We asked ChatGPT to Analyze Malware. It Failed

We Asked ChatGPT to Analyze Malware. It Failed.

If ChatGPT is an excellent assistant in building malware, can it help analyze it too? The team of ANY.RUN malware sandbox decided to put this to the test and see if AI can help us perform malware analysis.

Lately, there’s been a great deal of discussion about malicious actors using ChatGPT — the latest conversational AI to create malware.

Malware analysts, researchers, and IT specialists agree that writing code is one of GPT’s strongest sides, and it’s especially good at mutating it. By leveraging this capability, apparently, even want-to-be-hackers can build polymorphic malware simply by feeding text prompts to the bot, and it will spit back working malicious code.

OpenAI released ChatGPT in November 2022, and at the time of writing this article, the chatbot already has over 600 million monthly visits, according to SimilarWeb. It’s scary to think how many people are being armed with the tools to develop advanced malware.

Going into this, our hopes were high, but unfortunately, the results weren’t that great.

How did we test ChatGPT?

We fed the chatbot malicious scripts of varying complexity and asked it to explain the purpose behind the code.

We used simple prompts such as “explain what this code does” or “analyze this code”.

ChatGPT can recognize and explain simple malware

Based on our testing it can recognize and explain malicious code, but it only works for simple scripts.

The first example that we asked it to analyze is a code snippet that hides drives from the Windows Explorer interface — that is exactly what GPT told us when pasted the following code, using this prompt: What does this script do?

The bot was able to give a fairly detailed explanation:

ChatGPT identifies simple malicious scripts.

So far so good. The AI understands the purpose of the code, highlights its malicious intent and logically lays out what it does step-by-step.

But let’s try something a bit more complex. We pasted code from this task, using the same prompt.

ChatGPT was able to understand what the code does and, again, gave us a fairly detailed explanation, correctly identifying that we’re dealing with a fake ransomware attack. Here’s the answer that it generated:

ChatGPT answer for a ransomware script's analysis

We like how GPT explains the end goal of the code and paints a compelling picture of the aftermath of its execution.

We also tested it with this task — a similar one — and the answer was about the same: comprehensive enough and correct.

Not bad so far, let’s keep on going.

ChatGPT struggles in real-life situations

The performance the AI was able to show so far is impressive, there is no doubt about it. But let’s be honest, in a real-life situation you usually won’t be dealing with such simple code, like in the previous two examples.

So for the next couple of tests, we ramped up the complexity and provided it with code that is closer to that what you can expect to be asked to analyze on the job.

Unfortunately, chatGPT just couldn’t keep up.

In this task, the code ended up being too large and the AI straight up refused to analyze it. And when we took obfuscated code from this example and asked the chatbot to deobfuscate it, it threw an error.

After a bit of tinkering and trying different prompts, we got it to work, but the answer wasn’t what we had hoped for:

Instead of trying to deobfuscate the script it just tells us that it’s not human readable, which is something that we already knew. Unfortunately, there’s no value in this answer.

Wrapping up

As long as you provide ChatGPT with simple samples, it is able to explain them in a relatively useful way. But as soon as we’re getting closer to real-world scenarios, the AI just breaks down. At least, in our experience, we weren’t able to get anything of value out of it.

It seems that either there is an imbalance and the tool is of more use for red-teamers and hackers, or the articles that warn of its use for creating advanced malware are overhyping what it can do a bit.

In any case, bearing in mind how quickly this technology has developed, it’s worth keeping an eye on how it’s progressing. Chances are that in a couple of updates, it will be a lot more useful.

But for now, as far as coding goes, cybersecurity specialists can write simple Bash or Python scripts slightly faster and light debugging is what it’s best used for.

malware analysis

Cancel reply

1 comments

Samual Policicchio says:

February 17, 2023 at 12:08 pm

Only a smiling visitor here to share the love (:, btw great layout. “Reading well is one of the great pleasures that solitude can afford you.” by Harold Bloom.

Reply

We Asked ChatGPT to Analyze Malware. It Failed.

How did we test ChatGPT?

ChatGPT can recognize and explain simple malware

ChatGPT struggles in real-life situations

Wrapping up

Cancel reply

See Malicious Process Relationships
on a Visual Graph

Brute Ratel C4 Badger Used to Load Latrodectus

Find Threats Exploiting CrowdStrike Outage
with TI Lookup

What Are the 3 Types of Threat Intelligence Data

We Asked ChatGPT to Analyze Malware. It Failed.

Recent posts

See Malicious Process Relationships on a Visual Graph

Brute Ratel C4 Badger Used to Load Latrodectus

Find Threats Exploiting CrowdStrike Outage with TI Lookup

How did we test ChatGPT?

ChatGPT can recognize and explain simple malware

ChatGPT struggles in real-life situations

Wrapping up

Cancel reply

You may also like

See Malicious Process Relationships on a Visual Graph

Brute Ratel C4 Badger Used to Load Latrodectus

Find Threats Exploiting CrowdStrike Outage with TI Lookup

What Are the 3 Types of Threat Intelligence Data

See Malicious Process Relationships
on a Visual Graph

Find Threats Exploiting CrowdStrike Outage
with TI Lookup

See Malicious Process Relationships
on a Visual Graph

Find Threats Exploiting CrowdStrike Outage
with TI Lookup