HomeMalware Analysis
Zero-day Attack Uses Corrupted Files to Bypass Detection: Technical Analysis
HomeMalware Analysis
Zero-day Attack Uses Corrupted Files to Bypass Detection: Technical Analysis

Recently, our analyst team shared their research into a zero-day attack involving the use of corrupted malicious files to bypass static detection systems. Now, we present a technical analysis of this method and its mechanics. 

In this article, we will:  

  • Demonstrate how attackers corrupt archives, office documents, and other files 
  • Explain how this method successfully evades detection by security systems 
  • Show how corrupted files get recovered by their native applications 

Let’s get started. 

Sandbox Analysis of a Corrupted File Attack

To first see how such attacks unfold, we can upload one of the corrupted filles used by attackers to ANY.RUN’s sandbox.  

View analysis session

Analysis of a corrupted docx file in the ANY.RUN sandbox

Thanks to its interactivity, the sandbox lets us simulate a real scenario of user opening the broken malicious file inside the file’s corresponding application

Word asking to restore a corrupted file

In our case, it’s a docx file. When we open it with Word, the program immediately offers us the option to recover the content of the file and successfully does it. 

ANY.RUN allows you to manually open a broken file with Word

Inside, we find a QR code with a phishing link. The sandbox also automatically detects malicious activity and notifies us about this. 

Black Friday 2024: Get up to 3 sandbox licenses for free 

See details

How Corrupted Files Bypass Antivirus Software and Other Automated Solutions

Analysis inside the ANY.RUN sandbox showed how a corrupted file gets restored thanks to Word’s built-in recovery mechanisms, which allows us to identify its malicious nature. 

VirusTotal shows no detections for such corrupted files

Yet, if we submit the same corrupted file to VirusTotal, which provides verdicts from numerous security solutions, we will see zero threat detections. The question is why? 

The answer is simple: most antivirus software and automated tools are not equipped with the recovery functionality that is found in applications, such as Word. This prevents them from accurately identifying the type of the corrupted file, resulting in a failure to detect and mitigate the threat

Docx is not the only file format used by attackers. There are also corrupted archives with malicious files inside, which easily bypass spam filters because security systems cannot view their contents due to corruption.  

Once downloaded onto a system, tools like WinRAR easily restore the damaged archive, making its contents available to the victim. 

Now, let’s see how exactly it works on a technical level. 

Technical Analysis of a Corrupted Word Document 

The Structure of a Word Document 

Since the mid-2000s, office documents (OpenOffice.org 2.0 — released in 2005) have been structured as archives containing the document’s content. 

In the image below, you can see the structure of a Word document. 

Word document structure (Figure 1)

As we can see, all structures within this archive are interconnected, and this relationship begins from the end

At the end of the archive, there is a structure called the End of Central Directory Record (EOCD). This structure contains information about the size of the Central Directory File Header (CDFH), its offset, and the total number of entries in the archive. This structure helps locate the CDFH.  

The CDFH duplicates the data stored in the Local File Header (LFH) and the offsets to it. Yet, this structure does not contain the compressed data itself but rather represents a hierarchy of files within the archive. This part of the structure allows you to find the LFH of each file in the archive.  

The LFH is considered the header for each file in the archive. It contains important data such as the file name, compressed and uncompressed sizes, CRC32 checksum, and other parameters.  

The compressed data is located after the header. 

How the File Structure Can Be Manipulated by Attackers 

As shown in the image above (Figure 1), the archive is structured backward, starting with the end, while all parts are linked together.  

This has led us to test three different hypotheses (Figure 2): 

Three hypotheses we tested (Figure 2)

1. Can Word or an archiving program recover and successfully open a file if additional data is added to the beginning of the archive? 

2. Can Word or an archiving program recover and successfully open a file if we corrupt the linking between the parts and delete the CDFH, which does not contain the file data itself?  

3. Can Word or an archiving program recover and successfully open a file if we corrupt the linking between the parts and erase the EOCD, which is a crucial part of the recovery process? 

You can see the results of our hypothesis testing in the table below.

   Word   ZIP  
Hypothesis 1   Success  Fail (the file is no longer an archive)  
Hypothesis 2  Success  Success 
Hypothesis 3  Success (thanks to undamaged Local File Headers)    Success (thanks to undamaged Local File Headers)   

During our hypothesis testing, we’ve made several noteworthy observations: 

1. For minimal recovery of a Word document, the following files are essential: 

[Content_Types].xml,   

Word/document.xml,   

word/_rels/document.xml.rels,   

_rels/.rels;   

These contain crucial information regarding the relationships between elements and form the standard file hierarchy required for Word to interpret the document. 

2. A ZIP archive with corrupted Local File Headers will only show the file structure. The actual file content will be empty. 

3. If the end part of the ZIP file is damaged, the archiving software and Word will attempt to use an alternative recovery method: by leveraging intact Local File Headers

Our findings demonstrate that Word is more resilient to file corruption than ZIP. While Word successfully recovered files with corrupted CDFH, EOCD, and even when random bytes were added to create a non-existent LFH structure, ZIP failed in the first hypothesis, where random bytes were added to the beginning of the file. 

Why Security Systems Fail to Read Corrupted Files 

Security systems attempt to identify file types, including by using Magic Bytes in File Headers. In the case of office documents and ZIP archives, because the file effectively starts from the end, we can corrupt the archive structure and magic bytes, making it difficult for detection systems to identify the file type.  

This leads to the inability to unpack and inspect the contents. 

Consider this email with a corrupted Word document

ANY.RUN’s Sandbox identifies malicious activity of the corrupted file

The sandbox once again has no problem detecting the threat, returning a “malicious activity” verdict.

Only one detection in VirusTotal

But, when run in VirusTotal, almost zero threat detections come back for this file. 

Learn to analyze malware in a sandbox


Learn to analyze cyber threats

See a detailed guide to using ANY.RUN’s Interactive Sandbox for malware and phishing analysis

Conclusion

Our study revealed a vulnerability in document and archive structures. By manipulating specific components like the CDFH and EOCD, attackers can create corrupted files that are successfully repaired by applications but remain undetected by security software. As a result, we face a situation when security systems have not yet developed a clear logic for detecting such attacks, exposing the security of their users.

About ANY.RUN  

ANY.RUN helps more than 500,000 cybersecurity professionals worldwide. Our interactive sandbox simplifies malware analysis of threats that target both Windows and Linux systems. Our threat intelligence products, TI Lookup, YARA Search and Feeds, help you find IOCs or files to learn more about the threats and respond to incidents faster.  

With ANY.RUN you can: 

  • Detect malware in seconds
  • Interact with samples in real time
  • Save time and money on sandbox setup and maintenance
  • Record and study all aspects of malware behavior
  • Collaborate with your team 
  • Scale as you need

Explore all Black Friday 2024 offers →

khr0x
Malware analyst at ANY.RUN at ANY.RUN | + posts

I'm 21 years old and I work as a malware analyst for more than a year. I like finding out what kind of malware got on my computer. In my spare time I do sports and play video games.

khr0x
khr0x
Malware analyst at ANY.RUN
I'm 21 years old and I work as a malware analyst for more than a year. I like finding out what kind of malware got on my computer. In my spare time I do sports and play video games.

What do you think about this post?

7 answers

  • Awful
  • Average
  • Great

No votes so far! Be the first to rate this post.

0 comments