Encryption has been around since ancient times, with early examples dating back to ancient Egypt, where hieroglyphics were used to conceal the meaning of messages. Over the millennia, encryption has evolved and become more complex, but its purpose and basic principles have remained largely unchanged.
Malware encryption is a common evasion and anti-analysis technique. You’ll often see 2 applications of it:
- Encrypted traffic (such as exfiltrated data sent to C2)
- Encrypted strings (hard-coded URLs, IP addresses, and other sensitive details that are part of the malware configuration file).
Both of these are used to convert data that could be scanned by security systems and set off alarms into something that appears random and unrecognizable. XOR encryption is a common method in malware that you will likely encounter sooner or later.
- XOR
- Rc4
- AES
- DES
- 3DES (Tripple DES)
Since it’s our first article explaining encryption basics, we’ll cover the fundamentals, how encryption works and then take a deep dive into the XOR cryptography. You will learn:
- The fundamentals of cryptography
- How some classical ciphers work
- How ciphers have evolved with the advent of digital technology
- What bitwise operations are
- What XOR is and how it works
- How to tell when you encounter XOR cipher.
- And then, how to decrypt it.
We’ll conclude with a practical example where we go from detecting malware in ANY.RUN to decrypting command and control (C2) communications encrypted with the XOR cipher.
—Feel free to skip ahead to the XOR explanation and practical examples.
Otherwise, read on!
The fundamentals of encryption
So, what is encryption, exactly? At its core, encryption is a process of transforming an input (known as plaintext) into a random, unreadable set of characters (known as ciphertext) to hide its meaning from unauthorized parties.
Encryption transforms data according to a set of rules (an encryption algorithm) so that the transformation can be reversed by applying a key.
Here’s a breakdown of the main concepts in encryption:
- Plaintext: The original, unencrypted data that is readable and understandable without any processing.
- Ciphertext: The encrypted data that is unreadable and appears as random characters or bytes. It is the output of the encryption process.
- Encryption algorithm: A set of rules by which plaintext is transformed into ciphertext. It usually involves applying a series of substitutions and permutations to the plaintext.
- Key: A piece of information, usually a string of characters or numbers, that is used in conjunction with the encryption algorithm to encrypt and decrypt data.
Symmetric vs. Asymmetric encryption
The encryption methods we’re going to explore in this article, including XOR cipher, all use symmetric encryption. This means that they use the same key to encrypt and decrypt the data. However, there is another type of encryption — asymmetric, or, as it’s sometimes called, public-key cryptography. Here’s how they compare:
Symmetric encryption
- Uses the same key for both encryption and decryption
- Faster and simpler, but requires secure key exchange
Asymmetric encryption
- Uses two related keys: a public key for encryption and a private key for decryption.
- Allows to exchange keys with more security but is slower and more complex.
How encryption works: building up to XOR
Let’s start a little bit from afar. Encryption is easier to explain through examples, so we’ll consider a few below.
Each method we will look at will introduce a new concept, and then we’ll put them together like Legos to eventually understand XOR cypher. Here are the concepts we’ll cover:
- Simple substitution
- Mutating cleartext against a key
- Bitwise operations
Concept #1. Simple substitution
If we create a codebook and replace words, symbols, or even concepts with something random, we get a simple substitution cipher. Look at the table below:
Original | Substitution |
---|---|
ANY | 🤖 |
RUN | 🏃♂️ |
IS | 👈 |
AWESOME | 🤘 |
If we want to encode the message ANY RUN IS AWESOME, we get ? ?♂️ ? ?. This is actually similar to how ancient Egyptians used hieroglyphics to communicate secretly around 1900 BC.
This idea might seem very basic — and it is — but the concept of replacing one symbol with another according to some rule (or key) is at the heart of all cryptography.
Of course, if we use emojis you can kind of guess at the meaning, so a logical evolution is to introduce more randomness into the output.
Caesar Cipher, a famous example of a real substitution cipher, does this.
In the Caesar Cipher, each letter in the plaintext is shifted by a fixed number of positions up or down the alphabet. Julius Caesar allegedly used this encryption in his private correspondence around 50 BC.
To encrypt the phrase “ANY RUN IS AWESOME” using the Caesar Cipher with a right shift of 3, we get:
Plaintext: | ANY RUN IS AWESOME |
Ciphertext: | DQB UXQ LV DZHVRPH |
However, shift ciphers are incredibly easy to crack. They create predictable patterns and don’t change the frequency of occurring symbols.
- Encryption always uses some kind of rule set to substitute a symbol (or byte) for another one.
Concept #2. Using a key for mutation
To overcome this limitation, cryptographers developed the idea of mutating the plaintext using a key that dictates the logic of each mutation. Let’s look at another example to see this in practice.
We can use a keyword to generate multiple substitution alphabets and thus apply a Caesar Cipher with a different shift to each letter. The shift is determined by the alphabetic position of the corresponding letter in the key.
Let’s encrypt the message ANY.RUN IS AWESOME using this logic. First, we need a key. Let’s choose a keyword, say, CRYPTO. Here’s how it works:
Plaintext | ANY.RUN IS AWESOME |
Key | CRYPTOC RY PTOCRYP |
This encryption method is called the Vigenère cipher and it was invented in the sixteen hundreds. In this cipher, for each letter in the plaintext, find the corresponding letter in the key and shift the plaintext letter by the alphabetic position of the key letter (A=0, B=1, etc.):
Plaintext: | A N Y . R U N I S A W E S O M E |
Key: | C R Y P T O C R Y P T O C R Y P |
Ciphertext: | C F N . U M Q Z B K C O F V Z O |
Note that because the plaintext is longer than the key, we had to repeat the key until it matches the length of the plaintext. That’s a vulnerability because it introduces repetition — remember this, as it also rings true for XOR and can help you detect when it’s used in malware.
- A key dictates the logic of encryption on a per symbol or per bit basis.
- The key should match the plaintext in length, otherwise it makes the encryption easier to crack.
- This rule is also true for modern symmetric encryption methods like XOR.
Concept #3. Bitwise operations
Until now we’ve mutated symbols of the English alphabet directly, but digital data is represented in binary code.
If we take the same string ANY.RUN IS AWESOME and represent it in Binary, it will look like this:
Plaintext | 8-bit binary |
---|---|
ANY.RUN IS AWESOME | 01000001 01001110 01011001 00101110 01010010 01010101 01001110 00100000 01001001 01010011 00100000 01000001 01010111 01000101 01010011 01001111 01001101 01000101 |
A bitwise operation works directly on individual bits. Various bitwise operations exist, but XOR (which stands for exclusive or) is of special interest to us, because it’s widely used in encryption. It’s reversible and provides a simple way to combine data with a key. Here’s how it works:
XOR (^) | Returns 1 if exactly one of the operand bits is 1, otherwise, return 0 |
In code, the XOR operator is represented by the caret symbol (^).
- A bitwise operation is an operation directly on binary code.
- XOR is a bitwise operation that’s common in encryption.
- In code, the XOR operator is usually represented by the caret symbol (^).
Now, let’s look at an encryption method that uses bitwise operations. We’ll encrypt the message ANY.RUN IS AWESOME using the key ONETIMEPADCIPHERS.
First, we convert both to binary:
Plaintext | 01000001 01001110 01011001 00101110 01010010 01010101 01001110 00100000 01001001 01010011 00100000 01000001 01010111 01000101 01010011 01001111 01001101 01000101 |
Key | 01001111 01001110 01000101 01010100 01001001 01001101 01000101 01010000 01000001 01000100 01000011 01001001 01010000 01001000 01000101 01010010 01010011 00100000 |
(Note, that binary representation of the key matches the binary representation of plaintext in length, which will make our encryption more robust).
Then, we XOR each bit:
Ciphertext | 00001110 00000000 00011100 01111010 00011011 00011000 00001011 01110000 00001000 00010111 01100011 00001000 00000111 00001101 00010110 00011101 00011110 00001010 |
Converting back to characters, we get:
Ciphertext | �zx#è{^e |
The resulting string appears totally random. This cipher is called OTP, which stands for one-time pad — it uses a random key of the same length as the plaintext to encrypt the data. The key is used only once and then discarded, and this encryption is mathematically impossible to crack.
And now all the Lego pieces we need for XOR are in place:
- Replacing one character or bit with another.
- Using a key to dictate the substitution logic.
- Performing operations on binary bits.
- Using a binary XOR operation.
How does XOR cipher work?
Let’s break down the XOR cipher itself. As we discussed above, the XOR operation compares two bits and returns 1 if exactly one of the bits is 1, otherwise it returns 0. Here’s the truth table for XOR:
A | B | A XOR B --|--|-------- 0 | 0 | 0 0 | 1 | 1 1 | 0 | 1 1 | 1 | 0 |
Let’s say we have a plaintext message “Hello” and a key “Secret”. First, we need to convert both the message and the key to binary:
Plaintext | Binary |
Hello | 01001000 01100101 01101100 01101100 01101111 |
Secret | 01010011 01100101 01100011 01110010 01100101 01110100 |
Now, we XOR each bit of the plaintext with the corresponding bit of the key:
Plaintext | 01001000 01100101 01101100 01101100 01101111 |
Key | 01010011 01100101 01100011 01110010 01100101 01110100 |
Cyphertext | 00011011 00000000 00001111 00011110 00001010 |
To decrypt the ciphertext, we XOR it with the same key, and we get back the original plaintext “Hello”.
An interesting peculiarity occurs when we XOR 0. When you XOR a bit with 0, you get the original bit back. This is because:
0 XOR 0 = 0 1 XOR 0 = 1 |
If the key is a numerical value represented as a string, and we convert the numerical value to its hexadecimal representation, every time we XOR a 0 with the key, the output will simply be the corresponding bit of the key, repeated as many times as necessary to match the length of the input.
In practice, in hexadecimal values of XORed data, this manifests in series of repetition, which is a hint that XOR was possibly used. We will see this in action later when we analyze a real example of XOR encryption.
Input (HEX) | 00 00 |
Key (UTF8 String) | 33 53 |
Output (HEX) | 33 53 |
Using XOR as a when used with short, repeating keys, and it’s very evident in hexadecimal codes where there is a high frequency of zeros.
Decrypting XOR
Let’s look at an example of XOR obfuscation and encryption in practice, using this recording of an interactive analysis session in ANY.RUN.
In this example, we notice that a process spawned by an executable sends a GET request to a URL for a file with an .mp4 extension. Knowing that process is malicious, we can guess that it’s downloading some kind of module.
We can click the orange button to open Static Discovering and view the content transmitted with the request.
ANY.RUN has a built-in text transformer which shows HEX values in cleartext. It’s complete gibberish, suggesting encryption. But note the telltale repetition of 5s and 3s. Recalling our earlier discussion, this could hint that the transmission content was encrypted with XOR.
We can make an educated guess that the key involves a sequence of 5s and 3s, but we don’t know the exact length. To figure that out, we need to examine the executable’s source code and find the encryption function holding the key.
In ANY.RUN, you can download the object you’re analyzing by clicking on it in the top-right corner, which opens Static Discovering. Click Download to retrieve the file.
Downloading malware can be dangerous. Only do this if you’re working in a secure environment and know what you’re doing.
Let’s load the sample into dnSpy so we can confirm if it uses XOR and find the key itself. We need to locate where the XOR encryption occurs.
This particular sample lacks further obfuscation, so we easily find the function performing the XOR encryption. In many cases, the code will be more obfuscated, with the key constructed by additional functions rather than stated explicitly.
Once we know the key, we can decrypt this traffic. Let’s download the transmitted file (view transmission in Static Discovering as above and click download) and use this CyberChef recipe to decrypt it. Insert the file’s ciphertext as input and set the key of 355.
We then get the decrypted file’s bytes, which tells us that it’s a DLL Windows Portable Executable.
Wrapping up
In this article, we explored the fundamentals of encryption, starting with hieroglyphics and simple substitution ciphers and progressing to the XOR cipher.
We learned:
- How encryption transforms plaintext into unreadable ciphertext using an algorithm and a key.
- How encryption methods evolved over time and became more complex and secure.
- Finally, we applied our knowledge to a practical example, demonstrating how to detect and decrypt XOR-encrypted malware communications using ANY.RUN and other tools.
About ANY.RUN
ANY.RUN is a trusted partner for more than 400,000 cybersecurity professionals around the world. Our interactive sandbox simplifies malware analysis of threats targeting both Windows and Linux systems, providing analysts with an advanced tool for investigations. Our threat intelligence products, Lookup and Feeds, offer refined indicators of compromise and context that lets users detect threats and respond to incidents faster.
Advantages of ANY.RUN
ANY.RUN helps you analyze threats faster while improving detection rates. The platform detects common malware families with YARA and Suricata rules and identifies malware behavior with signatures when detection by family is not possible.
With ANY.RUN you can:
- Detect malware in under 40s: ANY.RUN detects malware within about 40 seconds of a file upload. It identifies prevalent malware families using YARA and Suricata rules and uses behavioral signatures to detect malicious actions when you encounter a new threat.
- Interact with samples in real time: ANY.RUN is an interactive cloud sandbox powered by VNC, which means that you can do everything you could on a real system: browse webpages, click through installers, open password-protected archives.
- Save time and money on sandbox setup and maintenance: ANY.RUN’s cloud-based nature eliminates the need for setup or maintenance by your DevOps team, making it a cost-effective solution for businesses.
- Record and study all aspects of malware behavior: ANY.RUN provides a detailed analysis of malware behavior, including network traffic, system calls, and file system changes.
- Collaborate with your team: easily share analysis results, or, as a senior team member, check work of junior analysts by viewing recordings of their analysis sessions.
- Scale as you need: as a cloud service, you can easily scale your team, simply by adding more licenses.
We’ll show you in an interactive presentation how ANY.RUN can help your security team.
0 comments