There are several patterns that I’ve picked up on in which malware authors try to hide function calls, a large one being by using other function calls. To analysts who are experienced with the operating system’s API and system calls, this is not a big deal, but to those who are not, they are effective means of hiding behavior in code. Typically, when staring at disassembly either in a debugger or a disassembler, a very useful thing to pay attention to is the assembly “call” instructions because they indicate either API, system, or application functions being run. However, sometimes there are no “call” instructions which lead to any malicious code directly, but the malware is still performing malicious behavior. This is because a very popular trick is to pass the memory location of malicious code into other, more benign-looking API calls or functions, which covertly transfer execution to the malicious code at some point. Some API calls that can be used for this are:
- SetWindowsHookEx – Not so stealthy but essentially allows an author to create an event handler which will be triggered depending on one of various hook types specified.
- CreateThread – Instead of a “call” on the current thread, CreateThread can start up a new thread with a function address passed to it, which then executes that function
- CreateRemoteThread – Same as above but for threads in remote processes.
- CreateProcess – This isn’t very stealthy, but it’s another way nonetheless
- WinExec – Older version to launch an executable from 16 bit windows
- cmd.exe – Often used in conjunction with other methods, this indicates starting up other processes via the command line
- COM library calls and other, similar operating systems services which allow remote function calling. In a nutshell, these allow applications to call functions within other applications by serial number, which avoids any obvious strings to API calls. 1
The other, more cryptic and stealthy way to execute malicious code is to simply redirect the instruction pointer (eip) to some shellcode instructions at some point. The reason why this method is so effective is because shellcode is assembly code which is written in pure machine language (just opcodes) and placed in areas that are not normally designated for program code. Next, the instruction pointer can be redirected to this shellcode in an obscure location in memory, or the shellcode can be copied over in a place just in front of the instruction pointer.
For example, malware could open up any executable file and jump to a specific offset (memory location) inside that file, and then overwrite some bytes with malicious code, then either execute the file, or not even execute it and wait for a legitimate program or even the user to execute it. When this occurs, the original malware file will have absolutely zero calls to malicious code and it may only appear to have calls to CreateFile and WriteFile in the case of a file on disk, or memory mapping calls and WriteProcessMemory. This is because essentially, the program which may appear to not be malware at all is in fact writing another program which is the malware, into a separate file and may be delaying execution or hooking that file into a legitimate program such as Mozilla Firefox so that when the user launches it, the malicious code is run.
Although we may not know all possible ways for such behavior concealment, there is one constant: Malware can either execute malicious code or it can write malicious code that will presumably be executed right away or at another time. The execution of malicious code is defined as the instruction pointer of the CPU being placed at the starting address of the malicious “payload,” followed by the line-by-line execution of that payload. With these facts, we can reason about different methods that malware authors may use to conceal the calling and execution of malicious code:
- By directly calling it – Easiest to detect.
- By writing the malcode to another process in memory directly, then either moving eip or placing it in front of eip so that the code is executed.
- By completely replacing a legitimate process in memory, then executing the code
- By writing the malcode to a file on disk which gets loaded by a legitimate process which mistakes the file for a legitimate file
- By writing the malcode to a file on disk and then loading it directly using an executable malware program
- By writing a malcode executable to disk and tricking a user into executing it manually
- By exploiting a legitimate program in a way that accomplishes one or more of the above automatically, without any user input
The above methods (or a combination of them) are the only conceivable possible ways for code to be executed on a machine. This is because in order for any program to be executed, it must be loaded into memory; thus either memory must be directly written to and executed, or a file on disk is written to, loaded, and executed. There’s no other way to execute a program on a modern operating system, aside from firmware.
The final way in which malcode can be further concealed is via compression and encryption. All this really does is try to hide the code itself and is the last-ditch attempt to do so. Compression and encryption are different from the above methods because, the analyst’s knowledge of API calls and functions is completely irrelevant until the malicious code is decompressed/unpacked and encrypted because until then, the malicious code is not legible in the first place.
To the advantage of the malware analyst, the CPU itself also cannot execute code which is encrypted or compressed, so at some point prior to execution, it must be decrypted and/or decompressed so that it can be read by the CPU. It is at this instant that the malware analyst can capture the malicious code for analysis. What this means is that computers in general are fundamentally not built for behavior and code-hiding. A program is nothing more than a list of instructions such as a recipe for a cook or chef. No matter how secret the recipe, it must eventually be decrypted in order for a chef or cook to be able to follow the instructions. This is the case with the CPU as well, and this is the space that malware researchers and reverse engineers work in.
These hiding methods can be broken down into 3 categories:
- Code obfuscation/Compression/Encryption – destroys legibility of code
- Malcode hiding – Placing malicious code in hard-to-find or unexpected locations either on disk or in memory
- Functionality Concealment – Using functions/routines/procedures in such a way that makes it more difficult for an analyst or 3rd party reader to understand the malicious behavior of the program
Every technique that authors use to try and thwart analysis falls into one of those categories. Understanding the above information can greatly help when an analyst seems to be “stuck” and unable to find sufficient intel on a malware sample.