 
View Full Version : vm for the masses - a vm compiler incl source
0rp
April 11th, 2007, 14:56
hi,
i have attached the complete sourcecode of a working vm compiler. this compiler was used for the 'impossible crackme' - crackmes
i have also included a brief explanation of everything
please keep in mind that this vm underwent some major changes (read the impossible crackme threads), thats why parts of the code are messy and smelly
p.
Sab
April 11th, 2007, 15:05
Great to see a good public contribution. Thanks orp.
ZaiRoN
April 11th, 2007, 15:58
Thank you Orp 

b3n
April 11th, 2007, 21:56
thanks orp, i was looking for something like this! 

winndy
April 12th, 2007, 06:16
thank you very much!!
That's what I'm looking for.
FaTaL_PrIdE
April 12th, 2007, 07:56
Great contribution. Thank you for sharing!

winndy
April 12th, 2007, 21:51
I try to compile it with VS6.
I download msvcr80.dll and msvcp80.dll.
But opcodetoheader  still can't be executed.
Finally I found it's side-by-side configuration error.
I installed vcredist_x86.exe.It still cann't run .
opcodetoheader source isn't included.
Orp,would you please upload opcodetoheader source code?
Thanks!!
Another question:
What's the BYTE base[]  array?
How does it be generated?
NeOXOeN
April 13th, 2007, 06:50
thx for source i was looking for something like this for long time 
i think its for VC 7
bye
0rp
April 13th, 2007, 12:32
hi,
i have attached the opcodetoheader sources
the base[] array is the ready-to-use vm-binary-code.
this whole sourcefile (vmfuncs.cpp) is generated by the backend
see void Backend::generateCPP()
Silver
April 13th, 2007, 12:52
Nice stuff 0rp, I'll have a browse through your code later.
Is there a lot of interest in VM these days? Was mulling over a RECON submission for this year...
winndy
April 13th, 2007, 21:58
Thank you,Orp.
I'll take a good study at your code.
So If we want add more fuctions in vmfuncs.cpp,
we should write code to generate it.
Every fuction in vmfuncs.cpp has a different offset.
And instructions.dat is the base array.
char* mem points to the randmized data which is writed in base[] later.
While in compiler.cpp,some base array DWORD are wrote with fuction address or variabal address. 
Code:
	*(DWORD *)(base + 0) = (DWORD)xm_allocate;
	*(DWORD *)(base + 4) = (DWORD)xm_free;
	*(DWORD *)(base + 8) = (DWORD)sprintf;
	*(DWORD *)(base + 12) = (DWORD)globals;
	*(DWORD *)(base + 16) = (DWORD)xm_printf;
	*(DWORD *)(base + 20) = (DWORD)xm_export;
I'll study it more carefully to understand the blueprint of how VM works.
BR
0rp
April 14th, 2007, 02:21
if you want more functions in vmfuncs.cpp, then you have to put more funcs into your input script (test.txt)
basically each function has an own startoffset in this base array, but only functions that are exported (__export) get special code, that pushes real stack parameters to the vm stack:
Code:
	if (function->containsDeclSpec("export"

)
	{
		INSTR_BEGIN(ENTER);
			vmFunction->exportStart = instr;
		INSTR_END();
		for (int i = 0; i < function->parameters.size(); i++)
		{
			MOV_TEMP_CONST(TEMP(1), (10 + i) * 4);
			ADD(TEMP(1), APPREGS);
			MOV_TEMP_MEM(TEMP(0), TEMP(1));
			
			MOV_MEM_TEMP(ESP, TEMP(0));
			MOV_TEMP_CONST(TEMP(0), 4);
			ADD(ESP, TEMP(0));
		}
	}
Code:
	*(DWORD *)(base + 0) = (DWORD)xm_allocate;
	*(DWORD *)(base + 4) = (DWORD)xm_free;
	*(DWORD *)(base + 8) = (DWORD)sprintf;
.....
this are required 'imports', that the vm needs to run happily. so if you finally generated a vm and want to start it, you have to write this functionptrs to those vm addresses. its done in compiler just for testing purposes, since the vm gets executed: 
Code:
	char msg[1024];
	test(43, msg);
	info("%s", msg);
winndy
April 14th, 2007, 03:48
Orp,Thanks for your explanation.
I'm sorry to trouble you again.
Coco.exe  caused side-by-side configuration error.
It just donn't work.
It seems that you rebuild your coco.exe .
Is your coco source  this one:
Quote:
| Coco/R for C++
 ported and maintained by Markus Löberbauer and Csaba Balazs
 
 
 | 
I replaced Coco.exe with the above coco.exe.
I just got error:
Quote:
| Coco/R (Jan 15, 2007)
 checking
 FuncCallParams deletable
 Statements deletable
 XM deletable
 LL1 warning in Factor: "(" is start of several alternatives
 LL1 warning in IfElse: "else" is start & successor of deletable structure
 parser -- incomplete or corrupt parser frame file
 
 | 
I wonder what coco.exe you used.Thanks.
I just want to compile your xm sourcecode.I didn't expect so much problems.
Sorry.
And I think I should turn to VS2005.
BR
0rp
April 14th, 2007, 04:59
check the attachment
i recompiled coco without msvcrt dlls, ive also included its source
i changed coco a bit to fit my needs
i also re-enabled a fancy vm feature:
	data MessageBoxA = __export("user32.dll", "MessageBoxA"

;
	MessageBoxA(0, "oook", "hi", 3);
winndy
April 14th, 2007, 05:40
That's very kind of you.
I'll study it.
You're a great coder and reverser.
What's more,you are my patient teacher.

NeOXOeN
April 14th, 2007, 18:53
i think its one or rare source which came to public and are reaLLY great...
thx again ..
b3n
April 19th, 2007, 07:40
do i get this right that coco only generates you the parser and scanner but you have to write the compiler yourself? 
from what i understand so far is that coco is run on a language to produce some sort of output. is the coco output already the code that gets executed by the virtual machine or is it processed further in to create a virtual machine byte code?
im a bit lost here (even after having a look at the sources), so maybe someone can point me in the right direction.
0rp
April 19th, 2007, 13:29
coco generates the sourcecode of the used compiler, it is configured by the grammarfile xm.atg
so basically i dont write the compilersource myself, i just make a config file for coco. based on this config, coco generates the sources for the compiler wich are used then
b3n
April 19th, 2007, 20:34
so the compiler generated by coco transforms your instructions into this for example:
00000000    mov temp_0000, 0
00000001    mov i, temp_0000
00000002    mov temp_0000, i
00000003    mov_data temp_0000, src
00000004    mov temp_0001, 0
00000005    not_equal temp_0000, temp_0001
(taken from your strcpy snippet in the bigpicture.txt)
and this is then executed by the vm? or is it processed further to some sort of binary code? which of the method in the packages is actually executing the instructions?
Silver
April 20th, 2007, 04:19
b3n, I don't think following 0rp's code is going to help you with what you want. It might actually make it harder to understand.
0rp, no reflection on your code, just that b3n and I had quite a detailed discussion about VMs via privmsg.
b3n
April 20th, 2007, 09:31
hi silver,
its not really concerned with what we talked about, i just want to get an understanding on how 0rp's code works and i couldnt figure that out yet.
0rp
April 20th, 2007, 15:31
lets assume you have this expression:
1 + 2 * 3
the coco-generated compiler (aka frontend), transforms this expression into:
Code:
00000000    mov temp_0000, 1
00000001    mov temp_0001, 2
00000002    mov temp_0002, 3
00000003    mul temp_0001, temp_0002
00000004    add temp_0000, temp_0001
(if you prefer stackmachines, this code is identical to
Code:
push 1
push 2
push 3
mul
add
actually the first xm generation was a stackmachine)
this frontend code is given to the backend, wich transforms it into real down to the metal vm-instructions:
Code:
  00000000    mov temp_0000, 1
  ---------------------------------------------------------
  10    00000126    MOV_TEMP_CONST      00000064,  00000001
  11    00000ca0    MOV_TEMP_CONST      00000078,  00000000
  12    0000020a    ADD                 00000078,  00000008
  13    00000ce5    MOV_MEM_TEMP        00000078,  00000064
  00000001    mov temp_0001, 2
  ---------------------------------------------------------
  14    000003f1    MOV_TEMP_CONST      00000064,  00000002
  15    00000944    MOV_TEMP_CONST      00000078,  00000004
  16    0000074c    ADD                 00000078,  00000008
  17    0000031a    MOV_MEM_TEMP        00000078,  00000064
  00000002    mov temp_0002, 3
  ---------------------------------------------------------
  18    00000f62    MOV_TEMP_CONST      00000064,  00000003
  19    00000d2e    MOV_TEMP_CONST      00000078,  00000008
  1a    000008fd    ADD                 00000078,  00000008
  1b    00000ff0    MOV_MEM_TEMP        00000078,  00000064
  mul temp_0001, temp_0002
  ---------------------------------------------------------
  1c    00001187    MOV_TEMP_CONST      00000078,  00000004
  1d    000011cc    ADD                 00000078,  00000008
  1e    00000d73    MOV_TEMP_MEM        00000064,  00000078
  1f    0000125a    MOV_TEMP_CONST      00000078,  00000008
  20    00000c59    ADD                 00000078,  00000008
  21    00000e46    MOV_TEMP_MEM        00000068,  00000078
  22    0000081f    MUL                 00000064,  00000068
  23    0000004f    MOV_TEMP_CONST      00000078,  00000004
  24    00000a62    ADD                 00000078,  00000008
  25    00000a19    MOV_MEM_TEMP        00000078,  00000064
  add temp_0000, temp_0001
  ---------------------------------------------------------
  26    000012b3    MOV_TEMP_CONST      00000078,  00000000
  27    00000989    ADD                 00000078,  00000008
  28    000003a8    MOV_TEMP_MEM        00000064,  00000078
  29    00000507    MOV_TEMP_CONST      00000078,  00000004
  2a    00000f1b    ADD                 00000078,  00000008
  2b    00000094    MOV_TEMP_MEM        00000068,  00000078
  2c    000012f8    ADD                 00000064,  00000068
  2d    00000ed6    MOV_TEMP_CONST      00000078,  00000000
  2e    0000066c    ADD                 00000078,  00000008
  2f    000009d0    MOV_MEM_TEMP        00000078,  00000064
(first the frontend instruction, following the required vm instructions)
as you can see, there are a lot of vm instructions required to do one frontendinstruction (i.e. add temp, temp requires 10 vm instructions)
b3n
April 21st, 2007, 20:13
thanks for that explanation 0rp, that made it a lot clearer. im currently still digging through the code commenting as much as i can. but i havent found the method that is doing the execution of the vm instructions yet. where is the generated backend code executed? or is the backend code generated and executed on the fly when the frontend instructions are read?
edit:
am i right if i assume the following snipped of vm code would translate to the instructions shown below?
10    00000126    MOV_TEMP_CONST      00000064,  00000001
11    00000ca0    MOV_TEMP_CONST      00000078,  00000000
12    0000020a    ADD                 00000078,  00000008
13    00000ce5    MOV_MEM_TEMP        00000078,  00000064
mov 	        dword [ebx+0xededed00], 0xededed01
mov 	        dword [ebx+0xededed00], 0xededed01
mov		eax, [ebx+0xededed01]
add		[ebx+0xededed00], eax
mov		eax, [ebx+0xededed01]
mov		ecx, [ebx+0xededed00]
mov		[ecx], eax
im dont know what 0xededed00 and 0xededed01 are used for, could you please explain that to me?
[--MOV_TEMP_CONST--]
//initialize temp reg with 1 (ebx+0xededed00 points to the first temp reg?)
//is 00000064 in ebx?
mov 	dword [ebx+0xededed00], 0xededed01  
[--END MOV_TEMP_CONST--]
[--MOV_TEMP_CONST--]
//same as above, initialize second temp reg with 0
mov 	dword [ebx+0xededed00], 0xededed01
[--END MOV_TEMP_CONST--]
[--ADD--]
//move value of temp reg 2 into eax
mov		eax, [ebx+0xededed01]
//probably add the value in eax to the first temp reg, but im not sure what
//the 00000008 in the vm code stands for
add		[ebx+0xededed00], eax
[--END ADD--]
[--MOV_MEM_TEMP--]
//move value of second temp reg into eax
mov		eax, [ebx+0xededed01]
//move address of first temp reg in ecx
mov		ecx, [ebx+0xededed00]
//save eax at address of first temp reg
mov		[ecx], eax
[--END MOV_MEM_TEMP--]
0rp
April 22nd, 2007, 08:08
the instructions itself are executable, when the vm is entered, it goes straight to the first opcode, this opcode knows who is next and jumps to it, and so on
this edededXX stuff are markers. i compile the opcode source into .bin and overwrite the edededXX markers with their real values (done in void Backend::writeParam)
example:
ADD TEMP_0064, TEMP_0078
add opcode source:
	mov		eax, [ebx+0xededed01]
	add		[ebx+0xededed00], eax
wich gets:
	mov		eax, [ebx+0x78]
	add		[ebx+0x64], eax
so 0xedededed01 (the source operand) is replaced with 0x78 during generation, and 0xededed00 (the dest) is replaced by 0x64
and you are right with your example of those 4 instructions and their real asm
b3n
April 22nd, 2007, 08:22
thanks 0rp!
so do i get this right:
1. you let the compiler generate the vm instruction from the input script
2. the vm runs over this script and executes the matching instructions
so:
ADD TEMP_0064, TEMP_0078
will be executed by the vm like:
1. find out instruction (in this case add)
2. look up the compiled opcode
3. patch the 0xebebeb00 and 0xebebebe01 markers
4. execute the opcode instructions
5. get next instruction
did i get this right?
0rp
April 22nd, 2007, 09:36
this replacement of edededXX is done while generation, not while execution
so, when generation is done, you have a big block of x86 executable code, that make up the single steps, so somewhere it will contain 
mov eax, [ebx+0x78]
add [ebx+0x64], eax
which was required for something
here is how the final generation result looks like without encryption:
mov temp64, 1:
0049E845  mov         dword ptr [ebx+64h],1 
0049E84F  mov         ecx,4FCh 
0049E854  mov         edx,19h 
0049E859  add         ecx,dword ptr [ebx+2Ch] 
0049E85C  jmp         ecx  
mov temp_78, 0:
0049ECDC  mov         dword ptr [ebx+78h],0 
0049ECE6  mov         ecx,0C8h 
0049ECEB  mov         edx,1Bh 
0049ECF0  add         ecx,dword ptr [ebx+2Ch] 
0049ECF3  jmp         ecx  
add temp_78, temp_8
0049E8A8  mov         eax,dword ptr [ebx+8] 
0049E8AE  add         dword ptr [ebx+78h],eax 
0049E8B4  mov         ecx,516h 
0049E8B9  mov         edx,1Dh 
0049E8BE  add         ecx,dword ptr [ebx+2Ch] 
0049E8C1  jmp         ecx  
so the vm instructions end up as a chain of small executable and customized (the edededXX markers are replaced) x86 blocks, that are chained
b3n
April 22nd, 2007, 17:24
i see, so the compiled opcode snippets are just small templates of code that get customized by the vm environment and put together to form the final program? the way is see it the backend is kind of a compiler too, which produces the final binary as output. the final program is then run by executing the first instruction in the instruction chain?
0rp
April 23rd, 2007, 12:54
yes, exactly 

b3n
April 23rd, 2007, 18:39
why did you decide to create a final binary version of the input program instead of letting the vm execute the vm instructions during runtime as kind of an interpreter? if you have a binary version of the input program, what do you need the vm for? (maybe i missed something on the way but thats what i ask myself)
0rp
April 24th, 2007, 13:38
there was a xm version, that was working like you suggested
it had a static number of generic opcodes (add, mov, mul,...) that were parameterized. thatfor the vm contained also a big parameterstream
i didnt like this idea too much, bc you can easy replace the static number of opcodes by own hacked opcodes and do whatever you want
b3n
April 24th, 2007, 20:28
maybe you can help me with this 0rp: im just trying to develop my own little grammar to play around with, but the scanner and parser generated use wchar_t* everywhere instead of char*. i saw your scanner and parser use just char *. is there any way on how to tell the coco to use char* instead of wchar_t? its driving me nuts cause every time i change something in the grammar and i have to regenerate the parser and scanner i have to manually edit all the files...
dELTA
April 25th, 2007, 02:23
Quote:
| [Originally Posted by b3n]why did you decide to create a final binary version of the input program instead of letting the vm execute the vm instructions during runtime as kind of an interpreter? if you have a binary version of the input program, what do you need the vm for? | 
Quote:
| [Originally Posted by 0rp]bc you can easy replace the static number of opcodes by own hacked opcodes and do whatever you want | 
Well, sure, but in the case of building a normal binary like this, you lose the entire idea of people not being able to analyze the code statically with any tool they like, not to mention creating a simple IDC script that marks up all these sequences into their corresponding VM instruction (or even dumps the entire original script to a text file). (and yes, a much more advanced IDC script could do this even if you do it in VM code, but that's much harder, and again, exactly what is the reason/advantage with a VM in the first place with this method?)
And I really don't want to be rude or anything, I just wanted to check if I missed something here, just like b3n?
b3n
April 25th, 2007, 02:56
i think you got more to the point than me dELTA 

0rp
April 25th, 2007, 12:09
its using vmregs or a vmstack, so i would still call it a vm, or whats the definition of a vm?
as i said, it was a vm like you mean in some early version:
http://woodmann.com/forum/attachment.php?attachmentid=1531&d=1166647623
opcodes were much bigger and generic, and there was an array of vminstructions that were in fact the params for those generic opcodes
an opcode looked like this:
Code:
0040D0EE    8B6B 24         mov     ebp, dword ptr ds:[ebx+24]
0040D0F1    036B 14         add     ebp, dword ptr ds:[ebx+14]
0040D0F4    8D75 6C         lea     esi, dword ptr ss:[ebp+6C]
0040D0F7    8B06            mov     eax, dword ptr ds:[esi]
0040D0F9    B9 08000000     mov     ecx, 8
0040D0FE    8B148E          mov     edx, dword ptr ds:[esi+ecx*4]
0040D101    3353 28         xor     edx, dword ptr ds:[ebx+28]
0040D104    0353 14         add     edx, dword ptr ds:[ebx+14]
0040D107    3302            xor     eax, dword ptr ds:[edx]
0040D109  ^ E2 F3           loopd   short testcon.0040D0FE
0040D10B    8943 4C         mov     dword ptr ds:[ebx+4C], eax
0040D10E    8DB5 90000000   lea     esi, dword ptr ss:[ebp+90]
0040D114    8B06            mov     eax, dword ptr ds:[esi]
0040D116    B9 08000000     mov     ecx, 8
0040D11B    8B148E          mov     edx, dword ptr ds:[esi+ecx*4]
0040D11E    3353 28         xor     edx, dword ptr ds:[ebx+28]
0040D121    0353 14         add     edx, dword ptr ds:[ebx+14]
0040D124    3302            xor     eax, dword ptr ds:[edx]
0040D126  ^ E2 F3           loopd   short testcon.0040D11B
0040D128    8943 50         mov     dword ptr ds:[ebx+50], eax
0040D12B    8B43 4C         mov     eax, dword ptr ds:[ebx+4C]
0040D12E    8B4B 50         mov     ecx, dword ptr ds:[ebx+50]
0040D131    890C03          mov     dword ptr ds:[ebx+eax], ecx
0040D134    8D75 00         lea     esi, dword ptr ss:[ebp]
0040D137    8B06            mov     eax, dword ptr ds:[esi]
0040D139    B9 08000000     mov     ecx, 8
0040D13E    8B148E          mov     edx, dword ptr ds:[esi+ecx*4]
0040D141    3353 28         xor     edx, dword ptr ds:[ebx+28]
0040D144    0353 14         add     edx, dword ptr ds:[ebx+14]
0040D147    3302            xor     eax, dword ptr ds:[edx]
0040D149  ^ E2 F3           loopd   short testcon.0040D13E
0040D14B    8943 24         mov     dword ptr ds:[ebx+24], eax
0040D14E    8D75 48         lea     esi, dword ptr ss:[ebp+48]
0040D151    8B06            mov     eax, dword ptr ds:[esi]
0040D153    B9 08000000     mov     ecx, 8
0040D158    8B148E          mov     edx, dword ptr ds:[esi+ecx*4]
0040D15B    3353 28         xor     edx, dword ptr ds:[ebx+28]
0040D15E    0353 14         add     edx, dword ptr ds:[ebx+14]
0040D161    3302            xor     eax, dword ptr ds:[edx]
0040D163  ^ E2 F3           loopd   short testcon.0040D158
0040D165    50              push    eax
0040D166    8D75 24         lea     esi, dword ptr ss:[ebp+24]
0040D169    8B06            mov     eax, dword ptr ds:[esi]
0040D16B    B9 08000000     mov     ecx, 8
0040D170    8B148E          mov     edx, dword ptr ds:[esi+ecx*4]
0040D173    3353 28         xor     edx, dword ptr ds:[ebx+28]
0040D176    0353 14         add     edx, dword ptr ds:[ebx+14]
0040D179    3302            xor     eax, dword ptr ds:[edx]
0040D17B  ^ E2 F3           loopd   short testcon.0040D170
0040D17D    8F43 28         pop     dword ptr ds:[ebx+28]
0040D180    0343 14         add     eax, dword ptr ds:[ebx+14]
0040D183    FFE0            jmp     eax
but again, then you just need to make this basic opcode set patch safe (crcing an backup, or completly remove crcing), thats why i switched to executable instructions, wich are harder to retrieve from the vm, esp. when they are encrypted (yes, i failed here too: http://woodmann.com/forum/attachment.php?attachmentid=1572&d=1170436383)
b3n: try switching your project to multibyte, or if you use coco, you can change the parser/lexer code templates. they are in parser.frame and scanner.frame
Fh_prg
September 18th, 2009, 17:48
hello , how we can use this source code to protect a sample app with it's VM ?
0rp
November 1st, 2009, 07:49
you cant protect x86 code with it. you have to write your secret code with the vm-script language and compile it to vm
(dont use it for serious business, because its too weak)
Fh_prg
November 1st, 2009, 09:04
Thank you so much.
Powered by vBulletin® Version 4.2.2 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.