 
View Full Version : Toughts on code de-obfuscator(s)
SiNTAX
September 17th, 2002, 08:23
Has anybody done some work on code deobfuscators (ie. for the anti-disassembly stuff SafeDisc and the likes add to code).
I have Imhotep from AntharXes (did I spell this correct). It does a decent job, but not a perfect one.
Just wondering if this can simply be done with some crafty IDA scripting?! 
In the safedisc unwrapper from R!sc there was a cleaned dump of the SD dll's, I doubt this was all done by hand..
[yAtEs]
September 18th, 2002, 14:20
risc got a maid to clean his dlls (-;
It just takes a little time and analyse, dead list something
for example the safedisc dlls and define into bytes what you
think is junk and refined the code after, after awhile you'll
spot the patterns, obfuscation general works by adding 
certain bytes to after certain opcodes, for example  CALL EAX
is  FF E0  stick an EB on it,  EB FF E0  makes a jmp -1
so it jumps back to the FF E0 and is disassembled wrong
etc
yates.
SiNTAX
September 18th, 2002, 15:06
Yups.. I know how this stuff works, but I doubt anyone sane of mind would do it completely manually 
 
 
Imhotep does a decent job at it, but not a perfect one... I think using a decent disassembler like IDA with a script file could do wonders.
The 100% automatic tools usually fail in various places.
(It would be nice if OllyDbg had something like that built-in... Now THAT would be cool!)
.. for the record ...  been playing a bit with a v2.60.52 SD file (maybe get it to run in WINE one of these days)
Sure miss PROTECT / Frogsice under WinXP 

cyberheg
September 18th, 2002, 17:31
I once tried a safewrap protected program which I assume uses same type of obfuscation as safedisc itself. However I was supprised how 'stupid' the obfuscation was. If the same thing goes for safedisc I am sure it's possible to write a good anti tool for it.
Maybe one of you could post more examples of this type of obfuscation.
In most cases obfuscation is applied on compiled code or to-be-compiled code and assuming most engines aren't smart enough to track which registers are in use then mostly obfuscation are made of "null instructions". From safewrap I remember stuff like xchg eax, edx; xchg edx, eax and push/pop series.
However there exists ways to make obfuscation harder to make it harder to remove again.
// CyberHeg
bsod
September 18th, 2002, 22:05
well, many protectors simply use the same obfuscator code again and again, so we simply search for those bytes over the whole code section and nop them out..
bye,
bsod
SiNTAX
September 19th, 2002, 08:23
SafeDisc doesn't use the exact same sequence over and over again, that would be too easy.
As for an example:
[correctly decoded version]
07E6F loc_10007E6F:                           ; CODE XREF: .txt2:10007E76j
10007E6F                 mov     ebx, ebx
10007E71                 jg      short loc_10007E79
10007E73                 nop
10007E74                 jle     short loc_10007E79
10007E76 
10007E76 loc_10007E76:                           ; CODE XREF: .txt2:10007E6Dj
10007E76                 jmp     short loc_10007E6F
10007E76 ; ---------------------------------------------------------------------------
10007E78                 db  2Bh ; +
10007E79 ; ---------------------------------------------------------------------------
10007E79 
10007E79 loc_10007E79:                           ; CODE XREF: .txt2:10007E71j
10007E79                                         ; .txt2:10007E74j
10007E79                 js      short loc_10007E84
10007E7B 
10007E7B loc_10007E7B:                           ; CODE XREF: .txt2:10007E86j
10007E7B                 nop
10007E7C                 xchg    eax, eax
10007E7E                 jg      short loc_10007E89
10007E80                 xchg    ebx, ebx
10007E82                 jle     short loc_10007E89
10007E84 
10007E84 loc_10007E84:                           ; CODE XREF: .txt2:10007E79j
10007E84                 jz      short $+2
10007E86                 js      short loc_10007E7B
10007E86 ; ---------------------------------------------------------------------------
10007E88                 db  22h ; "
10007E89 ; ---------------------------------------------------------------------------
10007E89 
10007E89 loc_10007E89:                           ; CODE XREF: .txt2:10007E7Ej
10007E89                                         ; .txt2:10007E82j
10007E89                 jnz     short loc_10007E96
10007E8B                 push    ebx
10007E8C                 call    nullsub_23
10007E91 ; ---------------------------------------------------------------------------
10007E91                 pop     ebx
10007E92                 jz      short loc_10007E96
10007E94 
10007E94 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
10007E94 
10007E94 
10007E94 nullsub_22      proc near               ; CODE XREF: sub_10007E20+1Dp
10007E94                 retn
10007E94 nullsub_22      endp
10007E94 
10007E94 ; ---------------------------------------------------------------------------
10007E95                 db  38h ; 8
10007E96 ; ---------------------------------------------------------------------------
10007E96 
10007E96 loc_10007E96:                           ; CODE XREF: .txt2:10007E89j
And this is how it normally looks like:
10007E6F ; ---------------------------------------------------------------------------
10007E6F 
10007E6F loc_10007E6F:                           ; CODE XREF: .txt2:10007E76j
10007E6F                 mov     ebx, ebx
10007E71                 jg      short loc_10007E79
10007E73                 nop
10007E74                 jle     short loc_10007E79
10007E76 
10007E76 loc_10007E76:                           ; CODE XREF: .txt2:10007E6Dj
10007E76                 jmp     short loc_10007E6F
10007E76 ; ---------------------------------------------------------------------------
10007E78                 db  2Bh ; +
10007E79 ; ---------------------------------------------------------------------------
10007E79 
10007E79 loc_10007E79:                           ; CODE XREF: .txt2:10007E71j
10007E79                                         ; .txt2:10007E74j
10007E79                 js      short loc_10007E84
10007E7B 
10007E7B loc_10007E7B:                           ; CODE XREF: .txt2:10007E86j
10007E7B                 nop
10007E7C                 xchg    eax, eax
10007E7E                 jg      short near ptr loc_10007E88+1
10007E80                 xchg    ebx, ebx
10007E82                 jle     short near ptr loc_10007E88+1
10007E84 
10007E84 loc_10007E84:                           ; CODE XREF: .txt2:10007E79j
10007E84                 jz      short $+2
10007E86                 js      short loc_10007E7B
10007E88 
10007E88 loc_10007E88:                           ; CODE XREF: .txt2:10007E7Ej
10007E88                                         ; .txt2:10007E82j
10007E88                 and     dh, [ebp+0Bh]
10007E8B                 push    ebx
10007E8C                 call    nullsub_23
10007E91                 pop     ebx
10007E92                 jz      short loc_10007E96
10007E94 
10007E94 locret_10007E94:                        ; CODE XREF: sub_10007E20+1Dp
10007E94                 retn
10007E94 ; ---------------------------------------------------------------------------
10007E95                 db  38h ; 8
10007E96 ; ---------------------------------------------------------------------------
Fixing this in IDA is as simple as going to the  adress of the label with the +1 (in this case loc_10007E88), pressing U for undefined code, then go 1 down and press C for code.
So a simple script that does that sequence will clean up the decode. 
Imhotep works a bit different, it finds null instructions like mov ecx,ecx and NOP's them out.
SiNTAX
October 8th, 2002, 01:17
Ahh finally took the time to code something up... anyway it's actually not that hard to do.. should have done it sooner 

Powered by vBulletin® Version 4.2.2 Copyright © 2018 vBulletin Solutions, Inc. All rights reserved.