CN110245467A - Android application program guard method based on Dex2C and LLVM - Google Patents
Android application program guard method based on Dex2C and LLVM Download PDFInfo
- Publication number
- CN110245467A CN110245467A CN201910394117.5A CN201910394117A CN110245467A CN 110245467 A CN110245467 A CN 110245467A CN 201910394117 A CN201910394117 A CN 201910394117A CN 110245467 A CN110245467 A CN 110245467A
- Authority
- CN
- China
- Prior art keywords
- instruction
- protected
- conversion
- dex2c
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 168
- 238000006243 chemical reaction Methods 0.000 claims abstract description 70
- 238000010009 beating Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 65
- 230000004224 protection Effects 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 19
- 238000004458 analytical method Methods 0.000 claims description 15
- 238000012546 transfer Methods 0.000 claims description 8
- 229910002056 binary alloy Inorganic materials 0.000 claims description 5
- 230000000877 morphologic effect Effects 0.000 claims description 4
- 238000003780 insertion Methods 0.000 abstract description 2
- 230000037431 insertion Effects 0.000 abstract description 2
- 230000006837 decompression Effects 0.000 abstract 1
- 230000009897 systematic effect Effects 0.000 abstract 1
- 230000036961 partial effect Effects 0.000 description 8
- 230000003068 static effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000011056 performance test Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/12—Protecting executable software
- G06F21/14—Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses the Android application program guard methods based on Dex2C and LLVM; it include: that decompression APK is obtained and parsed Dex file; obtain all necessary informations that can restore C code of every assembly instruction; the conversion of progress Dex2C is chosen whether according to assessment models; if being more than threshold value; then carry out the conversion of Dex2C: conversion pretreatment operation; including lookup method to be protected, insertion assembly instruction sentence, the syntople for establishing instruction etc., one of three sets of conversion logics are selected to be converted according to assembly instruction type one by one;It is realized based on LLVM and is virtualized when compiling, if being less than threshold value, directly execution LLVM compiles virtualization modules;After generating So file by the frame, carry out beating again packet, signature, the equivalent APK of systematic function.Present invention incorporates Dex layers and the means of defence of local layer, on the one hand can be improved the execution efficiency of APK, are on the other hand greatly improved the difficulty and cost of malicious attacker attack.
Description
Technical field
The technical field that the invention belongs to virtualize when Dex file encryption in Android application program and So file compile,
Specifically related to the conversion and Android application program guard method virtual when being compiled based on LLVM of Dex file to C file.
Background technique
In recent years, flourishing with the mobile Internet ecosystem, the quantity of mobile applications exponentially increases
It is long.It is investigated according to the one of Statista, by March, 2019, Google Play provides 2,600,000 Android altogether and answers
Use program.But due to the maturation of reverse tools chain, attacker is easy to get in valid application program using reverse tool
The core logic of So (shared object) file or classes.dex file, is then distorted, and malice generation is for example added
Code or replacement primary advertising etc., finally carry out beating again packet signature, be come into the market by illegal channels.This not only compromises to apply and open
The interests of originator, and threat is constituted to user's property and privacy, seriously affect the sound development of mobile application industry.
Most of APP are developed by Java language and C language in the market, and Java code generates Dex in compilation process
File, C code generate So file during compiling.The primary protection mode of Dex file has following several at present: whole to add
Close, part class load encryption and the virtualization of Dex file, but correspondingly, DexExtractor, ZjDroid and PackGrind work
Tool can effectively attack these three Dex layers of protection scheme.There are one the protection methods of the current mainstream of Dex file
Most important defect is that the method for shell adding can't improve the execution efficiency of Dex file.The primary protection mode of So file at present
Have: OLLVM and Upx shell adding.But corresponding attack tool or scheme have DecLLVM and Upx Shell Tools.It can be seen that
On the one hand, the problem of current protection scheme is reduced there is protective capability deficiency and protection behind efficiency, on the other hand, city at present
There are no a kind of systems that can be protected simultaneously Dex file and So file simultaneously on face.
Summary of the invention
The invention proposes a kind of Android for being based on Dex2C and LLVM (Low Level Virtual Machine)
Application program guard method can simultaneously protect Dex file and So file, effectively to resist the static state of malicious attacker
Analysis and dynamic analysis.
In order to realize above-mentioned task, the invention adopts the following technical scheme:
Android application program guard method based on Dex2C and LLVM, comprising the following steps:
Dex file is obtained from application program installation kit to be protected, it is successively parsed according to file format, is obtained
To each assembly instruction in Dex file all necessary informations that can be restored to C language code and be stored in data knot
In structure;It determines method to be protected and is revised as local layer type method and then rewrite Dex file, pre- place before being converted
Science and engineering is made;Establish assessment models, using method to be protected run when core allocating time accounting as the decision of assessment models according to
According to by the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, to avoid as far as possible frequent anti-
Penetrate the circulate operation of calling and redundancy;
If method to be protected carries out Dex2C conversion, then storage is corresponded into method to be protected in the data structure
Necessary information be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and
Restore assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restores, data transmit one
Cause property;Using the C code after conversion as object to be protected;
It, then will be in the So file in application program installation kit to be protected if method to be protected is converted without Dex2C
Entrance function as object to be protected;
Virtualization while object to be protected is compiled, the binary system So file after generating virtualization, is beaten again
Packet, signature, the application program after generating protection.
Further, all necessary informations that can be restored to C language code of the assembly instruction, including side in class
Method, the detailed information of the description of field and every instruction;
The data structure is used to store the number and content, affiliated class of register involved in every assembly instruction
Information, parameter information etc..
Further, pretreatment work before the conversion, the execution class construction including the place class in method to be protected
The subsequent relationship of forerunner for being inserted into assembly instruction sentence in device to establish between instruction.
Further, the assessment models are as follows:
Calculate allocating time itself the adjusting divided by method to be protected of function itself in the function call chain of method to be protected
It is called with, subfunction and the allocating time summation of related system API, by the threshold value comparison of calculated value and setting, if it exceeds
Method to be protected is then added conversion white list and carries out Dex2C conversion operation by threshold value, and it is black that conversion otherwise is added in method to be protected
List is without Dex2C conversion operation.
Further, the different assembly instruction that is directed to establishes different conversion logics, comprising:
Three kinds of conversion logics are established according to different assembly instructions:
The instruction of the first general type, including data manipulation instruction, return instruction, DB definition instruction and data operation
Instruction, this kind of instruction are directly translated according to the semantic information of assembly instruction;
Second of reference type instruction, including example operation instruction, method call instruction, field operations instruction, this kind of finger
It enables and Java layers of method is called by the reflection of JNI function to realize the expressed semanteme of these instructions;
The third jumps type instruction, including jump instruction, then according to instruction connection relationship carry out scope division and
The conversion of instruction.
Further, virtualization while object to be protected is compiled, comprising:
Under LLVM compiler frame, to treat protected object and carry out morphological analysis, syntactic analysis, parsing constructs its AST tree,
To generate intermediate representation IR, intermediate representation eliminates source code characteristic relevant to platform, but remains its logical AND semanteme
Information;
Fictitious order is divided into three types and carries out concrete operations by fictitious order interpreter, and respectively arithmetic operator refers to
It enables, data transfer instruction and control circulation move type instruction;
Program scheduler is used to simulate the implementation procedure of CPU, first acquisition fictitious order, indexes after decoding to fictitious order
It to interpreter, transfers to interpreter to explain the instruction control, takes back again control later and recycle the above process, until explaining institute
There is instruction;
Function body replacer executes deformation to the function body of function on the basis of intermediate representation, first deletes function body
And the signature of generating function, function signature are used to position virtual instruction address performed by the function, and by the parameter of the function
The interpreter of fictitious order is passed to initialize respective fictional register in interpreter.
The present invention has following technical characterstic compared with prior art:
It is inversely cracked 1. the present invention can effectively prevent Dex layers, the shell adding compared to traditional Dex layer encrypts skill
Art, method proposed by the present invention are converted to the C code of local layer by realizing Dex layers of code by customized converter
It realizes.It is capable of the Java method of effective protection core.
2. the present invention can simultaneously protect Dex layers of method and the method for local layer.For Dex layers of method,
This programme carries out double-encryption by virtualization scheme when Dex2C and the compiling based on LLVM.For the method for local layer, we
Case is protected by virtualization scheme when compiling based on LLVM.Malicious attacker needs simultaneously to carry out two kinds of protected modes
It analyses in depth and research, the combination of both protectiving schemes effectively increases the threshold of attack.
3. the compatibility that the present invention designs is preferably, the method to be protected that Dex2C first provides user passes through customized solution
Parser is parsed, and Dex file is converted to C code, belongs to code level conversion.It is virtualized when the subsequent compiling based on LLVM
The conversion of code rank when belonging to compiling, because the problem that compatibility is not good enough in existing Scheme of Strengthening may be not present.
4. design flexibility of the present invention is strong, it is able to use assessment models and voluntarily deploys protection scheme, improve as far as possible
The execution efficiency of APK is avoided because of performance cost brought by redundant cyclic operation, frequent JNI call operation.
5. test experiments show the application program after present invention protection compared to the application program before protection, APK packet
Volume averagely reduce the volume of 13.53%, Dex file and averagely reduce 20.72%, and the utilization rate of CPU reduces
12.51%.This is because the method for having extracted Dex layers is realized, then with the realization of local layer instead of Dex layers of realization, and
And the operation in local layer is smaller than the operation expense on DVM virtual machine;This method can effectively resist malicious attacker
Static analysis and dynamic analysis.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is system framework figure of the invention;
Fig. 3 (a) is the schematic diagram of Dex file format;Fig. 3 (b) is all types of Dalvik assembly instruction collection.
Fig. 4 (a) is the exemplary diagram to invoke-static assembly instruction conversion front and back;Fig. 4 (b) is converged to if...lez
Compile the exemplary diagram of instruction conversion front and back;
Fig. 5 (a) is the Smali instruction segment generated;Fig. 5 (b) is that intermediate code indicates;Fig. 5 (c) is according to intermediate representation
The path tree of generation.
Fig. 6 is the comparative examples figure of Java method and C method after conversion before converting;
Fig. 7 is fictitious order and its function description in LLVM virtualization modules;
Fig. 8 is code comparison's figure of virtualization front and back;
Fig. 9 (a) is that five Android application program protection front and back APK files and the volume of Dex file become in F-Droid
Change schematic diagram;Fig. 9 (b) is five Android application program protection front and back CPU usage variation schematic diagrames.
Specific embodiment
The Android application program guard method based on Dex2C and LLVM that the invention proposes a kind of, core are conversions
Method to be protected in Dex layers is corresponding C code and virtualizes when using the compiling based on LLVM.Dex2C decompresses APK first
File simultaneously parses Dex file, obtains the Node nodal information of every instruction, then to function declaration and function body into
Row conversion.This programme devises three kinds of conversion logics for 256 assembly instructions, and the C code after conversion is compiled by LLVM
Virtualization modules, form and beat again packet specific to the binary system So file after the virtualization of CPU architecture, signature generates new APK,
On the one hand application program after present invention protection can effectively reduce the volume of APK, improve the execution efficiency of APK;On the other hand
Also malicious attacker is capable of increasing in Dex layers and the reverse intrusion scene of local layer, and Dex layers of methods to be protected are realized
Double-encryption.
A kind of Android application program guard method based on Dex2C and LLVM, comprising the following steps:
Step 1, Dex file is obtained from application program installation kit to be protected, it is carried out successively according to file format
Parsing, obtains all necessary informations that can be restored to C language code of each assembly instruction in Dex file and is stored in
In data structure.
The basic process of the step be decompress the APK installation package file of application program to be protected with obtain Dex file and
AndroidManifest.xml file parses xml document to obtain main-inlet class;By Dex file according to file format carry out by
Layer parsing parses each of each of Dex file Class, Class Method method as shown in Fig. 3 (a) respectively
And each assembly instruction in Method method;Finally obtain the institute that can be restored to C file of each assembly instruction
It is necessary to information, specific as follows:
Step 1.1, Android application program installation kit to be protected is unpacked to obtain Dex file and AndroidMani
Fest.xml file parses xml document, obtains all Activity entrance classes.
Step 1.2, it is parsed according to the Dex file format as shown in Fig. 3 (a), first parsing dex_header field
Information gets the offset and size of several fields such as string_ids, type_ids, method_ids in dex file, so as to
The initial address to each field and end address is accurately positioned.
Step 1.3, the parsing for starting progress class_defs and method_ids field, mainly obtains from class_defs
Type, parent type and the corresponding static state/instance field of each class, directly/void method positional shift, related explain are believed
Breath etc..Then each method is parsed as unit of class, the affiliated class of this method is mainly obtained from method_ids
Information, parameter information and method name information.
Step 1.4, every assembly instruction is parsed as unit of method, based on BakSmali decompiling engine we
Case gets the corresponding relationship between binary system and Dalvik assembly instruction, finally obtain enough restore C file institute it is necessary to
Shown in information such as Fig. 5 (a);Necessary information includes method in class, the description of field and detailed information of every instruction etc..
The detailed information that every instructs is stored in data structure InsInfoNode by this programme.
Specifically, the Dalvik virtual machine instruction set of Google share 256 assembly instructions for example move, return,
New-instance, goto, if-eq, cmpl-float, invoke-virtual, add-type etc., industry instruct function according to it
The difference of energy is divided into 14 seed types, and shown in type information such as Fig. 3 (b), this programme constructs data structure
InsInfoNode houses the number and content, the information of affiliated class, parameter letter of register involved in every assembly instruction
Breath etc. stores different data informations according to the different this programme for reading instruction type in the data structure.If reading
Be const_string etc data definition type assembly instruction, this programme only need to obtain its be stored in which number post
The value of storage, operation code and String type is stored in InsInfoNode data structure.And for invoke_direct
Etc method call type assembly instruction, this programme not only need to obtain the affiliated category information of called method, parameter letter
Breath and method name information etc., and the number, value and opcode information etc. for needing to obtain register, are successively read, thus
The detailed information of the every instruction encapsulated into the Method.
Step 2, it determines method to be protected and is revised as local layer type method and then rewrite Dex file, carry out
Pretreatment work before converting.
In the step, class and method name where the method to be protected mainly specified by user are determined uniquely wait protect
Maintaining method, construction call the assembly instruction sentence of local layer interpreter code, the method to be protected found are revised as local layer
Type method rewrites Dex file, specific as follows:
Step 2.1, user specifies the class and method name where method to be protected, according to both information in class_
Defs field and method_ids field traversal, so that it is determined that unique method to be protected.
In the present embodiment, this programme is to the alarm clock application program talalarmo.apk in open source shop F-Droid
It is protected, the entitled onStartCommand of the method for the guard method of this programme, the entitled trikita.talalarmo.a of class
larm.AlarmService.The function of this method is to play the tinkle of bells and jump to the new interface Activity.
Step 2.2, the pretreatment before conversion
Insertion compilation refers in the execution class constructor Clinit of the place class of the method to be protected got by step 2.1
The subsequent relationship of forerunner that sentence is enabled to establish between instruction;The present embodiment is herein are as follows:
“const-string v1
"libDex2C";
invoke-static{v1},Ljava/lang/System;->loadLibrary(Ljava/lang/String;)
V;".
The attribute for the method to be protected that step 2.1 is got is changed to native type (local channel type) at this time, then
Dex file is written over.The purpose of these operations is easy for the So generated in the initial phase load subsequent compilation of class
File.
Step 3, assessment models are established, core allocating time accounting when method to be protected is run is as assessment models
Decision-making foundation, by the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, to avoid frequency as far as possible
Numerous reflection is called and the circulate operation of redundancy.
Assessment models are established, JNI is the Java layers of bridge communicated with Native layers, when JNI function is called by local layer
Provided parameter realizes that reflection is called, and this mode is more than direct Dalvik virtual machine time-consuming.In view of side to be converted
Method may be called after conversion comprising excessive JNI, will certainly cause the decline of performance, and this system introduces assessment models, to the greatest extent
The possible execution efficiency for improving APK.
In the present solution, core allocating time (all functions itself in function call chain when method to be protected is run
Allocating time) decision-making foundation of the accounting as assessment models.
When the calling of each function of Android application program before protecting is collected using the CPU Profiler of Google first
Between, all called subfunctions of the current method to be protected of recursive lookup, until reaching all subfunction bottoms, i.e. subfunction
It is middle without other function call situations, the function call chain of method to be protected is generated according to call relation formed above.Herein
The current function (method) to be protected of this programme are as follows: onStartCommand.
The assessment models are as follows:
Calculate allocating time itself the adjusting divided by method to be protected of function itself in the function call chain of method to be protected
It is called with, subfunction and the allocating time summation of related system API, by the threshold value comparison of calculated value and setting, if it exceeds
Method to be protected is then added conversion white list and carries out Dex2C conversion operation, executes step 4 by threshold value;Otherwise by method to be protected
Conversion blacklist is added without Dex2C conversion operation, executes step 5.In the present embodiment, 60% is set a threshold to, at this
In user can according to protection intensity and execution efficiency voluntarily deploy, if pursue execution efficiency, threshold value can be turned up;If pursuing
Intensity is protected, then is turned down threshold value.
As an example it is assumed that method to be protected is method A, itself allocating time of method A accounts for method A and executes total time
Method B is had invoked in 40%, method A, the execution time of method B accounts for 35% that method A executes total time, phase relation in method A
The time accounting of system API is 25%;Itself allocating time of method B accounts for 50% that method B executes total time, and method B has invoked
The execution time of method C, method C account for method B always executes the time 35%, and the time accounting of related system API is in method B
15%;For method C without the calling of other functions, its own allocating time accounts for 80% that method A executes total time, phase in method C
The time accounting of relationship system API is 20%.Therefore its core allocating time be 40%+35%* (50%+35%*80%)=
67.3%, it is more than threshold value 60%, therefore by the method write-in conversion white list to be protected, then executes step 4.
Step 4, Dex2C conversion operation
If method to be protected carries out Dex2C conversion, then storage is corresponded into method to be protected in the data structure
Necessary information be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and
Restore assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restores, data transmit one
Cause property;Using the C code after conversion as object to be protected;It is specific as follows:
Step 4.1, by the compilation in the InsInfoNode data structure established in step 1.4, corresponding to method to be protected
Instruction can be restored to all necessary informations of C language code and be converted to C language code, start to function declaration and function body into
Row conversion.
In the present embodiment, function to be protected is onStartCommand.It is selected for different assembly instruction type this programme
Different conversion logics, this programme establishes three sets of conversion logics herein, is incremented by using simulation Register approach and Partial Variable
Partial Variable method of randomization ensure that the correct consistency restored and data are transmitted of Dex2C type.
Step 4.1.1, the conversion of function declaration
It solves the problems, such as function overloading using static registration method, method name and parameter is subjected to integration conversion.
Method in the present embodiment, before protection are as follows:
public int onStartCommand(Intent intent,int flags,int startId).
Static registration method after conversion are as follows:
JNIEXPORT jint Java_trikita_talalarmo_alarm_AlarmService_
onStartCommand (JNIEnv*env,jobject a0,jstring a1,jint a2,jint a3).
Wherein env is a reference for Android virtual machine environment, and a0 is local variable registers, remaining is ginseng
Number registers, this effective solution function overloading problem.
Step 4.1.2, the conversion of function body
Different conversion logics is established for different instruction type, this programme is broadly divided into three kinds of conversion logics herein:
The first, general type instruction
Including data manipulation instruction, return instruction, DB definition instruction and data operation instruction etc., this kind compares
It is relatively simple, it is directly translated according to the semantic information of assembly instruction.With const-string vx, string_id data definition refers to
For order, semantic information is to construct a character string by community string index community and be assigned to register vx, therefore this programme is direct
Corresponding character string information is obtained according to offset address in string_ids Rigen.And it will be assigned in the character string got corresponding
Variable.C code after conversion are as follows: char*dqP=" java/lang/Math ";Cmp-long v1 for another example, v1, v2 ratio
Compared with instruction etc..Its semantic information are as follows: compare two lint-long integer type numbers, if the value of v1 register is greater than the value of v2 register, tie
Fruit is 1, and equal is 0, and being less than is -1.Therefore the C code after converting are as follows: jint a16=(a15>a14? 1:(a15<a14? -1:
0))。
Second, reference type instruction
Including example operation instruction, method call instruction, field operations instruction etc., this kind of assembly instructions mainly pass through
JNI reflection calls Java layers of method to realize the expressed semanteme of these instructions.With invoke-static
{ parameters }, for the instruction of methodcall method call.Its semantic information is to call the static method of example.This programme
The method name of call method, the character string information of class name and parameter type are got from string_ids first, is passed through
FindClass method gets corresponding jclass object, analytic parameter list information to get corresponding content of registers,
The character string information obtained just now and jclass object are configured to jmethodId object by GetStaticMethodID,
It finally adjusts CallStaticLongMethodA method to realize that reflection obtains, finally result is back in corresponding register.Turn
It changes shown in front and back example code such as Fig. 4 (a).
The third, jumps type instruction
Including jump instruction, then the conversion of division and the instruction of scope is carried out according to instruction connection relationship;This kind refers to
The instruction connection relationship mainly established according to step 4.2 is enabled to be converted.With if-lez vx, target jump instruction is
Example is illustrated, if the value that this semantic information is vx register is less than or equal to zero and jumps at target.Therefore basis
Connection relationship between if-lez assembly instruction and next instruction, LabelInsNode instruction, the C code after converting is if
(a17≤0) goto L78b66d36;{...}L78b66d36;As shown in Fig. 4 (b).
Step 4.2, restore assembly instruction forerunner, it is subsequent between connection relationship
Normal sequence traversal can not solve the problems, such as that scope of a variable, this programme use depth-first traversal to step herein
Sequence node in 1.4 each path carries out translating operation, until such as step 4.2 institute of the instruction morphing completion on all nodes
Show.It constantly transmits when variable addressable under prescope and variable and posts according to the set membership between node in translation process
Incidence relation between storage, this allows the variable in higher level's scope by the effect domain browsing of low level, each node
The translation that present node is completed according to Available Variables, shown in the intermediate representation such as Fig. 5 (b) for generating a kind of C code.In
Between indicate to establish shown in path tree such as Fig. 5 (c).To reach the company between dividing scope, effectively setting up forerunner and is subsequent
Connect relationship.
Step 4.3, guarantee the consistency that assembly instruction type is correctly restored, data are transmitted
This programme ensure that using the method that simulation Register approach and Partial Variable are incremented by Partial Variable randomization
The correct consistency restored and data are transmitted of Dex2C type.
This programme establishes the register of 15 simulations altogether, completes the data between instruction using the association of register and passes
It passs.For example for assembly statement:
iget-wide v2,p0,Lcom/uberspot/a2048/MainActivity;->mLastBackPress:J;
Sub-long v2, v0, v2,
This programme stores the reflection results that first assembly statement obtains into v2 register, and second is then to make first
The value of v2 register is subtracted with the value of v0 register.Finally the result of register is stored to v2 register again.Therefore make
The consistency of data transmitting is completed with the incidence relation of register.
How is variable so after register value conversion C code named? this programme is used here as Partial Variable name
It is incremented by and the method for Partial Variable name randomization is completed.For method call instruction and field operations instruction etc., due to turning
Partial Variable after changing is not related to the read-write for register, and this programme carries out variable life by the way of variable randomization
Name, it is only necessary to which the final result of this assembly instruction processing is assigned to corresponding register.And other instructions are directed to, this
Scheme carries out variable naming by the way of the increasing of variable name class, because register is typeless, and variable has type.Cause
, it is possible that the variable that some register is stored at upper one is int type, the variable of next deposit is double class for this
The case where type.This programme uses the operation that variable adds up.In storing process, if the type of variable does not change, then
It is cumulative without variable name.If changing, then variable accumulation operations.At the same time, it is established that register and nearest variable name
Between incidence relation.In reading process, this programme only needs to get nearest variable name by register.
In the present embodiment, if first time write operation is the value that int type is written in v0 register, second of write operation
For the value that double type is written in v0 register.Third time operation is to read the value of v0 register.At this point, this programme need by
First variable naming is a1, and type jint, second variable naming is a2, type jdouble, and third time operates
Then be read be a2 value.
This completes the conversions for customized guard method in Dex file.C code this programme write-in after conversion
In Dex2C.cpp file.The code of complete conversion anterior-posterior approach is as shown in Figure 6.
Step 4.4, after the completion of code conversion, using the C code after converting as object to be protected.
And such as method to be protected is converted without Dex2C, then by the So file in application program installation kit to be protected
In entrance function as object to be protected.
Step 5, virtualization while object to be protected being compiled, the binary system So file after generating virtualization.
If result of the method to be protected after assessment models are assessed is more than threshold value, by the C generation after step 4 conversion
Code is used as object to be protected, is compiled virtualization operations into LLVM virtualization modules;Otherwise directly by Android application program
In entrance function (JNI_Onload method) in original So file as input, be compiled into LLVM virtualization modules
Virtualization operations, the binary file after generating virtualization.
JNI_OnLoad function is the entrance function of So file, which is broadly divided into three phases, first stage code
The built-in api of LLVM is finally called to generate by morphological analysis, syntactic analysis, parsing AST grammer tree operations into the front end Clang
Suffix is the intermediate representation of " .ll ".Second stage constructs virtual component, mainly constructs fictitious order, fictitious order interpreter, journey
Sequence scheduler, function body replacer etc..Phase III integration compiling chain, generates the Clang compiler for specifically obscuring function, specifically
It is as follows:
Step 5.1, object to be protected is generated by processes such as morphological analysis, syntactic analysis, AST the tree buildings of Clang
With platform-independent intermediate representation IR.Original program code is split and is converted into custom instruction system by LLVM virtualization modules
Structure, it is intended to will be procedure complicated.
The operational process of virtualization code is that dynamic explains the process for executing custom instruction, rather than it is original for being changed source
C code simultaneously executes.LLVM virtualization modules are configured similarly to the fictitious order of JVM framework first.Fictitious order generating process is first
First according to the quantity and size of temporary variable used in IR, determines the virtual register space size needed, make virtually to refer to
Enable the distribution movement for completing dynamic memory.Fictitious order simulates the logic flow for completing original program on stack, wherein virtually posting
Storage is used to assist the storage of pilot process.Finally destroy the fictitious order run.Fig. 7 is partial virtual instruction name and retouches
It states.
Step 5.2, fictitious order is only the customized expression of IR, can not directly give back target execution, virtually refer to
Enable interpreter for explaining that fictitious order is divided into three types and carries out concrete operations by customized fictitious order, interpreter,
Respectively arithmetic operator instruction, data transfer instruction and control circulation move type instruction.Such as store instruction definition is data
Transfer instruction, the corresponding explanation of interpreter are as follows:
Value v1=vmdata [vpc++];
Value v2=stack [stack_index--];
if(!Reg[v1])
alloc(Reg[v1]);
Reg [v1]=cast_i64 (v2)
This process modifies virtual program counter first to obtain the index of memory space, is stored later from stack top acquisition
Object, and the object is put into corresponding memory space.
Step 5.3, program scheduler is used to simulate the implementation procedure of CPU, and scheduler obtains fictitious order first, to virtual
Interpreter is indexed after instruction decoding, transfers to interpreter to explain the instruction control, is taken back again control later and is recycled above-mentioned
Process, until explaining all instructions.
Step 5.4, function body replacer executes deformation to the function body of function on the basis of intermediate representation, it is therefore intended that
Original program implementation procedure is replaced with into virtual interpretation process.Function body is deleted to the signature of simultaneously generating function, function signature first
For positioning virtual instruction address performed by the function, and the parameter of the function is passed into the interpreter of fictitious order with first
Respective fictional register in beginningization interpreter.Fig. 8 be function body variation front and back source code equivalent representation and final two into
Result processed.
Step 5.5, it also needs to integrate compiling chain to realize that compiling is virtual.And NDK has integrated clang,
Middle above step is realized with LLVM analysis pass, transfers to passManager to be managed collectively the analysis pass organized, most
Source code after recompilating change afterwards, which generates, has the compiler for obscuring function.
In order to activate compiling chain, need to pass through NDK_ in the Application.mk that just-ahead-of-time compilation generates
TOOLCHAIN_VERSION parameter specifies virtual compiler, is specified using-mllvm-vm in Android.mk virtual
Parameter required for changing.Clang receives virtualization parameter and calls PassManagerBuilder to determine whether to be compiled void
Quasi-ization function.It can be seen that virtualization step is all based on IR above, therefore virtualization process is naturally compatible with each platform, into
The code of row virtualization will call the rear end LLVM to generate binary executable (So file) relevant to platform.
Step 6, the Dex file after the So file and step 2.2 of the generation after compiling being rewritten, other resource files carry out
Beat again packet, signature, the Android application program after ultimately generating the protection equivalent with APK function before protection.
Experimental section:
Inventor has done following performance test and attack experiment:
The platform of performance test are as follows: test machine is Google Nexus 5, and Android version is Android4.4.2, is surveyed
Examination APK is higher five application programs of download in open source shop F-Droid: BMI settles accounts device BMICalculate.APK, stream
Capable game 2048.APK, two-dimensional code scanning tool QRScanner.APK, alarm clock program talalarmo.apk and notepad
Program JustNote.APK.
Shown in volume change such as Fig. 9 (a) of protection front and back APK volume and Dex volume, CPU usage variation such as Fig. 9 (b)
It is shown;Wherein CPU usage is the average value of 50 measuring and calculating.As seen from the figure, the volume of APK packet averagely reduces 13.53%,
The volume of Dex file averagely reduces 20.72%.And the utilization rate of CPU reduces 12.51%.The reduction of CPU is because in background
The execution of layer is more much faster than executing on DVM virtual machine, and C code is directly compiled into machine code and is executed.APK packet
It with the reduction of Dex file is reappeared in local layer, but So file volume itself because we have extracted Dex layers of realization
It is smaller, so APK volume is to reduce on the whole.
The tool of challenge trial is respectively: interactive disassembler IDA Pro, Android decompiling instrument
The script DecLLVM of AndroidKiller, Dex layers of shelling tool PackGrind, OLLVM confrontation.Object of attack is protection
Android application program afterwards.
Challenge trial shows that the above common reverse tool is invalid to the application program after our protections, because I
Tool be not simple shell adding, encryption.Local layer and Dex layers of double-encryption effectively prevent malicious attacker
Static and dynamic attacks.
Claims (6)
1. the Android application program guard method based on Dex2C and LLVM, which comprises the following steps:
Dex file is obtained from application program installation kit to be protected, it is successively parsed according to file format, is obtained
All necessary informations that can be restored to C language code of each assembly instruction in Dex file are simultaneously stored in data structure
In;It determines method to be protected and is revised as local layer type method and then rewrite Dex file, carry out converting preceding pretreatment
Work;Establish assessment models, core allocating time accounting when method to be protected is run as the decision-making foundation of assessment models,
By the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, adjusted to avoid frequently reflecting as far as possible
With the circulate operation with redundancy;
If method to be protected carries out Dex2C conversion, then by storage in the data structure corresponding to method to be protected must
It wants information to be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and is restored
Assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restore, data transmitting it is consistent
Property;Using the C code after conversion as object to be protected;
If method to be protected is converted without Dex2C, then by entering in the So file in application program installation kit to be protected
Mouth function is as object to be protected;
Virtualization while object to be protected is compiled, the binary system So file after generating virtualization carry out beating again packet, label
Name, the application program after generating protection.
2. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute
All necessary informations that can be restored to C language code for the assembly instruction stated, including method in class, the description of field and every
The detailed information of item instruction;
The data structure is used to store the number of register involved in every assembly instruction and the letter of content, affiliated class
Breath, parameter information etc..
3. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute
Pretreatment work before the conversion stated is inserted into assembly instruction sentence in the execution class constructor including the place class in method to be protected
With the subsequent relationship of forerunner established between instruction.
4. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute
The assessment models stated are as follows:
The allocating time of function itself in the function call chain of method to be protected is calculated divided by itself calling of method to be protected, son
The allocating time summation of function call and related system API, by the threshold value comparison of calculated value and setting, if it exceeds the threshold,
Conversion white list then is added in method to be protected and carries out Dex2C conversion operation, conversion blacklist otherwise is added in method to be protected
Without Dex2C conversion operation.
5. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute
The different assembly instruction that is directed to stated establishes different conversion logics, comprising:
Three kinds of conversion logics are established according to different assembly instructions:
The instruction of the first general type, including data manipulation instruction, return instruction, DB definition instruction and data operation refer to
It enables, this kind of instruction is directly translated according to the semantic information of assembly instruction;
Second of reference type instruction, including example operation instruction, method call instruction, field operations instruction, this kind of instruction are logical
It crosses the reflection of JNI function and calls Java layers of method to realize the expressed semanteme of these instructions;
The third jumps type instruction, including jump instruction, then the division and instruction of scope are carried out according to instruction connection relationship
Conversion.
6. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute
Virtualization while being compiled object to be protected stated, comprising:
Under LLVM compiler frame, to treat protected object and carry out morphological analysis, syntactic analysis, parsing constructs its AST tree, thus
Intermediate representation IR is generated, intermediate representation eliminates source code characteristic relevant to platform, but remains its logical AND semantic information;
Fictitious order is divided into three types and carries out concrete operations, respectively arithmetic operator instruction, number by fictitious order interpreter
Type instruction is moved according to transfer instruction and control circulation;
Program scheduler is used to simulate the implementation procedure of CPU, first acquisition fictitious order, indexes solution after decoding to fictitious order
Device is released, transfers to interpreter to explain the instruction control, control is taken back again later and recycles the above process, until explaining all fingers
It enables;
Function body replacer executes deformation to the function body of function on the basis of intermediate representation, and function body is deleted and given birth to first
At the signature of function, function signature is used to position virtual instruction address performed by the function, and the parameter of the function is transmitted
To the interpreter of fictitious order to initialize respective fictional register in interpreter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910394117.5A CN110245467B (en) | 2019-05-13 | 2019-05-13 | Android application program protection method based on Dex2C and LLVM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910394117.5A CN110245467B (en) | 2019-05-13 | 2019-05-13 | Android application program protection method based on Dex2C and LLVM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110245467A true CN110245467A (en) | 2019-09-17 |
CN110245467B CN110245467B (en) | 2023-02-07 |
Family
ID=67884280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910394117.5A Active CN110245467B (en) | 2019-05-13 | 2019-05-13 | Android application program protection method based on Dex2C and LLVM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245467B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673929A (en) * | 2019-09-29 | 2020-01-10 | 北京智游网安科技有限公司 | Method for protecting abnormal mechanism, intelligent terminal and storage medium |
CN111597514A (en) * | 2020-04-15 | 2020-08-28 | 卓望数码技术(深圳)有限公司 | Android source code protection method and device |
CN112163195A (en) * | 2020-10-14 | 2021-01-01 | 北京邮电大学 | Novel virtual machine software protection method based on stack hiding |
CN112287306A (en) * | 2020-10-29 | 2021-01-29 | 中国银联股份有限公司 | Protection method and device for application program installation package and computer readable storage medium |
CN112989290A (en) * | 2021-03-10 | 2021-06-18 | 四川长虹格润环保科技股份有限公司 | Multi-compatibility code reinforcing method |
CN113626773A (en) * | 2020-05-06 | 2021-11-09 | 上海蜚语信息科技有限公司 | Code protection method based on intermediate language |
CN113836495A (en) * | 2021-09-25 | 2021-12-24 | 上海蛮犀科技有限公司 | Method for equivalently converting Java code into C + + code |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200919210A (en) * | 2007-07-18 | 2009-05-01 | Steven Kays | Adaptive electronic design |
US20140109068A1 (en) * | 2010-12-06 | 2014-04-17 | Flexycore | Method for compiling an intermediate code of an application |
CN104866734A (en) * | 2014-02-25 | 2015-08-26 | 北京娜迦信息科技发展有限公司 | DEX (Dalvik VM executes) file protecting method and device |
CN106201872A (en) * | 2016-07-05 | 2016-12-07 | 北京鼎源科技有限公司 | A kind of running environment detection method of android system |
CN106778100A (en) * | 2016-12-01 | 2017-05-31 | 北京智游网安科技有限公司 | Obscuring Compilation Method and obscure compiler based on Android platform and ios platform |
CN107480476A (en) * | 2017-06-15 | 2017-12-15 | 西北大学 | A kind of Android local layer compiling of instruction based on ELF infection virtualizes shell adding method |
US20180262388A1 (en) * | 2006-09-25 | 2018-09-13 | Weaved, Inc. | Remote device deployment |
-
2019
- 2019-05-13 CN CN201910394117.5A patent/CN110245467B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180262388A1 (en) * | 2006-09-25 | 2018-09-13 | Weaved, Inc. | Remote device deployment |
TW200919210A (en) * | 2007-07-18 | 2009-05-01 | Steven Kays | Adaptive electronic design |
US20140109068A1 (en) * | 2010-12-06 | 2014-04-17 | Flexycore | Method for compiling an intermediate code of an application |
CN104866734A (en) * | 2014-02-25 | 2015-08-26 | 北京娜迦信息科技发展有限公司 | DEX (Dalvik VM executes) file protecting method and device |
CN106201872A (en) * | 2016-07-05 | 2016-12-07 | 北京鼎源科技有限公司 | A kind of running environment detection method of android system |
CN106778100A (en) * | 2016-12-01 | 2017-05-31 | 北京智游网安科技有限公司 | Obscuring Compilation Method and obscure compiler based on Android platform and ios platform |
CN107480476A (en) * | 2017-06-15 | 2017-12-15 | 西北大学 | A kind of Android local layer compiling of instruction based on ELF infection virtualizes shell adding method |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673929A (en) * | 2019-09-29 | 2020-01-10 | 北京智游网安科技有限公司 | Method for protecting abnormal mechanism, intelligent terminal and storage medium |
CN111597514A (en) * | 2020-04-15 | 2020-08-28 | 卓望数码技术(深圳)有限公司 | Android source code protection method and device |
CN111597514B (en) * | 2020-04-15 | 2023-10-13 | 卓望数码技术(深圳)有限公司 | An Zhuoyuan code protection method and device |
CN113626773A (en) * | 2020-05-06 | 2021-11-09 | 上海蜚语信息科技有限公司 | Code protection method based on intermediate language |
CN113626773B (en) * | 2020-05-06 | 2024-04-02 | 上海蜚语信息科技有限公司 | Code protection method based on intermediate language |
CN112163195A (en) * | 2020-10-14 | 2021-01-01 | 北京邮电大学 | Novel virtual machine software protection method based on stack hiding |
CN112163195B (en) * | 2020-10-14 | 2022-08-05 | 北京邮电大学 | Virtual machine software protection method based on stack hiding |
CN112287306A (en) * | 2020-10-29 | 2021-01-29 | 中国银联股份有限公司 | Protection method and device for application program installation package and computer readable storage medium |
CN112287306B (en) * | 2020-10-29 | 2024-04-26 | 中国银联股份有限公司 | Protection method and device for application program installation package and computer readable storage medium |
CN112989290A (en) * | 2021-03-10 | 2021-06-18 | 四川长虹格润环保科技股份有限公司 | Multi-compatibility code reinforcing method |
CN113836495A (en) * | 2021-09-25 | 2021-12-24 | 上海蛮犀科技有限公司 | Method for equivalently converting Java code into C + + code |
Also Published As
Publication number | Publication date |
---|---|
CN110245467B (en) | 2023-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245467A (en) | Android application program guard method based on Dex2C and LLVM | |
CN108614960B (en) | JavaScript virtualization protection method based on front-end byte code technology | |
Van Emmerik | Static single assignment for decompilation | |
Sharif et al. | Automatic reverse engineering of malware emulators | |
CN111770206B (en) | Method for deploying intelligent contract, block chain node and storage medium | |
Cai et al. | Certified self-modifying code | |
Schoepe et al. | Explicit secrecy: A policy for taint tracking | |
CN108733988A (en) | The guard method of executable program on Android platform | |
CN108681457A (en) | The Android application program guard methods explained with residual code based on code sinking | |
Myreen | Formal verification of machine-code programs | |
Schneider et al. | Bridging the semantic gap through static code analysis | |
CN103514027B (en) | Method for enhancing usability of software protection | |
Tamboli et al. | Metamorphic code generation from LLVM bytecode | |
Zhao et al. | Compile-time code virtualization for android applications | |
Cifuentes et al. | Experience in the design, implementation and use of a retargetable static binary translation framework | |
Yuan et al. | End-to-end Mechanized Proof of an eBPF Virtual Machine for Micro-controllers | |
Borzacchiello et al. | SENinja: A symbolic execution plugin for Binary Ninja | |
Sayed et al. | If-transpiler: Inlining of hybrid flow-sensitive security monitor for JavaScript | |
Liu et al. | Proving LTL properties of bitvector programs and decompiled binaries | |
Guo et al. | A survey of obfuscation and deobfuscation techniques in android code protection | |
Brandl et al. | Modular Abstract Definitional Interpreters for WebAssembly | |
Wang et al. | Leveraging WebAssembly for Numerical JavaScript Code Virtualization | |
Wichelmann et al. | MAMBO–V: Dynamic Side-Channel Leakage Analysis on RISC–V | |
Zhou et al. | WASMOD: Detecting vulnerabilities in Wasm smart contracts | |
Dominiak et al. | Efficient approach to fuzzing interpreters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |