CN110245467A - Android application program guard method based on Dex2C and LLVM - Google Patents

Android application program guard method based on Dex2C and LLVM Download PDF

Info

Publication number
CN110245467A
CN110245467A CN201910394117.5A CN201910394117A CN110245467A CN 110245467 A CN110245467 A CN 110245467A CN 201910394117 A CN201910394117 A CN 201910394117A CN 110245467 A CN110245467 A CN 110245467A
Authority
CN
China
Prior art keywords
instruction
protected
conversion
dex2c
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910394117.5A
Other languages
Chinese (zh)
Other versions
CN110245467B (en
Inventor
汤战勇
何中凯
张宇翔
王薇
龚晓庆
陈晓江
房鼎益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN201910394117.5A priority Critical patent/CN110245467B/en
Publication of CN110245467A publication Critical patent/CN110245467A/en
Application granted granted Critical
Publication of CN110245467B publication Critical patent/CN110245467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses the Android application program guard methods based on Dex2C and LLVM; it include: that decompression APK is obtained and parsed Dex file; obtain all necessary informations that can restore C code of every assembly instruction; the conversion of progress Dex2C is chosen whether according to assessment models; if being more than threshold value; then carry out the conversion of Dex2C: conversion pretreatment operation; including lookup method to be protected, insertion assembly instruction sentence, the syntople for establishing instruction etc., one of three sets of conversion logics are selected to be converted according to assembly instruction type one by one;It is realized based on LLVM and is virtualized when compiling, if being less than threshold value, directly execution LLVM compiles virtualization modules;After generating So file by the frame, carry out beating again packet, signature, the equivalent APK of systematic function.Present invention incorporates Dex layers and the means of defence of local layer, on the one hand can be improved the execution efficiency of APK, are on the other hand greatly improved the difficulty and cost of malicious attacker attack.

Description

Android application program guard method based on Dex2C and LLVM
Technical field
The technical field that the invention belongs to virtualize when Dex file encryption in Android application program and So file compile, Specifically related to the conversion and Android application program guard method virtual when being compiled based on LLVM of Dex file to C file.
Background technique
In recent years, flourishing with the mobile Internet ecosystem, the quantity of mobile applications exponentially increases It is long.It is investigated according to the one of Statista, by March, 2019, Google Play provides 2,600,000 Android altogether and answers Use program.But due to the maturation of reverse tools chain, attacker is easy to get in valid application program using reverse tool The core logic of So (shared object) file or classes.dex file, is then distorted, and malice generation is for example added Code or replacement primary advertising etc., finally carry out beating again packet signature, be come into the market by illegal channels.This not only compromises to apply and open The interests of originator, and threat is constituted to user's property and privacy, seriously affect the sound development of mobile application industry.
Most of APP are developed by Java language and C language in the market, and Java code generates Dex in compilation process File, C code generate So file during compiling.The primary protection mode of Dex file has following several at present: whole to add Close, part class load encryption and the virtualization of Dex file, but correspondingly, DexExtractor, ZjDroid and PackGrind work Tool can effectively attack these three Dex layers of protection scheme.There are one the protection methods of the current mainstream of Dex file Most important defect is that the method for shell adding can't improve the execution efficiency of Dex file.The primary protection mode of So file at present Have: OLLVM and Upx shell adding.But corresponding attack tool or scheme have DecLLVM and Upx Shell Tools.It can be seen that On the one hand, the problem of current protection scheme is reduced there is protective capability deficiency and protection behind efficiency, on the other hand, city at present There are no a kind of systems that can be protected simultaneously Dex file and So file simultaneously on face.
Summary of the invention
The invention proposes a kind of Android for being based on Dex2C and LLVM (Low Level Virtual Machine) Application program guard method can simultaneously protect Dex file and So file, effectively to resist the static state of malicious attacker Analysis and dynamic analysis.
In order to realize above-mentioned task, the invention adopts the following technical scheme:
Android application program guard method based on Dex2C and LLVM, comprising the following steps:
Dex file is obtained from application program installation kit to be protected, it is successively parsed according to file format, is obtained To each assembly instruction in Dex file all necessary informations that can be restored to C language code and be stored in data knot In structure;It determines method to be protected and is revised as local layer type method and then rewrite Dex file, pre- place before being converted Science and engineering is made;Establish assessment models, using method to be protected run when core allocating time accounting as the decision of assessment models according to According to by the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, to avoid as far as possible frequent anti- Penetrate the circulate operation of calling and redundancy;
If method to be protected carries out Dex2C conversion, then storage is corresponded into method to be protected in the data structure Necessary information be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and Restore assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restores, data transmit one Cause property;Using the C code after conversion as object to be protected;
It, then will be in the So file in application program installation kit to be protected if method to be protected is converted without Dex2C Entrance function as object to be protected;
Virtualization while object to be protected is compiled, the binary system So file after generating virtualization, is beaten again Packet, signature, the application program after generating protection.
Further, all necessary informations that can be restored to C language code of the assembly instruction, including side in class Method, the detailed information of the description of field and every instruction;
The data structure is used to store the number and content, affiliated class of register involved in every assembly instruction Information, parameter information etc..
Further, pretreatment work before the conversion, the execution class construction including the place class in method to be protected The subsequent relationship of forerunner for being inserted into assembly instruction sentence in device to establish between instruction.
Further, the assessment models are as follows:
Calculate allocating time itself the adjusting divided by method to be protected of function itself in the function call chain of method to be protected It is called with, subfunction and the allocating time summation of related system API, by the threshold value comparison of calculated value and setting, if it exceeds Method to be protected is then added conversion white list and carries out Dex2C conversion operation by threshold value, and it is black that conversion otherwise is added in method to be protected List is without Dex2C conversion operation.
Further, the different assembly instruction that is directed to establishes different conversion logics, comprising:
Three kinds of conversion logics are established according to different assembly instructions:
The instruction of the first general type, including data manipulation instruction, return instruction, DB definition instruction and data operation Instruction, this kind of instruction are directly translated according to the semantic information of assembly instruction;
Second of reference type instruction, including example operation instruction, method call instruction, field operations instruction, this kind of finger It enables and Java layers of method is called by the reflection of JNI function to realize the expressed semanteme of these instructions;
The third jumps type instruction, including jump instruction, then according to instruction connection relationship carry out scope division and The conversion of instruction.
Further, virtualization while object to be protected is compiled, comprising:
Under LLVM compiler frame, to treat protected object and carry out morphological analysis, syntactic analysis, parsing constructs its AST tree, To generate intermediate representation IR, intermediate representation eliminates source code characteristic relevant to platform, but remains its logical AND semanteme Information;
Fictitious order is divided into three types and carries out concrete operations by fictitious order interpreter, and respectively arithmetic operator refers to It enables, data transfer instruction and control circulation move type instruction;
Program scheduler is used to simulate the implementation procedure of CPU, first acquisition fictitious order, indexes after decoding to fictitious order It to interpreter, transfers to interpreter to explain the instruction control, takes back again control later and recycle the above process, until explaining institute There is instruction;
Function body replacer executes deformation to the function body of function on the basis of intermediate representation, first deletes function body And the signature of generating function, function signature are used to position virtual instruction address performed by the function, and by the parameter of the function The interpreter of fictitious order is passed to initialize respective fictional register in interpreter.
The present invention has following technical characterstic compared with prior art:
It is inversely cracked 1. the present invention can effectively prevent Dex layers, the shell adding compared to traditional Dex layer encrypts skill Art, method proposed by the present invention are converted to the C code of local layer by realizing Dex layers of code by customized converter It realizes.It is capable of the Java method of effective protection core.
2. the present invention can simultaneously protect Dex layers of method and the method for local layer.For Dex layers of method, This programme carries out double-encryption by virtualization scheme when Dex2C and the compiling based on LLVM.For the method for local layer, we Case is protected by virtualization scheme when compiling based on LLVM.Malicious attacker needs simultaneously to carry out two kinds of protected modes It analyses in depth and research, the combination of both protectiving schemes effectively increases the threshold of attack.
3. the compatibility that the present invention designs is preferably, the method to be protected that Dex2C first provides user passes through customized solution Parser is parsed, and Dex file is converted to C code, belongs to code level conversion.It is virtualized when the subsequent compiling based on LLVM The conversion of code rank when belonging to compiling, because the problem that compatibility is not good enough in existing Scheme of Strengthening may be not present.
4. design flexibility of the present invention is strong, it is able to use assessment models and voluntarily deploys protection scheme, improve as far as possible The execution efficiency of APK is avoided because of performance cost brought by redundant cyclic operation, frequent JNI call operation.
5. test experiments show the application program after present invention protection compared to the application program before protection, APK packet Volume averagely reduce the volume of 13.53%, Dex file and averagely reduce 20.72%, and the utilization rate of CPU reduces 12.51%.This is because the method for having extracted Dex layers is realized, then with the realization of local layer instead of Dex layers of realization, and And the operation in local layer is smaller than the operation expense on DVM virtual machine;This method can effectively resist malicious attacker Static analysis and dynamic analysis.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is system framework figure of the invention;
Fig. 3 (a) is the schematic diagram of Dex file format;Fig. 3 (b) is all types of Dalvik assembly instruction collection.
Fig. 4 (a) is the exemplary diagram to invoke-static assembly instruction conversion front and back;Fig. 4 (b) is converged to if...lez Compile the exemplary diagram of instruction conversion front and back;
Fig. 5 (a) is the Smali instruction segment generated;Fig. 5 (b) is that intermediate code indicates;Fig. 5 (c) is according to intermediate representation The path tree of generation.
Fig. 6 is the comparative examples figure of Java method and C method after conversion before converting;
Fig. 7 is fictitious order and its function description in LLVM virtualization modules;
Fig. 8 is code comparison's figure of virtualization front and back;
Fig. 9 (a) is that five Android application program protection front and back APK files and the volume of Dex file become in F-Droid Change schematic diagram;Fig. 9 (b) is five Android application program protection front and back CPU usage variation schematic diagrames.
Specific embodiment
The Android application program guard method based on Dex2C and LLVM that the invention proposes a kind of, core are conversions Method to be protected in Dex layers is corresponding C code and virtualizes when using the compiling based on LLVM.Dex2C decompresses APK first File simultaneously parses Dex file, obtains the Node nodal information of every instruction, then to function declaration and function body into Row conversion.This programme devises three kinds of conversion logics for 256 assembly instructions, and the C code after conversion is compiled by LLVM Virtualization modules, form and beat again packet specific to the binary system So file after the virtualization of CPU architecture, signature generates new APK, On the one hand application program after present invention protection can effectively reduce the volume of APK, improve the execution efficiency of APK;On the other hand Also malicious attacker is capable of increasing in Dex layers and the reverse intrusion scene of local layer, and Dex layers of methods to be protected are realized Double-encryption.
A kind of Android application program guard method based on Dex2C and LLVM, comprising the following steps:
Step 1, Dex file is obtained from application program installation kit to be protected, it is carried out successively according to file format Parsing, obtains all necessary informations that can be restored to C language code of each assembly instruction in Dex file and is stored in In data structure.
The basic process of the step be decompress the APK installation package file of application program to be protected with obtain Dex file and AndroidManifest.xml file parses xml document to obtain main-inlet class;By Dex file according to file format carry out by Layer parsing parses each of each of Dex file Class, Class Method method as shown in Fig. 3 (a) respectively And each assembly instruction in Method method;Finally obtain the institute that can be restored to C file of each assembly instruction It is necessary to information, specific as follows:
Step 1.1, Android application program installation kit to be protected is unpacked to obtain Dex file and AndroidMani Fest.xml file parses xml document, obtains all Activity entrance classes.
Step 1.2, it is parsed according to the Dex file format as shown in Fig. 3 (a), first parsing dex_header field Information gets the offset and size of several fields such as string_ids, type_ids, method_ids in dex file, so as to The initial address to each field and end address is accurately positioned.
Step 1.3, the parsing for starting progress class_defs and method_ids field, mainly obtains from class_defs Type, parent type and the corresponding static state/instance field of each class, directly/void method positional shift, related explain are believed Breath etc..Then each method is parsed as unit of class, the affiliated class of this method is mainly obtained from method_ids Information, parameter information and method name information.
Step 1.4, every assembly instruction is parsed as unit of method, based on BakSmali decompiling engine we Case gets the corresponding relationship between binary system and Dalvik assembly instruction, finally obtain enough restore C file institute it is necessary to Shown in information such as Fig. 5 (a);Necessary information includes method in class, the description of field and detailed information of every instruction etc.. The detailed information that every instructs is stored in data structure InsInfoNode by this programme.
Specifically, the Dalvik virtual machine instruction set of Google share 256 assembly instructions for example move, return, New-instance, goto, if-eq, cmpl-float, invoke-virtual, add-type etc., industry instruct function according to it The difference of energy is divided into 14 seed types, and shown in type information such as Fig. 3 (b), this programme constructs data structure InsInfoNode houses the number and content, the information of affiliated class, parameter letter of register involved in every assembly instruction Breath etc. stores different data informations according to the different this programme for reading instruction type in the data structure.If reading Be const_string etc data definition type assembly instruction, this programme only need to obtain its be stored in which number post The value of storage, operation code and String type is stored in InsInfoNode data structure.And for invoke_direct Etc method call type assembly instruction, this programme not only need to obtain the affiliated category information of called method, parameter letter Breath and method name information etc., and the number, value and opcode information etc. for needing to obtain register, are successively read, thus The detailed information of the every instruction encapsulated into the Method.
Step 2, it determines method to be protected and is revised as local layer type method and then rewrite Dex file, carry out Pretreatment work before converting.
In the step, class and method name where the method to be protected mainly specified by user are determined uniquely wait protect Maintaining method, construction call the assembly instruction sentence of local layer interpreter code, the method to be protected found are revised as local layer Type method rewrites Dex file, specific as follows:
Step 2.1, user specifies the class and method name where method to be protected, according to both information in class_ Defs field and method_ids field traversal, so that it is determined that unique method to be protected.
In the present embodiment, this programme is to the alarm clock application program talalarmo.apk in open source shop F-Droid It is protected, the entitled onStartCommand of the method for the guard method of this programme, the entitled trikita.talalarmo.a of class larm.AlarmService.The function of this method is to play the tinkle of bells and jump to the new interface Activity.
Step 2.2, the pretreatment before conversion
Insertion compilation refers in the execution class constructor Clinit of the place class of the method to be protected got by step 2.1 The subsequent relationship of forerunner that sentence is enabled to establish between instruction;The present embodiment is herein are as follows:
“const-string v1
"libDex2C";
invoke-static{v1},Ljava/lang/System;->loadLibrary(Ljava/lang/String;) V;".
The attribute for the method to be protected that step 2.1 is got is changed to native type (local channel type) at this time, then Dex file is written over.The purpose of these operations is easy for the So generated in the initial phase load subsequent compilation of class File.
Step 3, assessment models are established, core allocating time accounting when method to be protected is run is as assessment models Decision-making foundation, by the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, to avoid frequency as far as possible Numerous reflection is called and the circulate operation of redundancy.
Assessment models are established, JNI is the Java layers of bridge communicated with Native layers, when JNI function is called by local layer Provided parameter realizes that reflection is called, and this mode is more than direct Dalvik virtual machine time-consuming.In view of side to be converted Method may be called after conversion comprising excessive JNI, will certainly cause the decline of performance, and this system introduces assessment models, to the greatest extent The possible execution efficiency for improving APK.
In the present solution, core allocating time (all functions itself in function call chain when method to be protected is run Allocating time) decision-making foundation of the accounting as assessment models.
When the calling of each function of Android application program before protecting is collected using the CPU Profiler of Google first Between, all called subfunctions of the current method to be protected of recursive lookup, until reaching all subfunction bottoms, i.e. subfunction It is middle without other function call situations, the function call chain of method to be protected is generated according to call relation formed above.Herein The current function (method) to be protected of this programme are as follows: onStartCommand.
The assessment models are as follows:
Calculate allocating time itself the adjusting divided by method to be protected of function itself in the function call chain of method to be protected It is called with, subfunction and the allocating time summation of related system API, by the threshold value comparison of calculated value and setting, if it exceeds Method to be protected is then added conversion white list and carries out Dex2C conversion operation, executes step 4 by threshold value;Otherwise by method to be protected Conversion blacklist is added without Dex2C conversion operation, executes step 5.In the present embodiment, 60% is set a threshold to, at this In user can according to protection intensity and execution efficiency voluntarily deploy, if pursue execution efficiency, threshold value can be turned up;If pursuing Intensity is protected, then is turned down threshold value.
As an example it is assumed that method to be protected is method A, itself allocating time of method A accounts for method A and executes total time Method B is had invoked in 40%, method A, the execution time of method B accounts for 35% that method A executes total time, phase relation in method A The time accounting of system API is 25%;Itself allocating time of method B accounts for 50% that method B executes total time, and method B has invoked The execution time of method C, method C account for method B always executes the time 35%, and the time accounting of related system API is in method B 15%;For method C without the calling of other functions, its own allocating time accounts for 80% that method A executes total time, phase in method C The time accounting of relationship system API is 20%.Therefore its core allocating time be 40%+35%* (50%+35%*80%)= 67.3%, it is more than threshold value 60%, therefore by the method write-in conversion white list to be protected, then executes step 4.
Step 4, Dex2C conversion operation
If method to be protected carries out Dex2C conversion, then storage is corresponded into method to be protected in the data structure Necessary information be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and Restore assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restores, data transmit one Cause property;Using the C code after conversion as object to be protected;It is specific as follows:
Step 4.1, by the compilation in the InsInfoNode data structure established in step 1.4, corresponding to method to be protected Instruction can be restored to all necessary informations of C language code and be converted to C language code, start to function declaration and function body into Row conversion.
In the present embodiment, function to be protected is onStartCommand.It is selected for different assembly instruction type this programme Different conversion logics, this programme establishes three sets of conversion logics herein, is incremented by using simulation Register approach and Partial Variable Partial Variable method of randomization ensure that the correct consistency restored and data are transmitted of Dex2C type.
Step 4.1.1, the conversion of function declaration
It solves the problems, such as function overloading using static registration method, method name and parameter is subjected to integration conversion.
Method in the present embodiment, before protection are as follows:
public int onStartCommand(Intent intent,int flags,int startId).
Static registration method after conversion are as follows:
JNIEXPORT jint Java_trikita_talalarmo_alarm_AlarmService_ onStartCommand (JNIEnv*env,jobject a0,jstring a1,jint a2,jint a3).
Wherein env is a reference for Android virtual machine environment, and a0 is local variable registers, remaining is ginseng Number registers, this effective solution function overloading problem.
Step 4.1.2, the conversion of function body
Different conversion logics is established for different instruction type, this programme is broadly divided into three kinds of conversion logics herein:
The first, general type instruction
Including data manipulation instruction, return instruction, DB definition instruction and data operation instruction etc., this kind compares It is relatively simple, it is directly translated according to the semantic information of assembly instruction.With const-string vx, string_id data definition refers to For order, semantic information is to construct a character string by community string index community and be assigned to register vx, therefore this programme is direct Corresponding character string information is obtained according to offset address in string_ids Rigen.And it will be assigned in the character string got corresponding Variable.C code after conversion are as follows: char*dqP=" java/lang/Math ";Cmp-long v1 for another example, v1, v2 ratio Compared with instruction etc..Its semantic information are as follows: compare two lint-long integer type numbers, if the value of v1 register is greater than the value of v2 register, tie Fruit is 1, and equal is 0, and being less than is -1.Therefore the C code after converting are as follows: jint a16=(a15>a14? 1:(a15<a14? -1: 0))。
Second, reference type instruction
Including example operation instruction, method call instruction, field operations instruction etc., this kind of assembly instructions mainly pass through JNI reflection calls Java layers of method to realize the expressed semanteme of these instructions.With invoke-static { parameters }, for the instruction of methodcall method call.Its semantic information is to call the static method of example.This programme The method name of call method, the character string information of class name and parameter type are got from string_ids first, is passed through FindClass method gets corresponding jclass object, analytic parameter list information to get corresponding content of registers, The character string information obtained just now and jclass object are configured to jmethodId object by GetStaticMethodID, It finally adjusts CallStaticLongMethodA method to realize that reflection obtains, finally result is back in corresponding register.Turn It changes shown in front and back example code such as Fig. 4 (a).
The third, jumps type instruction
Including jump instruction, then the conversion of division and the instruction of scope is carried out according to instruction connection relationship;This kind refers to The instruction connection relationship mainly established according to step 4.2 is enabled to be converted.With if-lez vx, target jump instruction is Example is illustrated, if the value that this semantic information is vx register is less than or equal to zero and jumps at target.Therefore basis Connection relationship between if-lez assembly instruction and next instruction, LabelInsNode instruction, the C code after converting is if (a17≤0) goto L78b66d36;{...}L78b66d36;As shown in Fig. 4 (b).
Step 4.2, restore assembly instruction forerunner, it is subsequent between connection relationship
Normal sequence traversal can not solve the problems, such as that scope of a variable, this programme use depth-first traversal to step herein Sequence node in 1.4 each path carries out translating operation, until such as step 4.2 institute of the instruction morphing completion on all nodes Show.It constantly transmits when variable addressable under prescope and variable and posts according to the set membership between node in translation process Incidence relation between storage, this allows the variable in higher level's scope by the effect domain browsing of low level, each node The translation that present node is completed according to Available Variables, shown in the intermediate representation such as Fig. 5 (b) for generating a kind of C code.In Between indicate to establish shown in path tree such as Fig. 5 (c).To reach the company between dividing scope, effectively setting up forerunner and is subsequent Connect relationship.
Step 4.3, guarantee the consistency that assembly instruction type is correctly restored, data are transmitted
This programme ensure that using the method that simulation Register approach and Partial Variable are incremented by Partial Variable randomization The correct consistency restored and data are transmitted of Dex2C type.
This programme establishes the register of 15 simulations altogether, completes the data between instruction using the association of register and passes It passs.For example for assembly statement:
iget-wide v2,p0,Lcom/uberspot/a2048/MainActivity;->mLastBackPress:J;
Sub-long v2, v0, v2,
This programme stores the reflection results that first assembly statement obtains into v2 register, and second is then to make first The value of v2 register is subtracted with the value of v0 register.Finally the result of register is stored to v2 register again.Therefore make The consistency of data transmitting is completed with the incidence relation of register.
How is variable so after register value conversion C code named? this programme is used here as Partial Variable name It is incremented by and the method for Partial Variable name randomization is completed.For method call instruction and field operations instruction etc., due to turning Partial Variable after changing is not related to the read-write for register, and this programme carries out variable life by the way of variable randomization Name, it is only necessary to which the final result of this assembly instruction processing is assigned to corresponding register.And other instructions are directed to, this Scheme carries out variable naming by the way of the increasing of variable name class, because register is typeless, and variable has type.Cause , it is possible that the variable that some register is stored at upper one is int type, the variable of next deposit is double class for this The case where type.This programme uses the operation that variable adds up.In storing process, if the type of variable does not change, then It is cumulative without variable name.If changing, then variable accumulation operations.At the same time, it is established that register and nearest variable name Between incidence relation.In reading process, this programme only needs to get nearest variable name by register.
In the present embodiment, if first time write operation is the value that int type is written in v0 register, second of write operation For the value that double type is written in v0 register.Third time operation is to read the value of v0 register.At this point, this programme need by First variable naming is a1, and type jint, second variable naming is a2, type jdouble, and third time operates Then be read be a2 value.
This completes the conversions for customized guard method in Dex file.C code this programme write-in after conversion In Dex2C.cpp file.The code of complete conversion anterior-posterior approach is as shown in Figure 6.
Step 4.4, after the completion of code conversion, using the C code after converting as object to be protected.
And such as method to be protected is converted without Dex2C, then by the So file in application program installation kit to be protected In entrance function as object to be protected.
Step 5, virtualization while object to be protected being compiled, the binary system So file after generating virtualization.
If result of the method to be protected after assessment models are assessed is more than threshold value, by the C generation after step 4 conversion Code is used as object to be protected, is compiled virtualization operations into LLVM virtualization modules;Otherwise directly by Android application program In entrance function (JNI_Onload method) in original So file as input, be compiled into LLVM virtualization modules Virtualization operations, the binary file after generating virtualization.
JNI_OnLoad function is the entrance function of So file, which is broadly divided into three phases, first stage code The built-in api of LLVM is finally called to generate by morphological analysis, syntactic analysis, parsing AST grammer tree operations into the front end Clang Suffix is the intermediate representation of " .ll ".Second stage constructs virtual component, mainly constructs fictitious order, fictitious order interpreter, journey Sequence scheduler, function body replacer etc..Phase III integration compiling chain, generates the Clang compiler for specifically obscuring function, specifically It is as follows:
Step 5.1, object to be protected is generated by processes such as morphological analysis, syntactic analysis, AST the tree buildings of Clang With platform-independent intermediate representation IR.Original program code is split and is converted into custom instruction system by LLVM virtualization modules Structure, it is intended to will be procedure complicated.
The operational process of virtualization code is that dynamic explains the process for executing custom instruction, rather than it is original for being changed source C code simultaneously executes.LLVM virtualization modules are configured similarly to the fictitious order of JVM framework first.Fictitious order generating process is first First according to the quantity and size of temporary variable used in IR, determines the virtual register space size needed, make virtually to refer to Enable the distribution movement for completing dynamic memory.Fictitious order simulates the logic flow for completing original program on stack, wherein virtually posting Storage is used to assist the storage of pilot process.Finally destroy the fictitious order run.Fig. 7 is partial virtual instruction name and retouches It states.
Step 5.2, fictitious order is only the customized expression of IR, can not directly give back target execution, virtually refer to Enable interpreter for explaining that fictitious order is divided into three types and carries out concrete operations by customized fictitious order, interpreter, Respectively arithmetic operator instruction, data transfer instruction and control circulation move type instruction.Such as store instruction definition is data Transfer instruction, the corresponding explanation of interpreter are as follows:
Value v1=vmdata [vpc++];
Value v2=stack [stack_index--];
if(!Reg[v1])
alloc(Reg[v1]);
Reg [v1]=cast_i64 (v2)
This process modifies virtual program counter first to obtain the index of memory space, is stored later from stack top acquisition Object, and the object is put into corresponding memory space.
Step 5.3, program scheduler is used to simulate the implementation procedure of CPU, and scheduler obtains fictitious order first, to virtual Interpreter is indexed after instruction decoding, transfers to interpreter to explain the instruction control, is taken back again control later and is recycled above-mentioned Process, until explaining all instructions.
Step 5.4, function body replacer executes deformation to the function body of function on the basis of intermediate representation, it is therefore intended that Original program implementation procedure is replaced with into virtual interpretation process.Function body is deleted to the signature of simultaneously generating function, function signature first For positioning virtual instruction address performed by the function, and the parameter of the function is passed into the interpreter of fictitious order with first Respective fictional register in beginningization interpreter.Fig. 8 be function body variation front and back source code equivalent representation and final two into Result processed.
Step 5.5, it also needs to integrate compiling chain to realize that compiling is virtual.And NDK has integrated clang, Middle above step is realized with LLVM analysis pass, transfers to passManager to be managed collectively the analysis pass organized, most Source code after recompilating change afterwards, which generates, has the compiler for obscuring function.
In order to activate compiling chain, need to pass through NDK_ in the Application.mk that just-ahead-of-time compilation generates TOOLCHAIN_VERSION parameter specifies virtual compiler, is specified using-mllvm-vm in Android.mk virtual Parameter required for changing.Clang receives virtualization parameter and calls PassManagerBuilder to determine whether to be compiled void Quasi-ization function.It can be seen that virtualization step is all based on IR above, therefore virtualization process is naturally compatible with each platform, into The code of row virtualization will call the rear end LLVM to generate binary executable (So file) relevant to platform.
Step 6, the Dex file after the So file and step 2.2 of the generation after compiling being rewritten, other resource files carry out Beat again packet, signature, the Android application program after ultimately generating the protection equivalent with APK function before protection.
Experimental section:
Inventor has done following performance test and attack experiment:
The platform of performance test are as follows: test machine is Google Nexus 5, and Android version is Android4.4.2, is surveyed Examination APK is higher five application programs of download in open source shop F-Droid: BMI settles accounts device BMICalculate.APK, stream Capable game 2048.APK, two-dimensional code scanning tool QRScanner.APK, alarm clock program talalarmo.apk and notepad Program JustNote.APK.
Shown in volume change such as Fig. 9 (a) of protection front and back APK volume and Dex volume, CPU usage variation such as Fig. 9 (b) It is shown;Wherein CPU usage is the average value of 50 measuring and calculating.As seen from the figure, the volume of APK packet averagely reduces 13.53%, The volume of Dex file averagely reduces 20.72%.And the utilization rate of CPU reduces 12.51%.The reduction of CPU is because in background The execution of layer is more much faster than executing on DVM virtual machine, and C code is directly compiled into machine code and is executed.APK packet It with the reduction of Dex file is reappeared in local layer, but So file volume itself because we have extracted Dex layers of realization It is smaller, so APK volume is to reduce on the whole.
The tool of challenge trial is respectively: interactive disassembler IDA Pro, Android decompiling instrument The script DecLLVM of AndroidKiller, Dex layers of shelling tool PackGrind, OLLVM confrontation.Object of attack is protection Android application program afterwards.
Challenge trial shows that the above common reverse tool is invalid to the application program after our protections, because I Tool be not simple shell adding, encryption.Local layer and Dex layers of double-encryption effectively prevent malicious attacker Static and dynamic attacks.

Claims (6)

1. the Android application program guard method based on Dex2C and LLVM, which comprises the following steps:
Dex file is obtained from application program installation kit to be protected, it is successively parsed according to file format, is obtained All necessary informations that can be restored to C language code of each assembly instruction in Dex file are simultaneously stored in data structure In;It determines method to be protected and is revised as local layer type method and then rewrite Dex file, carry out converting preceding pretreatment Work;Establish assessment models, core allocating time accounting when method to be protected is run as the decision-making foundation of assessment models, By the way that threshold value is arranged, to determine whether method to be protected carrying out Dex2C conversion, adjusted to avoid frequently reflecting as far as possible With the circulate operation with redundancy;
If method to be protected carries out Dex2C conversion, then by storage in the data structure corresponding to method to be protected must It wants information to be converted to C language code, different assembly instructions is directed in conversion process and establishes different conversion logics, and is restored Assembly instruction forerunner, it is subsequent between connection relationship, while guarantee assembly instruction type correctly restore, data transmitting it is consistent Property;Using the C code after conversion as object to be protected;
If method to be protected is converted without Dex2C, then by entering in the So file in application program installation kit to be protected Mouth function is as object to be protected;
Virtualization while object to be protected is compiled, the binary system So file after generating virtualization carry out beating again packet, label Name, the application program after generating protection.
2. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute All necessary informations that can be restored to C language code for the assembly instruction stated, including method in class, the description of field and every The detailed information of item instruction;
The data structure is used to store the number of register involved in every assembly instruction and the letter of content, affiliated class Breath, parameter information etc..
3. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute Pretreatment work before the conversion stated is inserted into assembly instruction sentence in the execution class constructor including the place class in method to be protected With the subsequent relationship of forerunner established between instruction.
4. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute The assessment models stated are as follows:
The allocating time of function itself in the function call chain of method to be protected is calculated divided by itself calling of method to be protected, son The allocating time summation of function call and related system API, by the threshold value comparison of calculated value and setting, if it exceeds the threshold, Conversion white list then is added in method to be protected and carries out Dex2C conversion operation, conversion blacklist otherwise is added in method to be protected Without Dex2C conversion operation.
5. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute The different assembly instruction that is directed to stated establishes different conversion logics, comprising:
Three kinds of conversion logics are established according to different assembly instructions:
The instruction of the first general type, including data manipulation instruction, return instruction, DB definition instruction and data operation refer to It enables, this kind of instruction is directly translated according to the semantic information of assembly instruction;
Second of reference type instruction, including example operation instruction, method call instruction, field operations instruction, this kind of instruction are logical It crosses the reflection of JNI function and calls Java layers of method to realize the expressed semanteme of these instructions;
The third jumps type instruction, including jump instruction, then the division and instruction of scope are carried out according to instruction connection relationship Conversion.
6. the Android application program guard method based on Dex2C and LLVM as described in claim 1, which is characterized in that institute Virtualization while being compiled object to be protected stated, comprising:
Under LLVM compiler frame, to treat protected object and carry out morphological analysis, syntactic analysis, parsing constructs its AST tree, thus Intermediate representation IR is generated, intermediate representation eliminates source code characteristic relevant to platform, but remains its logical AND semantic information;
Fictitious order is divided into three types and carries out concrete operations, respectively arithmetic operator instruction, number by fictitious order interpreter Type instruction is moved according to transfer instruction and control circulation;
Program scheduler is used to simulate the implementation procedure of CPU, first acquisition fictitious order, indexes solution after decoding to fictitious order Device is released, transfers to interpreter to explain the instruction control, control is taken back again later and recycles the above process, until explaining all fingers It enables;
Function body replacer executes deformation to the function body of function on the basis of intermediate representation, and function body is deleted and given birth to first At the signature of function, function signature is used to position virtual instruction address performed by the function, and the parameter of the function is transmitted To the interpreter of fictitious order to initialize respective fictional register in interpreter.
CN201910394117.5A 2019-05-13 2019-05-13 Android application program protection method based on Dex2C and LLVM Active CN110245467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910394117.5A CN110245467B (en) 2019-05-13 2019-05-13 Android application program protection method based on Dex2C and LLVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910394117.5A CN110245467B (en) 2019-05-13 2019-05-13 Android application program protection method based on Dex2C and LLVM

Publications (2)

Publication Number Publication Date
CN110245467A true CN110245467A (en) 2019-09-17
CN110245467B CN110245467B (en) 2023-02-07

Family

ID=67884280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910394117.5A Active CN110245467B (en) 2019-05-13 2019-05-13 Android application program protection method based on Dex2C and LLVM

Country Status (1)

Country Link
CN (1) CN110245467B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673929A (en) * 2019-09-29 2020-01-10 北京智游网安科技有限公司 Method for protecting abnormal mechanism, intelligent terminal and storage medium
CN111597514A (en) * 2020-04-15 2020-08-28 卓望数码技术(深圳)有限公司 Android source code protection method and device
CN112163195A (en) * 2020-10-14 2021-01-01 北京邮电大学 Novel virtual machine software protection method based on stack hiding
CN112287306A (en) * 2020-10-29 2021-01-29 中国银联股份有限公司 Protection method and device for application program installation package and computer readable storage medium
CN112989290A (en) * 2021-03-10 2021-06-18 四川长虹格润环保科技股份有限公司 Multi-compatibility code reinforcing method
CN113626773A (en) * 2020-05-06 2021-11-09 上海蜚语信息科技有限公司 Code protection method based on intermediate language
CN113836495A (en) * 2021-09-25 2021-12-24 上海蛮犀科技有限公司 Method for equivalently converting Java code into C + + code

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200919210A (en) * 2007-07-18 2009-05-01 Steven Kays Adaptive electronic design
US20140109068A1 (en) * 2010-12-06 2014-04-17 Flexycore Method for compiling an intermediate code of an application
CN104866734A (en) * 2014-02-25 2015-08-26 北京娜迦信息科技发展有限公司 DEX (Dalvik VM executes) file protecting method and device
CN106201872A (en) * 2016-07-05 2016-12-07 北京鼎源科技有限公司 A kind of running environment detection method of android system
CN106778100A (en) * 2016-12-01 2017-05-31 北京智游网安科技有限公司 Obscuring Compilation Method and obscure compiler based on Android platform and ios platform
CN107480476A (en) * 2017-06-15 2017-12-15 西北大学 A kind of Android local layer compiling of instruction based on ELF infection virtualizes shell adding method
US20180262388A1 (en) * 2006-09-25 2018-09-13 Weaved, Inc. Remote device deployment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180262388A1 (en) * 2006-09-25 2018-09-13 Weaved, Inc. Remote device deployment
TW200919210A (en) * 2007-07-18 2009-05-01 Steven Kays Adaptive electronic design
US20140109068A1 (en) * 2010-12-06 2014-04-17 Flexycore Method for compiling an intermediate code of an application
CN104866734A (en) * 2014-02-25 2015-08-26 北京娜迦信息科技发展有限公司 DEX (Dalvik VM executes) file protecting method and device
CN106201872A (en) * 2016-07-05 2016-12-07 北京鼎源科技有限公司 A kind of running environment detection method of android system
CN106778100A (en) * 2016-12-01 2017-05-31 北京智游网安科技有限公司 Obscuring Compilation Method and obscure compiler based on Android platform and ios platform
CN107480476A (en) * 2017-06-15 2017-12-15 西北大学 A kind of Android local layer compiling of instruction based on ELF infection virtualizes shell adding method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673929A (en) * 2019-09-29 2020-01-10 北京智游网安科技有限公司 Method for protecting abnormal mechanism, intelligent terminal and storage medium
CN111597514A (en) * 2020-04-15 2020-08-28 卓望数码技术(深圳)有限公司 Android source code protection method and device
CN111597514B (en) * 2020-04-15 2023-10-13 卓望数码技术(深圳)有限公司 An Zhuoyuan code protection method and device
CN113626773A (en) * 2020-05-06 2021-11-09 上海蜚语信息科技有限公司 Code protection method based on intermediate language
CN113626773B (en) * 2020-05-06 2024-04-02 上海蜚语信息科技有限公司 Code protection method based on intermediate language
CN112163195A (en) * 2020-10-14 2021-01-01 北京邮电大学 Novel virtual machine software protection method based on stack hiding
CN112163195B (en) * 2020-10-14 2022-08-05 北京邮电大学 Virtual machine software protection method based on stack hiding
CN112287306A (en) * 2020-10-29 2021-01-29 中国银联股份有限公司 Protection method and device for application program installation package and computer readable storage medium
CN112287306B (en) * 2020-10-29 2024-04-26 中国银联股份有限公司 Protection method and device for application program installation package and computer readable storage medium
CN112989290A (en) * 2021-03-10 2021-06-18 四川长虹格润环保科技股份有限公司 Multi-compatibility code reinforcing method
CN113836495A (en) * 2021-09-25 2021-12-24 上海蛮犀科技有限公司 Method for equivalently converting Java code into C + + code

Also Published As

Publication number Publication date
CN110245467B (en) 2023-02-07

Similar Documents

Publication Publication Date Title
CN110245467A (en) Android application program guard method based on Dex2C and LLVM
CN108614960B (en) JavaScript virtualization protection method based on front-end byte code technology
Van Emmerik Static single assignment for decompilation
Sharif et al. Automatic reverse engineering of malware emulators
CN111770206B (en) Method for deploying intelligent contract, block chain node and storage medium
Cai et al. Certified self-modifying code
Schoepe et al. Explicit secrecy: A policy for taint tracking
CN108733988A (en) The guard method of executable program on Android platform
CN108681457A (en) The Android application program guard methods explained with residual code based on code sinking
Myreen Formal verification of machine-code programs
Schneider et al. Bridging the semantic gap through static code analysis
CN103514027B (en) Method for enhancing usability of software protection
Tamboli et al. Metamorphic code generation from LLVM bytecode
Zhao et al. Compile-time code virtualization for android applications
Cifuentes et al. Experience in the design, implementation and use of a retargetable static binary translation framework
Yuan et al. End-to-end Mechanized Proof of an eBPF Virtual Machine for Micro-controllers
Borzacchiello et al. SENinja: A symbolic execution plugin for Binary Ninja
Sayed et al. If-transpiler: Inlining of hybrid flow-sensitive security monitor for JavaScript
Liu et al. Proving LTL properties of bitvector programs and decompiled binaries
Guo et al. A survey of obfuscation and deobfuscation techniques in android code protection
Brandl et al. Modular Abstract Definitional Interpreters for WebAssembly
Wang et al. Leveraging WebAssembly for Numerical JavaScript Code Virtualization
Wichelmann et al. MAMBO–V: Dynamic Side-Channel Leakage Analysis on RISC–V
Zhou et al. WASMOD: Detecting vulnerabilities in Wasm smart contracts
Dominiak et al. Efficient approach to fuzzing interpreters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant