CN105426707B - A kind of instruction-level cryptographic algorithm recognition methods and system - Google Patents
A kind of instruction-level cryptographic algorithm recognition methods and system Download PDFInfo
- Publication number
- CN105426707B CN105426707B CN201510755316.6A CN201510755316A CN105426707B CN 105426707 B CN105426707 B CN 105426707B CN 201510755316 A CN201510755316 A CN 201510755316A CN 105426707 B CN105426707 B CN 105426707B
- Authority
- CN
- China
- Prior art keywords
- code
- algorithm
- instruction
- cryptographic algorithm
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 179
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000003068 static effect Effects 0.000 claims abstract description 40
- 230000003542 behavioural effect Effects 0.000 claims abstract description 27
- 239000000284 extract Substances 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 13
- 238000007405 data analysis Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 230000003252 repetitive effect Effects 0.000 claims description 3
- 238000012790 confirmation Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 12
- 230000006870 function Effects 0.000 description 30
- 238000004458 analytical method Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 125000002015 acyclic group Chemical group 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 208000010877 cognitive disease Diseases 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/12—Protecting executable software
- G06F21/14—Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Storage Device Security (AREA)
Abstract
The present invention relates to cryptographic algorithm identification technology fields, and the invention discloses a kind of instruction-level cryptographic algorithm recognition methods, specifically include following step:Step 1: establishing the feature database of disclosed cryptographic algorithm, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;Step 2: the static nature code in scanning and matching target program, passes through static nature code recognition code algorithm;Step 3: it collects and analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: being compared using the matching relationship between input parameter and output parameter and behavioral characteristics data, the cryptographic algorithm performed in target program is confirmed.The cryptographic algorithm that instruction-level is carried out by the above method identifies that recognition accuracy is high.
Description
Technical field
The present invention relates to cryptographic algorithm identification technology field more particularly to a kind of instruction-level cryptographic algorithm recognition methods and it is
System.
Background technology
Cryptographic algorithm has become the necessary means to ensure information safety.It is important in network in the Network Information epoch
Electronic equipment such as interchanger, router, fire wall and other specific encryption and decryption equipment are all in its embedded software program
Cryptographic algorithm is used.The security mechanism of software in these equipment is analyzed, detected and eliminated safe hidden trouble, it is just necessary
The identification to cryptographic algorithm is completed in program code reverse process.In addition, the Malwares such as computer virus, wooden horse are adopted extensively
Reached the static nature of change oneself and hiding network traffic with cryptographic algorithm or protected its payload content
Purpose.Cryptographic algorithm identification is the key technology that these Malwares are carried out with feature extraction and core content decryption.
Cryptographic algorithm identifies a branch for belonging to program comprehension, and program comprehension is to be obtained from the inside of computer program
Relevant knowledge information often positions framework and function with recognizer by object code.Password in software is calculated
Method can be identified from binary code and assembly code the two levels.Binary code rank is mainly sick using being similar to
The features such as the static nature code matching technique in poison detection, the initialization value that will occur in common cryptographic algorithm in advance, S box parameters
Code is collected into feature database, then scans target software, if there is the condition code to match, is judged as that corresponding password is calculated
Method.The cryptographic algorithm that Grobert and Zhao et al. devised automation using static nature code successively in 2010 and 2011
Recognition methods.The identification of assembly code rank is that target software is carried out dis-assembling processing, and then extraction is specific with cycle etc.
The instruction sequence of structure is compared to close in identification target software using method and the known cryptographic algorithm of pattern match
Code algorithm.The identification of current many cryptographic algorithms contributes to the conversed analysis of Malware.2009, Wang et al. was for the first time
It proposes and is dynamically detected when program is run and the cryptographic algorithm in recognizer.They first with data life period,
It is marked including data stain, binary pitching pile technology goes to determine the transfer point of bright ciphertext, i.e. of message decryption and processing
Point.Then the region of memory of message after determining storage is decrypted.They used four standards agreement (HTTPS, IRC, MIME with
And the unknown agreement that is used in Malware Agobot) assess the effect of this method.In their test, exploitation
Software tool can decrypt all cipher-text messages.The major defect of this method is between message decryption and Message Processing only
A transfer point can be handled, even program first decrypts one block of message, then handles, then decrypts again, then this method is not
The cryptographic algorithm of this pattern can be correctly identified.Lutz is found that cryptographic operation has more bit arithmetic instruction.Lutz's
Method is based primarily upon following three observations:(1) cycle is a core component of cryptographic algorithm;(2) cryptographic algorithm largely uses
Integer arithmetic;(3) decrypting process reduces the comentropy of labeled data.The core of the identification facility of Lutz exploitations is to use
Stain is analyzed and judges whether the buffering area is decrypted by calculating the comentropy of a buffer data.Caballero etc.
People has become more meticulous the method for Wang, and the discovery about cryptographic operation of Lutz is utilized.They carry out Malware MegaD
The agreement of automation is reverse and cryptographic algorithm identifies.To each function example of software, they calculate its bit arithmetic instruction
Ratio.If the function performs bit arithmetic instruction at least 20 times and ratio has been more than 55%, which is marked
For encryption and decryption function.In an actual test, this method has found all cipher functions.In order to identify cipher function
Parameter (in plain text etc.), they attempt to determine the set that the data of labeled function are read in.In order to distinguish in plain text, key and other
Data used in function are encrypted, the data of the different instances of their more same functions read in set.Thus only
Only clear portion can change, therefore can identify clear data.In terms of the cryptographic algorithm identification of other Malwares
Also the following analysis and research:2010, Werber and Leder analyzed Malware Conficker, it is found that the software makes
It is realized with a disclosed templating of the SHA-1 in OpenSSL and MD6.It is interesting that attacker then calculates the MD6
Patch has been beaten in the realization of method, has modified one of Buffer Overflow loophole.Further, Porras et al. is found that in many
The developer of P2P Malwares uses the RSA of 1024 as signature verification algorithm, in the new version of certain softwares even with
The RSA Algorithm of 4096.Then, Werber and Leder also analyzes Malware Waledac, identifies used in it
There are 1000 to come from cryptographic algorithm library OpenSSL, and aes algorithm uses the CBC that IV values are 0 in 4000 functions
Encryption mode.Stewart analyzes and identifies the algorithm in Malware Storm Worm, should to point-to-point high-speed traffic
Software has used static XOR algorithms to be authenticated child node, and key uses the RSA Algorithm of 56 bits.It is 2012, domestic
Using the method that Instruction Statistics characteristic similarity judges come recognition code algorithm in Li Ji et al., but cipher function can only be extracted,
Algorithm title can not be identified.2013, Shu Hui et al. was extracted and has been identified to the cycle specificity of cryptographic algorithm, is improved
The accuracy of cipher function positioning.
Synthesis is got on very well, and cryptographic algorithm identification is all based on (the static feature that some features of cryptographic algorithm in itself are realized
Code or dynamic instruction sequence), the current recognition methods for cryptographic algorithm in software is primarily present problems with:
(1) accuracy of cryptographic algorithm recognition methods is low.The cryptographic algorithm that has in practice lack apparent Constant eigenvalue or
These features of person are hidden in program data section and are difficult to detect, this has resulted in current most of dependent on static nature
Recognition methods accuracy is relatively low.
(2) it is difficult to out specific cryptographic algorithm title.Although dynamic approach can be with trace routine implementation procedure, and energy
The perform track of extraction program, but this can only utilize loop detection to navigate to crucial function, due to not obtaining entire function
The relationship of call chain leads to identify that the partial function of cryptographic algorithm realizes that but None- identified goes out entire cryptographic algorithm.
(3) it is extremely difficult using the cryptographic algorithm identification in the software of Code obfuscation.Current many Malwares are all extensive
It employs the Code Obfuscation Security Technologies such as shell adding the feature and implementation procedure of cryptographic algorithm are hidden and filtered, substantially increase
The difficulty of cryptographic algorithm identification.
(4) the degree of automation of cryptographic algorithm identification is relatively low.Cryptographic algorithm identification is carried out completely by the way of artificial to go
Understand the behavior of target Malware, what this undoubtedly took very much.The automatization level of raising algorithm identification, which is important, grinds
Study carefully direction.
Invention content
For the above problem existing for cryptographic algorithm recognition methods of the prior art, the invention discloses a kind of instruction-levels
Cryptographic algorithm recognition methods and system.
The invention discloses a kind of instruction-level cryptographic algorithm recognition methods, specifically include following step:Step 1: it builds
The feature database of disclosed cryptographic algorithm is found, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;Step
2nd, the static nature code in target program is scanned and matched, passes through static nature code recognition code algorithm;Step 3: collect and
It analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: it utilizes
Matching relationship between input parameter and output parameter is compared with behavioral characteristics data, is performed in confirmation target program close
Code algorithm.
Further, its execution is extracted in the template program that the behavioral characteristics of above-mentioned cryptographic algorithm are realized according to the algorithm
When instruction sequence and relevant operating data formed, dynamic instruction D1, D2..., DnFinite sequence formed one execution mark.
Further, using binary pitching pile tool PIN as the tool for performing mark collection.
Further, the above method further include perform mark collect and analyze process, the collection for performing mark and point
For analysis process mainly comprising data reduction and data analysis two parts, the data reduction comes from known generation including excluding those
Instruction inside code library and two kinds of filter types are filtered by Thread Id, the data analysis include basic block detection,
Loop detection, the generation of loop-around data flow graph and parameter information are collected.
Further, above-mentioned basic block is by execution mark dynamic generation, according to the execution of instruction sequence when basic block detects
Track carries out, if it only has single entrance and exit, a basic block is identified as, when the code that basic block is changed by self
Change, then this variation will be found when first time fresh code performs.
Further, above-mentioned loop detection specifically includes following step:Step a, the machine performed in mark is handled successively
Device instructs, and stores them in list, referred to as History;Step b, it is obtained according to the repetitive instruction wherein occurred multiple
Possible cycle example, there are a corresponding next desired instructions for each cycle example;Step 3: by new
Machine instruction is added to History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example simultaneously
It is marked in History using cycle labeling X.
Further, above-mentioned loop-around data flow graph, which generates, is specially:For every a pair of of the cycle example L detectediWith
Lj, using the graph-theoretical algorithm of a standard, by testing, whether it meets binary crelation and it connects branch and is recycled to construct
Data flow diagram.
Further, above-mentioned parameter information collection specifically includes:Byte is packaged by parametric variable according to condition first,
Then these parametric variables are divided into two classes using condition:Output and input parameter;Following principle is used to be obtained for previous step again
The parametric variable taken assigns a fixed value, performs mark and collects corresponding value for each data access, is these parameter assignments
Principle is:Input parameter assigns its value read for the first time, and output parameter assigns the value of its last write-in;Finally, to each cycle
Example L, algorithm return to INM(L) and INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR
(L), the output parameter respectively in memory and register.
The invention also discloses a kind of instruction-level cryptographic algorithm identifying system, specifically include feature database and establish unit, quiet
State feature identification unit and behavioral characteristics recognition unit;The feature database establishes unit for establishing the spy of disclosed cryptographic algorithm
Library is levied, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;The static nature recognition unit is used for
Static nature code in scanning and matching target program, passes through static nature code recognition code algorithm;The behavioral characteristics identification
Unit is used to collect and analyze the execution mark of target program and extracts the program code for realizing cryptographic algorithm and its input and output ginseng
Number, and be compared using the matching relationship between input parameter and output parameter and behavioral characteristics data, confirm target program
The cryptographic algorithm of middle execution.
By using above technical solution, beneficial effects of the present invention are:New cryptographic algorithm is proposed in this method
Identifying schemes can significantly reduce software safety mechanism according to the program with the cryptographic algorithm identification facility of design automation
The time of analysis.This method combines the advantages of mark scanning of software static password and Dynamic Execution process analysis procedure analysis, significantly carries
The high accuracy of cryptographic algorithm identification.The technology that algorithm identification is carried out using the parameters relationship of input and output in this method is disclosed
Essence when cryptographic algorithm performs, can effectively break through the cognitive disorders that the Code obfuscations method such as software shelling is brought.It should
Method provides the flow frame of a set of cryptographic algorithm parsing and identification, and versatility is good, can be used not only for block cipher
Identification, can be used for the identification of public key algorithm module.
Description of the drawings
Fig. 1 is the overall flow figure of cryptographic algorithm identification.
Fig. 2 is the flow chart of program controlled execution under PIN tools.
Fig. 3 is the diagram of nested cycle (ABBBCABBC).
Fig. 4 is an instruction sequence of cryptographic algorithm.
Fig. 5 is instruction I1Recurrent state later.
Fig. 6 is another instruction sequence of cryptographic algorithm.
Fig. 7 is instruction I3Recurrent state later.
Fig. 8 is the state after a cycle example X is identified.
Specific embodiment
With reference to the accompanying drawings of the specification, the specific embodiment that the present invention will be described in detail.
The cryptographic algorithm recognition methods of instruction-level disclosed by the invention, available for performed close in extraction executable program
Code function identifies disclosed cryptographic algorithm title.This method mainly includes following four steps (as shown in Figure 1):
Step 1:The feature database of open code algorithm is established, the cryptographic algorithm includes grouping algorithm, sequence algorithm, Kazakhstan
Uncommon algorithm and public key algorithm, the feature of the algorithm include static nature code and behavioral characteristics instruction sequence.That establishes here is close
Code algorithm characteristics library includes static nature code and behavioral characteristics instruction sequence.Cryptographic algorithm can be divided into grouping algorithm, sequence algorithm,
Hash algorithm and public key algorithm.The static nature code of algorithm for grouping algorithm be mainly S box constants, initial permutation constant,
Hash algorithm is mainly the initialization vector value of loop iteration, and sequence algorithm is shift register lengths etc., and public key algorithm is main
It is related to Big prime.Since more and more Malwares employ Code Obfuscation Security Technology, this static nature code to cryptographic algorithm
It is covered, so the recognition effect for fully relying on static nature code is very undesirable.It would therefore be desirable to establish password
The behavioral characteristics library of algorithm.Behavioral characteristics refer to information during specific operation in program process, and program operation essence is just
It is a series of instruction and related calling data.For cryptographic algorithm, its behavioral characteristics is collected, just must have the calculation
The template program of the realization of method.According to these template programs, instruction sequence and relevant operating data when extracting its execution are made
Behavioral characteristics for the algorithm.These dynamic instruction sequences and data are exactly the so-called concept for performing mark.One dynamic refers to
The tuple that D is enabled to be made of following message:(1) memory address A [D];(2) the machine instruction I performed on A [D]
[D];(3) two groups of memory address that I [D] reads and is written, are denoted as R respectivelyA[D] and WA[D];(4) I [D] readings and be written two
Group register, is denoted as R respectivelyR[D] and WR[D].It is exactly dynamic instruction D that one, which performs mark T,1, D2..., DnFinite sequence.Due to
The execution mark of each cryptographic algorithm is unique, therefore can clearly distinguish very much.Preferably, the cryptographic algorithm
The template program that behavioral characteristics are realized according to the algorithm, instruction sequence and relevant operating data when extracting its execution are formed,
Dynamic instruction D1, D2..., DnFinite sequence formed one execution mark.Specific reality of the extraction of behavioral characteristics dependent on algorithm
Existing, the execution mark of template program that different compilers generates may be different.In practice, we can adopt according to operation platform
The program generated with the compiler of mainstream on the platform extracts the execution mark of cryptographic algorithm, as Windows operating system can be with
Using Visual Studio series compilers, linux system can use GCC compilers.It is collected to each cryptographic algorithm to hold
After trace, the behavioral characteristics library of these open code algorithms is just established.This will be after us in dynamic identifying method
Carry out the Template Information of instruction sequence comparison.
Step 2:Static nature code in scanning and matcher, passes through static nature code recognition code algorithm.To target
Software carries out static nature scanning, by the static nature information in the cipher feature data extracted and cryptographic algorithm feature database into
Row compares.Static nature code is the most direct form of expression of cryptographic algorithm, it is substantially the spy contained in various cryptographic algorithms
Determine the constants such as initialization value, S box numerical value.Although cryptographic algorithm may be realized with different programming language and compiler,
But these static nature codes are all fixed, usually in the executable program containing the cryptographic algorithm can directly recognize
Mode exists.If for example, occur 3A in scanned program code, 32,2A, 22,1A, 12,0A, 02,3C, 34,2C, 24,
1C, the hexadecimal numbers such as 14,0C, 04, then can be determined that in program and used DES algorithms, because this is in des encryption algorithm
Initialize the constant value of permutation table.If occurs code snippet in program:00 00 00 00 00 00 00 00 A5 63
63 C6 84 7C 7C F8,99 77 77 EE 8D 7B 7B F6 0D F2 F2 FF BD 6B 6B D6, B1 6F 6F DE
54 C5 C5 91 50 30 30 60 03 01 01 02 then can be determined that have used aes algorithm, numerical value (63 therein
63 77 77 7B 7B F2 F2 6B 6B 6F 6F C5 C5 30 30 01 01) be exactly S boxes in aes algorithm part it is normal
Numerical value.For hash algorithm, algorithm nearly all contains initialization value, therefore static nature code is more prone to identify.Such as
Initialization in SHA256 algorithms can use 8 hexadecimal constants, i.e. 0x6A09E667,0xBB67AE85,0x3C6EF372,
0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB and 0x5BE0CD19.If in code there are these often
Number, then can be determined that have used SHA256 algorithms.It is noted that under different Computer Architectures, the mode of storage
Some difference, store using small end mode that (low address stores the low level of word, high address storage word under common intel frameworks
It is high-order), therefore the actual storage form of constant 0x6A09E667 is 67E6096A, other 7 constants are similar.In common program
In realization, static nature code is feasible as the method for recognition code algorithm.But it is used more and more in Malware
The Code Obfuscation Security Technologies such as software shelling.At this moment static nature code hide or has been changed, in this case, the identification
Method is hard to work.Then following dynamic identifying method is needed.
Step 3:It collects and analyzes the execution mark of target program and extracts cipher code and (realize the program of cryptographic algorithm
Code) and its input/output argument (input/output relation in step 4 is the matching referred between these input/output arguments
Relationship, we will determine corresponding cryptographic algorithm according to these matching relationships).Cryptographic algorithm, which can be identified, to be based primarily upon pair
Password realizes the observation of the three classes important feature of code.These features find and are confirmed during we study.
Observation 1:Cipher code largely uses bit arithmetic instruction.Since the characteristics of cryptographic algorithm itself, leads to code meeting
There are many arithmetic instructions, particularly to operation of replacing and replace, assembled code will use a large amount of bit arithmetic
Instruction.Equally, many cryptographic algorithms are all optimized, such as the password as the AES this present age according to modern computing architecture
Algorithm has all carried out speed-optimization according to 32 bit architectures of Intel, and easy-to-use bit arithmetic has been used to instruct.
Observation 2:Cipher code includes cycle.When replacement and displacement modification internal data, they will repeatedly change these
Data can be affirmed, even if employing " loop code expansion " technology, the basic block of cipher code can be also performed a plurality of times.
Observation 3:There are relationships that is a kind of predefined and can verify that for the input and output of cipher code.It is contemplated that password calculate
Method all being to determine property.To arbitrarily inputting, corresponding output is all constant.The cryptographic algorithm performed in mark to one,
It includes outputting and inputting parameter and can defer to relationship determined by cryptographic algorithm in mark.
Our research object is the program on Windows/X86 platforms, and the dynamic two that Intel Company is selected to propose
System pitching pile tool PIN is as the tool for performing mark collection.The advantages of tool, is its ease for use and can handle self
The code of modification, this code are common in the program of Code obfuscation.The collection process for performing mark is exactly that we are inserted using binary system
Stake technology DBI (Dynamic Binary Instrumentation) dissects program data stream, allows target program in DBI
Controlled under tool PIN to perform (as shown in Figure 2), which can support the fine-grained instruction trace to one process.It is received by PIN
Collect perform track, including routine access and the region of memory of modification.In order to detect cryptographic algorithm and their parameter, we
Need the structured representation that will perform mark promotion to upper strata, i.e. cycle, basic block and flow chart of data processing figure.Then next
Whether in the structured representation on these upper stratas have the execution of cipher code, and the result based on inspection is identified if being checked in step
The algorithm and relevant parameter information gone out.
Preferably, perform mark collects and analyzes process mainly comprising data reduction and data analysis two parts, the number
Come from instruction inside known codes library according to simplifying tool and including excluding those and two kinds of mistakes are filtered by Thread Id
Filter mode, the data analysis are detected including basic block, and loop detection, the generation of loop-around data flow graph and parameter information are collected.
Data reduction is exactly to reduce the size for performing mark file, we use two kinds of filter methods.On the one hand, it excludes
Fall those and come from instruction inside known codes library, for these code libraries, we know in advance does not contain password generation wherein
Code.Using dynamic link library (DLL) white list, we can avoid big code section.This for lower generation mark time and
File size is particularly useful.On the other hand, we can be filtered by Thread Id, can also be held a certain number of
Generation mark after row instruction, if than it is known that target program is cryptor, then the code of sheller can be skipped.
Data analysis is detected including basic block, and loop detection, the generation of loop-around data flow graph and parameter information are collected.First, in order to full
The condition of sufficient cipher code analysis, it would be desirable to following information is recorded in instruction-level granularity:(1) current thread ID.(2) when
Preceding instruction and relevant register and data.(3) front and rear memory value is performed in instruction, including pattern (reading or writing), length
Degree, address.(4) Debugging message (optional) of current instruction position, such as DLL modules, functional symbol, the offset of functional symbol.
Basic block detects.One basic block just refers to one section of orderly instruction sequence, it is always transported according to specified sequence
Row.It is carried out during detection according to the perform track of instruction sequence, if it only has single entrance and exit, is identified as one substantially
Block.Because each basic block is by execution mark dynamic generation, therefore the result of basic block probe algorithm is calculated with static instrumentation
The difference of method.Basic block is generated by dynamic mark, therefore probe algorithm does not take into account that non-executable code, because of these codes
It is not present in performing in mark.One advantage of dynamic mark is can be with the execution branch of monitoring programme, and can be these
The result of branch is merged into basic block probe algorithm.If the code that basic block is changed self changes, this variation will
It can be found when first time fresh code performs, because the instruction of fresh code is different from old code command.
Loop detection.Cycle is the important feature of cipher code.Therefore we need emphasis detection containing recursion instruction
Module.We provide the definition of the cycle to be found first, include simple cycle and nested two kinds of cycle.Refer to if X86 is machine
Collection is enabled, Trace is the set for performing mark, to a word a ∈ X86 in X86*, (i.e. a is the instruction sequence of an X86) remembers a
Prefix sets for Pre (a), if there are r ∈ X86*So that a=br then has b ∈ Pre (a).If the instruction strip number in a is not small
In one, then a ∈ X86 are denoted as+.Simple cycle is defined as all mark L for meeting following condition, i.e.,
L/ins={ anb|a∈X86+, n>2, b ∈ Pre (a) }, wherein L/insRepresent the instruction in L.
Instruction sequence a during simple cycle defines is known as loop body, we can be replaced with some cycle labeling X.
Further, if LIDFor the set of cycle labeling, then nested circular in definition is
L/ins={ anb|a∈(X86ULID)+, n>2, b ∈ Pre (a) }.
It should be noted that in nested circular in definition, some loop body can be in the number of interior loop and outer loop
It is different.Cyclic representation as shown in Figure 3 is ABBBCABBC, and interior loop body B is represented with X.Although interior loop B performs 3
It is secondary, and only carried out 2 times in outer loop, but entire cycle is still denoted as AXCAXC by we, so as to the AXC in outer loop
It is exactly loop body.
We provide the thought of the probe algorithm of cycle below.Loop detection algorithm is in representing { anb|∈
(X86ULID)+, n>2, b ∈ Pre (a) } identification carry out, the identification process with instruction relationship context be closely related.
Cycle recognizer handles the machine instruction performed in mark, and store them in the structure of an analogous list successively, claims
For History.One common situation is as shown in Figure 4.Instruct I1, I2, I1, I3Be recorded to History structures it
In, and currently processed machine instruction is I1, therefore the instruction occurs twice in History.The I in History1It is every
Primary appearance all may be the beginning of cycle.There are two types of situations altogether now, the first situation loop body is a=I1, I2, I1,
I3, second of loop body is a=I1, I3.Therefore, algorithm has obtained two cycle examples, is denoted as L1And L2.Each example has one
A " cursor " indicates next desired instruction, such as L1In be I2, L2In be I3(as shown in Figure 5).Then I1It is added to
History, it is assumed that I3It is the next step machine instruction performed in mark, as shown in Figure 6.Present cycle example L1Possibility just by
Remove, because the instruction occurred is not I2.On the other hand, L2Cursor is moved, and is directed toward next expecting instruction I1, such as
Shown in Fig. 7.At this time, it is seen that L2Just there are two iteration, i.e. I1, I3, I1, I3, we thus judge this confirms that one is followed
Ring example is simultaneously marked using cycle labeling X in History, as shown in Figure 8.Assuming that the machine instruction of next appearance is
I4, and L2Instruction desired by cursor is I1, therefore remove previous cycle L2And it is marked.L2Code label X has been used to make
Obtaining outer loop can be detected, and can be with the L in outer loop2The each iterations of itself are unrelated.
Loop-around data flow graph.At present it is contemplated that each cipher code includes is single cycle.However, password
Function is actually usually made of, such as RC4 algorithms the cycle of several nestings.Therefore, only to single loop it is abstract not
Cipher function can completely be captured.In order to handle the problem, the concept that we introduce data flow will participate in same password reality
Cycle example in existing integrates.The data flow that we are defined between cycle example is as follows:Two cycle example L1And L2It is to connect
It connects, if L1Some output parameter as L2Input parameter.For brevity, we only consider memory parameters, because
Accurate stain tracking is carried out on the continuous code of cycle example for register parameters needs.In fact, we assume that memory
Input/output argument all handled by recycling.To each cycle example L, IN is rememberedM(L) and INR(L) it is respectively algorithm
Memory and register in input parameter.OUTM(L) and OUTR(L) algorithm is the output parameter in memory and register respectively.
We provide the thought of looping traffic construction algorithm below.
If { L1..., LnIt is set from the mark T ∈ Trace cycle examples extracted.It is defined between these cycle examples
One binary crelationTo arbitrary (i, j) ∈ [1, n]2If meet condition LiAppear in LjBefore, and set OUTM(Li) and
INM(Lj) intersect for empty set, then there is LiLj.Then we define loop-around data flow graph G and areG is one
Acyclic figure can have several connection branch g1, g2..., gm, each branch can have several root nodes and leaf node.To a company
Meet branch gk, we use ROOT [gk] and LEAF [gk] set of root node and leaf node is represented respectively.
Each connection branch gkRepresent an information extraction, the function being similarly in usual binary program.Therefore it is every
A gkIt is exactly that a candidate cipher function is realized, is used subsequently to be compared with the realization of known password.It is every to what is detected
A pair of of cycle example LiAnd Lj, using the graph-theoretical algorithm of a standard, by testing whether it meets binary crelationAnd its company
Branch is met to construct loop-around data flow graph.We branch into loop-around data flow graph at referred to as these connections.
In the compound situation of different cipher functions, i.e., the input of the output of one function as another function, they
It will be classified into same cycle flow graph.The solution of the problem is to consider each possible path of cycle flow graph.
For example g is a cycle flow graph ({ L1, L2, L3), meet L1L2, L2L3, then we are in comparison phase not merely test branch
{L1, L2, L3, also test { L1, L2And { L2, L3, last test single loop example.We being capable of recognition code function in this way
The situation of synthesis.
Parameter information is collected.Loop detection can to extract possible cipher code from performing in mark, but we
Final purpose is to collect cryptographic parameter information.The parameter of cycle example is the corresponding low level of high-level realization (such as source code)
Object performs the byte read and write in mark and constitutes our starting point.To a cycle example L, we are by combining following three
A necessary condition collects its supplemental characteristic:
(1) or the byte for belonging to same parameter in example L is adjacent byte in memory or is synchronization
Value in same register.The condition tends to multiple high-level parameters to be packaged into a parameter of example L.In fact,
Really possible adjacent, the situation particularly in storehouse of different high-level parameters in memory.Neighbour excessive in this way
The complexity of last algorithm comparison phase can closely be significantly enhanced, then we need following two condition.
(2) byte for belonging to same parameter in example L can be by instruction identical in the loop body BODY [L] of L with identical
Mode of operation (reading or writing) processing.An instruction in BODY [L] may handle different bytes in each iteration really,
But the role residing for these data is identical.
(3) finally, the byte for belonging to an input parameter in example L before reading will not by other codes in L into
Row write operates, and equally, the byte for belonging to an output parameter is certain to carry out write operation by the code of L.In order to collect these ginsengs
Number, our defined parameters variables, the i.e. byte arrays since some memory address.If a parametric variable is from address
0x400000 starts, and comprising 4 bytes, is then denoted as 0x400000:4.
We provide the algorithm idea of parameter collection now.
Byte is packaged into parametric variable first with the first two necessary condition above, then utilizes third condition handle
These parametric variables are divided into two classes:Output and input parameter.Same parametric variable can be appeared among this two class.Then
The parametric variable that following principle is obtained for previous step is used to assign a fixed value.Our execution mark is each data access
Corresponding value is collected, the principle for being these parameter assignments is:Input parameter assigns its value read for the first time, and output parameter assigns it
The value of last write-in.Finally, to each cycle example L, algorithm returns to INM(L) and INR(L), respectively in memory and register
Input parameter.And OUTM(L) and OUTR(L), the output parameter respectively in memory and register.
Loop-around data flow graph realizes that identification model is laid a good foundation for password, our final goal is to extract password ginseng
Number.We define the cycle example parameter that loop-around data Flowsheet parameter is not used in intermediary data stream for those in memory.To posting
Storage parameter, we take root node input register and leaf node output register as parameter.
If G:It is a loop-around data flow graph, its input parameter INGIt is defined as
∪(IN L ∪ OUT L ∪ ∈ROOT INRL,
Output parameter OUTGIt is defined as
∪(OUT L ∪ IN L ∪ ∈ OUTR L。
The value of these parameters has been collected during cycle example parameter extraction, therefore we establish one
A model extracts possible password realization and its parameter from execution mark.We can be carried out the knowledge of cryptographic algorithm in next step
Not.
Step 4:Utilize input/output relation (matching relationship i.e. between input parameter and output parameter) and behavioral characteristics
Data are compared, and confirm the cryptographic algorithm performed in target software.The final step of our recognition methods is exactly by recurring number
Be compared according to flow graph and cipher template realization, according to template realize program and input/output relation whether matching is close to judge
The title of code algorithm.Comparison algorithm needs to input two following class parameters:
(1) each looping traffic g extractedkAndWithIn parameter.
(2) to each disclosed cryptographic algorithm F, corresponding there are one the template program P referred toFSource code.Particularly, have
Whether one function prototype describes its high-level input/output argument, be variable-length including these parameters.
The theories integration of comparison algorithm comes from thought:The realization function of cryptographic algorithm keeps specific input and output to close
System.If in fact, F1It is that a cipher function meets F1(K, C)=P, wherein K are keys, and C is ciphertext, and P is bright after decrypting
Text, then it is hardly possible to have another cipher function F2Also meet F2(K, C)=P.I.e. ciphertext, key and plaintext are to ((K, C), P)
The realization function F of cryptographic algorithm is determined with absolute advantage1.The purpose of comparison algorithm is to checkWithDirectly
Whether relationship is also implemented PFIt is kept.If such relationship is set up, illustrate gkPerform function F.In other words, with
In input value perform program PF, practical output valve should be withIn value can match.
The coefficient and parameter gone out given in the above embodiments is available to those skilled in the art to realize or use
Invention, invention, which does not limit, only takes aforementioned disclosed numerical value, in the case where not departing from the thought of invention, the technology of this field
Personnel can make above-described embodiment various modifications or adjustment, thus the protection domain invented is not by above-described embodiment institute
Limit, and should be the maximum magnitude for meeting the inventive features that claims are mentioned.
Claims (4)
1. a kind of instruction-level cryptographic algorithm recognition methods, specifically includes following step:It is calculated Step 1: establishing disclosed password
The feature database of method, the feature of the algorithm include static nature code and behavioral characteristics instruction sequence;Step 2: scanning and matching mesh
Static nature code in beacon course sequence passes through static nature code recognition code algorithm;Step 3: collect and analyze holding for target program
Trace simultaneously extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: joined using input parameter and output
Matching relationship between number is compared with behavioral characteristics data, confirms the cryptographic algorithm performed in target program;
The method further includes the process that collects and analyzes for performing mark, and the process that collects and analyzes for performing mark mainly includes number
According to simplify and data analysis two parts, the data reduction include exclude those come from the instruction inside known codes library with
And two kinds of filter types are filtered by Thread Id, the data analysis includes basic block detection, loop detection, loop-around data
Flow graph generates and parameter information is collected;
The basic block is carried out when basic block detects according to the execution mark of behavioral characteristics instruction sequence by execution mark dynamic generation,
If it only has single entrance and exit, be identified as a basic block, the code self changed when basic block changes, then this
Kind variation will be found when first time fresh code performs;
The loop detection specifically includes following step:Step a, the machine instruction in processing execution mark, and them successively
It stores in lists, referred to as History;Step b, multiple possible cycle examples are obtained according to the repetitive instruction wherein occurred,
There are a corresponding next desired instructions for each cycle example;Step 3: new machine instruction is added to
History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example and existed using cycle labeling X
It is marked in History;
The loop-around data flow graph generates:For every a pair of of the cycle example L detectediAnd Lj, use standard
Graph-theoretical algorithm, by testing, whether it meets binary crelation and it connects branch and constructs loop-around data flow graph;
The parameter information collection specifically includes:Byte is packaged by parametric variable according to condition first, then utilizes condition handle
These parametric variables are divided into two classes:Output and input parameter;The parametric variable that following principle is obtained for previous step is used to assign again
A fixed value is given, mark is performed and collects corresponding value for each data access, the principle for being these parameter assignments is:Input parameter
Its value read for the first time is assigned, output parameter assigns the value of its last write-in;Finally, to each cycle example L, algorithm returns
INM(L) and INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR(L), respectively memory and
Output parameter in register.
2. instruction-level cryptographic algorithm recognition methods as described in claim 1, it is characterised in that the dynamic of the cryptographic algorithm is special
The template program realized according to the algorithm is levied, instruction sequence and relevant operating data when extracting its execution are formed, dynamically referred to
Enable D1, D2..., DnFinite sequence formed one execution mark.
3. instruction-level cryptographic algorithm recognition methods as described in claim 1, it is characterised in that using binary pitching pile work
Has PIN as the tool for performing mark collection.
4. a kind of instruction-level cryptographic algorithm identifying system, it is characterised in that specifically include feature database and establish unit, static nature knowledge
Other unit and behavioral characteristics recognition unit;The feature database establishes unit for establishing the feature database of disclosed cryptographic algorithm, institute
The feature for stating algorithm includes static nature code and behavioral characteristics instruction sequence;The static nature recognition unit for scan and
With the static nature code in target program, pass through static nature code recognition code algorithm;The behavioral characteristics recognition unit is used for
It collects and analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm, and utilize
Matching relationship between input parameter and output parameter is compared with behavioral characteristics data, is performed in confirmation target program close
Code algorithm;
The process that collects and analyzes for performing mark mainly includes data reduction and data analysis two parts, the data reduction packet
It includes and excludes those and come from the instruction inside known codes library and two kinds of filter types are filtered by Thread Id, it is described
Data analysis includes basic block detection, loop detection, the generation of loop-around data flow graph and parameter information and collects;
The basic block is carried out when basic block detects according to the execution mark of behavioral characteristics instruction sequence by execution mark dynamic generation,
If it only has single entrance and exit, be identified as a basic block, the code self changed when basic block changes, then this
Kind variation will be found when first time fresh code performs;
The loop detection includes the following steps:Step a, the machine instruction in processing execution mark, and they are stored successively
In lists, referred to as History;Step b, multiple possible cycle examples are obtained according to the repetitive instruction wherein occurred, it is each
All there are a corresponding next desired instructions for a cycle example;Step 3: new machine instruction is added to
History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example and existed using cycle labeling X
It is marked in History;
The loop-around data flow graph generates:For every a pair of of the cycle example L detectediAnd Lj, use standard
Graph-theoretical algorithm, by testing, whether it meets binary crelation and it connects branch and constructs loop-around data flow graph;
The parameter information collection includes:Byte is packaged by parametric variable according to condition first, then using condition these
Parametric variable is divided into two classes:Output and input parameter;The parametric variable that following principle is obtained for previous step is used to assign one again
A fixed value performs mark and collects corresponding value for each data access, and the principle for being these parameter assignments is:Input parameter assigns
Its value read for the first time, output parameter assign the value of its last write-in;Finally, to each cycle example L, algorithm returns to INM(L)
And INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR(L), respectively memory and register
In output parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510755316.6A CN105426707B (en) | 2015-11-09 | 2015-11-09 | A kind of instruction-level cryptographic algorithm recognition methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510755316.6A CN105426707B (en) | 2015-11-09 | 2015-11-09 | A kind of instruction-level cryptographic algorithm recognition methods and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105426707A CN105426707A (en) | 2016-03-23 |
CN105426707B true CN105426707B (en) | 2018-06-19 |
Family
ID=55504915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510755316.6A Active CN105426707B (en) | 2015-11-09 | 2015-11-09 | A kind of instruction-level cryptographic algorithm recognition methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105426707B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106452733A (en) * | 2016-11-24 | 2017-02-22 | 中国电子科技集团公司第三十研究所 | Block cipher identification method based on ciphertext analysis |
CN108073814B (en) * | 2017-12-29 | 2021-10-15 | 安天科技集团股份有限公司 | Shelling method and system based on static structured shelling parameters and storage medium |
CN110347432B (en) * | 2019-06-17 | 2021-09-14 | 海光信息技术股份有限公司 | Processor, branch predictor, data processing method thereof and branch prediction method |
CN112395613B (en) * | 2019-08-15 | 2022-04-08 | 奇安信安全技术(珠海)有限公司 | Static feature library loading method, device and equipment |
CN111222138A (en) * | 2019-12-31 | 2020-06-02 | 阿尔法云计算(深圳)有限公司 | Algorithm checking method, algorithm right confirming method and device |
CN112149138B (en) * | 2020-11-24 | 2021-02-19 | 北京智芯微电子科技有限公司 | Method and system for detecting program vulnerability of cryptographic algorithm and storage medium |
CN118378288B (en) * | 2024-06-24 | 2024-09-06 | 山东省计算中心(国家超级计算济南中心) | Encryption algorithm dynamic detection method and system based on Pin tool |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577323A (en) * | 2013-09-27 | 2014-02-12 | 西安交通大学 | Dynamic key command sequence birthmark-based software plagiarism detecting method |
CN104484175A (en) * | 2014-12-16 | 2015-04-01 | 上海交通大学 | Method for detecting cryptology misuse of Android application programs |
CN104517057A (en) * | 2014-12-22 | 2015-04-15 | 中国人民解放军信息工程大学 | Software hybrid measure method based on trusted computing |
-
2015
- 2015-11-09 CN CN201510755316.6A patent/CN105426707B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577323A (en) * | 2013-09-27 | 2014-02-12 | 西安交通大学 | Dynamic key command sequence birthmark-based software plagiarism detecting method |
CN104484175A (en) * | 2014-12-16 | 2015-04-01 | 上海交通大学 | Method for detecting cryptology misuse of Android application programs |
CN104517057A (en) * | 2014-12-22 | 2015-04-15 | 中国人民解放军信息工程大学 | Software hybrid measure method based on trusted computing |
Non-Patent Citations (1)
Title |
---|
密码算法识别与分析关键技术研究;李继中;《中国博士学位论文全文数据库》;20140415;第1-17页,第21-22页,第82-83页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105426707A (en) | 2016-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105426707B (en) | A kind of instruction-level cryptographic algorithm recognition methods and system | |
Delvaux et al. | Helper data algorithms for PUF-based key generation: Overview and analysis | |
Yu et al. | Deescvhunter: A deep learning-based framework for smart contract vulnerability detection | |
Cui et al. | A robust FSM watermarking scheme for IP protection of sequential circuit design | |
JP4806402B2 (en) | Program obfuscation apparatus and obfuscation method | |
CN104919750B (en) | Calculate the computing device and method of the data function on function input value | |
Bos et al. | Assessing the feasibility of single trace power analysis of Frodo | |
US9721120B2 (en) | Preventing unauthorized calls to a protected function | |
CN104156481A (en) | Android encryption communication detection device and method based on dynamic linking library injection | |
Nguyen et al. | Blocking self-avoiding walks stops cyber-epidemics: a scalable gpu-based approach | |
CN116361810A (en) | Intelligent contract vulnerability detection method based on symbol execution | |
JPWO2017146094A1 (en) | Attack code detection device, attack code detection method, and attack code detection program | |
Hettwer et al. | Side-channel analysis of the xilinx zynq ultrascale+ encryption engine | |
CN104484175A (en) | Method for detecting cryptology misuse of Android application programs | |
CN104200137A (en) | Method for guaranteeing self-security of JAVA program | |
Alarifi et al. | Diversify sensor nodes to improve resilience against node compromise | |
CN115659358A (en) | Intelligent contract fuzzy test method and device | |
CN106528403B (en) | Monitoring method when software based on binary code implanted prosthetics is run | |
JP2015106914A (en) | Malware communication analyzer and malware communication analysis method | |
Bernstein et al. | KyberSlash: Exploiting secret-dependent division timings in Kyber implementations | |
CN103093144B (en) | Detection method and detection system of crypto module application program interface (API) safety | |
CN107256370B (en) | Secret key storage method of fuzzy safe based on SRAM-PUF | |
Immler | Breaking hitag 2 revisited | |
US10514979B2 (en) | Method and device for processing data | |
Debes et al. | ZEKRA: Zero-Knowledge Control-Flow Attestation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |