CN105426707B - A kind of instruction-level cryptographic algorithm recognition methods and system - Google Patents

A kind of instruction-level cryptographic algorithm recognition methods and system Download PDF

Info

Publication number
CN105426707B
CN105426707B CN201510755316.6A CN201510755316A CN105426707B CN 105426707 B CN105426707 B CN 105426707B CN 201510755316 A CN201510755316 A CN 201510755316A CN 105426707 B CN105426707 B CN 105426707B
Authority
CN
China
Prior art keywords
code
algorithm
instruction
cryptographic algorithm
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510755316.6A
Other languages
Chinese (zh)
Other versions
CN105426707A (en
Inventor
张李军
吉庆兵
于飞
罗杰
陈曼
刘丹
谈程
高鹏军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN201510755316.6A priority Critical patent/CN105426707B/en
Publication of CN105426707A publication Critical patent/CN105426707A/en
Application granted granted Critical
Publication of CN105426707B publication Critical patent/CN105426707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/14Protecting executable software against software analysis or reverse engineering, e.g. by obfuscation

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention relates to cryptographic algorithm identification technology fields, and the invention discloses a kind of instruction-level cryptographic algorithm recognition methods, specifically include following step:Step 1: establishing the feature database of disclosed cryptographic algorithm, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;Step 2: the static nature code in scanning and matching target program, passes through static nature code recognition code algorithm;Step 3: it collects and analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: being compared using the matching relationship between input parameter and output parameter and behavioral characteristics data, the cryptographic algorithm performed in target program is confirmed.The cryptographic algorithm that instruction-level is carried out by the above method identifies that recognition accuracy is high.

Description

A kind of instruction-level cryptographic algorithm recognition methods and system
Technical field
The present invention relates to cryptographic algorithm identification technology field more particularly to a kind of instruction-level cryptographic algorithm recognition methods and it is System.
Background technology
Cryptographic algorithm has become the necessary means to ensure information safety.It is important in network in the Network Information epoch Electronic equipment such as interchanger, router, fire wall and other specific encryption and decryption equipment are all in its embedded software program Cryptographic algorithm is used.The security mechanism of software in these equipment is analyzed, detected and eliminated safe hidden trouble, it is just necessary The identification to cryptographic algorithm is completed in program code reverse process.In addition, the Malwares such as computer virus, wooden horse are adopted extensively Reached the static nature of change oneself and hiding network traffic with cryptographic algorithm or protected its payload content Purpose.Cryptographic algorithm identification is the key technology that these Malwares are carried out with feature extraction and core content decryption.
Cryptographic algorithm identifies a branch for belonging to program comprehension, and program comprehension is to be obtained from the inside of computer program Relevant knowledge information often positions framework and function with recognizer by object code.Password in software is calculated Method can be identified from binary code and assembly code the two levels.Binary code rank is mainly sick using being similar to The features such as the static nature code matching technique in poison detection, the initialization value that will occur in common cryptographic algorithm in advance, S box parameters Code is collected into feature database, then scans target software, if there is the condition code to match, is judged as that corresponding password is calculated Method.The cryptographic algorithm that Grobert and Zhao et al. devised automation using static nature code successively in 2010 and 2011 Recognition methods.The identification of assembly code rank is that target software is carried out dis-assembling processing, and then extraction is specific with cycle etc. The instruction sequence of structure is compared to close in identification target software using method and the known cryptographic algorithm of pattern match Code algorithm.The identification of current many cryptographic algorithms contributes to the conversed analysis of Malware.2009, Wang et al. was for the first time It proposes and is dynamically detected when program is run and the cryptographic algorithm in recognizer.They first with data life period, It is marked including data stain, binary pitching pile technology goes to determine the transfer point of bright ciphertext, i.e. of message decryption and processing Point.Then the region of memory of message after determining storage is decrypted.They used four standards agreement (HTTPS, IRC, MIME with And the unknown agreement that is used in Malware Agobot) assess the effect of this method.In their test, exploitation Software tool can decrypt all cipher-text messages.The major defect of this method is between message decryption and Message Processing only A transfer point can be handled, even program first decrypts one block of message, then handles, then decrypts again, then this method is not The cryptographic algorithm of this pattern can be correctly identified.Lutz is found that cryptographic operation has more bit arithmetic instruction.Lutz's Method is based primarily upon following three observations:(1) cycle is a core component of cryptographic algorithm;(2) cryptographic algorithm largely uses Integer arithmetic;(3) decrypting process reduces the comentropy of labeled data.The core of the identification facility of Lutz exploitations is to use Stain is analyzed and judges whether the buffering area is decrypted by calculating the comentropy of a buffer data.Caballero etc. People has become more meticulous the method for Wang, and the discovery about cryptographic operation of Lutz is utilized.They carry out Malware MegaD The agreement of automation is reverse and cryptographic algorithm identifies.To each function example of software, they calculate its bit arithmetic instruction Ratio.If the function performs bit arithmetic instruction at least 20 times and ratio has been more than 55%, which is marked For encryption and decryption function.In an actual test, this method has found all cipher functions.In order to identify cipher function Parameter (in plain text etc.), they attempt to determine the set that the data of labeled function are read in.In order to distinguish in plain text, key and other Data used in function are encrypted, the data of the different instances of their more same functions read in set.Thus only Only clear portion can change, therefore can identify clear data.In terms of the cryptographic algorithm identification of other Malwares Also the following analysis and research:2010, Werber and Leder analyzed Malware Conficker, it is found that the software makes It is realized with a disclosed templating of the SHA-1 in OpenSSL and MD6.It is interesting that attacker then calculates the MD6 Patch has been beaten in the realization of method, has modified one of Buffer Overflow loophole.Further, Porras et al. is found that in many The developer of P2P Malwares uses the RSA of 1024 as signature verification algorithm, in the new version of certain softwares even with The RSA Algorithm of 4096.Then, Werber and Leder also analyzes Malware Waledac, identifies used in it There are 1000 to come from cryptographic algorithm library OpenSSL, and aes algorithm uses the CBC that IV values are 0 in 4000 functions Encryption mode.Stewart analyzes and identifies the algorithm in Malware Storm Worm, should to point-to-point high-speed traffic Software has used static XOR algorithms to be authenticated child node, and key uses the RSA Algorithm of 56 bits.It is 2012, domestic Using the method that Instruction Statistics characteristic similarity judges come recognition code algorithm in Li Ji et al., but cipher function can only be extracted, Algorithm title can not be identified.2013, Shu Hui et al. was extracted and has been identified to the cycle specificity of cryptographic algorithm, is improved The accuracy of cipher function positioning.
Synthesis is got on very well, and cryptographic algorithm identification is all based on (the static feature that some features of cryptographic algorithm in itself are realized Code or dynamic instruction sequence), the current recognition methods for cryptographic algorithm in software is primarily present problems with:
(1) accuracy of cryptographic algorithm recognition methods is low.The cryptographic algorithm that has in practice lack apparent Constant eigenvalue or These features of person are hidden in program data section and are difficult to detect, this has resulted in current most of dependent on static nature Recognition methods accuracy is relatively low.
(2) it is difficult to out specific cryptographic algorithm title.Although dynamic approach can be with trace routine implementation procedure, and energy The perform track of extraction program, but this can only utilize loop detection to navigate to crucial function, due to not obtaining entire function The relationship of call chain leads to identify that the partial function of cryptographic algorithm realizes that but None- identified goes out entire cryptographic algorithm.
(3) it is extremely difficult using the cryptographic algorithm identification in the software of Code obfuscation.Current many Malwares are all extensive It employs the Code Obfuscation Security Technologies such as shell adding the feature and implementation procedure of cryptographic algorithm are hidden and filtered, substantially increase The difficulty of cryptographic algorithm identification.
(4) the degree of automation of cryptographic algorithm identification is relatively low.Cryptographic algorithm identification is carried out completely by the way of artificial to go Understand the behavior of target Malware, what this undoubtedly took very much.The automatization level of raising algorithm identification, which is important, grinds Study carefully direction.
Invention content
For the above problem existing for cryptographic algorithm recognition methods of the prior art, the invention discloses a kind of instruction-levels Cryptographic algorithm recognition methods and system.
The invention discloses a kind of instruction-level cryptographic algorithm recognition methods, specifically include following step:Step 1: it builds The feature database of disclosed cryptographic algorithm is found, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;Step 2nd, the static nature code in target program is scanned and matched, passes through static nature code recognition code algorithm;Step 3: collect and It analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: it utilizes Matching relationship between input parameter and output parameter is compared with behavioral characteristics data, is performed in confirmation target program close Code algorithm.
Further, its execution is extracted in the template program that the behavioral characteristics of above-mentioned cryptographic algorithm are realized according to the algorithm When instruction sequence and relevant operating data formed, dynamic instruction D1, D2..., DnFinite sequence formed one execution mark.
Further, using binary pitching pile tool PIN as the tool for performing mark collection.
Further, the above method further include perform mark collect and analyze process, the collection for performing mark and point For analysis process mainly comprising data reduction and data analysis two parts, the data reduction comes from known generation including excluding those Instruction inside code library and two kinds of filter types are filtered by Thread Id, the data analysis include basic block detection, Loop detection, the generation of loop-around data flow graph and parameter information are collected.
Further, above-mentioned basic block is by execution mark dynamic generation, according to the execution of instruction sequence when basic block detects Track carries out, if it only has single entrance and exit, a basic block is identified as, when the code that basic block is changed by self Change, then this variation will be found when first time fresh code performs.
Further, above-mentioned loop detection specifically includes following step:Step a, the machine performed in mark is handled successively Device instructs, and stores them in list, referred to as History;Step b, it is obtained according to the repetitive instruction wherein occurred multiple Possible cycle example, there are a corresponding next desired instructions for each cycle example;Step 3: by new Machine instruction is added to History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example simultaneously It is marked in History using cycle labeling X.
Further, above-mentioned loop-around data flow graph, which generates, is specially:For every a pair of of the cycle example L detectediWith Lj, using the graph-theoretical algorithm of a standard, by testing, whether it meets binary crelation and it connects branch and is recycled to construct Data flow diagram.
Further, above-mentioned parameter information collection specifically includes:Byte is packaged by parametric variable according to condition first, Then these parametric variables are divided into two classes using condition:Output and input parameter;Following principle is used to be obtained for previous step again The parametric variable taken assigns a fixed value, performs mark and collects corresponding value for each data access, is these parameter assignments Principle is:Input parameter assigns its value read for the first time, and output parameter assigns the value of its last write-in;Finally, to each cycle Example L, algorithm return to INM(L) and INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR (L), the output parameter respectively in memory and register.
The invention also discloses a kind of instruction-level cryptographic algorithm identifying system, specifically include feature database and establish unit, quiet State feature identification unit and behavioral characteristics recognition unit;The feature database establishes unit for establishing the spy of disclosed cryptographic algorithm Library is levied, the feature of the algorithm includes static nature code and behavioral characteristics instruction sequence;The static nature recognition unit is used for Static nature code in scanning and matching target program, passes through static nature code recognition code algorithm;The behavioral characteristics identification Unit is used to collect and analyze the execution mark of target program and extracts the program code for realizing cryptographic algorithm and its input and output ginseng Number, and be compared using the matching relationship between input parameter and output parameter and behavioral characteristics data, confirm target program The cryptographic algorithm of middle execution.
By using above technical solution, beneficial effects of the present invention are:New cryptographic algorithm is proposed in this method Identifying schemes can significantly reduce software safety mechanism according to the program with the cryptographic algorithm identification facility of design automation The time of analysis.This method combines the advantages of mark scanning of software static password and Dynamic Execution process analysis procedure analysis, significantly carries The high accuracy of cryptographic algorithm identification.The technology that algorithm identification is carried out using the parameters relationship of input and output in this method is disclosed Essence when cryptographic algorithm performs, can effectively break through the cognitive disorders that the Code obfuscations method such as software shelling is brought.It should Method provides the flow frame of a set of cryptographic algorithm parsing and identification, and versatility is good, can be used not only for block cipher Identification, can be used for the identification of public key algorithm module.
Description of the drawings
Fig. 1 is the overall flow figure of cryptographic algorithm identification.
Fig. 2 is the flow chart of program controlled execution under PIN tools.
Fig. 3 is the diagram of nested cycle (ABBBCABBC).
Fig. 4 is an instruction sequence of cryptographic algorithm.
Fig. 5 is instruction I1Recurrent state later.
Fig. 6 is another instruction sequence of cryptographic algorithm.
Fig. 7 is instruction I3Recurrent state later.
Fig. 8 is the state after a cycle example X is identified.
Specific embodiment
With reference to the accompanying drawings of the specification, the specific embodiment that the present invention will be described in detail.
The cryptographic algorithm recognition methods of instruction-level disclosed by the invention, available for performed close in extraction executable program Code function identifies disclosed cryptographic algorithm title.This method mainly includes following four steps (as shown in Figure 1):
Step 1:The feature database of open code algorithm is established, the cryptographic algorithm includes grouping algorithm, sequence algorithm, Kazakhstan Uncommon algorithm and public key algorithm, the feature of the algorithm include static nature code and behavioral characteristics instruction sequence.That establishes here is close Code algorithm characteristics library includes static nature code and behavioral characteristics instruction sequence.Cryptographic algorithm can be divided into grouping algorithm, sequence algorithm, Hash algorithm and public key algorithm.The static nature code of algorithm for grouping algorithm be mainly S box constants, initial permutation constant, Hash algorithm is mainly the initialization vector value of loop iteration, and sequence algorithm is shift register lengths etc., and public key algorithm is main It is related to Big prime.Since more and more Malwares employ Code Obfuscation Security Technology, this static nature code to cryptographic algorithm It is covered, so the recognition effect for fully relying on static nature code is very undesirable.It would therefore be desirable to establish password The behavioral characteristics library of algorithm.Behavioral characteristics refer to information during specific operation in program process, and program operation essence is just It is a series of instruction and related calling data.For cryptographic algorithm, its behavioral characteristics is collected, just must have the calculation The template program of the realization of method.According to these template programs, instruction sequence and relevant operating data when extracting its execution are made Behavioral characteristics for the algorithm.These dynamic instruction sequences and data are exactly the so-called concept for performing mark.One dynamic refers to The tuple that D is enabled to be made of following message:(1) memory address A [D];(2) the machine instruction I performed on A [D] [D];(3) two groups of memory address that I [D] reads and is written, are denoted as R respectivelyA[D] and WA[D];(4) I [D] readings and be written two Group register, is denoted as R respectivelyR[D] and WR[D].It is exactly dynamic instruction D that one, which performs mark T,1, D2..., DnFinite sequence.Due to The execution mark of each cryptographic algorithm is unique, therefore can clearly distinguish very much.Preferably, the cryptographic algorithm The template program that behavioral characteristics are realized according to the algorithm, instruction sequence and relevant operating data when extracting its execution are formed, Dynamic instruction D1, D2..., DnFinite sequence formed one execution mark.Specific reality of the extraction of behavioral characteristics dependent on algorithm Existing, the execution mark of template program that different compilers generates may be different.In practice, we can adopt according to operation platform The program generated with the compiler of mainstream on the platform extracts the execution mark of cryptographic algorithm, as Windows operating system can be with Using Visual Studio series compilers, linux system can use GCC compilers.It is collected to each cryptographic algorithm to hold After trace, the behavioral characteristics library of these open code algorithms is just established.This will be after us in dynamic identifying method Carry out the Template Information of instruction sequence comparison.
Step 2:Static nature code in scanning and matcher, passes through static nature code recognition code algorithm.To target Software carries out static nature scanning, by the static nature information in the cipher feature data extracted and cryptographic algorithm feature database into Row compares.Static nature code is the most direct form of expression of cryptographic algorithm, it is substantially the spy contained in various cryptographic algorithms Determine the constants such as initialization value, S box numerical value.Although cryptographic algorithm may be realized with different programming language and compiler, But these static nature codes are all fixed, usually in the executable program containing the cryptographic algorithm can directly recognize Mode exists.If for example, occur 3A in scanned program code, 32,2A, 22,1A, 12,0A, 02,3C, 34,2C, 24, 1C, the hexadecimal numbers such as 14,0C, 04, then can be determined that in program and used DES algorithms, because this is in des encryption algorithm Initialize the constant value of permutation table.If occurs code snippet in program:00 00 00 00 00 00 00 00 A5 63 63 C6 84 7C 7C F8,99 77 77 EE 8D 7B 7B F6 0D F2 F2 FF BD 6B 6B D6, B1 6F 6F DE 54 C5 C5 91 50 30 30 60 03 01 01 02 then can be determined that have used aes algorithm, numerical value (63 therein 63 77 77 7B 7B F2 F2 6B 6B 6F 6F C5 C5 30 30 01 01) be exactly S boxes in aes algorithm part it is normal Numerical value.For hash algorithm, algorithm nearly all contains initialization value, therefore static nature code is more prone to identify.Such as Initialization in SHA256 algorithms can use 8 hexadecimal constants, i.e. 0x6A09E667,0xBB67AE85,0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB and 0x5BE0CD19.If in code there are these often Number, then can be determined that have used SHA256 algorithms.It is noted that under different Computer Architectures, the mode of storage Some difference, store using small end mode that (low address stores the low level of word, high address storage word under common intel frameworks It is high-order), therefore the actual storage form of constant 0x6A09E667 is 67E6096A, other 7 constants are similar.In common program In realization, static nature code is feasible as the method for recognition code algorithm.But it is used more and more in Malware The Code Obfuscation Security Technologies such as software shelling.At this moment static nature code hide or has been changed, in this case, the identification Method is hard to work.Then following dynamic identifying method is needed.
Step 3:It collects and analyzes the execution mark of target program and extracts cipher code and (realize the program of cryptographic algorithm Code) and its input/output argument (input/output relation in step 4 is the matching referred between these input/output arguments Relationship, we will determine corresponding cryptographic algorithm according to these matching relationships).Cryptographic algorithm, which can be identified, to be based primarily upon pair Password realizes the observation of the three classes important feature of code.These features find and are confirmed during we study.
Observation 1:Cipher code largely uses bit arithmetic instruction.Since the characteristics of cryptographic algorithm itself, leads to code meeting There are many arithmetic instructions, particularly to operation of replacing and replace, assembled code will use a large amount of bit arithmetic Instruction.Equally, many cryptographic algorithms are all optimized, such as the password as the AES this present age according to modern computing architecture Algorithm has all carried out speed-optimization according to 32 bit architectures of Intel, and easy-to-use bit arithmetic has been used to instruct.
Observation 2:Cipher code includes cycle.When replacement and displacement modification internal data, they will repeatedly change these Data can be affirmed, even if employing " loop code expansion " technology, the basic block of cipher code can be also performed a plurality of times.
Observation 3:There are relationships that is a kind of predefined and can verify that for the input and output of cipher code.It is contemplated that password calculate Method all being to determine property.To arbitrarily inputting, corresponding output is all constant.The cryptographic algorithm performed in mark to one, It includes outputting and inputting parameter and can defer to relationship determined by cryptographic algorithm in mark.
Our research object is the program on Windows/X86 platforms, and the dynamic two that Intel Company is selected to propose System pitching pile tool PIN is as the tool for performing mark collection.The advantages of tool, is its ease for use and can handle self The code of modification, this code are common in the program of Code obfuscation.The collection process for performing mark is exactly that we are inserted using binary system Stake technology DBI (Dynamic Binary Instrumentation) dissects program data stream, allows target program in DBI Controlled under tool PIN to perform (as shown in Figure 2), which can support the fine-grained instruction trace to one process.It is received by PIN Collect perform track, including routine access and the region of memory of modification.In order to detect cryptographic algorithm and their parameter, we Need the structured representation that will perform mark promotion to upper strata, i.e. cycle, basic block and flow chart of data processing figure.Then next Whether in the structured representation on these upper stratas have the execution of cipher code, and the result based on inspection is identified if being checked in step The algorithm and relevant parameter information gone out.
Preferably, perform mark collects and analyzes process mainly comprising data reduction and data analysis two parts, the number Come from instruction inside known codes library according to simplifying tool and including excluding those and two kinds of mistakes are filtered by Thread Id Filter mode, the data analysis are detected including basic block, and loop detection, the generation of loop-around data flow graph and parameter information are collected.
Data reduction is exactly to reduce the size for performing mark file, we use two kinds of filter methods.On the one hand, it excludes Fall those and come from instruction inside known codes library, for these code libraries, we know in advance does not contain password generation wherein Code.Using dynamic link library (DLL) white list, we can avoid big code section.This for lower generation mark time and File size is particularly useful.On the other hand, we can be filtered by Thread Id, can also be held a certain number of Generation mark after row instruction, if than it is known that target program is cryptor, then the code of sheller can be skipped. Data analysis is detected including basic block, and loop detection, the generation of loop-around data flow graph and parameter information are collected.First, in order to full The condition of sufficient cipher code analysis, it would be desirable to following information is recorded in instruction-level granularity:(1) current thread ID.(2) when Preceding instruction and relevant register and data.(3) front and rear memory value is performed in instruction, including pattern (reading or writing), length Degree, address.(4) Debugging message (optional) of current instruction position, such as DLL modules, functional symbol, the offset of functional symbol.
Basic block detects.One basic block just refers to one section of orderly instruction sequence, it is always transported according to specified sequence Row.It is carried out during detection according to the perform track of instruction sequence, if it only has single entrance and exit, is identified as one substantially Block.Because each basic block is by execution mark dynamic generation, therefore the result of basic block probe algorithm is calculated with static instrumentation The difference of method.Basic block is generated by dynamic mark, therefore probe algorithm does not take into account that non-executable code, because of these codes It is not present in performing in mark.One advantage of dynamic mark is can be with the execution branch of monitoring programme, and can be these The result of branch is merged into basic block probe algorithm.If the code that basic block is changed self changes, this variation will It can be found when first time fresh code performs, because the instruction of fresh code is different from old code command.
Loop detection.Cycle is the important feature of cipher code.Therefore we need emphasis detection containing recursion instruction Module.We provide the definition of the cycle to be found first, include simple cycle and nested two kinds of cycle.Refer to if X86 is machine Collection is enabled, Trace is the set for performing mark, to a word a ∈ X86 in X86*, (i.e. a is the instruction sequence of an X86) remembers a Prefix sets for Pre (a), if there are r ∈ X86*So that a=br then has b ∈ Pre (a).If the instruction strip number in a is not small In one, then a ∈ X86 are denoted as+.Simple cycle is defined as all mark L for meeting following condition, i.e.,
L/ins={ anb|a∈X86+, n>2, b ∈ Pre (a) }, wherein L/insRepresent the instruction in L.
Instruction sequence a during simple cycle defines is known as loop body, we can be replaced with some cycle labeling X. Further, if LIDFor the set of cycle labeling, then nested circular in definition is
L/ins={ anb|a∈(X86ULID)+, n>2, b ∈ Pre (a) }.
It should be noted that in nested circular in definition, some loop body can be in the number of interior loop and outer loop It is different.Cyclic representation as shown in Figure 3 is ABBBCABBC, and interior loop body B is represented with X.Although interior loop B performs 3 It is secondary, and only carried out 2 times in outer loop, but entire cycle is still denoted as AXCAXC by we, so as to the AXC in outer loop It is exactly loop body.
We provide the thought of the probe algorithm of cycle below.Loop detection algorithm is in representing { anb|∈ (X86ULID)+, n>2, b ∈ Pre (a) } identification carry out, the identification process with instruction relationship context be closely related. Cycle recognizer handles the machine instruction performed in mark, and store them in the structure of an analogous list successively, claims For History.One common situation is as shown in Figure 4.Instruct I1, I2, I1, I3Be recorded to History structures it In, and currently processed machine instruction is I1, therefore the instruction occurs twice in History.The I in History1It is every Primary appearance all may be the beginning of cycle.There are two types of situations altogether now, the first situation loop body is a=I1, I2, I1, I3, second of loop body is a=I1, I3.Therefore, algorithm has obtained two cycle examples, is denoted as L1And L2.Each example has one A " cursor " indicates next desired instruction, such as L1In be I2, L2In be I3(as shown in Figure 5).Then I1It is added to History, it is assumed that I3It is the next step machine instruction performed in mark, as shown in Figure 6.Present cycle example L1Possibility just by Remove, because the instruction occurred is not I2.On the other hand, L2Cursor is moved, and is directed toward next expecting instruction I1, such as Shown in Fig. 7.At this time, it is seen that L2Just there are two iteration, i.e. I1, I3, I1, I3, we thus judge this confirms that one is followed Ring example is simultaneously marked using cycle labeling X in History, as shown in Figure 8.Assuming that the machine instruction of next appearance is I4, and L2Instruction desired by cursor is I1, therefore remove previous cycle L2And it is marked.L2Code label X has been used to make Obtaining outer loop can be detected, and can be with the L in outer loop2The each iterations of itself are unrelated.
Loop-around data flow graph.At present it is contemplated that each cipher code includes is single cycle.However, password Function is actually usually made of, such as RC4 algorithms the cycle of several nestings.Therefore, only to single loop it is abstract not Cipher function can completely be captured.In order to handle the problem, the concept that we introduce data flow will participate in same password reality Cycle example in existing integrates.The data flow that we are defined between cycle example is as follows:Two cycle example L1And L2It is to connect It connects, if L1Some output parameter as L2Input parameter.For brevity, we only consider memory parameters, because Accurate stain tracking is carried out on the continuous code of cycle example for register parameters needs.In fact, we assume that memory Input/output argument all handled by recycling.To each cycle example L, IN is rememberedM(L) and INR(L) it is respectively algorithm Memory and register in input parameter.OUTM(L) and OUTR(L) algorithm is the output parameter in memory and register respectively.
We provide the thought of looping traffic construction algorithm below.
If { L1..., LnIt is set from the mark T ∈ Trace cycle examples extracted.It is defined between these cycle examples One binary crelationTo arbitrary (i, j) ∈ [1, n]2If meet condition LiAppear in LjBefore, and set OUTM(Li) and INM(Lj) intersect for empty set, then there is LiLj.Then we define loop-around data flow graph G and areG is one Acyclic figure can have several connection branch g1, g2..., gm, each branch can have several root nodes and leaf node.To a company Meet branch gk, we use ROOT [gk] and LEAF [gk] set of root node and leaf node is represented respectively.
Each connection branch gkRepresent an information extraction, the function being similarly in usual binary program.Therefore it is every A gkIt is exactly that a candidate cipher function is realized, is used subsequently to be compared with the realization of known password.It is every to what is detected A pair of of cycle example LiAnd Lj, using the graph-theoretical algorithm of a standard, by testing whether it meets binary crelationAnd its company Branch is met to construct loop-around data flow graph.We branch into loop-around data flow graph at referred to as these connections.
In the compound situation of different cipher functions, i.e., the input of the output of one function as another function, they It will be classified into same cycle flow graph.The solution of the problem is to consider each possible path of cycle flow graph. For example g is a cycle flow graph ({ L1, L2, L3), meet L1L2, L2L3, then we are in comparison phase not merely test branch {L1, L2, L3, also test { L1, L2And { L2, L3, last test single loop example.We being capable of recognition code function in this way The situation of synthesis.
Parameter information is collected.Loop detection can to extract possible cipher code from performing in mark, but we Final purpose is to collect cryptographic parameter information.The parameter of cycle example is the corresponding low level of high-level realization (such as source code) Object performs the byte read and write in mark and constitutes our starting point.To a cycle example L, we are by combining following three A necessary condition collects its supplemental characteristic:
(1) or the byte for belonging to same parameter in example L is adjacent byte in memory or is synchronization Value in same register.The condition tends to multiple high-level parameters to be packaged into a parameter of example L.In fact, Really possible adjacent, the situation particularly in storehouse of different high-level parameters in memory.Neighbour excessive in this way The complexity of last algorithm comparison phase can closely be significantly enhanced, then we need following two condition.
(2) byte for belonging to same parameter in example L can be by instruction identical in the loop body BODY [L] of L with identical Mode of operation (reading or writing) processing.An instruction in BODY [L] may handle different bytes in each iteration really, But the role residing for these data is identical.
(3) finally, the byte for belonging to an input parameter in example L before reading will not by other codes in L into Row write operates, and equally, the byte for belonging to an output parameter is certain to carry out write operation by the code of L.In order to collect these ginsengs Number, our defined parameters variables, the i.e. byte arrays since some memory address.If a parametric variable is from address 0x400000 starts, and comprising 4 bytes, is then denoted as 0x400000:4.
We provide the algorithm idea of parameter collection now.
Byte is packaged into parametric variable first with the first two necessary condition above, then utilizes third condition handle These parametric variables are divided into two classes:Output and input parameter.Same parametric variable can be appeared among this two class.Then The parametric variable that following principle is obtained for previous step is used to assign a fixed value.Our execution mark is each data access Corresponding value is collected, the principle for being these parameter assignments is:Input parameter assigns its value read for the first time, and output parameter assigns it The value of last write-in.Finally, to each cycle example L, algorithm returns to INM(L) and INR(L), respectively in memory and register Input parameter.And OUTM(L) and OUTR(L), the output parameter respectively in memory and register.
Loop-around data flow graph realizes that identification model is laid a good foundation for password, our final goal is to extract password ginseng Number.We define the cycle example parameter that loop-around data Flowsheet parameter is not used in intermediary data stream for those in memory.To posting Storage parameter, we take root node input register and leaf node output register as parameter.
If G:It is a loop-around data flow graph, its input parameter INGIt is defined as
∪(IN L ∪ OUT L ∪ ∈ROOT INRL,
Output parameter OUTGIt is defined as
∪(OUT L ∪ IN L ∪ OUTR L。
The value of these parameters has been collected during cycle example parameter extraction, therefore we establish one A model extracts possible password realization and its parameter from execution mark.We can be carried out the knowledge of cryptographic algorithm in next step Not.
Step 4:Utilize input/output relation (matching relationship i.e. between input parameter and output parameter) and behavioral characteristics Data are compared, and confirm the cryptographic algorithm performed in target software.The final step of our recognition methods is exactly by recurring number Be compared according to flow graph and cipher template realization, according to template realize program and input/output relation whether matching is close to judge The title of code algorithm.Comparison algorithm needs to input two following class parameters:
(1) each looping traffic g extractedkAndWithIn parameter.
(2) to each disclosed cryptographic algorithm F, corresponding there are one the template program P referred toFSource code.Particularly, have Whether one function prototype describes its high-level input/output argument, be variable-length including these parameters.
The theories integration of comparison algorithm comes from thought:The realization function of cryptographic algorithm keeps specific input and output to close System.If in fact, F1It is that a cipher function meets F1(K, C)=P, wherein K are keys, and C is ciphertext, and P is bright after decrypting Text, then it is hardly possible to have another cipher function F2Also meet F2(K, C)=P.I.e. ciphertext, key and plaintext are to ((K, C), P) The realization function F of cryptographic algorithm is determined with absolute advantage1.The purpose of comparison algorithm is to checkWithDirectly Whether relationship is also implemented PFIt is kept.If such relationship is set up, illustrate gkPerform function F.In other words, with In input value perform program PF, practical output valve should be withIn value can match.
The coefficient and parameter gone out given in the above embodiments is available to those skilled in the art to realize or use Invention, invention, which does not limit, only takes aforementioned disclosed numerical value, in the case where not departing from the thought of invention, the technology of this field Personnel can make above-described embodiment various modifications or adjustment, thus the protection domain invented is not by above-described embodiment institute Limit, and should be the maximum magnitude for meeting the inventive features that claims are mentioned.

Claims (4)

1. a kind of instruction-level cryptographic algorithm recognition methods, specifically includes following step:It is calculated Step 1: establishing disclosed password The feature database of method, the feature of the algorithm include static nature code and behavioral characteristics instruction sequence;Step 2: scanning and matching mesh Static nature code in beacon course sequence passes through static nature code recognition code algorithm;Step 3: collect and analyze holding for target program Trace simultaneously extracts the program code and its input/output argument for realizing cryptographic algorithm;Step 4: joined using input parameter and output Matching relationship between number is compared with behavioral characteristics data, confirms the cryptographic algorithm performed in target program;
The method further includes the process that collects and analyzes for performing mark, and the process that collects and analyzes for performing mark mainly includes number According to simplify and data analysis two parts, the data reduction include exclude those come from the instruction inside known codes library with And two kinds of filter types are filtered by Thread Id, the data analysis includes basic block detection, loop detection, loop-around data Flow graph generates and parameter information is collected;
The basic block is carried out when basic block detects according to the execution mark of behavioral characteristics instruction sequence by execution mark dynamic generation, If it only has single entrance and exit, be identified as a basic block, the code self changed when basic block changes, then this Kind variation will be found when first time fresh code performs;
The loop detection specifically includes following step:Step a, the machine instruction in processing execution mark, and them successively It stores in lists, referred to as History;Step b, multiple possible cycle examples are obtained according to the repetitive instruction wherein occurred, There are a corresponding next desired instructions for each cycle example;Step 3: new machine instruction is added to History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example and existed using cycle labeling X It is marked in History;
The loop-around data flow graph generates:For every a pair of of the cycle example L detectediAnd Lj, use standard Graph-theoretical algorithm, by testing, whether it meets binary crelation and it connects branch and constructs loop-around data flow graph;
The parameter information collection specifically includes:Byte is packaged by parametric variable according to condition first, then utilizes condition handle These parametric variables are divided into two classes:Output and input parameter;The parametric variable that following principle is obtained for previous step is used to assign again A fixed value is given, mark is performed and collects corresponding value for each data access, the principle for being these parameter assignments is:Input parameter Its value read for the first time is assigned, output parameter assigns the value of its last write-in;Finally, to each cycle example L, algorithm returns INM(L) and INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR(L), respectively memory and Output parameter in register.
2. instruction-level cryptographic algorithm recognition methods as described in claim 1, it is characterised in that the dynamic of the cryptographic algorithm is special The template program realized according to the algorithm is levied, instruction sequence and relevant operating data when extracting its execution are formed, dynamically referred to Enable D1, D2..., DnFinite sequence formed one execution mark.
3. instruction-level cryptographic algorithm recognition methods as described in claim 1, it is characterised in that using binary pitching pile work Has PIN as the tool for performing mark collection.
4. a kind of instruction-level cryptographic algorithm identifying system, it is characterised in that specifically include feature database and establish unit, static nature knowledge Other unit and behavioral characteristics recognition unit;The feature database establishes unit for establishing the feature database of disclosed cryptographic algorithm, institute The feature for stating algorithm includes static nature code and behavioral characteristics instruction sequence;The static nature recognition unit for scan and With the static nature code in target program, pass through static nature code recognition code algorithm;The behavioral characteristics recognition unit is used for It collects and analyzes the execution mark of target program and extracts the program code and its input/output argument for realizing cryptographic algorithm, and utilize Matching relationship between input parameter and output parameter is compared with behavioral characteristics data, is performed in confirmation target program close Code algorithm;
The process that collects and analyzes for performing mark mainly includes data reduction and data analysis two parts, the data reduction packet It includes and excludes those and come from the instruction inside known codes library and two kinds of filter types are filtered by Thread Id, it is described Data analysis includes basic block detection, loop detection, the generation of loop-around data flow graph and parameter information and collects;
The basic block is carried out when basic block detects according to the execution mark of behavioral characteristics instruction sequence by execution mark dynamic generation, If it only has single entrance and exit, be identified as a basic block, the code self changed when basic block changes, then this Kind variation will be found when first time fresh code performs;
The loop detection includes the following steps:Step a, the machine instruction in processing execution mark, and they are stored successively In lists, referred to as History;Step b, multiple possible cycle examples are obtained according to the repetitive instruction wherein occurred, it is each All there are a corresponding next desired instructions for a cycle example;Step 3: new machine instruction is added to History, so as to exclude wherein ineligible cycle example;Step 4: confirm cycle example and existed using cycle labeling X It is marked in History;
The loop-around data flow graph generates:For every a pair of of the cycle example L detectediAnd Lj, use standard Graph-theoretical algorithm, by testing, whether it meets binary crelation and it connects branch and constructs loop-around data flow graph;
The parameter information collection includes:Byte is packaged by parametric variable according to condition first, then using condition these Parametric variable is divided into two classes:Output and input parameter;The parametric variable that following principle is obtained for previous step is used to assign one again A fixed value performs mark and collects corresponding value for each data access, and the principle for being these parameter assignments is:Input parameter assigns Its value read for the first time, output parameter assign the value of its last write-in;Finally, to each cycle example L, algorithm returns to INM(L) And INR(L), the input parameter and OUT respectively in memory and registerM(L) and OUTR(L), respectively memory and register In output parameter.
CN201510755316.6A 2015-11-09 2015-11-09 A kind of instruction-level cryptographic algorithm recognition methods and system Active CN105426707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510755316.6A CN105426707B (en) 2015-11-09 2015-11-09 A kind of instruction-level cryptographic algorithm recognition methods and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510755316.6A CN105426707B (en) 2015-11-09 2015-11-09 A kind of instruction-level cryptographic algorithm recognition methods and system

Publications (2)

Publication Number Publication Date
CN105426707A CN105426707A (en) 2016-03-23
CN105426707B true CN105426707B (en) 2018-06-19

Family

ID=55504915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510755316.6A Active CN105426707B (en) 2015-11-09 2015-11-09 A kind of instruction-level cryptographic algorithm recognition methods and system

Country Status (1)

Country Link
CN (1) CN105426707B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106452733A (en) * 2016-11-24 2017-02-22 中国电子科技集团公司第三十研究所 Block cipher identification method based on ciphertext analysis
CN108073814B (en) * 2017-12-29 2021-10-15 安天科技集团股份有限公司 Shelling method and system based on static structured shelling parameters and storage medium
CN110347432B (en) * 2019-06-17 2021-09-14 海光信息技术股份有限公司 Processor, branch predictor, data processing method thereof and branch prediction method
CN112395613B (en) * 2019-08-15 2022-04-08 奇安信安全技术(珠海)有限公司 Static feature library loading method, device and equipment
CN111222138A (en) * 2019-12-31 2020-06-02 阿尔法云计算(深圳)有限公司 Algorithm checking method, algorithm right confirming method and device
CN112149138B (en) * 2020-11-24 2021-02-19 北京智芯微电子科技有限公司 Method and system for detecting program vulnerability of cryptographic algorithm and storage medium
CN118378288B (en) * 2024-06-24 2024-09-06 山东省计算中心(国家超级计算济南中心) Encryption algorithm dynamic detection method and system based on Pin tool

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577323A (en) * 2013-09-27 2014-02-12 西安交通大学 Dynamic key command sequence birthmark-based software plagiarism detecting method
CN104484175A (en) * 2014-12-16 2015-04-01 上海交通大学 Method for detecting cryptology misuse of Android application programs
CN104517057A (en) * 2014-12-22 2015-04-15 中国人民解放军信息工程大学 Software hybrid measure method based on trusted computing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577323A (en) * 2013-09-27 2014-02-12 西安交通大学 Dynamic key command sequence birthmark-based software plagiarism detecting method
CN104484175A (en) * 2014-12-16 2015-04-01 上海交通大学 Method for detecting cryptology misuse of Android application programs
CN104517057A (en) * 2014-12-22 2015-04-15 中国人民解放军信息工程大学 Software hybrid measure method based on trusted computing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
密码算法识别与分析关键技术研究;李继中;《中国博士学位论文全文数据库》;20140415;第1-17页,第21-22页,第82-83页 *

Also Published As

Publication number Publication date
CN105426707A (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN105426707B (en) A kind of instruction-level cryptographic algorithm recognition methods and system
Delvaux et al. Helper data algorithms for PUF-based key generation: Overview and analysis
Yu et al. Deescvhunter: A deep learning-based framework for smart contract vulnerability detection
Cui et al. A robust FSM watermarking scheme for IP protection of sequential circuit design
JP4806402B2 (en) Program obfuscation apparatus and obfuscation method
CN104919750B (en) Calculate the computing device and method of the data function on function input value
Bos et al. Assessing the feasibility of single trace power analysis of Frodo
US9721120B2 (en) Preventing unauthorized calls to a protected function
CN104156481A (en) Android encryption communication detection device and method based on dynamic linking library injection
Nguyen et al. Blocking self-avoiding walks stops cyber-epidemics: a scalable gpu-based approach
CN116361810A (en) Intelligent contract vulnerability detection method based on symbol execution
JPWO2017146094A1 (en) Attack code detection device, attack code detection method, and attack code detection program
Hettwer et al. Side-channel analysis of the xilinx zynq ultrascale+ encryption engine
CN104484175A (en) Method for detecting cryptology misuse of Android application programs
CN104200137A (en) Method for guaranteeing self-security of JAVA program
Alarifi et al. Diversify sensor nodes to improve resilience against node compromise
CN115659358A (en) Intelligent contract fuzzy test method and device
CN106528403B (en) Monitoring method when software based on binary code implanted prosthetics is run
JP2015106914A (en) Malware communication analyzer and malware communication analysis method
Bernstein et al. KyberSlash: Exploiting secret-dependent division timings in Kyber implementations
CN103093144B (en) Detection method and detection system of crypto module application program interface (API) safety
CN107256370B (en) Secret key storage method of fuzzy safe based on SRAM-PUF
Immler Breaking hitag 2 revisited
US10514979B2 (en) Method and device for processing data
Debes et al. ZEKRA: Zero-Knowledge Control-Flow Attestation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant