CN109740347A - A kind of identification of the fragile hash function for smart machine firmware and crack method - Google Patents

A kind of identification of the fragile hash function for smart machine firmware and crack method Download PDF

Info

Publication number
CN109740347A
CN109740347A CN201811406960.2A CN201811406960A CN109740347A CN 109740347 A CN109740347 A CN 109740347A CN 201811406960 A CN201811406960 A CN 201811406960A CN 109740347 A CN109740347 A CN 109740347A
Authority
CN
China
Prior art keywords
firmware
fragile
hash function
function
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811406960.2A
Other languages
Chinese (zh)
Other versions
CN109740347B (en
Inventor
石志强
张国栋
杨寿国
孙利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201811406960.2A priority Critical patent/CN109740347B/en
Publication of CN109740347A publication Critical patent/CN109740347A/en
Application granted granted Critical
Publication of CN109740347B publication Critical patent/CN109740347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the present invention provides a kind of identification of the fragile hash function for smart machine firmware and crack method, key step include: the pretreatment of firmware, obtains binary file to be analyzed;Extracting is not influenced or is affected by it the common feature of the fragile hash function less than preset threshold by framework and compiling optimization option, numeralization processing is carried out to feature, characteristic is trained and is tested, the neural network model that the reliable logic-based of building one returns carries out the identification of fragile hash function and positioning of firmware based on structure match method;Structure division and extraction are carried out to the code of fragile hash function, intermediate language VEX IR sentence is converted by machine code or assembly code, the Z3 SMT based on semiology analysis is constructed and solves expression formula, addition solves constraint condition, crashworthness is inversely cracked out, whether verifying crashworthness is correct.This method has for the fragile hash function of firmware to be identified that rate of false alarm is low, accurate positioning, cracks the fast beneficial effect of rate.

Description

A kind of identification of the fragile hash function for smart machine firmware and crack method
Technical field
The present embodiments relate to the association of the function of intelligent embedded device firmware and binary program function bug excavations Field, and in particular to a kind of identification of the fragile hash function for smart machine firmware and crack method.
Background technique
In recent years, due to the frequent generation of smart machine firmware loophole attack, for firmware safety research with point Analysis has become one of research emphasis and hot spot of information security field.Due to the stressing practical results property of running of embedded device, and And its computing capability is limited, in order to which the performance of equipment optimizes, using some extremely fragile hash functions or to Standard Hash Function has done simplification, and resulting in smart machine, there are safety defects, and it is empty to cause smart machine system even whole network safety Between face huge threat safely.Therefore, loophole existing for the fragile hash function of firmware binary file often becomes " hacker " One of the main entrance point that attacker carries out equipment attack and related fields security expert is studied.For example, number is CVE- The VxWorks Encryption Algorithm loophole of 2010-2967 be exactly VxWorks 6.9 before version system in loginLib There are password collision problems for loginDefaultEncrypt () fragility hash function, and cryptographic hash sum is caused to be up to 220,000 A, attacker can carry out the sessions such as Brute Force telnet, ftp, rlogin by building password dictionary, to obtain system Control authority causes very big harm.Therefore, the identification for carrying out the fragile hash function of smart machine firmware also becomes with cracking It obtains most important.
It is directed to the bug excavation and detection technique of smart machine firmware at present, mainly includes following several: being based on firmware source Code level bug excavation technology, the static auditing technology of binary code level based on reverse-engineering, the injection leakage of look-up command row Hole technology, lookup buffer-overflow vulnerability technology, firmware loophole function correlation technology etc..
The recognition methods of the hash function of firmware mainly includes following several: being existed by comparing same hash function code The different binary codes compiled under distinct device, different frameworks, different compiling optimization options, seek the similitude between them And difference, to find another identical or homologous hash function;It is compared by bit stream, instruction sequence comparison, using cunning Dynamic window obtains 01 sequence signature and calculates similarity, or the similitude of research assembly statement carries out function association.
Common hash function collision crack method has rainbow table method, birthday attack method, equal substring method, centre to meet Method, differential attack method, leading attack method etc..
From the point of view of previous technology development, for smart machine firmware hash function bug excavation Research foundation also very It is plain.Currently, lack it is a kind of realize it is simple, for the fragile hash function of smart machine firmware identification rate of false alarm it is low, Accurate positioning, automated analysis method that crack rate fast.
Summary of the invention
The embodiment of the present invention provides identification and the crack method of a kind of fragile hash function for smart machine firmware, uses With solve it is existing lack it is a kind of realize it is simple low with the identification rate of false alarm for smart machine firmware fragility hash function, calmly Position is accurate, cracks the defect of the fast automated analysis method of rate.
First aspect according to an embodiment of the present invention provides a kind of fragile hash function for smart machine firmware Identification and crack method, comprising:
The pretreatment of firmware obtains binary file to be analyzed;
Extracting is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option Common feature carries out numeralization processing to feature, is trained and tests to characteristic, constructs a reliable logic-based The neural network model of recurrence carries out the identification of fragile hash function and positioning of firmware based on structure match method;
Structure division and extraction are carried out to the code of fragile hash function, convert intermediate language for machine code or assembly code It says VEX IR sentence, constructs the Z3SMT based on semiology analysis and solve expression formula, addition solves constraint condition, inversely cracks out and touch Value is hit, whether verifying crashworthness is correct.
Further, to the pretreatment of firmware, binary file to be analyzed is obtained, comprising:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, build MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category, The important firmware informations such as firmware specification, firmware download link are stored, and write firmware Download Script for firmware download link Firmware batch downloading is carried out, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls the decompression tools such as Binwalk to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
Further, extracting is not influenced or is affected by it the fragile Hash less than preset threshold by framework and compiling optimization option The common feature of function, comprising:
Collect the fragile hash function of several firmwares and the source code of other functions, by the source code of collection in different frameworks, no The binary file of multiple executable dis-assemblings is generated with compiling under compiling optimization option;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates, It writes IDA Pro plug-in unit to statistically analyze the various features of fragile hash function and other functions, extracts fragile hash function It is different from the more apparent feature of other functions, specifically includes that function name, instruction number, instruction type quantity, jump instruction Quantity, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether containing circulation.
Further, numeralization processing is carried out to the common feature of fragile hash function, comprising:
Each function in the binary file of multiple executable dis-assemblings of the compiling generation is studied and divided Analysis, every instruction for the basic block of each function are analyzed, and the method to be added up using numerical value counts various features Numerical value, while marking that hash function is positive sample, other functions are negative sample.
Further, the characteristic is trained and is tested, the mind that the reliable logic-based of building one returns Through network model, comprising:
Pass through the classification classifier and use logistic regression method pair of python machine learning library sklearn The characteristic of generation is trained and tests, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.If model Preferably, then the model is saved;If model is not good enough, then increases the fragile hash function and other functions of collection, increase simultaneously The fragile hash function feature of extraction extracts other fragile hash function features being possessed of higher values, and tests repeatedly, until Obtain the neural network model that a reliable logic-based returns.
Further, based on structure match method carry out firmware fragility hash function identification and positioning, obtain function name and Function entrance address, comprising:
The binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned Neural network model carry out the association of the function based on structure match method, realize the identification of the fragile hash function of firmware with Positioning, obtains function name and the function entrance address of fragile hash function.
Further, construction module division and extraction are carried out to the code of fragile hash function, comprising:
Submodule cle based on angr loads target binary file, extracts in the fragile function for being identified and having positioned Fragile function is split as initialization Block based on depth-priority-searching method and topological sorting algorithm, followed by the controlling stream graph CFG in portion Ring body Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
Further, intermediate language VEX IR sentence is converted by the machine code of each basic block of division or assembly code, wrapped It includes:
Corresponding intermediate language VEX is converted by the machine code of each module or assembly code based on angr submodule pyvex IR sentence;
Further, it constructs the Z3SMT constraint solving expression formula based on semiology analysis and adds solution constraint condition, comprising:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determines and is recycled present in function Number;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target Direction carries out, and reduces the search range of Brute Force and improves function and cracks rate.
Further, it solves crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware Fragile hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
The present invention proposes identification and the crack method of a kind of fragile hash function for smart machine firmware, constructs one Cover from firmware get firmware fragile hash function identification and positioning again to the analysis of the fragile hash function of firmware and broken The entire flow of solution.More accurately and efficiently identify and position out the fragile hash function of firmware, it is more efficient and quickly Crack the crashworthness of some fragility hash function original input datas in firmware.Promote the function of intelligent embedded device firmware The research of association and binary program function bug excavation field.
The present invention can obtain it is following the utility model has the advantages that
The present invention collects the fragile hash function and other function source codes of firmware in fragile hash function feature extraction, These function source codes are compiled as to the binary file of multiple executable dis-assemblings of different frameworks, different compiling optimization options, Research and analysis are carried out for these binary files, extract is not influenced or be affected by it to be less than by framework and compiling optimization option The common feature of the fragile hash function of preset threshold, including function name, instruction number, instruction type quantity, jump instruction number Amount, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether having circulation, can be regional very well Divide fragile hash function and other functions, can be ground for a variety of frameworks and the binary file for compiling optimization option compiling Study carefully analysis, improve the accuracy identified and positioned to fragile hash function, in the Hash letter of a variety of binary file types The time is saved in number research.
The present invention extracts fragile hash function feature when the fragile hash function identification of firmware is with positioning, and by its into Line number value processing, while characteristic is trained and is tested, the nerve net that the reliable logic-based of building one returns Network model, the very enough fragile hash functions for being applied to the firmware based on structure match method well of the model identify and determine Position improves the efficiency of the identification and positioning of fragile hash function.
The present invention is when carrying out research and analysis to fragile hash function code structure, according to the control inside fragile function Fragile function is split as initialization Block, loop body based on depth-priority-searching method and topological sorting algorithm by flow graph CFG Block, end terminate tri- basic modules of Block, while converting the machine code of each module of division or assembly code to Between language VEX IR sentence, efficiently avoid by different frameworks and different compilings optimization option influenced, this will be so as to two The function research of carry system code becomes more to be easy and simply.
The present invention constructs the Z3SMT based on semiology analysis when the fragile hash function to firmware is analyzed and cracked Expression formula is solved, addition solves constraint condition, makes to solve towards target direction progress, effectively reduces the search of Brute Force Range and improve fragile hash function crack rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 present invention is for the identification of the fragile hash function of smart machine firmware and a specific embodiment of crack method Overall procedure schematic diagram;
Fig. 2 is the present invention for the identification of the fragile hash function of smart machine firmware and the another specific reality of crack method Apply the preprocessing process schematic diagram of a firmware;
Fig. 3 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method Apply identification and the location algorithm flow diagram of the fragile hash function of a firmware;
Fig. 4 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method Apply analysis and the cracking trajectory flow diagram of example fragility hash function;
Fig. 5 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method It applies example verifying and solves crashworthness schematic diagram;
Fig. 6 is the entity structure schematic diagram of a kind of electronic equipment embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
For firmware fragile hash function identification rate of false alarm height, position inaccurate, crack the problems such as difficulty is big, it is of the invention The identification and crack method, this method for providing a kind of fragile hash function for smart machine firmware can be fast and effeciently Some fragile hash functions in the binary file of smart machine firmware are identified and positioned out, while can be to some extremely crisp Weak hash function is effectively cracked.This method, which can apply to, to be compiled under a variety of different frameworks and compiling optimization option The hash function research and analysis field of binary file.
The method flow that the present invention designs specifically includes that the pretreatment of firmware, the filtering of binary file to be detected, firmware Fragile hash function and other functions collection and compiling, extract and do not influenced or be affected by it by framework and compiling optimization option It is handled less than the common feature of the fragile hash function of preset threshold, the numeralization of feature, characteristic is trained and is surveyed The building of the neural network model that examination, logic-based return, the fragile hash function of firmware based on structure match method Identify the intermediate language VEX IR with positioning, the structure division and extraction, machine code or assembly code of fragile hash function code Conversion, the building of Z3SMT constraint solving expression formula based on semiology analysis, addition solve constraint solving condition, solve crashworthness, Whether correct verify crashworthness.Innovation point of the invention is the research by the fragile hash function to firmware Analysis, which has extracted, is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option Common feature has trained one according to fragile hash function characteristic information and is reliably employed for firmware fragility hash function structure Change matched mathematical model, structure division is carried out to fragile hash function and converts intermediate language for the code of each module of division VEXIR sentence is sayed, using the crashworthness of the original input data of the Z3SMT constraint solving hash function based on semiology analysis.This Invention can identify and position out some fragile hash functions in firmware well and crack out some extremely fragile fragile Kazakhstan The crashworthness of the original input data of uncommon function.
The specific embodiment of the invention shows identification and the side of cracking of a kind of fragile hash function for smart machine firmware Method, comprising:
S1, the pretreatment of firmware obtain binary file to be analyzed;
S2, extracting is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option Common feature, numeralization processing is carried out to feature, characteristic is trained and is tested, construct one it is reliable based on patrolling The neural network model returned is collected, the fragile hash function identification of firmware is carried out based on structure match method and is positioned;
S3 carries out structure division and extraction to the code of fragile hash function, converts machine code or assembly code to Between language VEX IR sentence, construct Z3SMT based on semiology analysis and solve expression formula, addition solves constraint condition, inversely cracks Whether crashworthness out, verifying crashworthness are correct.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method, the pretreatment to firmware, obtain binary file to be analyzed, comprising:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, build MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category, The important firmware informations such as firmware specification, firmware download link are stored, and write firmware Download Script for firmware download link Firmware batch downloading is carried out, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls the decompression tools such as Binwalk to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method, extracting is not influenced or is affected by it crisp less than preset threshold by framework and compiling optimization option The common feature of weak hash function, comprising:
Collect the fragile hash function of several firmwares and the source code of other functions, by the source code of collection in different frameworks, no The binary file of multiple executable dis-assemblings is generated with compiling under compiling optimization option;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates, It writes IDA Pro plug-in unit to statistically analyze the various features of fragile hash function and other functions, extracts fragile hash function It is different from the more apparent feature of other functions, specifically includes that function name, instruction number, instruction type quantity, jump instruction Quantity, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether containing circulation.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method carry out numeralization processing to the common feature of fragile hash function, comprising:
Each function in the binary file of multiple executable dis-assemblings of the compiling generation is studied and divided Analysis, every instruction for the basic block of each function are analyzed, and the method to be added up using numerical value counts various features Numerical value, while marking that hash function is positive sample, other functions are negative sample.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method are trained and test to the characteristic, construct a reliable logic-based and return The neural network model returned, comprising:
Pass through the classification classifier and use logistic regression method pair of python machine learning library sklearn The characteristic of generation is trained and tests, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.If model Preferably, then the model is saved;If model is not good enough, then increases the fragile hash function and other functions of collection, increase simultaneously The fragile hash function feature of extraction extracts other fragile hash function features being possessed of higher values, and tests repeatedly, until Obtain the neural network model that a reliable logic-based returns.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method carry out the identification of firmware fragility hash function and positioning based on structure match method, obtain letter Several and function entrance address, comprising:
The binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned Neural network model carry out the association of the function based on structure match method, realize the identification of the fragile hash function of firmware with Positioning, obtains function name and the function entrance address of fragile hash function.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method carry out construction module division and extraction to the code of fragile hash function, comprising:
Submodule cle based on angr loads target binary file, extracts in the fragile function for being identified and having positioned Fragile function is split as initialization Block based on depth-priority-searching method and topological sorting algorithm, followed by the controlling stream graph CFG in portion Ring body Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The machine code of each basic block of division or assembly code are converted intermediate language VEX IR language by the identification of function and crack method Sentence, comprising:
Corresponding intermediate language VEX is converted by the machine code of each module or assembly code based on angr submodule pyvex IR sentence;
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method construct the Z3SMT constraint solving expression formula based on semiology analysis and add solution constraint condition, Include:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determines and is recycled present in function Number;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target Direction carries out, and reduces the search range of Brute Force and improves function and cracks rate.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and crack method solve crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware Fragile hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed The identification of function and the specific embodiment of crack method, overall procedure is as shown in Figure 1, including process in detail below:
A) pre- place's process of smart machine firmware, as shown in Fig. 2, including that firmware crawls, firmware stores, firmware decompresses, can Execute the filtering of dis-assembling binary file.Firmware crawls, and carries out research and analysis to well-known smart machine manufacturer both at home and abroad, makes Firmware spiders is developed with Python Scrapy framework technology, firmware is carried out and crawls;Firmware storage, uses MongoDB data Library is a variety of important to firmware name, manufacturer's name, ProductName, firmware version number, product category, firmware specification, firmware download link etc. The storage of firmware information, the information of main storage are firmware download link url, write firmware Download Script and download for firmware It links url and carries out mass downloading, by the firmware storage of downloading to the designated position of server;Firmware decoding, writes Python The automation that script calls the tools such as Binwalk to carry out firmware batch decompresses;The filtering of executable dis-assembling binary file, By screening to multiple binary files after decompression, obtaining firmware to be analyzed can be performed the binary system text of dis-assembling Part.
B) the fragile hash function (such as BKDRHash, BPHash) and other functions for collecting the firmware of a variety of open sources are (such as Readdir, system etc.) C language code.For these function source codes of collection, pass through a variety of framves such as X86, ARM, MIPS A variety of compiling optimization option compilings such as structure and-O0 ,-O1 ,-O2 ,-O3 and-Os generate multiple and different binary files.
C) conversed analysis is carried out by binary file of the IDA Pro to the compiling in b), in conjunction with IDA Python script Feature-extraction analysis is carried out to fragile hash function and other functions, it is found that there are be clearly distinguishable from other for fragile hash function Some features of function mainly include function name, instruction number, instruction type quantity, jump instruction quantity, call number, different Or quantity, stack size, the basic number of blocks of code and 9 features for whether having circulation, can also there are many features certainly and obscure Uncatalyzed coking hash function.
D) numeralization processing is carried out to the feature in c), to the every of the basic block of each function in the binary file of compiling Item instruction is analyzed, and the method to be added up using numerical value can count the number that the instruction of various features uses, mark simultaneously Fragile hash function is positive sample, other functions are negative sample.
E) 70% training set and 30% test set are randomly divided into d) the middle characteristic extracted, use Python The Logistic Regression module in the library machine learning sklearn is trained and tests to characteristic, constructs nerve net Network model.Before obtaining reliable neural network model, accuracy rate, recall rate and comprehensive evaluation index f- are first used The appraisement systems such as measure value evaluate the use value of experiment classifier effect and mathematical model, if classifying quality Preferably, use value is higher, then obtains the reliable mathematics for the matched firmware fragility hash function of subsequent structuralization Model needs to extract feature and again model training again if classifying quality and using effect are poor, until obtaining an energy Reliable mathematical model for fragile hash function structure match.
F) by a) binary file of the middle executable dis-assembling to be analyzed obtained and e), the middle mathematical model constructed is carried out Function association based on structure match method, realizes the identification and positioning of firmware fragility hash function, obtains fragile Hash letter Several function names and function entrance address, Fig. 3 are identification and the location algorithm flow diagram of the fragile hash function of firmware.
G) in machine language, a line of demarcation can be considered as by jumping caused by usual indirectly transferring instruction, thus one A complete function structure is divided into several different code blocks.F) the middle fragile hash function positioned is studied by testing Analysis, carries out structure division and extraction to the code of fragile hash function, loads object binary using the submodule cle of angr File extracts the controlling stream graph CFG identified with inside the fragile function of positioning.It is calculated based on depth-priority-searching method and topological sorting Fragile function is split as initialization Block by method, loop body Block, end terminate tri- basic modules of Block, and obtain base This block entrance address and jump address.
H) it is converted g) machine code of the middle each module divided or assembly code to using angr submodule pyvex corresponding Intermediate language VEX IR sentence.
It i) is intermediate language according to the execution sequential conversions of hash function code, in loop body in Symbolic Execution Carry out the value of symbol replacement of known variables.When semiology analysis, the Z3 expression formula containing known variables need to be obtained.Due to intermediate language Valid statement be all equation, need peer-to-peer both sides to be analyzed.Then it is translated into the corresponding variable of Z3.Wherein Z3 becomes Amount also has the naming rule of oneself, with beginning of letter, and can only be made of number, upper and lower case letter and underscore.Specific mistake Journey are as follows: use equal sign as separator, obtain the operation of both members;Extract left-hand variable operand;Judge east side operation type, And extraction operation number;Judge whether the operand of all extractions has carried out variable declarations, if not having, uses above-mentioned Z3 variable Uniform Name carries out variable declarations, carries out assignment to the variable on right side;Sentence after executing conversion.
J) the Z3 expression formula according to obtained in i) containing known variables, replaces with value of symbol for variate-value.Use b0, b1,...,bi-1,biStep-by-step replaces inputting the 1st of character string, the 2nd ..., (i-1)-th respectively, the content of i-th bit character string.It is holding During row Do statement, by judge the value of the expression formula in sentence whether be equal to the value of parameter register position input ginseng It counts and replaces with value of symbol.
K) constraint solving problem is converted by program problem analysis, needs to add solution constraint condition, make solution mode court Target direction carry out, reduce Brute Force search range and improve function crack rate.For example, being provided with constraint condition: Assuming that original input data is made of the character string that a series of number and upper and lower case letter form.Constraint condition is solved in addition When, character will be cracked and be limited in number and upper and lower case letter;Make the Z3 expression containing unknown quantity in j) after semiology analysis The value of formula is equal to former fragile hash function encrypted output valve of the input data through using.Fig. 4 is point of fragile hash function Analysis and cracking trajectory flow diagram.
L) judge whether to meet constraint condition using check () method of Z3, if s.check ()==sat, explanation At least there is a value, then calls model () method, so that it may carry out solution crashworthness.If s.check ()== Solution value is then not present in unsat.When obtaining the output solution for meeting condition according to method in step k), preservation result is ASCII character Value, in order to make it easy to understand, needing to be translated into number or letter and then exporting, by calling Python built-in function chr () converts its result to wanting to see as a result, by character according to b0To bi-1Sequential output.
M) crashworthness is verified, Fig. 5 illustrates the schematic diagram that verifying solves crashworthness.By comparing former input data and collision Whether whether the equal Converse solved result judged in l) is correct for the encrypted value of the fragile hash function of both values, and then judges Firmware fragility hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.Continue to verify next collision Value, Repeated m) step.
For example as follows:
Fig. 6 illustrates a kind of entity structure schematic diagram of server, which may include: processor (processor) 710, communication interface (Communications Interface) 720, memory (memory) 730 and communication Bus 740, wherein processor 710, communication interface 720, memory 730 complete mutual communication by communication bus 740. Processor 710 can call the logical order in memory 730, to execute following method: the pretreatment of firmware, obtain to be analyzed Binary file;Extracting is not influenced or is affected by it the fragile Hash letter less than preset threshold by framework and compiling optimization option Several common features carries out numeralization processing to feature, is trained and tests to characteristic, and building one is reliably based on The neural network model of logistic regression carries out the identification of fragile hash function and positioning of firmware based on structure match method;It is right The code of fragile hash function carries out structure division and extraction, converts intermediate language VEX IR language for machine code or assembly code Sentence constructs the Z3SMT based on semiology analysis and solves expression formula, and addition solves constraint condition, inversely cracks out crashworthness, verifying is touched Whether correct hit value.
In addition, the logical order in above-mentioned memory 730 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention The form of software product embodies, which is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, The computer program is implemented to carry out the transmission method of the various embodiments described above offer when being executed by processor, for example, firmware Pretreatment, obtain binary file to be analyzed;Extract is not influenced or is affected by it to be less than by framework and compiling optimization option The common feature of the fragile hash function of preset threshold carries out numeralization processing to feature, is trained and surveys to characteristic Examination, the neural network model that the reliable logic-based of building one returns, the fragility of firmware is carried out based on structure match method Hash function identification and positioning;Structure division and extraction are carried out to the code of fragile hash function, by machine code or assembly code It is converted into intermediate language VEX IR sentence, the Z3SMT based on semiology analysis is constructed and solves expression formula, addition solves constraint condition, Crashworthness is inversely cracked out, whether verifying crashworthness is correct.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. identification and the crack method of a kind of fragile hash function for smart machine firmware characterized by comprising
The pretreatment of firmware obtains binary file to be analyzed;
Extracting is not influenced or is affected by it the general character of the fragile hash function less than preset threshold by framework and compiling optimization option Feature carries out numeralization processing to feature, is trained and tests to characteristic, and the reliable logic-based of building one returns Neural network model, based on structure match method carry out firmware fragile hash function identification and positioning;
Structure division and extraction are carried out to the code of fragile hash function, convert intermediate language for machine code or assembly code VEX IR sentence constructs the Z3 SMT based on semiology analysis and solves expression formula, and addition solves constraint condition, inversely cracks out and collide Whether value, verifying crashworthness are correct.
2. the method as described in claim 1, which is characterized in that the pretreatment to firmware obtains binary file to be analyzed, Main flow includes:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, builds MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category, firmware The firmware information of description and firmware download link is stored, and is write firmware Download Script for firmware download link and is carried out firmware Mass downloading, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls Binwalk decompression tool to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
3. the method as described in claim 1, which is characterized in that extracting is not influenced or by framework and compiling optimization option by its shadow Ring the common feature for being less than the fragile hash function of preset threshold, comprising:
The fragile hash function of several firmwares and the source code of other functions are collected, by the source code of collection in different frameworks, different volumes Translate the binary file that compiling under optimization option generates multiple executable dis-assemblings;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates, is write IDA Pro plug-in unit statisticallys analyze the various features of fragile hash function and other functions, extracts fragile hash function difference In the more apparent feature of other functions, comprising: function name, instruction type quantity, jump instruction quantity, is adjusted instruction number With number, exclusive or quantity, stack size, the basic number of blocks of code and whether contain 9 features recycled.
4. method as claimed in claim 3, which is characterized in that carried out at numeralization to the common feature of fragile hash function Reason, comprising:
Research and analysis are carried out to each function in the binary file of multiple executable dis-assemblings of compiling generation, for every Every instruction of the basic block of a function is analyzed, and the method to be added up using numerical value counts the numerical value of various features, simultaneously Label hash function is positive sample, other functions are negative sample.
5. method as claimed in claim 4, which is characterized in that the characteristic of generation is trained and is tested, building one The neural network model that a reliable logic-based returns, comprising:
By the classification classifier of python machine learning library sklearn and using logistic regression method to feature Data are trained and test, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.Save one reliably Mathematical model.
6. method as claimed in claim 5, which is characterized in that carry out firmware fragility hash function based on structure match method Identification and positioning, comprising:
The mind that the binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned The association of the function based on structure match method is carried out through network model, realize the identification of the fragile hash function of firmware and is determined Position, obtains function name and the function entrance address of fragile hash function.
7. method as claimed in claim 6, which is characterized in that carry out structure division to the code of fragile hash function and mention It takes, comprising:
Submodule cle based on angr loads target binary file, extracts inside the fragile function for being identified and having positioned Fragile function is split as initialization Block, loop body based on depth-priority-searching method and topological sorting algorithm by controlling stream graph CFG Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
8. the method for claim 7, which is characterized in that convert the machine code of each basic block of division or assembly code For intermediate language VEX IR sentence, comprising:
Corresponding intermediate language VEX IR language is converted by the machine code of each module or assembly code based on angr submodule pyvex Sentence.
9. method according to claim 8, which is characterized in that Z3SMT constraint solving expression formula of the building based on semiology analysis And add solution constraint condition, comprising:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determine cycle-index present in function;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target direction It carries out, reduces the search range of Brute Force and improve function and crack rate.
10. method as claimed in claim 9, which is characterized in that solve crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware fragility Hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
CN201811406960.2A 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware Active CN109740347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811406960.2A CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811406960.2A CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Publications (2)

Publication Number Publication Date
CN109740347A true CN109740347A (en) 2019-05-10
CN109740347B CN109740347B (en) 2020-07-10

Family

ID=66358138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811406960.2A Active CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Country Status (1)

Country Link
CN (1) CN109740347B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362966A (en) * 2019-07-11 2019-10-22 华东师范大学 A kind of cross-platform firmware homology safety detection method based on fuzzy Hash
CN110764784A (en) * 2019-10-24 2020-02-07 北京智游网安科技有限公司 Method for identifying three-party SO file, intelligent terminal and storage medium
CN110941832A (en) * 2019-11-28 2020-03-31 杭州安恒信息技术股份有限公司 Embedded Internet of things equipment firmware vulnerability discovery method, device and equipment
CN111444513A (en) * 2019-11-14 2020-07-24 中国电力科学研究院有限公司 Firmware compiling optimization option identification method and device for power grid embedded terminal
CN111580822A (en) * 2020-04-22 2020-08-25 中国科学院信息工程研究所 Internet of things equipment assembly version information extraction method based on VEX intermediate language
CN112394984A (en) * 2020-10-29 2021-02-23 北京软安科技有限公司 Firmware code analysis method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN101976319A (en) * 2010-11-22 2011-02-16 张平 BIOS firmware Rootkit detection method based on behaviour characteristic
CN104982011A (en) * 2013-03-08 2015-10-14 比特梵德知识产权管理有限公司 Document classification using multiscale text fingerprints
CN105184146A (en) * 2015-06-05 2015-12-23 北京北信源软件股份有限公司 Method and system for checking weak password of operating system
CN105740477A (en) * 2016-03-18 2016-07-06 中国科学院信息工程研究所 Function searching method for large-scale embedded device firmware and search engine
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system
CN107229563A (en) * 2016-03-25 2017-10-03 中国科学院信息工程研究所 A kind of binary program leak function correlating method across framework
CN108008960A (en) * 2017-11-09 2018-05-08 北京航空航天大学 A kind of feature code generating method towards critical software binary file
US10057243B1 (en) * 2017-11-30 2018-08-21 Mocana Corporation System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN101976319A (en) * 2010-11-22 2011-02-16 张平 BIOS firmware Rootkit detection method based on behaviour characteristic
CN104982011A (en) * 2013-03-08 2015-10-14 比特梵德知识产权管理有限公司 Document classification using multiscale text fingerprints
CN105184146A (en) * 2015-06-05 2015-12-23 北京北信源软件股份有限公司 Method and system for checking weak password of operating system
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system
CN105740477A (en) * 2016-03-18 2016-07-06 中国科学院信息工程研究所 Function searching method for large-scale embedded device firmware and search engine
CN107229563A (en) * 2016-03-25 2017-10-03 中国科学院信息工程研究所 A kind of binary program leak function correlating method across framework
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN108008960A (en) * 2017-11-09 2018-05-08 北京航空航天大学 A kind of feature code generating method towards critical software binary file
US10057243B1 (en) * 2017-11-30 2018-08-21 Mocana Corporation System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANIV DAVID .ETAL: ""FirmUp: precise static detection of common vulnerabilities in firmware"", 《ASPLOS’18》 *
ZHANG XING .ETAL: ""Staticly Detect Stack Overflow Vulnerabilities with Taint Analysis"", 《ITM WEB OF CONFERENCES》 *
常青 等: ""VDNS:一种跨平台的固件漏洞关联算法"", 《计算机研究与发展》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362966A (en) * 2019-07-11 2019-10-22 华东师范大学 A kind of cross-platform firmware homology safety detection method based on fuzzy Hash
CN110764784A (en) * 2019-10-24 2020-02-07 北京智游网安科技有限公司 Method for identifying three-party SO file, intelligent terminal and storage medium
CN111444513A (en) * 2019-11-14 2020-07-24 中国电力科学研究院有限公司 Firmware compiling optimization option identification method and device for power grid embedded terminal
CN111444513B (en) * 2019-11-14 2024-03-12 中国电力科学研究院有限公司 Firmware compiling optimization option identification method and device for power grid embedded terminal
CN110941832A (en) * 2019-11-28 2020-03-31 杭州安恒信息技术股份有限公司 Embedded Internet of things equipment firmware vulnerability discovery method, device and equipment
CN111580822A (en) * 2020-04-22 2020-08-25 中国科学院信息工程研究所 Internet of things equipment assembly version information extraction method based on VEX intermediate language
CN112394984A (en) * 2020-10-29 2021-02-23 北京软安科技有限公司 Firmware code analysis method and device
CN112394984B (en) * 2020-10-29 2022-09-30 北京智联安行科技有限公司 Firmware code analysis method and device

Also Published As

Publication number Publication date
CN109740347B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN109740347A (en) A kind of identification of the fragile hash function for smart machine firmware and crack method
CN106572117B (en) A kind of detection method and device of WebShell file
CN108920954B (en) Automatic malicious code detection platform and method
CN103473506A (en) Method and device of recognizing malicious APK files
CN109711163B (en) Android malicious software detection method based on API (application program interface) calling sequence
CN114297654A (en) Intelligent contract vulnerability detection method and system for source code hierarchy
CN110765459A (en) Malicious script detection method and device and storage medium
CN107944274A (en) A kind of Android platform malicious application off-line checking method based on width study
CN112733146B (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN113821804B (en) Cross-architecture automatic detection method and system for third-party components and security risks thereof
CN113221960B (en) Construction method and collection method of high-quality vulnerability data collection model
CN111159012A (en) Intelligent contract vulnerability detection method based on deep learning
CN109067800A (en) A kind of cross-platform association detection method of firmware loophole
CN115033895B (en) Binary program supply chain safety detection method and device
KR102058966B1 (en) Method for detecting malicious application and apparatus thereof
CN105045715A (en) Programming mode and mode matching based bug clustering method
CN113722719A (en) Information generation method and artificial intelligence system for security interception big data analysis
CN111339535A (en) Vulnerability prediction method and system for intelligent contract codes, computer equipment and storage medium
CN113722711A (en) Data adding method based on big data security vulnerability mining and artificial intelligence system
CN110808947B (en) Automatic vulnerability quantitative evaluation method and system
CN110197068B (en) Android malicious application detection method based on improved grayish wolf algorithm
Schuckert et al. Source code patterns of sql injection vulnerabilities
CN114780967B (en) Mining evaluation method based on big data vulnerability mining and AI vulnerability mining system
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant