CN109740347A - A kind of identification of the fragile hash function for smart machine firmware and crack method - Google Patents
A kind of identification of the fragile hash function for smart machine firmware and crack method Download PDFInfo
- Publication number
- CN109740347A CN109740347A CN201811406960.2A CN201811406960A CN109740347A CN 109740347 A CN109740347 A CN 109740347A CN 201811406960 A CN201811406960 A CN 201811406960A CN 109740347 A CN109740347 A CN 109740347A
- Authority
- CN
- China
- Prior art keywords
- firmware
- fragile
- hash function
- function
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The embodiment of the present invention provides a kind of identification of the fragile hash function for smart machine firmware and crack method, key step include: the pretreatment of firmware, obtains binary file to be analyzed;Extracting is not influenced or is affected by it the common feature of the fragile hash function less than preset threshold by framework and compiling optimization option, numeralization processing is carried out to feature, characteristic is trained and is tested, the neural network model that the reliable logic-based of building one returns carries out the identification of fragile hash function and positioning of firmware based on structure match method;Structure division and extraction are carried out to the code of fragile hash function, intermediate language VEX IR sentence is converted by machine code or assembly code, the Z3 SMT based on semiology analysis is constructed and solves expression formula, addition solves constraint condition, crashworthness is inversely cracked out, whether verifying crashworthness is correct.This method has for the fragile hash function of firmware to be identified that rate of false alarm is low, accurate positioning, cracks the fast beneficial effect of rate.
Description
Technical field
The present embodiments relate to the association of the function of intelligent embedded device firmware and binary program function bug excavations
Field, and in particular to a kind of identification of the fragile hash function for smart machine firmware and crack method.
Background technique
In recent years, due to the frequent generation of smart machine firmware loophole attack, for firmware safety research with point
Analysis has become one of research emphasis and hot spot of information security field.Due to the stressing practical results property of running of embedded device, and
And its computing capability is limited, in order to which the performance of equipment optimizes, using some extremely fragile hash functions or to Standard Hash
Function has done simplification, and resulting in smart machine, there are safety defects, and it is empty to cause smart machine system even whole network safety
Between face huge threat safely.Therefore, loophole existing for the fragile hash function of firmware binary file often becomes " hacker "
One of the main entrance point that attacker carries out equipment attack and related fields security expert is studied.For example, number is CVE-
The VxWorks Encryption Algorithm loophole of 2010-2967 be exactly VxWorks 6.9 before version system in loginLib
There are password collision problems for loginDefaultEncrypt () fragility hash function, and cryptographic hash sum is caused to be up to 220,000
A, attacker can carry out the sessions such as Brute Force telnet, ftp, rlogin by building password dictionary, to obtain system
Control authority causes very big harm.Therefore, the identification for carrying out the fragile hash function of smart machine firmware also becomes with cracking
It obtains most important.
It is directed to the bug excavation and detection technique of smart machine firmware at present, mainly includes following several: being based on firmware source
Code level bug excavation technology, the static auditing technology of binary code level based on reverse-engineering, the injection leakage of look-up command row
Hole technology, lookup buffer-overflow vulnerability technology, firmware loophole function correlation technology etc..
The recognition methods of the hash function of firmware mainly includes following several: being existed by comparing same hash function code
The different binary codes compiled under distinct device, different frameworks, different compiling optimization options, seek the similitude between them
And difference, to find another identical or homologous hash function;It is compared by bit stream, instruction sequence comparison, using cunning
Dynamic window obtains 01 sequence signature and calculates similarity, or the similitude of research assembly statement carries out function association.
Common hash function collision crack method has rainbow table method, birthday attack method, equal substring method, centre to meet
Method, differential attack method, leading attack method etc..
From the point of view of previous technology development, for smart machine firmware hash function bug excavation Research foundation also very
It is plain.Currently, lack it is a kind of realize it is simple, for the fragile hash function of smart machine firmware identification rate of false alarm it is low,
Accurate positioning, automated analysis method that crack rate fast.
Summary of the invention
The embodiment of the present invention provides identification and the crack method of a kind of fragile hash function for smart machine firmware, uses
With solve it is existing lack it is a kind of realize it is simple low with the identification rate of false alarm for smart machine firmware fragility hash function, calmly
Position is accurate, cracks the defect of the fast automated analysis method of rate.
First aspect according to an embodiment of the present invention provides a kind of fragile hash function for smart machine firmware
Identification and crack method, comprising:
The pretreatment of firmware obtains binary file to be analyzed;
Extracting is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option
Common feature carries out numeralization processing to feature, is trained and tests to characteristic, constructs a reliable logic-based
The neural network model of recurrence carries out the identification of fragile hash function and positioning of firmware based on structure match method;
Structure division and extraction are carried out to the code of fragile hash function, convert intermediate language for machine code or assembly code
It says VEX IR sentence, constructs the Z3SMT based on semiology analysis and solve expression formula, addition solves constraint condition, inversely cracks out and touch
Value is hit, whether verifying crashworthness is correct.
Further, to the pretreatment of firmware, binary file to be analyzed is obtained, comprising:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, build MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category,
The important firmware informations such as firmware specification, firmware download link are stored, and write firmware Download Script for firmware download link
Firmware batch downloading is carried out, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls the decompression tools such as Binwalk to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
Further, extracting is not influenced or is affected by it the fragile Hash less than preset threshold by framework and compiling optimization option
The common feature of function, comprising:
Collect the fragile hash function of several firmwares and the source code of other functions, by the source code of collection in different frameworks, no
The binary file of multiple executable dis-assemblings is generated with compiling under compiling optimization option;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates,
It writes IDA Pro plug-in unit to statistically analyze the various features of fragile hash function and other functions, extracts fragile hash function
It is different from the more apparent feature of other functions, specifically includes that function name, instruction number, instruction type quantity, jump instruction
Quantity, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether containing circulation.
Further, numeralization processing is carried out to the common feature of fragile hash function, comprising:
Each function in the binary file of multiple executable dis-assemblings of the compiling generation is studied and divided
Analysis, every instruction for the basic block of each function are analyzed, and the method to be added up using numerical value counts various features
Numerical value, while marking that hash function is positive sample, other functions are negative sample.
Further, the characteristic is trained and is tested, the mind that the reliable logic-based of building one returns
Through network model, comprising:
Pass through the classification classifier and use logistic regression method pair of python machine learning library sklearn
The characteristic of generation is trained and tests, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.If model
Preferably, then the model is saved;If model is not good enough, then increases the fragile hash function and other functions of collection, increase simultaneously
The fragile hash function feature of extraction extracts other fragile hash function features being possessed of higher values, and tests repeatedly, until
Obtain the neural network model that a reliable logic-based returns.
Further, based on structure match method carry out firmware fragility hash function identification and positioning, obtain function name and
Function entrance address, comprising:
The binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned
Neural network model carry out the association of the function based on structure match method, realize the identification of the fragile hash function of firmware with
Positioning, obtains function name and the function entrance address of fragile hash function.
Further, construction module division and extraction are carried out to the code of fragile hash function, comprising:
Submodule cle based on angr loads target binary file, extracts in the fragile function for being identified and having positioned
Fragile function is split as initialization Block based on depth-priority-searching method and topological sorting algorithm, followed by the controlling stream graph CFG in portion
Ring body Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
Further, intermediate language VEX IR sentence is converted by the machine code of each basic block of division or assembly code, wrapped
It includes:
Corresponding intermediate language VEX is converted by the machine code of each module or assembly code based on angr submodule pyvex
IR sentence;
Further, it constructs the Z3SMT constraint solving expression formula based on semiology analysis and adds solution constraint condition, comprising:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determines and is recycled present in function
Number;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target
Direction carries out, and reduces the search range of Brute Force and improves function and cracks rate.
Further, it solves crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware
Fragile hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
The present invention proposes identification and the crack method of a kind of fragile hash function for smart machine firmware, constructs one
Cover from firmware get firmware fragile hash function identification and positioning again to the analysis of the fragile hash function of firmware and broken
The entire flow of solution.More accurately and efficiently identify and position out the fragile hash function of firmware, it is more efficient and quickly
Crack the crashworthness of some fragility hash function original input datas in firmware.Promote the function of intelligent embedded device firmware
The research of association and binary program function bug excavation field.
The present invention can obtain it is following the utility model has the advantages that
The present invention collects the fragile hash function and other function source codes of firmware in fragile hash function feature extraction,
These function source codes are compiled as to the binary file of multiple executable dis-assemblings of different frameworks, different compiling optimization options,
Research and analysis are carried out for these binary files, extract is not influenced or be affected by it to be less than by framework and compiling optimization option
The common feature of the fragile hash function of preset threshold, including function name, instruction number, instruction type quantity, jump instruction number
Amount, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether having circulation, can be regional very well
Divide fragile hash function and other functions, can be ground for a variety of frameworks and the binary file for compiling optimization option compiling
Study carefully analysis, improve the accuracy identified and positioned to fragile hash function, in the Hash letter of a variety of binary file types
The time is saved in number research.
The present invention extracts fragile hash function feature when the fragile hash function identification of firmware is with positioning, and by its into
Line number value processing, while characteristic is trained and is tested, the nerve net that the reliable logic-based of building one returns
Network model, the very enough fragile hash functions for being applied to the firmware based on structure match method well of the model identify and determine
Position improves the efficiency of the identification and positioning of fragile hash function.
The present invention is when carrying out research and analysis to fragile hash function code structure, according to the control inside fragile function
Fragile function is split as initialization Block, loop body based on depth-priority-searching method and topological sorting algorithm by flow graph CFG
Block, end terminate tri- basic modules of Block, while converting the machine code of each module of division or assembly code to
Between language VEX IR sentence, efficiently avoid by different frameworks and different compilings optimization option influenced, this will be so as to two
The function research of carry system code becomes more to be easy and simply.
The present invention constructs the Z3SMT based on semiology analysis when the fragile hash function to firmware is analyzed and cracked
Expression formula is solved, addition solves constraint condition, makes to solve towards target direction progress, effectively reduces the search of Brute Force
Range and improve fragile hash function crack rate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 present invention is for the identification of the fragile hash function of smart machine firmware and a specific embodiment of crack method
Overall procedure schematic diagram;
Fig. 2 is the present invention for the identification of the fragile hash function of smart machine firmware and the another specific reality of crack method
Apply the preprocessing process schematic diagram of a firmware;
Fig. 3 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method
Apply identification and the location algorithm flow diagram of the fragile hash function of a firmware;
Fig. 4 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method
Apply analysis and the cracking trajectory flow diagram of example fragility hash function;
Fig. 5 is the present invention for the identification of the fragile hash function of smart machine firmware and another specific reality of crack method
It applies example verifying and solves crashworthness schematic diagram;
Fig. 6 is the entity structure schematic diagram of a kind of electronic equipment embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
For firmware fragile hash function identification rate of false alarm height, position inaccurate, crack the problems such as difficulty is big, it is of the invention
The identification and crack method, this method for providing a kind of fragile hash function for smart machine firmware can be fast and effeciently
Some fragile hash functions in the binary file of smart machine firmware are identified and positioned out, while can be to some extremely crisp
Weak hash function is effectively cracked.This method, which can apply to, to be compiled under a variety of different frameworks and compiling optimization option
The hash function research and analysis field of binary file.
The method flow that the present invention designs specifically includes that the pretreatment of firmware, the filtering of binary file to be detected, firmware
Fragile hash function and other functions collection and compiling, extract and do not influenced or be affected by it by framework and compiling optimization option
It is handled less than the common feature of the fragile hash function of preset threshold, the numeralization of feature, characteristic is trained and is surveyed
The building of the neural network model that examination, logic-based return, the fragile hash function of firmware based on structure match method
Identify the intermediate language VEX IR with positioning, the structure division and extraction, machine code or assembly code of fragile hash function code
Conversion, the building of Z3SMT constraint solving expression formula based on semiology analysis, addition solve constraint solving condition, solve crashworthness,
Whether correct verify crashworthness.Innovation point of the invention is the research by the fragile hash function to firmware
Analysis, which has extracted, is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option
Common feature has trained one according to fragile hash function characteristic information and is reliably employed for firmware fragility hash function structure
Change matched mathematical model, structure division is carried out to fragile hash function and converts intermediate language for the code of each module of division
VEXIR sentence is sayed, using the crashworthness of the original input data of the Z3SMT constraint solving hash function based on semiology analysis.This
Invention can identify and position out some fragile hash functions in firmware well and crack out some extremely fragile fragile Kazakhstan
The crashworthness of the original input data of uncommon function.
The specific embodiment of the invention shows identification and the side of cracking of a kind of fragile hash function for smart machine firmware
Method, comprising:
S1, the pretreatment of firmware obtain binary file to be analyzed;
S2, extracting is not influenced or is affected by it the fragile hash function less than preset threshold by framework and compiling optimization option
Common feature, numeralization processing is carried out to feature, characteristic is trained and is tested, construct one it is reliable based on patrolling
The neural network model returned is collected, the fragile hash function identification of firmware is carried out based on structure match method and is positioned;
S3 carries out structure division and extraction to the code of fragile hash function, converts machine code or assembly code to
Between language VEX IR sentence, construct Z3SMT based on semiology analysis and solve expression formula, addition solves constraint condition, inversely cracks
Whether crashworthness out, verifying crashworthness are correct.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method, the pretreatment to firmware, obtain binary file to be analyzed, comprising:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, build MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category,
The important firmware informations such as firmware specification, firmware download link are stored, and write firmware Download Script for firmware download link
Firmware batch downloading is carried out, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls the decompression tools such as Binwalk to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method, extracting is not influenced or is affected by it crisp less than preset threshold by framework and compiling optimization option
The common feature of weak hash function, comprising:
Collect the fragile hash function of several firmwares and the source code of other functions, by the source code of collection in different frameworks, no
The binary file of multiple executable dis-assemblings is generated with compiling under compiling optimization option;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates,
It writes IDA Pro plug-in unit to statistically analyze the various features of fragile hash function and other functions, extracts fragile hash function
It is different from the more apparent feature of other functions, specifically includes that function name, instruction number, instruction type quantity, jump instruction
Quantity, call number, exclusive or quantity, stack size, the basic number of blocks of code and 9 features for whether containing circulation.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method carry out numeralization processing to the common feature of fragile hash function, comprising:
Each function in the binary file of multiple executable dis-assemblings of the compiling generation is studied and divided
Analysis, every instruction for the basic block of each function are analyzed, and the method to be added up using numerical value counts various features
Numerical value, while marking that hash function is positive sample, other functions are negative sample.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method are trained and test to the characteristic, construct a reliable logic-based and return
The neural network model returned, comprising:
Pass through the classification classifier and use logistic regression method pair of python machine learning library sklearn
The characteristic of generation is trained and tests, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.If model
Preferably, then the model is saved;If model is not good enough, then increases the fragile hash function and other functions of collection, increase simultaneously
The fragile hash function feature of extraction extracts other fragile hash function features being possessed of higher values, and tests repeatedly, until
Obtain the neural network model that a reliable logic-based returns.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method carry out the identification of firmware fragility hash function and positioning based on structure match method, obtain letter
Several and function entrance address, comprising:
The binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned
Neural network model carry out the association of the function based on structure match method, realize the identification of the fragile hash function of firmware with
Positioning, obtains function name and the function entrance address of fragile hash function.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method carry out construction module division and extraction to the code of fragile hash function, comprising:
Submodule cle based on angr loads target binary file, extracts in the fragile function for being identified and having positioned
Fragile function is split as initialization Block based on depth-priority-searching method and topological sorting algorithm, followed by the controlling stream graph CFG in portion
Ring body Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The machine code of each basic block of division or assembly code are converted intermediate language VEX IR language by the identification of function and crack method
Sentence, comprising:
Corresponding intermediate language VEX is converted by the machine code of each module or assembly code based on angr submodule pyvex
IR sentence;
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method construct the Z3SMT constraint solving expression formula based on semiology analysis and add solution constraint condition,
Include:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determines and is recycled present in function
Number;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target
Direction carries out, and reduces the search range of Brute Force and improves function and cracks rate.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and crack method solve crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware
Fragile hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
On the basis of any above-mentioned specific embodiment of the invention, a kind of fragile Hash for smart machine firmware is proposed
The identification of function and the specific embodiment of crack method, overall procedure is as shown in Figure 1, including process in detail below:
A) pre- place's process of smart machine firmware, as shown in Fig. 2, including that firmware crawls, firmware stores, firmware decompresses, can
Execute the filtering of dis-assembling binary file.Firmware crawls, and carries out research and analysis to well-known smart machine manufacturer both at home and abroad, makes
Firmware spiders is developed with Python Scrapy framework technology, firmware is carried out and crawls;Firmware storage, uses MongoDB data
Library is a variety of important to firmware name, manufacturer's name, ProductName, firmware version number, product category, firmware specification, firmware download link etc.
The storage of firmware information, the information of main storage are firmware download link url, write firmware Download Script and download for firmware
It links url and carries out mass downloading, by the firmware storage of downloading to the designated position of server;Firmware decoding, writes Python
The automation that script calls the tools such as Binwalk to carry out firmware batch decompresses;The filtering of executable dis-assembling binary file,
By screening to multiple binary files after decompression, obtaining firmware to be analyzed can be performed the binary system text of dis-assembling
Part.
B) the fragile hash function (such as BKDRHash, BPHash) and other functions for collecting the firmware of a variety of open sources are (such as
Readdir, system etc.) C language code.For these function source codes of collection, pass through a variety of framves such as X86, ARM, MIPS
A variety of compiling optimization option compilings such as structure and-O0 ,-O1 ,-O2 ,-O3 and-Os generate multiple and different binary files.
C) conversed analysis is carried out by binary file of the IDA Pro to the compiling in b), in conjunction with IDA Python script
Feature-extraction analysis is carried out to fragile hash function and other functions, it is found that there are be clearly distinguishable from other for fragile hash function
Some features of function mainly include function name, instruction number, instruction type quantity, jump instruction quantity, call number, different
Or quantity, stack size, the basic number of blocks of code and 9 features for whether having circulation, can also there are many features certainly and obscure
Uncatalyzed coking hash function.
D) numeralization processing is carried out to the feature in c), to the every of the basic block of each function in the binary file of compiling
Item instruction is analyzed, and the method to be added up using numerical value can count the number that the instruction of various features uses, mark simultaneously
Fragile hash function is positive sample, other functions are negative sample.
E) 70% training set and 30% test set are randomly divided into d) the middle characteristic extracted, use Python
The Logistic Regression module in the library machine learning sklearn is trained and tests to characteristic, constructs nerve net
Network model.Before obtaining reliable neural network model, accuracy rate, recall rate and comprehensive evaluation index f- are first used
The appraisement systems such as measure value evaluate the use value of experiment classifier effect and mathematical model, if classifying quality
Preferably, use value is higher, then obtains the reliable mathematics for the matched firmware fragility hash function of subsequent structuralization
Model needs to extract feature and again model training again if classifying quality and using effect are poor, until obtaining an energy
Reliable mathematical model for fragile hash function structure match.
F) by a) binary file of the middle executable dis-assembling to be analyzed obtained and e), the middle mathematical model constructed is carried out
Function association based on structure match method, realizes the identification and positioning of firmware fragility hash function, obtains fragile Hash letter
Several function names and function entrance address, Fig. 3 are identification and the location algorithm flow diagram of the fragile hash function of firmware.
G) in machine language, a line of demarcation can be considered as by jumping caused by usual indirectly transferring instruction, thus one
A complete function structure is divided into several different code blocks.F) the middle fragile hash function positioned is studied by testing
Analysis, carries out structure division and extraction to the code of fragile hash function, loads object binary using the submodule cle of angr
File extracts the controlling stream graph CFG identified with inside the fragile function of positioning.It is calculated based on depth-priority-searching method and topological sorting
Fragile function is split as initialization Block by method, loop body Block, end terminate tri- basic modules of Block, and obtain base
This block entrance address and jump address.
H) it is converted g) machine code of the middle each module divided or assembly code to using angr submodule pyvex corresponding
Intermediate language VEX IR sentence.
It i) is intermediate language according to the execution sequential conversions of hash function code, in loop body in Symbolic Execution
Carry out the value of symbol replacement of known variables.When semiology analysis, the Z3 expression formula containing known variables need to be obtained.Due to intermediate language
Valid statement be all equation, need peer-to-peer both sides to be analyzed.Then it is translated into the corresponding variable of Z3.Wherein Z3 becomes
Amount also has the naming rule of oneself, with beginning of letter, and can only be made of number, upper and lower case letter and underscore.Specific mistake
Journey are as follows: use equal sign as separator, obtain the operation of both members;Extract left-hand variable operand;Judge east side operation type,
And extraction operation number;Judge whether the operand of all extractions has carried out variable declarations, if not having, uses above-mentioned Z3 variable
Uniform Name carries out variable declarations, carries out assignment to the variable on right side;Sentence after executing conversion.
J) the Z3 expression formula according to obtained in i) containing known variables, replaces with value of symbol for variate-value.Use b0,
b1,...,bi-1,biStep-by-step replaces inputting the 1st of character string, the 2nd ..., (i-1)-th respectively, the content of i-th bit character string.It is holding
During row Do statement, by judge the value of the expression formula in sentence whether be equal to the value of parameter register position input ginseng
It counts and replaces with value of symbol.
K) constraint solving problem is converted by program problem analysis, needs to add solution constraint condition, make solution mode court
Target direction carry out, reduce Brute Force search range and improve function crack rate.For example, being provided with constraint condition:
Assuming that original input data is made of the character string that a series of number and upper and lower case letter form.Constraint condition is solved in addition
When, character will be cracked and be limited in number and upper and lower case letter;Make the Z3 expression containing unknown quantity in j) after semiology analysis
The value of formula is equal to former fragile hash function encrypted output valve of the input data through using.Fig. 4 is point of fragile hash function
Analysis and cracking trajectory flow diagram.
L) judge whether to meet constraint condition using check () method of Z3, if s.check ()==sat, explanation
At least there is a value, then calls model () method, so that it may carry out solution crashworthness.If s.check ()==
Solution value is then not present in unsat.When obtaining the output solution for meeting condition according to method in step k), preservation result is ASCII character
Value, in order to make it easy to understand, needing to be translated into number or letter and then exporting, by calling Python built-in function chr
() converts its result to wanting to see as a result, by character according to b0To bi-1Sequential output.
M) crashworthness is verified, Fig. 5 illustrates the schematic diagram that verifying solves crashworthness.By comparing former input data and collision
Whether whether the equal Converse solved result judged in l) is correct for the encrypted value of the fragile hash function of both values, and then judges
Firmware fragility hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.Continue to verify next collision
Value, Repeated m) step.
For example as follows:
Fig. 6 illustrates a kind of entity structure schematic diagram of server, which may include: processor
(processor) 710, communication interface (Communications Interface) 720, memory (memory) 730 and communication
Bus 740, wherein processor 710, communication interface 720, memory 730 complete mutual communication by communication bus 740.
Processor 710 can call the logical order in memory 730, to execute following method: the pretreatment of firmware, obtain to be analyzed
Binary file;Extracting is not influenced or is affected by it the fragile Hash letter less than preset threshold by framework and compiling optimization option
Several common features carries out numeralization processing to feature, is trained and tests to characteristic, and building one is reliably based on
The neural network model of logistic regression carries out the identification of fragile hash function and positioning of firmware based on structure match method;It is right
The code of fragile hash function carries out structure division and extraction, converts intermediate language VEX IR language for machine code or assembly code
Sentence constructs the Z3SMT based on semiology analysis and solves expression formula, and addition solves constraint condition, inversely cracks out crashworthness, verifying is touched
Whether correct hit value.
In addition, the logical order in above-mentioned memory 730 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention
The form of software product embodies, which is stored in a storage medium, including some instructions to
So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention
The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program,
The computer program is implemented to carry out the transmission method of the various embodiments described above offer when being executed by processor, for example, firmware
Pretreatment, obtain binary file to be analyzed;Extract is not influenced or is affected by it to be less than by framework and compiling optimization option
The common feature of the fragile hash function of preset threshold carries out numeralization processing to feature, is trained and surveys to characteristic
Examination, the neural network model that the reliable logic-based of building one returns, the fragility of firmware is carried out based on structure match method
Hash function identification and positioning;Structure division and extraction are carried out to the code of fragile hash function, by machine code or assembly code
It is converted into intermediate language VEX IR sentence, the Z3SMT based on semiology analysis is constructed and solves expression formula, addition solves constraint condition,
Crashworthness is inversely cracked out, whether verifying crashworthness is correct.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments
Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation
Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or
Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. identification and the crack method of a kind of fragile hash function for smart machine firmware characterized by comprising
The pretreatment of firmware obtains binary file to be analyzed;
Extracting is not influenced or is affected by it the general character of the fragile hash function less than preset threshold by framework and compiling optimization option
Feature carries out numeralization processing to feature, is trained and tests to characteristic, and the reliable logic-based of building one returns
Neural network model, based on structure match method carry out firmware fragile hash function identification and positioning;
Structure division and extraction are carried out to the code of fragile hash function, convert intermediate language for machine code or assembly code
VEX IR sentence constructs the Z3 SMT based on semiology analysis and solves expression formula, and addition solves constraint condition, inversely cracks out and collide
Whether value, verifying crashworthness are correct.
2. the method as described in claim 1, which is characterized in that the pretreatment to firmware obtains binary file to be analyzed,
Main flow includes:
Firmware crawls, and develops firmware spiders, crawls for the smart machine manufacturer progress firmware of pre-selection;
Firmware storage, builds MongoDB database to firmware name, manufacturer's name, ProductName, firmware version number, product category, firmware
The firmware information of description and firmware download link is stored, and is write firmware Download Script for firmware download link and is carried out firmware
Mass downloading, by multiple firmwares storage of downloading to server designated position;
The decoding that script calls Binwalk decompression tool to carry out firmware batch is write in firmware decoding;
The filtering of executable dis-assembling binary file, obtains the binary file of firmware to be analyzed.
3. the method as described in claim 1, which is characterized in that extracting is not influenced or by framework and compiling optimization option by its shadow
Ring the common feature for being less than the fragile hash function of preset threshold, comprising:
The fragile hash function of several firmwares and the source code of other functions are collected, by the source code of collection in different frameworks, different volumes
Translate the binary file that compiling under optimization option generates multiple executable dis-assemblings;
Conversed analysis is carried out by binary file of the IDA Pro plug-in unit to multiple executable dis-assemblings that compiling generates, is write
IDA Pro plug-in unit statisticallys analyze the various features of fragile hash function and other functions, extracts fragile hash function difference
In the more apparent feature of other functions, comprising: function name, instruction type quantity, jump instruction quantity, is adjusted instruction number
With number, exclusive or quantity, stack size, the basic number of blocks of code and whether contain 9 features recycled.
4. method as claimed in claim 3, which is characterized in that carried out at numeralization to the common feature of fragile hash function
Reason, comprising:
Research and analysis are carried out to each function in the binary file of multiple executable dis-assemblings of compiling generation, for every
Every instruction of the basic block of a function is analyzed, and the method to be added up using numerical value counts the numerical value of various features, simultaneously
Label hash function is positive sample, other functions are negative sample.
5. method as claimed in claim 4, which is characterized in that the characteristic of generation is trained and is tested, building one
The neural network model that a reliable logic-based returns, comprising:
By the classification classifier of python machine learning library sklearn and using logistic regression method to feature
Data are trained and test, and building one is used for the matched neural network model of subsequent structuralization;
Mathematical model is assessed using accuracy rate, recall rate and comprehensive evaluation index f-measure value.Save one reliably
Mathematical model.
6. method as claimed in claim 5, which is characterized in that carry out firmware fragility hash function based on structure match method
Identification and positioning, comprising:
The mind that the binary file of the executable dis-assembling of the firmware to be analyzed and the reliable logic-based are returned
The association of the function based on structure match method is carried out through network model, realize the identification of the fragile hash function of firmware and is determined
Position, obtains function name and the function entrance address of fragile hash function.
7. method as claimed in claim 6, which is characterized in that carry out structure division to the code of fragile hash function and mention
It takes, comprising:
Submodule cle based on angr loads target binary file, extracts inside the fragile function for being identified and having positioned
Fragile function is split as initialization Block, loop body based on depth-priority-searching method and topological sorting algorithm by controlling stream graph CFG
Block and end terminate tri- basic modules of Block, and obtain basic block entry address and jump address.
8. the method for claim 7, which is characterized in that convert the machine code of each basic block of division or assembly code
For intermediate language VEX IR sentence, comprising:
Corresponding intermediate language VEX IR language is converted by the machine code of each module or assembly code based on angr submodule pyvex
Sentence.
9. method according to claim 8, which is characterized in that Z3SMT constraint solving expression formula of the building based on semiology analysis
And add solution constraint condition, comprising:
Value of symbol is converted by variate-value based on semiology analysis, constructs Z3 expression formula, and determine cycle-index present in function;
Constraint solving problem is converted by program problem analysis, addition solves constraint condition, makes solution mode towards target direction
It carries out, reduces the search range of Brute Force and improve function and crack rate.
10. method as claimed in claim 9, which is characterized in that solve crashworthness and whether verify crashworthness correct, comprising:
Whether the encrypted value of fragile hash function for comparing both former input data and crashworthness is equal, to judge firmware fragility
Hash function cracks whether succeed, equal, cracks success, not equal then to crack failure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811406960.2A CN109740347B (en) | 2018-11-23 | 2018-11-23 | Method for identifying and cracking fragile hash function of intelligent device firmware |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811406960.2A CN109740347B (en) | 2018-11-23 | 2018-11-23 | Method for identifying and cracking fragile hash function of intelligent device firmware |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109740347A true CN109740347A (en) | 2019-05-10 |
CN109740347B CN109740347B (en) | 2020-07-10 |
Family
ID=66358138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811406960.2A Active CN109740347B (en) | 2018-11-23 | 2018-11-23 | Method for identifying and cracking fragile hash function of intelligent device firmware |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109740347B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362966A (en) * | 2019-07-11 | 2019-10-22 | 华东师范大学 | A kind of cross-platform firmware homology safety detection method based on fuzzy Hash |
CN110764784A (en) * | 2019-10-24 | 2020-02-07 | 北京智游网安科技有限公司 | Method for identifying three-party SO file, intelligent terminal and storage medium |
CN110941832A (en) * | 2019-11-28 | 2020-03-31 | 杭州安恒信息技术股份有限公司 | Embedded Internet of things equipment firmware vulnerability discovery method, device and equipment |
CN111444513A (en) * | 2019-11-14 | 2020-07-24 | 中国电力科学研究院有限公司 | Firmware compiling optimization option identification method and device for power grid embedded terminal |
CN111580822A (en) * | 2020-04-22 | 2020-08-25 | 中国科学院信息工程研究所 | Internet of things equipment assembly version information extraction method based on VEX intermediate language |
CN112394984A (en) * | 2020-10-29 | 2021-02-23 | 北京软安科技有限公司 | Firmware code analysis method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101714118A (en) * | 2009-11-20 | 2010-05-26 | 北京邮电大学 | Detector for binary-code buffer-zone overflow bugs, and detection method thereof |
CN101976319A (en) * | 2010-11-22 | 2011-02-16 | 张平 | BIOS firmware Rootkit detection method based on behaviour characteristic |
CN104982011A (en) * | 2013-03-08 | 2015-10-14 | 比特梵德知识产权管理有限公司 | Document classification using multiscale text fingerprints |
CN105184146A (en) * | 2015-06-05 | 2015-12-23 | 北京北信源软件股份有限公司 | Method and system for checking weak password of operating system |
CN105740477A (en) * | 2016-03-18 | 2016-07-06 | 中国科学院信息工程研究所 | Function searching method for large-scale embedded device firmware and search engine |
CN105868108A (en) * | 2016-03-28 | 2016-08-17 | 中国科学院信息工程研究所 | Instruction-set-irrelevant binary code similarity detection method based on neural network |
CN106295335A (en) * | 2015-06-11 | 2017-01-04 | 中国科学院信息工程研究所 | The firmware leak detection method of a kind of Embedded equipment and system |
CN107229563A (en) * | 2016-03-25 | 2017-10-03 | 中国科学院信息工程研究所 | A kind of binary program leak function correlating method across framework |
CN108008960A (en) * | 2017-11-09 | 2018-05-08 | 北京航空航天大学 | A kind of feature code generating method towards critical software binary file |
US10057243B1 (en) * | 2017-11-30 | 2018-08-21 | Mocana Corporation | System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service |
-
2018
- 2018-11-23 CN CN201811406960.2A patent/CN109740347B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101714118A (en) * | 2009-11-20 | 2010-05-26 | 北京邮电大学 | Detector for binary-code buffer-zone overflow bugs, and detection method thereof |
CN101976319A (en) * | 2010-11-22 | 2011-02-16 | 张平 | BIOS firmware Rootkit detection method based on behaviour characteristic |
CN104982011A (en) * | 2013-03-08 | 2015-10-14 | 比特梵德知识产权管理有限公司 | Document classification using multiscale text fingerprints |
CN105184146A (en) * | 2015-06-05 | 2015-12-23 | 北京北信源软件股份有限公司 | Method and system for checking weak password of operating system |
CN106295335A (en) * | 2015-06-11 | 2017-01-04 | 中国科学院信息工程研究所 | The firmware leak detection method of a kind of Embedded equipment and system |
CN105740477A (en) * | 2016-03-18 | 2016-07-06 | 中国科学院信息工程研究所 | Function searching method for large-scale embedded device firmware and search engine |
CN107229563A (en) * | 2016-03-25 | 2017-10-03 | 中国科学院信息工程研究所 | A kind of binary program leak function correlating method across framework |
CN105868108A (en) * | 2016-03-28 | 2016-08-17 | 中国科学院信息工程研究所 | Instruction-set-irrelevant binary code similarity detection method based on neural network |
CN108008960A (en) * | 2017-11-09 | 2018-05-08 | 北京航空航天大学 | A kind of feature code generating method towards critical software binary file |
US10057243B1 (en) * | 2017-11-30 | 2018-08-21 | Mocana Corporation | System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service |
Non-Patent Citations (3)
Title |
---|
YANIV DAVID .ETAL: ""FirmUp: precise static detection of common vulnerabilities in firmware"", 《ASPLOS’18》 * |
ZHANG XING .ETAL: ""Staticly Detect Stack Overflow Vulnerabilities with Taint Analysis"", 《ITM WEB OF CONFERENCES》 * |
常青 等: ""VDNS:一种跨平台的固件漏洞关联算法"", 《计算机研究与发展》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110362966A (en) * | 2019-07-11 | 2019-10-22 | 华东师范大学 | A kind of cross-platform firmware homology safety detection method based on fuzzy Hash |
CN110764784A (en) * | 2019-10-24 | 2020-02-07 | 北京智游网安科技有限公司 | Method for identifying three-party SO file, intelligent terminal and storage medium |
CN111444513A (en) * | 2019-11-14 | 2020-07-24 | 中国电力科学研究院有限公司 | Firmware compiling optimization option identification method and device for power grid embedded terminal |
CN111444513B (en) * | 2019-11-14 | 2024-03-12 | 中国电力科学研究院有限公司 | Firmware compiling optimization option identification method and device for power grid embedded terminal |
CN110941832A (en) * | 2019-11-28 | 2020-03-31 | 杭州安恒信息技术股份有限公司 | Embedded Internet of things equipment firmware vulnerability discovery method, device and equipment |
CN111580822A (en) * | 2020-04-22 | 2020-08-25 | 中国科学院信息工程研究所 | Internet of things equipment assembly version information extraction method based on VEX intermediate language |
CN112394984A (en) * | 2020-10-29 | 2021-02-23 | 北京软安科技有限公司 | Firmware code analysis method and device |
CN112394984B (en) * | 2020-10-29 | 2022-09-30 | 北京智联安行科技有限公司 | Firmware code analysis method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109740347B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740347A (en) | A kind of identification of the fragile hash function for smart machine firmware and crack method | |
CN106572117B (en) | A kind of detection method and device of WebShell file | |
CN108920954B (en) | Automatic malicious code detection platform and method | |
CN103473506A (en) | Method and device of recognizing malicious APK files | |
CN109711163B (en) | Android malicious software detection method based on API (application program interface) calling sequence | |
CN114297654A (en) | Intelligent contract vulnerability detection method and system for source code hierarchy | |
CN110765459A (en) | Malicious script detection method and device and storage medium | |
CN107944274A (en) | A kind of Android platform malicious application off-line checking method based on width study | |
CN112733146B (en) | Penetration testing method, device and equipment based on machine learning and storage medium | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
CN113821804B (en) | Cross-architecture automatic detection method and system for third-party components and security risks thereof | |
CN113221960B (en) | Construction method and collection method of high-quality vulnerability data collection model | |
CN111159012A (en) | Intelligent contract vulnerability detection method based on deep learning | |
CN109067800A (en) | A kind of cross-platform association detection method of firmware loophole | |
CN115033895B (en) | Binary program supply chain safety detection method and device | |
KR102058966B1 (en) | Method for detecting malicious application and apparatus thereof | |
CN105045715A (en) | Programming mode and mode matching based bug clustering method | |
CN113722719A (en) | Information generation method and artificial intelligence system for security interception big data analysis | |
CN111339535A (en) | Vulnerability prediction method and system for intelligent contract codes, computer equipment and storage medium | |
CN113722711A (en) | Data adding method based on big data security vulnerability mining and artificial intelligence system | |
CN110808947B (en) | Automatic vulnerability quantitative evaluation method and system | |
CN110197068B (en) | Android malicious application detection method based on improved grayish wolf algorithm | |
Schuckert et al. | Source code patterns of sql injection vulnerabilities | |
CN114780967B (en) | Mining evaluation method based on big data vulnerability mining and AI vulnerability mining system | |
CN113297580B (en) | Code semantic analysis-based electric power information system safety protection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |