CN109740347B - Method for identifying and cracking fragile hash function of intelligent device firmware - Google Patents

Method for identifying and cracking fragile hash function of intelligent device firmware Download PDF

Info

Publication number
CN109740347B
CN109740347B CN201811406960.2A CN201811406960A CN109740347B CN 109740347 B CN109740347 B CN 109740347B CN 201811406960 A CN201811406960 A CN 201811406960A CN 109740347 B CN109740347 B CN 109740347B
Authority
CN
China
Prior art keywords
firmware
fragile
hash function
function
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811406960.2A
Other languages
Chinese (zh)
Other versions
CN109740347A (en
Inventor
石志强
张国栋
杨寿国
孙利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201811406960.2A priority Critical patent/CN109740347B/en
Publication of CN109740347A publication Critical patent/CN109740347A/en
Application granted granted Critical
Publication of CN109740347B publication Critical patent/CN109740347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a method for identifying and cracking a fragile hash function of intelligent equipment firmware, which mainly comprises the following steps: preprocessing firmware to obtain a binary file to be analyzed; extracting common features of a fragile hash function which is not influenced by the architecture and the compiling optimization option or is influenced by the compiling optimization option and is smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method; performing structural division and extraction on codes of the fragile hash function, converting machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding solving constraint conditions, reversely breaking collision values, and verifying whether the collision values are correct. The method has the beneficial effects of low false alarm rate, accurate positioning and high cracking speed aiming at the fragile hash function of the firmware.

Description

Method for identifying and cracking fragile hash function of intelligent device firmware
Technical Field
The embodiment of the invention relates to the field of function association and binary program function vulnerability mining of intelligent embedded equipment firmware, in particular to a method for identifying and cracking a fragile hash function of the intelligent equipment firmware.
Background
In recent years, due to the frequent occurrence of firmware vulnerability attack events of intelligent equipment, research and analysis aiming at firmware security have become one of key points and hot points of research in the field of information security, because the operation of embedded equipment is exquisite and effective, and the computing capacity of the embedded equipment is limited, for the performance optimization of the equipment, a plurality of extremely fragile hash functions are used or standard hash functions are simplified, so that the intelligent equipment has security defects, and the security of an intelligent equipment system and even the whole network security space faces huge threats.
At present, vulnerability mining and detection technologies for intelligent device firmware mainly include the following: the method comprises a firmware source code level vulnerability mining technology, a binary code level static auditing technology based on reverse engineering, a search command line vulnerability injection technology, a search buffer overflow vulnerability technology, a firmware vulnerability function correlation technology and the like.
The identification method of the hash function of the firmware mainly comprises the following steps: searching for another identical or homologous hash function by comparing different binary codes compiled by the same hash function code under different equipment, different architectures and different compiling optimization options and searching for similar points and different points among the binary codes; and performing bit stream comparison and instruction sequence comparison, and adopting a sliding window to obtain 01 sequence characteristics to calculate similarity, or researching the similarity of assembly sentences to perform function association.
Common hash function collision cracking methods include a rainbow table method, a birthday attack method, an equal substring method, a middle encounter method, a differential attack method, a first attack method and the like.
From the prior technical development, the research foundation of vulnerability mining of the hash function for the firmware of the intelligent device is shallow. At present, an automatic analysis method which is simple to implement, low in false alarm rate of identification of a fragile hash function of intelligent equipment firmware, accurate in positioning and high in cracking speed is lacked.
Disclosure of Invention
The embodiment of the invention provides a method for identifying and cracking a fragile hash function of firmware of an intelligent device, which is used for solving the defects that the existing method is simple to implement, low in false alarm rate of identification of the fragile hash function of the firmware of the intelligent device, accurate in positioning and high in cracking speed.
According to a first aspect of the embodiments of the present invention, there is provided a method for identifying and cracking a fragile hash function of a smart device firmware, including:
preprocessing firmware to obtain a binary file to be analyzed;
extracting common features of a fragile hash function which is not influenced by the architecture and the compiling optimization option or is influenced by the compiling optimization option and is smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method;
performing structural division and extraction on codes of the fragile hash function, converting machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding solving constraint conditions, reversely breaking collision values, and verifying whether the collision values are correct.
Further, preprocessing the firmware to obtain a binary file to be analyzed, including:
firmware crawling, namely developing a firmware webpage crawler and crawling the firmware for preselected intelligent equipment manufacturers;
storing firmware, namely building a MongoDB database to store important firmware information such as a firmware name, a manufacturer name, a product name, a firmware version number, a product category, firmware description, a firmware downloading link and the like, compiling a firmware downloading script to download firmware in batch aiming at the firmware downloading link, and storing a plurality of downloaded firmware to a designated position of a server;
decoding the firmware, writing a script and calling decompression tools such as Binwalk and the like to decode the firmware in batch;
filtering of the disassembled binary file can be performed to obtain the binary file of the firmware to be analyzed.
Further, extracting common characteristics of the fragile hash function which are not influenced by the architecture and the compiling optimization option or are influenced by the architecture and the compiling optimization option less than a preset threshold value comprises the following steps:
collecting source codes of fragile hash functions and other functions of a plurality of firmware, and compiling the collected source codes under different architectures and different compiling optimization options to generate a plurality of binary files capable of executing disassembly;
the method mainly comprises the following steps of performing reverse analysis on a plurality of binary files which are generated by compiling and can be executed and disassembled through an IDA Pro plug-in, compiling the IDA Pro plug-in to perform statistical analysis on various characteristics of a fragile hash function and other functions, extracting obvious characteristics of the fragile hash function different from the other functions, and mainly comprising the following steps of: function name, number of instructions, number of instruction types, number of jump instructions, number of calls, number of exclusive ors, stack size, number of basic blocks of code, and whether there are 9 characteristics of loops.
Further, the method for carrying out numerical processing on the common characteristics of the fragile hash function comprises the following steps:
and researching and analyzing each function in the compiled and generated binary files capable of executing disassembly, analyzing each instruction of the basic block of each function, counting the numerical value of each characteristic by adopting a numerical value accumulation method, and marking the hash function as a positive sample and other functions as negative samples.
Further, training and testing the characteristic data to construct a reliable neural network model based on logistic regression, comprising:
training and testing the generated characteristic data by using a classification classifier of a python machine learning library sklern and a logistic regression method to construct a neural network model for subsequent structural matching;
and evaluating the mathematical model by using the accuracy, the recall rate and the comprehensive evaluation index f-measure value. If the model is better, the model is saved; if the model is not good enough, the collected fragile hash function and other functions are added, and meanwhile, the extracted fragile hash function features are added or other fragile hash function features with higher values are extracted, and the experiment is repeated until a reliable neural network model based on logistic regression is obtained.
Further, identifying and positioning the firmware fragile hash function based on a structured matching method to obtain a function name and a function entry address, comprising:
and performing function association based on a structured matching method on the binary file of the executable disassembly of the firmware to be analyzed and the reliable neural network model based on the logistic regression, so as to realize the identification and the positioning of the fragile hash function of the firmware and obtain the function name and the function entry address of the fragile hash function.
Further, the structural module division and extraction are carried out on the codes of the fragile hash function, and the structural module division and extraction comprise the following steps:
loading a target binary file by an angr-based submodule cle, extracting a control flow graph CFG (computational fluid dynamics) inside the identified and positioned fragile function, splitting the fragile function into three basic modules, namely an initialization Block, a loop Block and a tail end Block based on a depth-first algorithm and a topology sorting algorithm, and obtaining a basic Block entry address and a jump address.
Further, converting the machine code or assembly code of each divided basic block into an intermediate language VEX IR statement, comprising:
converting the machine code or assembly code of each module into a corresponding intermediate language VEXIR statement based on the angr submodule pyvex;
further, constructing a Z3SMT constraint solving expression executed based on symbols and adding solving constraint conditions, wherein the method comprises the following steps:
converting the variable value into a symbolic value based on symbolic execution, constructing a Z3 expression, and determining the number of circulation times existing in the function;
the program analysis problem is converted into a constraint solving problem, solving constraint conditions are added, the solving mode is carried out towards the target direction, the searching range of brute force cracking is reduced, and the function cracking speed is improved.
Further, solving the collision value and verifying whether the collision value is correct comprises:
and comparing whether the encrypted values of the fragile hash functions of the original input data and the collision value are equal or not to judge whether the fragile hash functions of the firmware are successfully cracked or not, wherein if the fragile hash functions of the firmware are equal, the firmware is successfully cracked, and if the fragile hash functions of the firmware are not equal, the firmware is failed to be cracked.
The invention provides a method for identifying and cracking a fragile hash function of firmware of an intelligent device, which constructs a set of complete flow from the identification and positioning of the fragile hash function of the firmware acquired by the firmware to the analysis and cracking of the fragile hash function of the firmware. The fragile hash functions of the firmware can be identified and located more accurately and efficiently, and collision values of original input data of some fragile hash functions in the firmware can be cracked more effectively and quickly. The method promotes the research in the field of function association of the intelligent embedded equipment firmware and binary program function vulnerability mining.
The invention can obtain the following beneficial effects:
the invention collects the fragile hash function and other function source codes of the firmware when the characteristic of the fragile hash function is extracted, compiles the function source codes into a plurality of binary files which can execute disassembly and have different architectures and different compiling optimization options, researches and analyzes the binary files, extracts the common characteristic of the fragile hash function which is not influenced by the architectures and the compiling optimization options or is influenced by the architectures and the compiling optimization options and is less than a preset threshold value, comprises function names, instruction numbers, instruction type numbers, jump instruction numbers, calling times, exclusive or numbers, stack sizes, code basic block numbers and 9 characteristics of whether loops exist, can well distinguish the fragile hash function from other functions, can research and analyze the binary files compiled by a plurality of architectures and compiling optimization options, and improves the identification and positioning accuracy of the fragile hash function, time is saved on hash function studies of multiple binary file types.
According to the invention, when the fragile hash function of the firmware is identified and positioned, the characteristics of the fragile hash function are extracted and are subjected to numerical processing, meanwhile, the characteristic data is trained and tested, and a reliable neural network model based on logistic regression is constructed.
When the method is used for researching and analyzing the code structure of the fragile hash function, the fragile function is divided into three basic modules, namely an initialization Block, a loop body Block and a tail end Block based on a depth-first algorithm and a topological sorting algorithm according to a control flow graph CFG in the fragile function, and meanwhile machine codes or assembly codes of the divided modules are converted into intermediate language VEX IR statements, so that the influences of different architectures and different compiling optimization options are effectively avoided, and the function research of binary codes is easier and simpler.
When the fragile hash function of the firmware is analyzed and cracked, a Z3SMT solving expression based on symbolic execution is constructed, and solving constraint conditions are added, so that the solving is carried out towards the target direction, the search range of brute force cracking is effectively reduced, and the cracking rate of the fragile hash function is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a general flowchart illustrating a method for identifying and cracking a fragile hash function of an intelligent device firmware according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a firmware preprocessing process according to yet another embodiment of the present invention for a method for recognizing and cracking a fragile hash function of an intelligent device firmware;
fig. 3 is a schematic flowchart of an algorithm for identifying and locating a fragile hash function of firmware according to still another embodiment of the method for identifying and cracking a fragile hash function of firmware of an intelligent device according to the present invention;
fig. 4 is a schematic flow chart illustrating an analysis and cracking algorithm of a fragile hash function according to still another embodiment of the method for identifying and cracking a fragile hash function of an intelligent device firmware according to the present invention;
FIG. 5 is a schematic diagram of a method for identifying and cracking a fragile hash function of an intelligent device firmware according to another embodiment of the present invention;
fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a method for identifying and cracking a fragile hash function of intelligent equipment firmware, aiming at the problems of high false alarm rate, inaccurate positioning, high cracking difficulty and the like of the fragile hash function of the firmware. The method can be applied to the field of hash function research and analysis of binary files compiled under various different architectures and compiling optimization options.
The method and the process designed by the invention mainly comprise the following steps: preprocessing of firmware, filtering of binary files to be detected, collecting and compiling of a fragile hash function and other functions of the firmware, extracting common features of the fragile hash function which are not influenced by architecture and compiling optimization options or are influenced by the common features of the fragile hash function less than a preset threshold, digitizing processing of the features, training and testing feature data, constructing a neural network model based on logistic regression, identifying and positioning the fragile hash function of the firmware based on a structured matching method, structural division and extraction of the fragile hash function code, VEX IR conversion of machine codes or assembly codes, constructing a Z3SMT constraint solving expression based on symbolic execution, adding a solving constraint solving condition, solving a collision value, verifying whether the collision value is correct and the like. The technical innovation point of the invention is that the common characteristic of the fragile hash function which is not influenced by architecture and compiling optimization options or is influenced by the fragile hash function and is less than a preset threshold value is extracted through research and analysis on the fragile hash function of the firmware, a reliable mathematical model which is applied to the structural matching of the fragile hash function of the firmware is trained according to the characteristic information of the fragile hash function, the fragile hash function is structurally divided, codes of each divided module are converted into intermediate language VEXIR statements, and the collision value of original input data of the hash function is solved by adopting Z3SMT constraint based on symbolic execution. The invention can well identify and locate some fragile hash functions in the firmware and the collision values of the original input data which break some fragile hash functions.
The specific embodiment of the invention discloses a method for identifying and cracking a fragile hash function of intelligent equipment firmware, which comprises the following steps:
s1, preprocessing the firmware to obtain a binary file to be analyzed;
s2, extracting common features of the fragile hash function which are not influenced by the architecture and the compiling optimization option or are influenced by the compiling optimization option and are smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method;
and S3, performing structural division and extraction on the codes of the fragile hash function, converting the machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding a solving constraint condition, reversely solving a collision value, and verifying whether the collision value is correct.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of firmware of an intelligent device is provided, where preprocessing the firmware to obtain a binary file to be analyzed includes:
firmware crawling, namely developing a firmware webpage crawler and crawling the firmware for preselected intelligent equipment manufacturers;
storing firmware, namely building a MongoDB database to store important firmware information such as a firmware name, a manufacturer name, a product name, a firmware version number, a product category, firmware description, a firmware downloading link and the like, compiling a firmware downloading script to download firmware in batch aiming at the firmware downloading link, and storing a plurality of downloaded firmware to a designated position of a server;
decoding the firmware, writing a script and calling decompression tools such as Binwalk and the like to decode the firmware in batch;
filtering of the disassembled binary file can be performed to obtain the binary file of the firmware to be analyzed.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, in which a commonality feature of the fragile hash function that is not affected by an architecture and a compilation optimization option or is affected by the architecture and the compilation optimization option to be less than a preset threshold is extracted, including:
collecting source codes of fragile hash functions and other functions of a plurality of firmware, and compiling the collected source codes under different architectures and different compiling optimization options to generate a plurality of binary files capable of executing disassembly;
the method mainly comprises the following steps of performing reverse analysis on a plurality of binary files which are generated by compiling and can be executed and disassembled through an IDA Pro plug-in, compiling the IDA Pro plug-in to perform statistical analysis on various characteristics of a fragile hash function and other functions, extracting obvious characteristics of the fragile hash function different from the other functions, and mainly comprising the following steps of: function name, number of instructions, number of instruction types, number of jump instructions, number of calls, number of exclusive ors, stack size, number of basic blocks of code, and whether there are 9 characteristics of loops.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, where a common characteristic of the fragile hash function is numerically processed, where the method includes:
and researching and analyzing each function in the compiled and generated binary files capable of executing disassembly, analyzing each instruction of the basic block of each function, counting the numerical value of each characteristic by adopting a numerical value accumulation method, and marking the hash function as a positive sample and other functions as negative samples.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, the method is used for training and testing the feature data to construct a reliable neural network model based on logistic regression, and the method includes:
training and testing the generated characteristic data by using a classification classifier of a python machine learning library sklern and a logistic regression method to construct a neural network model for subsequent structural matching;
and evaluating the mathematical model by using the accuracy, the recall rate and the comprehensive evaluation index f-measure value. If the model is better, the model is saved; if the model is not good enough, the collected fragile hash function and other functions are added, and meanwhile, the extracted fragile hash function features are added or other fragile hash function features with higher values are extracted, and the experiment is repeated until a reliable neural network model based on logistic regression is obtained.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of firmware of an intelligent device is provided, where identification and location of the fragile hash function of the firmware are performed based on a structured matching method to obtain a function name and a function entry address, including:
and performing function association based on a structured matching method on the binary file of the executable disassembly of the firmware to be analyzed and the reliable neural network model based on the logistic regression, so as to realize the identification and the positioning of the fragile hash function of the firmware and obtain the function name and the function entry address of the fragile hash function.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, in which a code of the fragile hash function is divided and extracted by a structural module, including:
loading a target binary file by an angr-based submodule cle, extracting a control flow graph CFG (computational fluid dynamics) inside the identified and positioned fragile function, splitting the fragile function into three basic modules, namely an initialization Block, a loop Block and a tail end Block based on a depth-first algorithm and a topology sorting algorithm, and obtaining a basic Block entry address and a jump address.
On the basis of any one of the above specific embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, in which machine codes or assembly codes of each divided basic block are converted into an intermediate language VEX IR statement, including:
converting the machine code or assembly code of each module into a corresponding intermediate language VEXIR statement based on the angr submodule pyvex;
on the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, where a Z3SMT constraint solving expression based on symbolic execution is constructed and a solving constraint condition is added, where the method includes:
converting the variable value into a symbolic value based on symbolic execution, constructing a Z3 expression, and determining the number of circulation times existing in the function;
the program analysis problem is converted into a constraint solving problem, solving constraint conditions are added, the solving mode is carried out towards the target direction, the searching range of brute force cracking is reduced, and the function cracking speed is improved.
On the basis of any one of the above embodiments of the present invention, a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, where the method includes:
and comparing whether the encrypted values of the fragile hash functions of the original input data and the collision value are equal or not to judge whether the fragile hash functions of the firmware are successfully cracked or not, wherein if the fragile hash functions of the firmware are equal, the firmware is successfully cracked, and if the fragile hash functions of the firmware are not equal, the firmware is failed to be cracked.
On the basis of any one of the above specific embodiments of the present invention, a specific embodiment of a method for identifying and cracking a fragile hash function of an intelligent device firmware is provided, where a general flow is shown in fig. 1, and the general flow includes the following specific flows:
a) the preprocessing flow of the firmware of the intelligent device, as shown in fig. 2, comprises firmware crawling, firmware storing, firmware decompressing and filtering of executable disassembly binary files. Firmware crawling, namely researching and analyzing well-known intelligent device manufacturers at home and abroad, developing a firmware webpage crawler by using a Python Scapy framework technology, and crawling the firmware; storing firmware, namely storing a plurality of important firmware information such as a firmware name, a manufacturer name, a product name, a firmware version number, a product category, firmware description, a firmware downloading link and the like by using a MongoDB database, wherein the most important stored information is the firmware downloading link url, writing a firmware downloading script to download the firmware downloading link url in batch, and storing the downloaded firmware to a specified position of a server; decoding the firmware, writing a Python script and calling tools such as Binwalk and the like to carry out automatic decompression of the firmware in batches; filtering the executable disassembly binary file, and screening the plurality of decompressed binary files to obtain the firmware executable disassembly binary file to be analyzed.
b) C language code for fragile hash functions (e.g., BKDRHash, BPHash, etc.) and other functions (e.g., readdir, system, etc.) of multiple open source firmware is collected. For the collected function source codes, a plurality of different binary files are generated through compiling a plurality of architectures such as X86, ARM and MIPS and a plurality of compiling optimization options such as-O0, -O1, -O2, -O3 and-Os.
c) Performing reverse analysis on the binary file compiled in the step b) through IDA Pro, and performing feature extraction analysis on the fragile hash function and other functions by combining an IDA Python script, and finding that the fragile hash function has some features obviously different from the other functions, wherein the features mainly comprise function names, instruction numbers, instruction type numbers, jump instruction numbers, calling times, XOR numbers, stack sizes, code basic block numbers and 9 features of loop existence or not, and certainly, a plurality of non-fragile hash functions with confused features also exist.
d) The characteristics in c) are processed in a numerical mode, each instruction of the basic block of each function in the compiled binary file is analyzed, the number of times of using the instructions of each characteristic can be counted by adopting a numerical accumulation method, and meanwhile, the fragile hash function is marked as a positive sample, and other functions are marked as negative samples.
e) Randomly dividing the feature data extracted in the step d) into a 70% training set and a 30% testing set, training and testing the feature data by using an L g logistic Regression module of a Python machine learning skerarn library, and constructing a neural network model.
f) Performing function association based on a structured matching method on the binary file which is obtained in the step a) and can be analyzed and disassembled and the mathematical model constructed in the step e), so as to realize identification and positioning of the fragile hash function of the firmware, and obtain a function name and a function entry address of the fragile hash function, wherein fig. 3 is a flow diagram of an identification and positioning algorithm of the fragile hash function of the firmware.
g) In machine language, the jump caused by an indirect branch instruction is usually treated as a boundary, so that a complete function structure is divided into several different code blocks. Carrying out research and analysis on the fragile hash function positioned in the step f) through experiments, carrying out structural division and extraction on codes of the fragile hash function, loading a target binary file by using a sub-module cle of an angr, and extracting a control flow graph CFG (computational fluid dynamics) in the identified and positioned fragile function. And splitting the fragile function into three basic modules, namely an initialization Block, a loop Block and a tail end Block, based on a depth-first algorithm and a topology sorting algorithm, and obtaining a basic Block entry address and a jump address.
h) Using the angr submodule pyvex to convert the machine code or assembly code of each module divided in g) into a corresponding intermediate language VEX IR statement.
i) In the symbolic execution process, the execution sequence of the hash function codes is converted into an intermediate language, and symbolic value replacement of unknown variables is carried out in a loop body. When the symbol is executed, a Z3 expression containing unknown variables is obtained. Since the valid statements of the intermediate language are all equations, analysis needs to be done on both sides of the equation. And then converted into a variable corresponding to Z3. Where the Z3 variable also has its own naming convention, beginning with letters and consisting of only numbers, upper and lower case letters, and underlining. The specific process is as follows: using equal sign as separator to obtain operations of two sides of equation; extracting left variable operands; judging the right operation type and extracting an operand; judging whether all the extracted operands are subjected to variable declaration, if not, performing variable declaration by using the unified naming of the Z3 variable, and assigning values to the variables on the right side; the translated statement is executed.
j) And replacing variable values with symbolic values according to the Z3 expression containing the unknown variable obtained in i). Use of b0,b1,...,bi-1,biThe contents of the 1 st, 2 nd, 2., i-1 st and i th bit strings of the input string are replaced by bits, respectively. During execution of the loop statement, the input parameter is located and replaced with a symbolic value by determining whether the value of the expression in the statement is equal to the value of the parameter register.
k) The program analysis problem is converted into a constraint solving problem, solving constraint conditions need to be added, the solving mode is carried out towards the target direction, the searching range of brute force cracking is reduced, and the function cracking speed is improved. For example, the constraint conditions are set: it is assumed that the raw input data consists of a series of strings of numbers and upper and lower case letters. When the solving constraint condition is added, the cracked characters are limited in numbers and upper and lower case letters; and enabling the value of the Z3 expression containing the unknown quantity after the execution of the symbols in the j) to be equal to the output value of the original input data after the encryption of the fragile hash function. Fig. 4 is a schematic diagram of a flow of an analysis and cracking algorithm of a fragile hash function.
l) judging whether the constraint condition is met by using a check () method of Z3, if s.check (), indicating that at least one value exists, and calling a model () method, solving the collision value can be carried out. If s.check (), then there is no solution. When the output solution meeting the condition is obtained according to the method in the step k), the result is stored as an ASCII code value, in order to facilitate understanding, the ASCII code value needs to be converted into a number or a letter for outputting, the result is converted into a result to be seen by calling a Python built-in function chr (), and the character is displayed according to b0To bi-1Is output in sequence.
m) validating the collision value, fig. 5 shows a schematic diagram of validating the solution collision value. And judging whether the reverse solving result in the step l) is correct or not by comparing whether the encrypted values of the fragile hash functions of the original input data and the collision values are equal or not, and further judging whether the fragile hash functions of the firmware are cracked successfully or not, if the fragile hash functions are equal, the cracking is successful or not, and if the fragile hash functions are not equal, the cracking is failed. Continuing to verify the next collision value, and repeating the step m).
An example is as follows:
fig. 6 illustrates a physical structure diagram of a server, which may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method: preprocessing firmware to obtain a binary file to be analyzed; extracting common features of a fragile hash function which is not influenced by the architecture and the compiling optimization option or is influenced by the compiling optimization option and is smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method; performing structural division and extraction on codes of the fragile hash function, converting machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding solving constraint conditions, reversely breaking collision values, and verifying whether the collision values are correct.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and the method includes: preprocessing firmware to obtain a binary file to be analyzed; extracting common features of a fragile hash function which is not influenced by the architecture and the compiling optimization option or is influenced by the compiling optimization option and is smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method; performing structural division and extraction on codes of the fragile hash function, converting machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding solving constraint conditions, reversely breaking collision values, and verifying whether the collision values are correct.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for identifying and cracking a fragile hash function of intelligent device firmware is characterized by comprising the following steps:
preprocessing firmware to obtain a binary file to be analyzed;
extracting common features of a fragile hash function which is not influenced by the architecture and the compiling optimization option or is influenced by the compiling optimization option and is smaller than a preset threshold value, carrying out numerical processing on the features, training and testing feature data, constructing a reliable neural network model based on logistic regression, and identifying and positioning the fragile hash function of the firmware based on a structural matching method;
performing structural division and extraction on codes of the fragile hash function, converting machine codes or assembly codes into intermediate language VEX IR statements, constructing a Z3SMT solving expression based on symbolic execution, adding solving constraint conditions, reversely breaking collision values, and verifying whether the collision values are correct.
2. The method of claim 1, wherein the preprocessing of the firmware to obtain the binary file to be analyzed comprises the following main processes:
firmware crawling, namely developing a firmware webpage crawler and crawling the firmware for preselected intelligent equipment manufacturers;
firmware storage, namely, a MongoDB database is set up to store firmware information of a firmware name, a manufacturer name, a product name, a firmware version number, a product category, firmware description and a firmware downloading link, a firmware downloading script is compiled to download firmware in batch aiming at the firmware downloading link, and a plurality of downloaded firmware is stored to a designated position of a server;
decoding the firmware, writing a script and calling a Binwalk decompression tool to decode the firmware in batches;
filtering of the disassembled binary file can be performed to obtain the binary file of the firmware to be analyzed.
3. The method of claim 1, wherein extracting commonality features of a fragile hash function that are not affected by architecture and compilation optimization options or that are affected by them less than a preset threshold comprises:
collecting source codes of fragile hash functions and other functions of a plurality of firmware, and compiling the collected source codes under different architectures and different compiling optimization options to generate a plurality of binary files capable of executing disassembly;
the method comprises the following steps of performing reverse analysis on a plurality of binary files which are generated by compiling and can be executed and disassembled through an IDA Pro plug-in, compiling the IDA Pro plug-in to perform statistical analysis on various characteristics of a fragile hash function and other functions, and extracting obvious characteristics of the fragile hash function different from the other functions, wherein the characteristics comprise the following steps: function name, number of instructions, number of instruction types, number of jump instructions, number of calls, number of exclusive ors, stack size, number of basic blocks of code, and whether there are 9 characteristics of loops.
4. The method of claim 3, wherein numerically processing the commonality characteristic of the fragile hash function comprises:
and researching and analyzing each function in a plurality of compiled binary files capable of executing disassembly, analyzing each instruction of a basic block of each function, counting the numerical value of each characteristic by adopting a numerical value accumulation method, and marking the hash function as a positive sample and other functions as negative samples.
5. The method of claim 4, wherein training and testing the generated feature data to construct a reliable logistic regression-based neural network model comprises:
training and testing the characteristic data by using a classification classifier of a python machine learning library sklern and a logistic regression method to construct a neural network model for subsequent structural matching;
and evaluating the mathematical model by using the accuracy, the recall rate and the comprehensive evaluation index f-measure value, and storing a reliable mathematical model.
6. The method of claim 5, wherein performing firmware fragile hash function identification and location based on a structured matching method comprises:
and performing function association based on a structured matching method on the binary file of the executable disassembly of the firmware to be analyzed and the reliable neural network model based on the logistic regression, so as to realize the identification and the positioning of the fragile hash function of the firmware and obtain the function name and the function entry address of the fragile hash function.
7. The method of claim 6, wherein structurally partitioning and extracting code of a fragile hash function comprises:
loading a target binary file by an angr-based submodule cle, extracting a control flow graph CFG (computational fluid dynamics) inside the identified and positioned fragile function, splitting the fragile function into three basic modules, namely an initialization Block, a loop Block and a tail end Block based on a depth-first algorithm and a topology sorting algorithm, and obtaining a basic Block entry address and a jump address.
8. The method of claim 7, wherein translating machine code or assembly code of each basic block of partitions into intermediate language VEX IR statements comprises:
and converting the machine code or assembly code of each module into a corresponding intermediate language VEX IR statement based on the angr sub-module pyvex.
9. The method of claim 8, wherein constructing a symbolic execution based Z3SMT constraint solving expression and adding solving constraints comprises:
converting the variable value into a symbolic value based on symbolic execution, constructing a Z3 expression, and determining the number of circulation times existing in the function;
the program analysis problem is converted into a constraint solving problem, solving constraint conditions are added, the solving mode is carried out towards the target direction, the searching range of brute force cracking is reduced, and the function cracking speed is improved.
10. The method of claim 9, wherein solving for the collision value and verifying whether the collision value is correct comprises:
and comparing whether the encrypted values of the fragile hash functions of the original input data and the collision value are equal or not to judge whether the fragile hash functions of the firmware are successfully cracked or not, wherein if the fragile hash functions of the firmware are equal, the firmware is successfully cracked, and if the fragile hash functions of the firmware are not equal, the firmware is failed to be cracked.
CN201811406960.2A 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware Active CN109740347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811406960.2A CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811406960.2A CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Publications (2)

Publication Number Publication Date
CN109740347A CN109740347A (en) 2019-05-10
CN109740347B true CN109740347B (en) 2020-07-10

Family

ID=66358138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811406960.2A Active CN109740347B (en) 2018-11-23 2018-11-23 Method for identifying and cracking fragile hash function of intelligent device firmware

Country Status (1)

Country Link
CN (1) CN109740347B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362966A (en) * 2019-07-11 2019-10-22 华东师范大学 A kind of cross-platform firmware homology safety detection method based on fuzzy Hash
CN110764784B (en) * 2019-10-24 2023-05-30 北京智游网安科技有限公司 Method for identifying three-party SO (SO) file, intelligent terminal and storage medium
CN111444513B (en) * 2019-11-14 2024-03-12 中国电力科学研究院有限公司 Firmware compiling optimization option identification method and device for power grid embedded terminal
CN110941832A (en) * 2019-11-28 2020-03-31 杭州安恒信息技术股份有限公司 Embedded Internet of things equipment firmware vulnerability discovery method, device and equipment
CN111580822A (en) * 2020-04-22 2020-08-25 中国科学院信息工程研究所 Internet of things equipment assembly version information extraction method based on VEX intermediate language
CN112394984B (en) * 2020-10-29 2022-09-30 北京智联安行科技有限公司 Firmware code analysis method and device
CN117473494B (en) * 2023-06-06 2024-06-25 兴华永恒(北京)科技有限责任公司 Method and device for determining homologous binary files, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN101976319A (en) * 2010-11-22 2011-02-16 张平 BIOS firmware Rootkit detection method based on behaviour characteristic
CN104982011A (en) * 2013-03-08 2015-10-14 比特梵德知识产权管理有限公司 Document classification using multiscale text fingerprints
CN105184146A (en) * 2015-06-05 2015-12-23 北京北信源软件股份有限公司 Method and system for checking weak password of operating system
CN105740477A (en) * 2016-03-18 2016-07-06 中国科学院信息工程研究所 Function searching method for large-scale embedded device firmware and search engine
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system
CN107229563A (en) * 2016-03-25 2017-10-03 中国科学院信息工程研究所 A kind of binary program leak function correlating method across framework
CN108008960A (en) * 2017-11-09 2018-05-08 北京航空航天大学 A kind of feature code generating method towards critical software binary file
US10057243B1 (en) * 2017-11-30 2018-08-21 Mocana Corporation System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN101976319A (en) * 2010-11-22 2011-02-16 张平 BIOS firmware Rootkit detection method based on behaviour characteristic
CN104982011A (en) * 2013-03-08 2015-10-14 比特梵德知识产权管理有限公司 Document classification using multiscale text fingerprints
CN105184146A (en) * 2015-06-05 2015-12-23 北京北信源软件股份有限公司 Method and system for checking weak password of operating system
CN106295335A (en) * 2015-06-11 2017-01-04 中国科学院信息工程研究所 The firmware leak detection method of a kind of Embedded equipment and system
CN105740477A (en) * 2016-03-18 2016-07-06 中国科学院信息工程研究所 Function searching method for large-scale embedded device firmware and search engine
CN107229563A (en) * 2016-03-25 2017-10-03 中国科学院信息工程研究所 A kind of binary program leak function correlating method across framework
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN108008960A (en) * 2017-11-09 2018-05-08 北京航空航天大学 A kind of feature code generating method towards critical software binary file
US10057243B1 (en) * 2017-11-30 2018-08-21 Mocana Corporation System and method for securing data transport between a non-IP endpoint device that is connected to a gateway device and a connected service

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"FirmUp: precise static detection of common vulnerabilities in firmware";Yaniv David .etal;《ASPLOS’18》;20180331;第392-404页 *
"Staticly Detect Stack Overflow Vulnerabilities with Taint Analysis";Zhang XING .etal;《ITM Web of conferences》;20161231;第1-5页 *
"VDNS:一种跨平台的固件漏洞关联算法";常青 等;《计算机研究与发展》;20161231;第2288-2298页 *

Also Published As

Publication number Publication date
CN109740347A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN109740347B (en) Method for identifying and cracking fragile hash function of intelligent device firmware
CN106295335B (en) Firmware vulnerability detection method and system for embedded equipment
CN109885479B (en) Software fuzzy test method and device based on path record truncation
CN109711163B (en) Android malicious software detection method based on API (application program interface) calling sequence
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
CN110175851B (en) Cheating behavior detection method and device
CN109598124A (en) A kind of webshell detection method and device
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN116361801B (en) Malicious software detection method and system based on semantic information of application program interface
CN112528284A (en) Malicious program detection method and device, storage medium and electronic equipment
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN115576840B (en) Static program pile insertion detection method and device based on machine learning
CN115033895B (en) Binary program supply chain safety detection method and device
CN112613040A (en) Vulnerability detection method based on binary program and related equipment
CN115168856A (en) Binary code similarity detection method and Internet of things firmware vulnerability detection method
CN113221960A (en) Construction method and collection method of high-quality vulnerability data collection model
CN114969755A (en) Cross-language unknown executable program binary vulnerability analysis method
CN112052453A (en) Webshell detection method and device based on Relief algorithm
CN116032654B (en) Firmware vulnerability detection and data security management method and system
Seas et al. Automated Vulnerability Detection in Source Code Using Deep Representation Learning
CN115618355A (en) Injection attack result judgment method, device, equipment and storage medium
CN112163217B (en) Malware variant identification method, device, equipment and computer storage medium
CN110413909B (en) Machine learning-based intelligent identification method for online firmware of large-scale embedded equipment
CN113190847A (en) Confusion detection method, device, equipment and storage medium for script file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant