CN114021146B

CN114021146B - Unstructured difference patch analysis method based on value set analysis

Info

Publication number: CN114021146B
Application number: CN202111348410.1A
Authority: CN
Inventors: 常瑞; 林键; 戴勤明
Original assignee: Hangzhou Rongshu Network Security Technology Co ltd
Current assignee: Hangzhou Rongshu Network Security Technology Co ltd
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-07-05
Anticipated expiration: 2041-11-15
Also published as: CN114021146A

Abstract

The invention discloses an unstructured difference patch analysis method based on value set analysis, which utilizes a value set analysis technology to search an unstructured difference patch so as to solve the problem that the unstructured difference patch cannot be found by the traditional structured patch comparison technology. The method provides a new method for searching for an unstructured difference patch, recovers a function stack frame and extracts constant values in function parameters and conditional jump instructions by using value set analysis aiming at a matched and unchanged function detected by a structured patch comparison technology, and finally finds the unstructured difference patch by using a stack frame matching algorithm for the recovered function stack frame and a constant matching algorithm for the constant values in the function parameters and the conditional jump instructions, thereby providing an actual and effective method for detecting the unstructured difference patch.

Description

Unstructured difference patch analysis method based on value set analysis

Technical Field

The invention relates to a patch comparison technology, in particular to an unstructured difference patch analysis method based on value set analysis.

Background

The 1day bug is used to refer to those bugs that are disclosed, and software vendors typically do not disclose detailed information of the 1day bug, but rather repair the 1day bug through a security patch, but when a user does not update the patch in a timely manner, the threat of the 1day bug persists. The patch comparison technology can effectively discover 1day bugs in the program by discovering the patches in the program. The structured patch comparison technology has a good effect as the most mainstream patch comparison technology at present, but the problem of patch missing report still exists. One of the most important problems is that the unstructured difference patches cannot be found, for example, there are 247 patches in the 124 programs in the CGC test set, which contain 24 unstructured difference patches, accounting for 9.72% of the total number. The effect of unstructured difference patches on a program is unstructured and therefore cannot be captured by structured patch alignment techniques. In order to detect such patches, BinHunt and inbhunt perform symbolic execution on basic blocks in a function on the basis of function matching to find such unstructured difference patches, but this method adopts a heavy-weight symbolic execution technique, which results in low efficiency, for example, it takes 3 hours to perform patch comparison on a gzip program and 6 hours to perform patch comparison on a thttpd program.

Disclosure of Invention

The invention aims to provide an unstructured difference patch analysis method based on value set analysis, aiming at the defects of the prior art.

The purpose of the invention is realized by the following technical scheme: an unstructured difference patch analysis method based on value set analysis comprises the following steps:

the method comprises the following steps: performing function identification on the original file and the patch file, identifying a function starting address and a function size, and generating two IDA Pro scripts;

step two: obtaining a matched function pair by using a structured patch comparison technology for two IDA Pro scripts, wherein the matched function pair comprises a first function and a second function, the first function corresponds to a function in an original file, and the second function corresponds to a function in a patch file; then extracting a first basic block by the first function and extracting a second basic block by the second function;

step three: normalizing the first basic block and the second basic block extracted in the step two, obtaining a normalized first basic block pair from the first basic block, and obtaining a normalized second basic block pair from the second basic block;

step four: analyzing the function value set, extracting all operations to stack memory, and expressing them as triple [ ins ]₁,offset₁,size₁]In wherein ins₁Is the instruction address, offset, of a stack memory operation₁Offset of the stack memory relative to the base address of the stack frame₁Is a negative number, size₁The number of bytes is read and written to the stack memory; meanwhile, analyzing the value set to obtain the value ranges s of all variables in the basic block at each standardized basic block of the function;

step five: performing mapping analysis from the stack memory to the instruction by operating the stack memory to obtain a mapping table from the offset of the stack memory relative to the stack frame base address to the instruction list: the mapping table is used for mapping the operation of the stack memory to the base address of the stack frame, and the mapping table is used for mapping the operation of the stack memory to the base address of the stack frame; then, clustering analysis is carried out on the mapping table from the offset of the stack memory relative to the stack frame base address to the instruction list, namely, the offset with the same instruction list is classified into a variable, so that a new mapping table is obtained: the method comprises the steps of obtaining a first stack frame by a first function and obtaining a second stack frame by a second function, wherein the ins _ list is an instruction list of all stack memory operations with the same offset, and variables are a set of offsets with the same instruction list;

step six: judging whether the unstructured difference patch exists by any one of the following methods:

the method comprises the following steps: analyzing the first stack frame and the second stack frame obtained in the fifth step by using a stack frame matching algorithm, obtaining matched variable pairs, and finally comparing the sizes of each pair of matched variables, wherein if the sizes of a pair of variables are not equal, the stack frames of the two functions are not matched, which means that an unstructured difference patch exists in the matched function pair obtained by the original file and the patch file;

the second method comprises the following steps: and analyzing the first basic block pair and the second basic block pair after the first function and the second function are standardized by using a constant matching algorithm to obtain a constant pair matched with the first basic block pair and the second basic block pair, wherein when the values of the matched constant pair are found to be unequal, the fact that an unstructured difference patch exists in the matched function pair obtained by the original file and the patch file is meant.

Further, the normalization in step three is specifically: and searching a function call instruction in the basic block, if the function call instruction exists in the basic block and the instruction is not the last instruction in the basic block, cutting the basic block by taking the function call instruction as a boundary to obtain two new basic blocks, wherein the last instruction of one basic block is the function call instruction, and then standardizing the other basic block until the cutting cannot be continued.

Further, the mapping analysis from the stack memory to the instruction in the fifth step is specifically: enumerating all operations [ ins ] to stack memory₁,offset₁,size₁]Ins will be₁Added to mem _ ins _ map offset₁]，mem_ins_map[offset₁+1]，…mem_ins_map[offset₁+size₁-1]In the initial state, mem _ ins _ map is empty, while inAnd recording the minimum value of the offset of the stack memory relative to the base address of the stack frame in the analysis process, and recording the minimum value as min _ offset.

Further, the cluster analysis in the fifth step specifically includes: traversing the offset of the stack memory relative to the stack frame base address from the minimum value of the offset to 0 (not including 0), and acquiring all pairs of offsets corresponding to the offset of the stack memory relative to the stack frame base address in the mapping from the offset to the instruction list as offsets₁The instruction list of stack memory operations, change all pair offsets to offset₁The variable range corresponding to the instruction list in the mapping from the instruction list of the stack memory operation to the variable increases the offset value on the original basis; initially, the range of variables is empty.

Further, the stack frame matching algorithm in the sixth step specifically includes: traversing all elements in the stack frame 1 and the stack frame 2, acquiring an instruction list and a variable pair therein, and respectively recording as (ins _ list)₁，variable₁) And (ins _ list)₂，variable₂) Wherein (ins _ list)₁，variable₁) Is a list of instructions and variable pairs in stack frame 1, (ins _ list)₂，variable₂) For instruction list and variable pairs in stack frame 2, if ins _ list₁And ins _ list₂Equal, this indicates variable₁And variable₂Is matched, otherwise is not matched;

further, the constant matching algorithm in the sixth step specifically includes: traversing each basic block in the function, firstly checking the exit instruction type of the basic block, performing further analysis when the exit instruction type is a function call instruction or a conditional branch instruction, and otherwise skipping the analysis of the basic block; when the exit instruction type is a function call instruction, acquiring all parameters of a function according to a function call convention, and then judging whether the parameters are constants of a non-pointer type one by using a constant judgment algorithm; when the exit instruction type is a conditional branch instruction, carrying out value set analysis on the basic block to obtain a path predicate, and then carrying out standard form conversion on the path predicate to obtain a constant in the path predicate; .

Further, the constant judgment algorithm in the sixth step is specifically: obtaining a value range s of a variable in the basic block according to the analysis result, then judging whether s is equal to 0, if so, indicating that the variable is a constant, otherwise, not indicating that the variable is a constant; if the constant is a constant, further judging whether the constant is a pointer type, judging by trying to acquire the content of the address in the memory, if the memory corresponding to the constant can be successfully acquired, indicating that the constant is the pointer type, otherwise, indicating that the constant is a constant which is not the pointer type.

Further, in the sixth step, the standard form conversion is specifically: in order to convert the path predicate into a standard form, all symbol values in the path predicate are moved to the left, and the constant value is moved to the right; for the operational characters, except for ═ and ≠ other operational characters need to be converted into ≦ respectively; the specific operation is that when the operation character is greater than or equal to the value, the left side and the right side are multiplied by-1 at the same time, the operation character is converted into the value less than or equal to the value, then, aiming at <, the constant value on the right side is subtracted by 1, and the operation character is converted into the value less than or equal to the value; and finally, the constant on the right side in the path predicate is the constant in the path predicate.

The invention has the beneficial effects that: the unstructured difference patch is searched by using a value set analysis technology so as to solve the problem that the unstructured difference patch cannot be found by using the traditional structured patch comparison technology. By the method, the false positive rate of the improved patch comparison technology on the CGC test set is reduced from 11.02% to 1.63%. Meanwhile, by using the method, 20 unstructured difference patches are successfully found in an HTTP service program of a real device, namely a Negetear R6400 router, namely a new 1day bug can be found, and the effectiveness of the method in an actual application scene is proved.

Drawings

FIG. 1 is a flow chart of a method for analyzing unstructured difference patches based on value set analysis.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will become more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

As shown in fig. 1, an unstructured difference patch analysis method based on value set analysis of the present invention includes the following steps:

the method comprises the following steps: performing function identification on the original file and the patch file, identifying a function starting address and a function size, and then generating two IDA Pro scripts which are respectively marked as origin.

Wherein the original file and the patch file are binary programs;

as one of the implementation manners, the function identification uses a function identification technology based on structural control flow graph analysis proposed by nucleous.

Step two: acquiring a matched function pair for origin.idb and patch.idb by using a structured patch comparison technology, wherein the matched function pair comprises a first function and a second function, the first function corresponds to a function in an original file, and the second function corresponds to a function in a patch file; then extracting a first basic block by the first function and extracting a second basic block by the second function;

as one of the embodiments, a BinDiff tool is used when using the structured patch alignment technique.

the standardization is specifically as follows: searching a function call instruction in a basic block, if the function call instruction exists in the basic block and the instruction is not the last instruction in the basic block, cutting the basic block by taking the function call instruction as a boundary to obtain two new basic blocks, wherein the last instruction of one basic block is the function call instruction, and then standardizing the other basic block until the cutting cannot be continued;

the standardized basic block has no function call instruction except the last instruction, and is suitable for subsequent value set analysis.

Step four: analyzing the function to extract the resultThere are operations on the stack memory and they are represented as triples [ ins ]₁,offset₁,size₁]Wherein ins₁Is the instruction address, offset, of a stack memory operation₁Offset of the stack memory relative to the base address of the stack frame₁Is a negative number, size₁The number of bytes is read and written to the stack memory; simultaneously, analyzing the value set to obtain the value ranges s of all variables in the basic block at each standardized basic block of the function;

step five: performing mapping analysis from the stack memory to the instruction by operating the stack memory to obtain a mapping table from the offset of the stack memory relative to the stack frame base address to the instruction list: the mapping table is used for mapping the operation of the stack memory to the base address of the stack frame, and the mapping table is used for mapping the operation of the stack memory to the base address of the stack frame; then, a clustering analysis is performed on the mapping table (mem _ ins _ map) of the offset of the stack memory relative to the base address of the stack frame to the instruction list, i.e. the offset (offset) of the same instruction list (ins _ list) is classified as a variable (variable), so as to obtain a new mapping table: the method comprises the steps of obtaining a first stack frame by a first function and obtaining a second stack frame by a second function, wherein the ins _ list is an instruction list of all stack memory operations with the same offset (offset), and the variable (variable) is a set of offsets with the same instruction list (ins _ list);

the mapping analysis from the stack memory to the instruction specifically includes: enumerating all operations [ ins ] to stack memory₁,offset₁,size₁]Ins will be₁Added to mem _ ins _ map offset₁]，mem_ins_map[offset₁+1]，…mem_ins_map[offset₁+size₁-1]In the initial state, mem _ ins _ map is empty, and the minimum value of the offset of the stack memory relative to the stack frame base address is recorded in the analysis process and is recorded as min _ offset;

the cluster analysis specifically comprises: offset of stack memory relative to stack frame base address₁) Minimum value (min _ offset) of offset amount to0 (not including 0) is traversed, and all pairs of offsets corresponding to the offset of the stack memory relative to the base address of the stack frame in the mapping of the instruction list are obtained as offsets₁The instruction list of stack memory operations, change all pair offsets to offset₁The variable range corresponding to the instruction list in the mapping from the instruction list of the stack memory operation to the variable increases the value of the offset on the original basis; initially, the range of variables is empty.

the stack frame matching algorithm specifically comprises: traversing all elements in the stack frame 1 and the stack frame 2, acquiring an instruction list and a variable pair therein, and respectively recording as (ins _ list)₁，variable₁)，(ins_list₂，variable₂) Wherein (ins _ list)₁，variable₁) Is a list of instructions and variable pairs in stack frame 1, (ins _ list)₂，variable₂) For instruction list and variable pairs in stack frame 2, if ins _ list₁And ins _ list₂Equal, this indicates variable₁And variable₂Is matched, otherwise is not matched;

the second method comprises the following steps: analyzing the first basic block pair and the second basic block pair after the first function and the second function are standardized by using a constant matching algorithm to obtain a constant pair matched with the first basic block pair and the second basic block pair, wherein when the values of the matched constant pair are found to be unequal, the fact that an unstructured difference patch exists in the matched function pair obtained by the original file and the patch file is meant;

the constant matching algorithm specifically comprises: traversing each basic block in the function, firstly checking the exit instruction type of the basic block, performing further analysis when the exit instruction type is a function call instruction or a conditional branch instruction, and otherwise skipping the analysis of the basic block; when the exit instruction type is a function call instruction, acquiring all parameters of a function according to a function call convention, and then judging whether the parameters are constants of a non-pointer type one by using a constant judgment algorithm; when the exit instruction type is a conditional branch instruction, carrying out value set analysis on the basic block to obtain a path predicate, and then carrying out standard form conversion on the path predicate to obtain a constant in the path predicate;

the constant judgment algorithm specifically comprises the following steps: obtaining a variable value range s in the basic block according to the analysis result, then judging whether s is equal to 0, if so, indicating that the variable is a constant, otherwise, not determining that the variable is a constant; if the constant is a constant, further judging whether the constant is a pointer type, judging by trying to acquire the content of the address in the memory, if the memory corresponding to the constant can be successfully acquired, indicating that the constant is the pointer type, otherwise, indicating that the constant is a constant which is not the pointer type;

the standard form conversion is specifically: in order to convert the path predicate into a standard form, all symbol values in the path predicate are moved to the left, and the constant value is moved to the right; for the operational characters, except for ═ and ≠ other operational characters need to be converted into ≦ respectively; the specific operation is that when the operation character is greater than or equal to the value, the left side and the right side are multiplied by-1 at the same time, the operation character is converted into the value less than or equal to the value, then, aiming at <, the constant value on the right side is subtracted by 1, and the operation character is converted into the value less than or equal to the value; and finally, the constant on the right side in the path predicate is the constant in the path predicate.

The effect of the prototype tool EnBinDiff on the CGC test set using the unstructured difference patch analysis method based on value set analysis will be listed below, while the most advanced patch comparison tool BinDiff, Diaphora, is selected as a control. Finally, the ability of EnBinDiff to discover unstructured difference patches in real devices will be explored. The present invention will be described in further detail through the above experiments.

The CGC test set contains the original programs and patches, which are run in a custom operating system named DECREE. There are 131 services in the test set, but 5 of them involve communication between multiple binaries, so only 126 services of a single binary are considered. According to the prefixes of binary program names, these binary programs can be classified into 4 types: CROMU, KPRCA, NRFIN and YAN 01. It was found during experiments that a large number of unstructured difference patches are contained in the binary files NRFIN _00026 and NRFIN _00032, for example NRFIN _00026 contains 1004 patches, of which 1003 are unstructured difference patches not found by BinDiff, and for fairness these two programs will not be included in the final test results. The final results show that, of a total of 245 patches, BinDiff can detect 218, report missing 27, Diaphora can detect 212, report missing 33, while the tool prototype EnBinDiff of the present invention can detect 241, report missing only 4. Compared with the BinDiff rate of missing report, the rate of missing report is reduced by 9.39%, and compared with the Diaphora rate of missing report, the rate of missing report is reduced by 11.84%.

Whereas in terms of performance, for each procedure, BinDiff has an average time overhead of 8.12 seconds, Diaphora of 16.46 seconds, and EnBinDiff of 14.03 seconds, the increased time overhead of 5.91 seconds is within an acceptable range compared to BinDiff.

Meanwhile, EnBinDiff can discover a new 1day hole in a real device. In an experiment, an HTTP service program of a Netgear R6400 router is used as a test set to verify the capability of EnBinDiff to find 1day bug in real software, specifically, every two adjacent versions in 18 versions of the HTTP service program are used as input to carry out patch detection, finally, in 17 comparison, unstructured difference patches are found for 4 times, 20 unstructured difference patches are counted, and analysis confirms that all 20 unstructured difference patches are used for repairing 1day bug.

It will be understood by those skilled in the art that the foregoing is only a single example of the invention and is not intended to limit the invention, which has been described in detail with reference to the foregoing examples, but it will be apparent to those skilled in the art that various changes in the form and details of the invention may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims

1. An unstructured difference patch analysis method based on value set analysis is characterized by comprising the following steps:

step four: analyzing the function value set, extracting all operations to stack memory, and expressing them as triple [ ins ]₁,offset₁,size₁]In wherein ins₁Is the instruction address, offset, of a stack memory operation₁Offset of the stack memory relative to the base address of the stack frame₁Is a negative number, size₁The number of bytes is read and written to the stack memory; simultaneously, analyzing the value set to obtain the value ranges s of all variables in the basic block at each standardized basic block of the function;

2. The method for analyzing unstructured difference patches based on value set analysis as claimed in claim 1, wherein the normalization in step three is specifically: and searching a function call instruction in the basic block, if the function call instruction exists in the basic block and is not the last instruction in the basic block, cutting the basic block by taking the function call instruction as a boundary to obtain two new basic blocks, wherein the last instruction of one basic block is the function call instruction, and then standardizing the other basic block until the cutting cannot be continued.

3. According toThe method for analyzing unstructured difference patches based on value set analysis as claimed in claim 1, wherein the mapping analysis from the stack memory to the instructions in step five is specifically: enumerating all operations [ ins ] to stack memory₁,offset₁,size₁]Ins will be₁Added to mem _ ins _ map offset₁]，mem_ins_map[offset₁+1]，…mem_ins_map[offset₁+size₁-1]In the initial state, mem _ ins _ map is empty, and the minimum value of the offset of the stack memory relative to the base address of the stack frame is recorded in the analysis process and is recorded as min _ offset.

4. The method as claimed in claim 1, wherein the cluster analysis in step five is specifically: traversing the offset of the stack memory relative to the stack frame base address from the minimum value of the offset to 0 (not including 0), and acquiring all pairs of offsets corresponding to the offset of the stack memory relative to the stack frame base address in the mapping from the offset to the instruction list as offsets₁The instruction list of stack memory operations, change all pair offsets to offset₁The variable range corresponding to the instruction list in the mapping from the instruction list of the stack memory operation to the variable increases the offset value on the original basis; initially, the range of variables is empty.

5. The unstructured difference patch analysis method based on value set analysis as claimed in claim 1, wherein the stack frame matching algorithm in step six is specifically: traversing all elements in the stack frame 1 and the stack frame 2, acquiring an instruction list and a variable pair therein, and respectively recording as (ins _ list)₁，variable₁) And (ins _ list)₂，variable₂) Wherein (ins _ list)₁，variable₁) Is a list of instructions and variable pairs in stack frame 1, (ins _ list)₂，variable₂) For instruction list and variable pairs in stack frame 2, if ins _ list₁And ins _ list₂Equal, this indicates variable₁And variable₂Is matched, otherwise is notAnd (6) matching.

6. The method for analyzing unstructured difference patches based on value set analysis as claimed in claim 1, wherein the constant matching algorithm in step six specifically is: traversing each basic block in the function, firstly checking the exit instruction type of the basic block, performing further analysis when the exit instruction type is a function call instruction or a conditional branch instruction, and otherwise, skipping the analysis of the basic block; when the exit instruction type is a function call instruction, acquiring all parameters of a function according to a function call convention, and then judging whether the parameters are constants of a non-pointer type one by using a constant judgment algorithm; and when the exit instruction type is a conditional branch instruction, performing value set analysis on the basic block to obtain a path predicate, and then performing standard form conversion on the path predicate to obtain a constant in the path predicate.

7. The method for analyzing unstructured difference patches based on value set analysis as claimed in claim 1, wherein the constant judgment algorithm in step six is specifically: obtaining a variable value range s in the basic block according to the analysis result, then judging whether s is equal to 0, if so, indicating that the variable is a constant, otherwise, not determining that the variable is a constant; if the constant is a constant, further judging whether the constant is a pointer type, judging by trying to acquire the content of the address in the memory, if the memory corresponding to the constant can be successfully acquired, indicating that the constant is the pointer type, otherwise, indicating that the constant is a constant which is not the pointer type.

8. The method for analyzing unstructured difference patches based on value set analysis as claimed in claim 6, wherein the standard form transformation in step six is specifically: in order to convert the path predicate into a standard form, all symbol values in the path predicate are moved to the left, and the constant value is moved to the right; for the operational characters, except for ═ and ≠ other operational characters need to be converted into ≦ respectively; the specific operation is that when the operation character is greater than or equal to the value, the left side and the right side are multiplied by-1 at the same time, the operation character is converted into the value less than or equal to the value, then, aiming at <, the constant value on the right side is subtracted by 1, and the operation character is converted into the value less than or equal to the value; and finally, the constant on the right side in the path predicate is the constant in the path predicate.