CN113553041B - Method, apparatus and medium for generating function code formalized structure in binary program - Google Patents
Method, apparatus and medium for generating function code formalized structure in binary program Download PDFInfo
- Publication number
- CN113553041B CN113553041B CN202111108278.7A CN202111108278A CN113553041B CN 113553041 B CN113553041 B CN 113553041B CN 202111108278 A CN202111108278 A CN 202111108278A CN 113553041 B CN113553041 B CN 113553041B
- Authority
- CN
- China
- Prior art keywords
- function
- function code
- code
- information
- binary program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
Abstract
The invention provides a method, equipment and a medium for generating a function code formalized structure in a binary program, wherein the method takes function codes in the binary program as analysis basic granularity, classifies the binary program codes based on functional attributes, divides the address space of the binary codes, generates function code sets of various functional attributes, and establishes a classification function information table and a function distribution table for describing the functional code attributes; constructing a machine instruction operand type set, and generating formalized structure sets of various function codes by formalizing operands in the function codes; and establishing a function code formalized structure matrix by using the classification function information table, the function distribution table and the formalized structure of the function codes, wherein the matrix can effectively analyze the formalized structure of the whole function codes in the binary program on the whole level and the function level. The method and the device realize effective analysis of the function code structure in the binary program and provide practical support for accurately detecting the functional attribute of the binary program.
Description
Technical Field
The present invention relates to the field of information security, and in particular, to a method, an apparatus, and a medium for generating a function code formalized structure in a binary program.
Background
In the modern times, systems such as a cloud computing platform, an internet of things, a mobile network and an industrial internet are rapidly developed, and a binary program is used as an important component in various systems, so that the security, reliability and credibility of the binary program are increasingly important. With the development of information security technology, its countermeasure technology has been rapidly developed, the kinds of harmful technologies that jeopardize network and system security have been increasing, and applied technologies have been continuously innovated. Because the binary program is composed of machine instructions, the existing method is difficult to effectively analyze the code structure of the binary program and cannot effectively resist harmful technologies in the field of information security, so that the security threats of various systems and platforms are increasingly serious. Under the background, binary programs become a hot and difficult problem for international software security research.
In summary, there is no generally applicable method for effectively analyzing the binary program code structure.
Disclosure of Invention
In view of this, the present invention provides a method, an apparatus, and a medium for generating a function code formalized structure in a binary program, which are used to solve the problem that it is difficult to effectively analyze a code structure of the binary program.
The technical scheme of the invention is realized as follows:
the invention discloses a method for generating a function code formalized structure in a binary program, which comprises the following steps:
s1, performing structure analysis on the binary program, identifying and measuring each function code, acquiring address space information, code structure information and measurement information of function code function attributes, constructing function code sets of various different function attributes, classifying the function codes, and continuing to execute the step S2;
s2, using the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data, generating the feature information of the function code, constructing a classification function information table and a function distribution table, and continuing to execute the step S3;
s3, classifying the operands in the machine instruction to obtain the formal numerical values of the operands, replacing the operands contained in the function codes in the function code set with the formal numerical values to generate the formal structure of the function codes, and continuing to execute the step S4;
s4, according to the function distribution table and the formalized structure of the function code, constructing a matrix representation form of the formalized structure of the function code in the binary program.
The invention realizes the effective analysis of the binary program code structure through the whole code and the function code on the whole and local two levels of the binary program by the method.
On the basis of the above technical solution, preferably, step S1 specifically includes:
s1-1, extracting effective data for describing the binary program and the code structure thereof from the file structure description information contained in the binary program;
s1-2, analyzing various address space information of the binary code according to the analyzing method of the program structure based on the effective data, wherein the address space information comprises information of the storage structure of the description code, such as the initial address, the size, the entry point and the like of the binary code;
s1-3, traversing the binary code address space in the binary program based on the address space information, identifying each segment of function code, and obtaining the code structure information of each segment of function code, wherein the code structure information comprises information describing a function code storage structure, such as the starting address, the size, the ending address and the like of the function code;
s1-4, based on the address space information and the code structure information, obtaining the functional attributes of different function codes in the binary program, and constructing various function code sets with different functional attributes:
s1-5, according toP 0 ~ P nCollection and separationThe class function code comprises the following classification methods: division of function code into sets in the binary programP 0 ~ P nThe function codes in the same set are divided into the same type, and the function codes in different sets are of different types to obtain the function codes in the same setnA type of function code.
According to the method, the function codes in the binary program are used as the basic analysis granularity, the binary program codes are classified based on the functional attributes, the address space of the binary codes is divided, and the function code sets of various functional attributes are generated.
On the basis of the above technical solution, preferably, step S1-4 specifically includes:
s1-4-1, based on the address space information and various specific information describing the binary code structure in the code structure information, measuring the relationship between different function codes, and obtaining the measurement information of various functional attributes of each section of function code, wherein the measurement information includes information describing various functional characteristics of the function code, such as storage interval, similarity, measurement value and the like of the function code;
s1-4-2, marking the function attribute of each segment of function code in the binary program according to the measurement information of the function attribute of the function code;
s1-4-3, according to the function attribute of the function code, putting each section of function code in the binary program into the function code setP 0 ~ P n The function code setP 0 ~ P n The following conditions are satisfied:
for arbitrary function code setsP i ,0≤i≤nAll have functional attributesξ i And, the following is true:P i the middle function codes all have functional attributesξ i ;
For any function code in the binary code address spaceωThere is a unique set of function codesP j ,0≤j≤nAnd, the following is true:ω∈P j ;
according to the method, the function attributes of the function codes in the binary program are obtained, and the function codes in the address space of the binary program are divided according to the function attributes, so that function code sets of various function attributes are generated.
On the basis of the above technical solution, preferably, step S2 specifically includes:
s2-1, using the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data to generate the feature information of the function code, wherein the feature information of the function code comprises the initial address, the address interval, the function attribute, the classification type and other data describing the feature of the function code;
s2-2, constructing a classification function information tableT 0 ~T n : collecting the saidP 0 ~ P nThe characteristic information of each function code is used as table directory entry and put into the table respectivelyT 0 ~ T n And arranged in ascending or descending order according to the starting address of the function codeListsT 0 ~ T n Each directory entry in;
s2-3, constructing a function distribution tableF: tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order or the descending order of the starting address of the function code.
Based on the method, the invention establishes the classification function information table and the function distribution table for describing the function code attribute, and the subsequent steps of the technical scheme of the invention can effectively analyze the code structure in the binary program on the whole and function levels and accurately describe the characteristics of the function codes, such as the distribution rule, the mutual relation and the like.
On the basis of the above technical solution, preferably, step S3 specifically includes:
s3-1, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setD;
S3-2, processing the function code set by unified specificationP 0 ~ P n Generating a set of formalized structures for the function codeP 0' ~ P n ' and establishing a function code mapping table describing bijective relation between themW。
On the basis of the above technical solution, preferably, step S3-1 specifically includes:
building operand type setsDThe operand type setDThe following conditions are satisfied:
the operand type setDThe data type of the middle element is a single-byte character type, such as an integer, a letter and the like;
for operands in arbitrary machine instructionsxAll have a unique elementVegetable extractt∈DAnd, the following is true:tis thatxThe operand type of (d);
operand type setDThe middle element is called the formalized value of the operand in the function code.
On the basis of the above technical solution, preferably, step S3-2 specifically includes:
s3-2-1, based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableWThe method mainly comprises the following steps:
S3-2-1-A, respectively selecting function code setsP 0 ~ P n Continuing to execute step S3-2-1-B for each function code in the selected function code sets;
S3-2-1-B, for arbitrary function codef∈P i , 0≤i≤nWill befReplacing each operand with its formalized value to obtainfForm structure off', the relationship holds:f ' for is orf f', continue to execute step S3-2-1-C;
S3-2-1-C, willf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
s3-2-2, optimizing function code mapping tableWThe optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWEach directory entry in;
s3-2-3, establishing the tableWBased on the search indexThe index can quickly search the tableWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
for arbitrary function codef∈P i , 0≤i≤nWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf ' f;
For any elementf '∈P i ', 0≤i≤nWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: presence of unique function code setsP i And a unique elementf∈P i So thatf f '。
The invention obtains the formalization structure of the function code by constructing the operand type set of the machine instruction through the method, generates the formalization structure set of various function codes and the function code mapping tableWAnd providing a basis for constructing a matrix representation form of a function code formalized structure in the binary program.
On the basis of the above technical solution, preferably, step S4 specifically includes:
s4-1, if the function distribution tableFAll of them sharemThe directory of the entries is,m∈N +, m≥nthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i ∈P j ,0≤i≤m, 0≤j≤nThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i 、dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i ∈N,Nis a natural number;
s4-2, search tableWAccording to said function code setP 0~P n And function code formalization setP 0'~ P n ' bijective relationship between, and groupingP 0~P n Taking the feature array of each function code as a setP 0'~ P n ' in which the matrix coordinates of the structure are formalized, to assembleP 0'~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
matrix arrayAThe middle element comprises a setP 0'~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤j≤nAnd, the following is true:f ij '∈P j ';
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, 、f kl, There is a relationship:f ij ' f ij , f kl ' f kl , 0≤k≤m, 0≤l≤nand is andf ij, 、f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
If it ist j 、t l Are of the same classification type, thenf ij '、f kl ' is located atAThe same column is used; if it ist j 、t l Are of different classification types, thenf ij '、f kl Are respectively located atADifferent columns in (c).
The invention establishes the formalized structure matrix of the function code in the binary program by the methodAMatrix ofABased on function distribution tables, classification function information tables and setsP 0' ~ P n The method comprises the steps of constructing a middle element, analyzing the characteristics of the structure, the distribution rule, the mutual relation and the like of function codes in the binary program by respectively applying two granularities of the whole codes and the function codes, and effectively describing the formalized structure of the whole function codes in the binary program on the whole and function levels.
In a second aspect of the present invention, an electronic device is disclosed, the device comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a program of a method for generating a formalized structure of a function code in a binary program executable by the processor, and the program of the method for generating a formalized structure of a function code in a binary program is configured to implement a method for generating a formalized structure of a function code in a binary program according to the first aspect of the present invention.
In a third aspect of the present invention, a computer-readable storage medium is disclosed, wherein a program of a method for generating a function code formalized structure in a binary program is stored on the storage medium, and when the program of the method for generating a function code formalized structure in a binary program is executed, the method for generating a function code formalized structure in a binary program according to the first aspect of the present invention is implemented.
Compared with the prior art, the method, the system, the equipment and the medium for generating the function code formalized structure in the binary program have the following beneficial effects:
(1) finally, the invention accurately describes the characteristics of the structure, the distribution rule, the mutual relation and the like of the function codes in the binary program in a matrix form, and can effectively analyze the formalized structure of the whole function codes in the binary program on two levels of the whole codes and the function codes;
(2) according to the method for generating the formalized structure of the function code in the binary program, the function code structure in the binary program can be recursively and accurately analyzed through the two granularities of the whole code and the function code, the difficult problem that the binary program is difficult to effectively analyze in the current international software safety research is effectively solved, practical support is provided for accurately detecting the functional attribute of the binary program, and the safety threat caused by harmful technology can be effectively resisted, so that the safety, the reliability and the credibility of a system are guaranteed;
(3) the method for generating the function code formalized structure in the binary program can be applied to the binary programs with different formats in various system environments, and has the advantages of wide application range and strong portability;
(4) the method for generating the formalized structure of the function code in the binary program can be applied to the software security fields of binary code structure and vulnerability analysis, virus and malicious program searching and killing, vulnerability mining and utilization, program code defect identification, software homology identification and the like, thereby providing effective support for the security, reliability and credibility of the system and being the basis for further developing the security attack and defense technology of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for generating a function code formalized structure in a binary program according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Examples
The working flow of the method for generating the function code formalized structure in the binary program is shown in figure 1, and the processing steps are described as follows:
since formats of binary programs in different operating system environments are different, taking a specific workflow of generating a matrix representation form of a function code formalized structure in a PE program in a Windows environment as an example, the method for generating a function code formalized structure in a binary program provided by the present invention mainly comprises the following steps:
firstly, extracting effective data for describing the binary program and codes thereof based on the file structure description information contained in the binary program. In the Windows environment, common PE files include formats such as EXE, DLL, OCX, SYS, COM, etc., and the main file structures thereof are shown in table 1 below:
TABLE 1 Overall Structure Table of PE File
The PE file contains a data structure for describing a program structure, so that effective information for describing the program and codes thereof can be extracted. For example, the value of the "e _ lfanew" data entry in the "DOS header" is the RVA address of the NT header. Binary code is typically a section, usually a ". text" section, that is compiled into a PE file. In the "PE OPTIONAL HEADER" section of the "NT HEADER", data items within the data structure IMAGE _ OPTIONAL _ HEADER describe the underlying information of the section in which the binary and code reside. The value of the data item "ImageBase" is the base address of program loading, "the value of sizeoffmage" is the size of the program after loading, "the value of the data item" BaseOfCode "is the starting RVA address of the section where the code is located, and whether the section is a section containing binary codes can be judged according to whether the characteristic value mark of the section is an executable (MEM _ EXECUTE) attribute. The last part of the 'PE optional header' is a DATA DIRECTORY table, each DIRECTORY in the table describes function codes of each type of function attribute respectively, the DATA structure is of a 'IMAGE _ DATA _ DIRECTORY' type, wherein the value of the DATA item 'VirtualAddress' describes the addresses of the DATA item, and the second step is carried out.
And secondly, traversing the address space of the binary codes in the binary program based on the effective data, and analyzing various address space information of the binary codes and code structure information of various sections of function codes according to an analysis method of a program structure. The address space information comprises information describing a program code storage structure, such as a start address, a size, an entry point and the like of the binary code, and the code structure information comprises information describing a function code storage structure, such as a start address, a size, an end address and the like of the function code. For example, for a binary code generated by compiling a source code of a PE file (referred to as an owned code in the present invention), the calculation formula of the start address is as follows:
self-code start address = "NT header" address + "PE signature" size + "PE header" size + "PE optional header" size
For the function codes in the PE file, the code structure information of the function codes with different function attributes can be analyzed according to the data contained in each directory in the data directory table. The function code information of different functional attributes in the PE file is mostly contained in a data directory table, where each directory describes the function code information of different functional attributes, respectively, see table 2:
table 2 data directory table of PE files
The groups of entries in the table are defined as follows:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress// start position
DWORD Size// Size
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
And (6) turning to the third step.
Thirdly, based on the address space information and the code structure information, obtaining measurement information of different function code functional attributes in the binary program, constructing a function code set of various functional attributes, and classifying function codes, wherein the main process is as follows:
based on the description information of each directory in the data directory table to each function attribute function code, respectively obtaining the function code segments described by each directory according to the ascending order (or descending order) of the address, and respectively placing the function code segments into the directorynFunction code sets with different function attributesP 0 ~ P n-1,n>0,n∈N +To aggregateP 0 ~ P n-1The following conditions are satisfied:
for arbitrary function code setsP i ,0≤i≤n-1, all functional attributes are presentξ i And, the following is true:P i the middle function codes all have functional attributesξ i ;
Traversing the binary code address space, sequentially identifying each function code segment therein, and regarding any function code segment thereinωIf the condition is satisfied:ω∉P i , 0≤i≤n-1 , i∈N +then handleωPut into a collectionP n ;
The function codes are classified, and the classification method comprises the following steps: division of function code into sets in the binary programP 0 ~ P nThe function codes in the same set are divided into the same type, and the function codes in different sets are of different types;
through the processes, thenFunction code set with different types of function attributesP 0 ~ P n In the PE file, function code divisionnType (b). And turning to the fourth step.
Fourthly, constructing a classification function information tableT 0 ~ T n . The main process is as follows:
by the collectionP 0 ~ P nGenerating feature information of the function code by using data such as code structure information, function attribute measurement information, classification information and the like of the function code, wherein the feature information comprises feature description data such as an initial address, an address interval, a function attribute, a classification type and the like of the function code;
collecting the saidP 0 ~ P nThe characteristic information of each function code is used as table directory entry and is respectively placedWatch with watchT 0 ~ T n And arranges the table according to the ascending (or descending) order of the starting address of the function codeT 0 ~ T n Each directory entry in (a). And turning to the fifth step.
Fifthly, constructing a function distribution tableF. Tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order (or descending order) of the starting address of the function code. And turning to the sixth step.
Sixthly, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setDThe following conditions are satisfied:
Dthe data type of the middle element is a single-byte character type, such as an integer, a letter and the like;
for operands in arbitrary machine instructionsxAll present a unique elementt∈DAnd, the following is true:tis thatxThe operand type of (d);
collectionDThe middle element is called the formalized value of the operand in the function code. And turning to the seventh step.
A seventh step of setting the operand type based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableW. The main process is as follows:
separately selecting function code setsP 0 ~ P n Continuing to execute the following steps for each function code in the selected function code sets;
for arbitrary function codef∈P i , 0≤i≤nWill befReplacing each operand with its formalized value to obtainfForm structure off', true:f ' for is orf f ';
Will be provided withf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
optimization function code mapping tableW. The optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWEach directory entry in;
building a tableWBased on which tables can be quickly retrievedWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
for arbitrary function codef∈P i , 0≤i≤nWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf ' f;
For any elementf '∈P i ', 0≤i≤nWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: presence of unique function code setsP i And a unique elementf∈P i So thatf f'. And (7) rotating to the eighth step.
And eighthly, constructing a function code formalized structure matrix representation form in the binary program according to the function distribution table and the formalized structure of the function code. The main process is as follows:
if the function distribution tableFAll of them sharemThe directory of the entries is,m∈N +, m≥nthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i ∈P j ,0≤i≤m, 0≤j≤nThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i 、dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i ∈N,Nis a natural number;
search tableWAccording to said function code setP 0 ~ P n And function code formalization setP 0' ~ P n ' bijective relationship between function codes, using feature arrays of each function code as matrix coordinates of its formalized structure to assembleP 0' ~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
matrix arrayAThe middle element comprises a setP 0' ~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤j≤nAnd, the following is true:f ij '∈P j ';
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, 、f kl, There is a relationship:f ij ' f ij , f kl ' f kl , 0≤k≤m, 0≤l≤nand is andf ij, 、f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
If, ift j 、t l Are of the same classification type, thenf ij '、f kl ' is located atAThe same column is used; if it ist j 、t l Are of different classification types, thenf ij '、f kl Are respectively located atADifferent columns in (c).
Matrix arrayASee table 3.
TABLE 3 function code formalization of a structural matrix
The invention establishes the formalized structure matrix of the function code in the PE file in the Windows environment by applying the proposed method for generating the formalized structure of the function code in the binary program through the specific example processA。
Matrix arrayABased on function distribution tables, classification function information tables and setsP 0' ~ P n The method comprises the steps of constructing a middle element, analyzing the characteristics of the structure, the distribution rule, the mutual relation and the like of function codes in the binary program by respectively applying two granularities of the whole codes and the function codes, and effectively describing the formalized structure of the whole function codes in the binary program on the whole and function levels.
According to the method for generating the formalized structure of the function code in the binary program, provided by the invention, the characteristics of the structure, the distribution rule, the mutual relation and the like of the function code in the binary program are accurately analyzed from the two levels of the whole code and the function code through a two-dimensional matrix form, and the code structure of the binary program can be effectively analyzed, so that basic support is provided for further detecting and defending the system security risk.
The method for generating the formalized structure of the function code in the binary program can carry out recursive accurate analysis on the function code structure in the binary program through the two granularities of the whole code and the function code, effectively solves the difficult problem that the binary program is difficult to effectively analyze in the current international software safety research, provides practical support for accurately detecting the functional attribute of the binary program, and can effectively resist the safety threat caused by harmful technology, thereby ensuring the safety, reliability and credibility of the system.
The method for generating the function code formalized structure in the binary program can be applied to the binary programs with different formats in various system environments, and has the advantages of wide application range and strong portability.
The method for generating the formalized structure of the function code in the binary program can be applied to the software security fields of binary code structure and vulnerability analysis, virus and malicious program searching and killing, vulnerability mining and utilization, program code defect identification, software homology identification and the like, can provide effective support for the security, reliability and credibility of a system, and is a basis for further developing the security attack and defense technology of the system.
The invention also discloses an electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a program of a method for generating a formalized structure of a function code in a binary program executable by the processor, and the program of the method for generating a formalized structure of a function code in a binary program is configured to implement the method for generating a formalized structure of a function code in a binary program according to the embodiment of the present invention.
The invention also discloses a computer readable storage medium, wherein the storage medium stores a function code formalized structure generation method program in a binary program, and the function code formalized structure generation method program in the binary program realizes the function code formalized structure generation method in the binary program according to the embodiment of the invention when executed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (9)
1. A method for generating a function code formalization structure in a binary program is characterized by comprising the following steps:
s1, carrying out structure analysis to the binary program, identifying and measuring each function code, and obtaining address space informationInformation, code structure information and measurement information of function code function attribute, and function code set with various different function attributes is constructedP 0 ~ P n Sorting the function codes, and continuing to execute step S2;
s2, generating feature information of the function code by using the code structure information, the measurement information of the function attribute of the function code, and the classification information of the function code as basic data, and constructing a classification function information table and a function distribution table, wherein the step S2 specifically includes:
generating feature information of the function code by taking the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data, wherein the feature information of the function code comprises but is not limited to a starting address, an address interval, the function attribute and a classification type of the function code and is used for describing the feature of the function code;
constructing a classification function information tableT 0 ~T n : collecting the saidP 0 ~ P n The characteristic information of each function code is used as table directory entry and put into the table respectivelyT 0 ~ T n And arranging the table according to the ascending or descending order of the starting address of the function codeT 0 ~ T n Each directory entry in;
building a function distribution tableF: tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order or the descending order of the starting address of the function code; continuing to execute step S3;
s3, classifying the operands in the machine instruction, obtaining the formalized numerical values of the operands, replacing the operands contained in the function codes in the function code sets with the formalized numerical values, generating the formalized structure of the function codes, and continuing to execute the step S4;
s4, according to the function distribution table and the formalized structure of the function code, constructing a matrix representation form of the formalized structure of the function code in the binary program.
2. The method for generating a function code formalized structure in a binary program according to claim 1, wherein the step S1 specifically includes:
s1-1, extracting effective data for describing the binary program and the code structure thereof from the file structure description information contained in the binary program;
s1-2, analyzing various address space information of the binary code according to the analyzing method of the program structure based on the effective data, wherein the address space information includes but is not limited to the initial address, the size and the entry point of the binary code, and is used for describing the information of the code storage structure;
s1-3, traversing the binary code address space in the binary program based on the address space information, identifying each segment of function code therein, and obtaining the code structure information of each segment of function code, wherein the code structure information includes but is not limited to the starting address, the size and the ending address of the function code, and is used for describing the information of the function code storage structure;
s1-4, based on the address space information and the code structure information, obtaining the functional attributes of different function codes in the binary program, and constructing various function code sets with different functional attributes:
s1-5, according toP 0 ~ P n The method for classifying the set of function codes comprises the following steps: division of function code into sets in the binary programP 0 ~ P n The function codes in the same set are divided into the same type, and the function codes in different sets are of different types to obtain the function codes in the same setnA type of function code.
3. The method for generating a function code formalized structure in a binary program according to claim 2, wherein the step S1-4 specifically includes:
s1-4-1, based on the address space information and various specific information describing the binary code structure in the code structure information, measuring the relationship between different function codes, obtaining the measurement information of various functional attributes of each section of function code, wherein the measurement information includes but is not limited to the storage interval, similarity and measurement value of the function code, and is used for describing information of various functional characteristics of the function code;
s1-4-2, marking the function attribute of each segment of function code in the binary program according to the measurement information of the function attribute of the function code;
s1-4-3, according to the function attribute of the function code, putting each section of function code in the binary program into the function code setP 0 ~ P n The function code setP 0 ~ P n The following conditions are satisfied:
for arbitrary function code setsP i ,0≤i≤nAll have functional attributesξ i And, the following is true:P i the middle function codes all have functional attributesξ i ;
For any function code in the binary code address spaceωThere is a unique set of function codesP j ,0≤j≤nAnd, the following is true:ω∈P j ;
4. the method for generating a function code formalized structure in a binary program according to claim 2, wherein the step S3 specifically includes:
s3-1, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setD;
S3-2, processing the function code set by unified specificationP 0 ~ P n Generating a set of formalized structures for the function codeP 0' ~ P n ' and establishing a function code mapping table describing bijective relation between themW。
5. The method for generating a function code formalized structure in a binary program according to claim 4, wherein said step S3-1 specifically includes:
building operand type setsDThe operand type setDThe following conditions are satisfied:
the operand type setDThe data type of the middle element is a single-byte character type, including but not limited to an integer and a letter;
for operands in arbitrary machine instructionsxAll present a unique elementt∈DIs established by:tIs thatxThe operand type of (d);
operand type setDThe middle element is called the formalized value of the operand in the function code.
6. The method for generating a function code formalized structure in a binary program according to claim 4, wherein the step S3-2 specifically includes:
s3-2-1, based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableW(ii) a The method mainly comprises the following steps:
S3-2-1-A, respectively selecting function code setsP 0 ~ P n Continuing to execute step S3-2-1-B for each function code in the selected function code sets;
S3-2-1-B, for arbitrary function codef∈P i , 0≤i≤nWill befReplacing each operand with its formalized value to obtainfForm structure off', the relationship holds:f ' for is orf f', continue to execute step S3-2-1-C;
S3-2-1-C, willf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
s3-2-2, optimizing function code mapping tableWThe optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWAll the best of the middle-jiaoRecording items;
s3-2-3, establishing the tableWBased on which tables can be quickly retrievedWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
for arbitrary function codef∈P i , 0≤i≤nWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf ' f;
7. The method for generating a function code formalized structure in a binary program according to claim 6, wherein said step S4 specifically includes:
s4-1, if soSaid function distribution tableFAll of them sharemThe directory of the entries is,m∈N +, m≥nthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i ∈P j ,0≤i≤m, 0≤j≤nThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i 、dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i ∈N,Nis a natural number;
s4-2, search tableWAccording to said function code setP 0~P n And function code formalization setP 0'~ P n ' bijective relationship between, and groupingP 0~P n Taking the feature array of each function code as a setP 0'~ P n ' in which the matrix coordinates of the structure are formalized, to assembleP 0'~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
matrix arrayAThe middle element comprises a setP 0' ~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤j≤nAnd, the following is true:f ij '∈P j ';
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, 、f kl, There is a relationship:f ij ' f ij , f kl ' f kl , 0≤k≤m, 0≤l≤nand is andf ij, 、f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
8. An electronic device comprising at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a function code formalization structure generation method program in a binary program executable by the processor, the function code formalization structure generation method program in a binary program configured to implement a function code formalization structure generation method in a binary program according to any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that a function code formalization structure generation method program in a binary program is stored on the storage medium, and the function code formalization structure generation method program in a binary program realizes a function code formalization structure generation method in a binary program according to any one of claims 1 to 7 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111108278.7A CN113553041B (en) | 2021-09-22 | 2021-09-22 | Method, apparatus and medium for generating function code formalized structure in binary program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111108278.7A CN113553041B (en) | 2021-09-22 | 2021-09-22 | Method, apparatus and medium for generating function code formalized structure in binary program |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113553041A CN113553041A (en) | 2021-10-26 |
CN113553041B true CN113553041B (en) | 2021-12-10 |
Family
ID=78134573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111108278.7A Active CN113553041B (en) | 2021-09-22 | 2021-09-22 | Method, apparatus and medium for generating function code formalized structure in binary program |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113553041B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138914A (en) * | 2015-08-03 | 2015-12-09 | 南京大学 | Software security detection method for code reuse programming |
CN105787368A (en) * | 2016-02-26 | 2016-07-20 | 武汉大学 | ROP defense method and device based on function scrambling |
CN105786512A (en) * | 2016-02-29 | 2016-07-20 | 浪潮(苏州)金融技术服务有限公司 | Program generation method and dimension manager |
CN107943481A (en) * | 2017-05-23 | 2018-04-20 | 清华大学 | C programmer code specification building method based on multi-model |
CN109101235A (en) * | 2018-06-05 | 2018-12-28 | 北京航空航天大学 | A kind of intelligently parsing method of software program |
CN111382439A (en) * | 2020-03-28 | 2020-07-07 | 玉溪师范学院 | Malicious software detection method based on multi-mode deep learning |
CN111930386A (en) * | 2020-09-24 | 2020-11-13 | 武汉精鸿电子技术有限公司 | PATTERN file compiling method and device and electronic equipment |
CN112068883A (en) * | 2020-07-31 | 2020-12-11 | 中国人民解放军战略支援部队信息工程大学 | Method for identifying number of parameters of large binary firmware under simplified instruction set |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9021589B2 (en) * | 2012-06-05 | 2015-04-28 | Los Alamos National Security, Llc | Integrating multiple data sources for malware classification |
US20140180660A1 (en) * | 2012-12-14 | 2014-06-26 | Life Technologies Holdings Pte Limited | Methods and systems for in silico design |
CN103150626B (en) * | 2013-03-01 | 2016-08-03 | 南京理工大学 | BPEL process consistency metric method based on program dependency graph |
US11157250B2 (en) * | 2017-12-05 | 2021-10-26 | Phase Change Software Llc | Inductive equivalence in machine-based instruction editing |
CN108415795B (en) * | 2018-02-12 | 2019-04-05 | 人和未来生物科技(长沙)有限公司 | A kind of container Dockerfile, container mirror image rapid generation and system |
CN111667135B (en) * | 2020-03-25 | 2023-07-28 | 国网天津市电力公司 | Load structure analysis method based on typical feature extraction |
-
2021
- 2021-09-22 CN CN202111108278.7A patent/CN113553041B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105138914A (en) * | 2015-08-03 | 2015-12-09 | 南京大学 | Software security detection method for code reuse programming |
CN105787368A (en) * | 2016-02-26 | 2016-07-20 | 武汉大学 | ROP defense method and device based on function scrambling |
CN105786512A (en) * | 2016-02-29 | 2016-07-20 | 浪潮(苏州)金融技术服务有限公司 | Program generation method and dimension manager |
CN107943481A (en) * | 2017-05-23 | 2018-04-20 | 清华大学 | C programmer code specification building method based on multi-model |
CN109101235A (en) * | 2018-06-05 | 2018-12-28 | 北京航空航天大学 | A kind of intelligently parsing method of software program |
CN111382439A (en) * | 2020-03-28 | 2020-07-07 | 玉溪师范学院 | Malicious software detection method based on multi-mode deep learning |
CN112068883A (en) * | 2020-07-31 | 2020-12-11 | 中国人民解放军战略支援部队信息工程大学 | Method for identifying number of parameters of large binary firmware under simplified instruction set |
CN111930386A (en) * | 2020-09-24 | 2020-11-13 | 武汉精鸿电子技术有限公司 | PATTERN file compiling method and device and electronic equipment |
Non-Patent Citations (3)
Title |
---|
How could Neural Networks understand Programs;Dinglan Peng,Shuxin Zheng;《Proceedings of the 38 th International Conference on Machine》;20210531;全文 * |
二进制代码比对分析研究;郑瀚Andrew.Hann;《https://www.cnblogs.com/LittleHann/p/13451724.html》;20200812;全文 * |
程序分析研究进展;张健;《软件学报》;20190130;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113553041A (en) | 2021-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Venkatraman et al. | A hybrid deep learning image-based analysis for effective malware detection | |
Li et al. | Libd: Scalable and precise third-party library detection in android markets | |
EP3654217B1 (en) | Malware detection | |
US7809670B2 (en) | Classification of malware using clustering that orders events in accordance with the time of occurance | |
Alasmary et al. | Graph-based comparison of IoT and android malware | |
D'Angelo et al. | Effective classification of android malware families through dynamic features and neural networks | |
CN111400719A (en) | Firmware vulnerability distinguishing method and system based on open source component version identification | |
Carlin et al. | The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes | |
RU2722692C1 (en) | Method and system for detecting malicious files in a non-isolated medium | |
Zhu et al. | Android malware detection based on multi-head squeeze-and-excitation residual network | |
Huang et al. | Deep android malware classification with API-based feature graph | |
Kim et al. | Binary executable file similarity calculation using function matching | |
CN112148305A (en) | Application detection method and device, computer equipment and readable storage medium | |
Liu et al. | Functions-based CFG embedding for malware homology analysis | |
Singh et al. | Malware analysis using multiple API sequence mining control flow graph | |
CN113553041B (en) | Method, apparatus and medium for generating function code formalized structure in binary program | |
CN109684844B (en) | Webshell detection method and device, computing equipment and computer-readable storage medium | |
Li et al. | Topology-aware hashing for effective control flow graph similarity analysis | |
Chen et al. | MalCommunity: A graph-based evaluation model for malware family clustering | |
Canfora et al. | How I met your mother?-an empirical study about android malware phylogenesis | |
Ahmad et al. | Android mobile malware classification using a tokenization approach | |
JPWO2019176062A1 (en) | Analyzer, analysis method, and program | |
Qi et al. | A Malware Variant Detection Method Based on Byte Randomness Test. | |
Sharma et al. | A survey of android malware detection strategy and techniques | |
CN111324890A (en) | Processing method, detection method and device of portable executive body file |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |