CN113553041B - Method, apparatus and medium for generating function code formalized structure in binary program - Google Patents

Method, apparatus and medium for generating function code formalized structure in binary program Download PDF

Info

Publication number
CN113553041B
CN113553041B CN202111108278.7A CN202111108278A CN113553041B CN 113553041 B CN113553041 B CN 113553041B CN 202111108278 A CN202111108278 A CN 202111108278A CN 113553041 B CN113553041 B CN 113553041B
Authority
CN
China
Prior art keywords
function
function code
code
information
binary program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111108278.7A
Other languages
Chinese (zh)
Other versions
CN113553041A (en
Inventor
郭昌盛
黄河
许团
聂永春
汪文晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Jiangmin Wangan Technology Co ltd
Original Assignee
Wuhan Jiangmin Wangan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Jiangmin Wangan Technology Co ltd filed Critical Wuhan Jiangmin Wangan Technology Co ltd
Priority to CN202111108278.7A priority Critical patent/CN113553041B/en
Publication of CN113553041A publication Critical patent/CN113553041A/en
Application granted granted Critical
Publication of CN113553041B publication Critical patent/CN113553041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Abstract

The invention provides a method, equipment and a medium for generating a function code formalized structure in a binary program, wherein the method takes function codes in the binary program as analysis basic granularity, classifies the binary program codes based on functional attributes, divides the address space of the binary codes, generates function code sets of various functional attributes, and establishes a classification function information table and a function distribution table for describing the functional code attributes; constructing a machine instruction operand type set, and generating formalized structure sets of various function codes by formalizing operands in the function codes; and establishing a function code formalized structure matrix by using the classification function information table, the function distribution table and the formalized structure of the function codes, wherein the matrix can effectively analyze the formalized structure of the whole function codes in the binary program on the whole level and the function level. The method and the device realize effective analysis of the function code structure in the binary program and provide practical support for accurately detecting the functional attribute of the binary program.

Description

Method, apparatus and medium for generating function code formalized structure in binary program
Technical Field
The present invention relates to the field of information security, and in particular, to a method, an apparatus, and a medium for generating a function code formalized structure in a binary program.
Background
In the modern times, systems such as a cloud computing platform, an internet of things, a mobile network and an industrial internet are rapidly developed, and a binary program is used as an important component in various systems, so that the security, reliability and credibility of the binary program are increasingly important. With the development of information security technology, its countermeasure technology has been rapidly developed, the kinds of harmful technologies that jeopardize network and system security have been increasing, and applied technologies have been continuously innovated. Because the binary program is composed of machine instructions, the existing method is difficult to effectively analyze the code structure of the binary program and cannot effectively resist harmful technologies in the field of information security, so that the security threats of various systems and platforms are increasingly serious. Under the background, binary programs become a hot and difficult problem for international software security research.
In summary, there is no generally applicable method for effectively analyzing the binary program code structure.
Disclosure of Invention
In view of this, the present invention provides a method, an apparatus, and a medium for generating a function code formalized structure in a binary program, which are used to solve the problem that it is difficult to effectively analyze a code structure of the binary program.
The technical scheme of the invention is realized as follows:
the invention discloses a method for generating a function code formalized structure in a binary program, which comprises the following steps:
s1, performing structure analysis on the binary program, identifying and measuring each function code, acquiring address space information, code structure information and measurement information of function code function attributes, constructing function code sets of various different function attributes, classifying the function codes, and continuing to execute the step S2;
s2, using the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data, generating the feature information of the function code, constructing a classification function information table and a function distribution table, and continuing to execute the step S3;
s3, classifying the operands in the machine instruction to obtain the formal numerical values of the operands, replacing the operands contained in the function codes in the function code set with the formal numerical values to generate the formal structure of the function codes, and continuing to execute the step S4;
s4, according to the function distribution table and the formalized structure of the function code, constructing a matrix representation form of the formalized structure of the function code in the binary program.
The invention realizes the effective analysis of the binary program code structure through the whole code and the function code on the whole and local two levels of the binary program by the method.
On the basis of the above technical solution, preferably, step S1 specifically includes:
s1-1, extracting effective data for describing the binary program and the code structure thereof from the file structure description information contained in the binary program;
s1-2, analyzing various address space information of the binary code according to the analyzing method of the program structure based on the effective data, wherein the address space information comprises information of the storage structure of the description code, such as the initial address, the size, the entry point and the like of the binary code;
s1-3, traversing the binary code address space in the binary program based on the address space information, identifying each segment of function code, and obtaining the code structure information of each segment of function code, wherein the code structure information comprises information describing a function code storage structure, such as the starting address, the size, the ending address and the like of the function code;
s1-4, based on the address space information and the code structure information, obtaining the functional attributes of different function codes in the binary program, and constructing various function code sets with different functional attributes:
Figure 283535DEST_PATH_IMAGE001
whereinN +Is a positive integer;
s1-5, according toP 0 ~ P nCollection and separationThe class function code comprises the following classification methods: division of function code into sets in the binary programP 0 ~ P nThe function codes in the same set are divided into the same type, and the function codes in different sets are of different types to obtain the function codes in the same setnA type of function code.
According to the method, the function codes in the binary program are used as the basic analysis granularity, the binary program codes are classified based on the functional attributes, the address space of the binary codes is divided, and the function code sets of various functional attributes are generated.
On the basis of the above technical solution, preferably, step S1-4 specifically includes:
s1-4-1, based on the address space information and various specific information describing the binary code structure in the code structure information, measuring the relationship between different function codes, and obtaining the measurement information of various functional attributes of each section of function code, wherein the measurement information includes information describing various functional characteristics of the function code, such as storage interval, similarity, measurement value and the like of the function code;
s1-4-2, marking the function attribute of each segment of function code in the binary program according to the measurement information of the function attribute of the function code;
s1-4-3, according to the function attribute of the function code, putting each section of function code in the binary program into the function code setP 0 ~ P n The function code setP 0 ~ P n The following conditions are satisfied:
Figure 595482DEST_PATH_IMAGE002
for arbitrary function code setsP i ,0≤i≤nAll have functional attributesξ i And, the following is true:P i the middle function codes all have functional attributesξ i
Figure 906377DEST_PATH_IMAGE003
For any two different sets of function codesP i ,P k ,0≤i,k≤nAll are trueP i P k = Ø;
Figure 708111DEST_PATH_IMAGE004
For any function code in the binary code address spaceωThere is a unique set of function codesP j ,0≤jnAnd, the following is true:ωP j
Figure 53642DEST_PATH_IMAGE005
if all codes in the binary program are expressed asUThen, there are:U=P 0P 1∪…∪P n
according to the method, the function attributes of the function codes in the binary program are obtained, and the function codes in the address space of the binary program are divided according to the function attributes, so that function code sets of various function attributes are generated.
On the basis of the above technical solution, preferably, step S2 specifically includes:
s2-1, using the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data to generate the feature information of the function code, wherein the feature information of the function code comprises the initial address, the address interval, the function attribute, the classification type and other data describing the feature of the function code;
s2-2, constructing a classification function information tableT 0 ~T n : collecting the saidP 0 ~ P nThe characteristic information of each function code is used as table directory entry and put into the table respectivelyT 0 ~ T n And arranged in ascending or descending order according to the starting address of the function codeListsT 0 ~ T n Each directory entry in;
s2-3, constructing a function distribution tableF: tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order or the descending order of the starting address of the function code.
Based on the method, the invention establishes the classification function information table and the function distribution table for describing the function code attribute, and the subsequent steps of the technical scheme of the invention can effectively analyze the code structure in the binary program on the whole and function levels and accurately describe the characteristics of the function codes, such as the distribution rule, the mutual relation and the like.
On the basis of the above technical solution, preferably, step S3 specifically includes:
s3-1, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setD
S3-2, processing the function code set by unified specificationP 0 ~ P n Generating a set of formalized structures for the function codeP 0' ~ P n ' and establishing a function code mapping table describing bijective relation between themW
On the basis of the above technical solution, preferably, step S3-1 specifically includes:
building operand type setsDThe operand type setDThe following conditions are satisfied:
Figure 282629DEST_PATH_IMAGE002
the operand type setDThe data type of the middle element is a single-byte character type, such as an integer, a letter and the like;
Figure 221766DEST_PATH_IMAGE003
for operands in arbitrary machine instructionsxAll have a unique elementVegetable extracttDAnd, the following is true:tis thatxThe operand type of (d);
operand type setDThe middle element is called the formalized value of the operand in the function code.
On the basis of the above technical solution, preferably, step S3-2 specifically includes:
s3-2-1, based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableWThe method mainly comprises the following steps:
S3-2-1-A, respectively selecting function code setsP 0 ~ P n Continuing to execute step S3-2-1-B for each function code in the selected function code sets;
S3-2-1-B, for arbitrary function codefP i , 0≤inWill befReplacing each operand with its formalized value to obtainfForm structure off', the relationship holds:f '
Figure 951825DEST_PATH_IMAGE006
for is orf
Figure 27228DEST_PATH_IMAGE006
f', continue to execute step S3-2-1-C;
S3-2-1-C, willf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
s3-2-2, optimizing function code mapping tableWThe optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWEach directory entry in;
s3-2-3, establishing the tableWBased on the search indexThe index can quickly search the tableWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
Figure 551750DEST_PATH_IMAGE002
for arbitrary function codefP i , 0≤inWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf '
Figure 712604DEST_PATH_IMAGE006
f
Figure 980775DEST_PATH_IMAGE003
For any elementf '∈P i ', 0≤inWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: presence of unique function code setsP i And a unique elementfP i So thatf
Figure 910685DEST_PATH_IMAGE006
f '。
The invention obtains the formalization structure of the function code by constructing the operand type set of the machine instruction through the method, generates the formalization structure set of various function codes and the function code mapping tableWAnd providing a basis for constructing a matrix representation form of a function code formalized structure in the binary program.
On the basis of the above technical solution, preferably, step S4 specifically includes:
s4-1, if the function distribution tableFAll of them sharemThe directory of the entries is,mN +, mnthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i P j ,0≤im, 0≤jnThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i NNis a natural number;
s4-2, search tableWAccording to said function code setP 0~P n And function code formalization setP 0'~ P n ' bijective relationship between, and groupingP 0~P n Taking the feature array of each function code as a setP 0'~ P n ' in which the matrix coordinates of the structure are formalized, to assembleP 0'~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
Figure 871687DEST_PATH_IMAGE007
matrix arrayAThe middle element comprises a setP 0'~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
Figure 785417DEST_PATH_IMAGE008
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤jnAnd, the following is true:f ij '∈P j ';
Figure 591699DEST_PATH_IMAGE009
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, f kl, There is a relationship:f ij '
Figure 641694DEST_PATH_IMAGE006
f ij , f kl '
Figure 508019DEST_PATH_IMAGE006
f kl , 0≤km, 0≤lnand is andf ij, f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
Figure 643465DEST_PATH_IMAGE002
if it isi<kIs established byds i <ds k (ii) a If it isi>kIs established byds i >ds k
Figure 253438DEST_PATH_IMAGE003
If it ist j t l Are of the same classification type, thenf ij '、f kl ' is located atAThe same column is used; if it ist j t l Are of different classification types, thenf ij '、f kl Are respectively located atADifferent columns in (c).
The invention establishes the formalized structure matrix of the function code in the binary program by the methodAMatrix ofABased on function distribution tables, classification function information tables and setsP 0' ~ P n The method comprises the steps of constructing a middle element, analyzing the characteristics of the structure, the distribution rule, the mutual relation and the like of function codes in the binary program by respectively applying two granularities of the whole codes and the function codes, and effectively describing the formalized structure of the whole function codes in the binary program on the whole and function levels.
In a second aspect of the present invention, an electronic device is disclosed, the device comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a program of a method for generating a formalized structure of a function code in a binary program executable by the processor, and the program of the method for generating a formalized structure of a function code in a binary program is configured to implement a method for generating a formalized structure of a function code in a binary program according to the first aspect of the present invention.
In a third aspect of the present invention, a computer-readable storage medium is disclosed, wherein a program of a method for generating a function code formalized structure in a binary program is stored on the storage medium, and when the program of the method for generating a function code formalized structure in a binary program is executed, the method for generating a function code formalized structure in a binary program according to the first aspect of the present invention is implemented.
Compared with the prior art, the method, the system, the equipment and the medium for generating the function code formalized structure in the binary program have the following beneficial effects:
(1) finally, the invention accurately describes the characteristics of the structure, the distribution rule, the mutual relation and the like of the function codes in the binary program in a matrix form, and can effectively analyze the formalized structure of the whole function codes in the binary program on two levels of the whole codes and the function codes;
(2) according to the method for generating the formalized structure of the function code in the binary program, the function code structure in the binary program can be recursively and accurately analyzed through the two granularities of the whole code and the function code, the difficult problem that the binary program is difficult to effectively analyze in the current international software safety research is effectively solved, practical support is provided for accurately detecting the functional attribute of the binary program, and the safety threat caused by harmful technology can be effectively resisted, so that the safety, the reliability and the credibility of a system are guaranteed;
(3) the method for generating the function code formalized structure in the binary program can be applied to the binary programs with different formats in various system environments, and has the advantages of wide application range and strong portability;
(4) the method for generating the formalized structure of the function code in the binary program can be applied to the software security fields of binary code structure and vulnerability analysis, virus and malicious program searching and killing, vulnerability mining and utilization, program code defect identification, software homology identification and the like, thereby providing effective support for the security, reliability and credibility of the system and being the basis for further developing the security attack and defense technology of the system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for generating a function code formalized structure in a binary program according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Examples
The working flow of the method for generating the function code formalized structure in the binary program is shown in figure 1, and the processing steps are described as follows:
since formats of binary programs in different operating system environments are different, taking a specific workflow of generating a matrix representation form of a function code formalized structure in a PE program in a Windows environment as an example, the method for generating a function code formalized structure in a binary program provided by the present invention mainly comprises the following steps:
firstly, extracting effective data for describing the binary program and codes thereof based on the file structure description information contained in the binary program. In the Windows environment, common PE files include formats such as EXE, DLL, OCX, SYS, COM, etc., and the main file structures thereof are shown in table 1 below:
TABLE 1 Overall Structure Table of PE File
Figure 157940DEST_PATH_IMAGE010
The PE file contains a data structure for describing a program structure, so that effective information for describing the program and codes thereof can be extracted. For example, the value of the "e _ lfanew" data entry in the "DOS header" is the RVA address of the NT header. Binary code is typically a section, usually a ". text" section, that is compiled into a PE file. In the "PE OPTIONAL HEADER" section of the "NT HEADER", data items within the data structure IMAGE _ OPTIONAL _ HEADER describe the underlying information of the section in which the binary and code reside. The value of the data item "ImageBase" is the base address of program loading, "the value of sizeoffmage" is the size of the program after loading, "the value of the data item" BaseOfCode "is the starting RVA address of the section where the code is located, and whether the section is a section containing binary codes can be judged according to whether the characteristic value mark of the section is an executable (MEM _ EXECUTE) attribute. The last part of the 'PE optional header' is a DATA DIRECTORY table, each DIRECTORY in the table describes function codes of each type of function attribute respectively, the DATA structure is of a 'IMAGE _ DATA _ DIRECTORY' type, wherein the value of the DATA item 'VirtualAddress' describes the addresses of the DATA item, and the second step is carried out.
And secondly, traversing the address space of the binary codes in the binary program based on the effective data, and analyzing various address space information of the binary codes and code structure information of various sections of function codes according to an analysis method of a program structure. The address space information comprises information describing a program code storage structure, such as a start address, a size, an entry point and the like of the binary code, and the code structure information comprises information describing a function code storage structure, such as a start address, a size, an end address and the like of the function code. For example, for a binary code generated by compiling a source code of a PE file (referred to as an owned code in the present invention), the calculation formula of the start address is as follows:
self-code start address = "NT header" address + "PE signature" size + "PE header" size + "PE optional header" size
For the function codes in the PE file, the code structure information of the function codes with different function attributes can be analyzed according to the data contained in each directory in the data directory table. The function code information of different functional attributes in the PE file is mostly contained in a data directory table, where each directory describes the function code information of different functional attributes, respectively, see table 2:
table 2 data directory table of PE files
Figure 460746DEST_PATH_IMAGE012
The groups of entries in the table are defined as follows:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress// start position
DWORD Size// Size
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
And (6) turning to the third step.
Thirdly, based on the address space information and the code structure information, obtaining measurement information of different function code functional attributes in the binary program, constructing a function code set of various functional attributes, and classifying function codes, wherein the main process is as follows:
Figure 817909DEST_PATH_IMAGE007
based on the description information of each directory in the data directory table to each function attribute function code, respectively obtaining the function code segments described by each directory according to the ascending order (or descending order) of the address, and respectively placing the function code segments into the directorynFunction code sets with different function attributesP 0 ~ P n-1n>0,nN +To aggregateP 0 ~ P n-1The following conditions are satisfied:
Figure 231573DEST_PATH_IMAGE002
for arbitrary function code setsP i ,0≤i≤n-1, all functional attributes are presentξ i And, the following is true:P i the middle function codes all have functional attributesξ i
Figure 990581DEST_PATH_IMAGE003
For any two different sets of function codesP i ,P k ,0≤i,k≤n-1, all are trueP i P k = Ø;
Figure 198709DEST_PATH_IMAGE008
Traversing the binary code address space, sequentially identifying each function code segment therein, and regarding any function code segment thereinωIf the condition is satisfied:ωP i , 0≤in-1 , iN +then handleωPut into a collectionP n
Figure 43168DEST_PATH_IMAGE009
The function codes are classified, and the classification method comprises the following steps: division of function code into sets in the binary programP 0 ~ P nThe function codes in the same set are divided into the same type, and the function codes in different sets are of different types;
through the processes, thenFunction code set with different types of function attributesP 0 ~ P n In the PE file, function code divisionnType (b). And turning to the fourth step.
Fourthly, constructing a classification function information tableT 0 ~ T n . The main process is as follows:
Figure 260523DEST_PATH_IMAGE007
by the collectionP 0 ~ P nGenerating feature information of the function code by using data such as code structure information, function attribute measurement information, classification information and the like of the function code, wherein the feature information comprises feature description data such as an initial address, an address interval, a function attribute, a classification type and the like of the function code;
Figure 139617DEST_PATH_IMAGE008
collecting the saidP 0 ~ P nThe characteristic information of each function code is used as table directory entry and is respectively placedWatch with watchT 0 ~ T n And arranges the table according to the ascending (or descending) order of the starting address of the function codeT 0 ~ T n Each directory entry in (a). And turning to the fifth step.
Fifthly, constructing a function distribution tableF. Tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order (or descending order) of the starting address of the function code. And turning to the sixth step.
Sixthly, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setDThe following conditions are satisfied:
Figure 784225DEST_PATH_IMAGE002
Dthe data type of the middle element is a single-byte character type, such as an integer, a letter and the like;
Figure 115980DEST_PATH_IMAGE003
for operands in arbitrary machine instructionsxAll present a unique elementtDAnd, the following is true:tis thatxThe operand type of (d);
collectionDThe middle element is called the formalized value of the operand in the function code. And turning to the seventh step.
A seventh step of setting the operand type based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableW. The main process is as follows:
Figure 605867DEST_PATH_IMAGE007
separately selecting function code setsP 0 ~ P n Continuing to execute the following steps for each function code in the selected function code sets;
Figure 339468DEST_PATH_IMAGE008
for arbitrary function codefP i , 0≤inWill befReplacing each operand with its formalized value to obtainfForm structure off', true:f '
Figure 154977DEST_PATH_IMAGE006
for is orf
Figure 974029DEST_PATH_IMAGE006
f ';
Figure 533186DEST_PATH_IMAGE009
Will be provided withf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
Figure 121293DEST_PATH_IMAGE013
optimization function code mapping tableW. The optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWEach directory entry in;
Figure 842125DEST_PATH_IMAGE014
building a tableWBased on which tables can be quickly retrievedWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
Figure 148472DEST_PATH_IMAGE002
for arbitrary function codefP i , 0≤inWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf '
Figure 245741DEST_PATH_IMAGE006
f
Figure 953934DEST_PATH_IMAGE003
For any elementf '∈P i ', 0≤inWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: presence of unique function code setsP i And a unique elementfP i So thatf
Figure 845667DEST_PATH_IMAGE006
f'. And (7) rotating to the eighth step.
And eighthly, constructing a function code formalized structure matrix representation form in the binary program according to the function distribution table and the formalized structure of the function code. The main process is as follows:
Figure 639311DEST_PATH_IMAGE007
if the function distribution tableFAll of them sharemThe directory of the entries is,mN +, mnthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i P j ,0≤im, 0≤jnThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i NNis a natural number;
Figure 540271DEST_PATH_IMAGE008
search tableWAccording to said function code setP 0 ~ P n And function code formalization setP 0' ~ P n ' bijective relationship between function codes, using feature arrays of each function code as matrix coordinates of its formalized structure to assembleP 0' ~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
Figure 837391DEST_PATH_IMAGE002
matrix arrayAThe middle element comprises a setP 0' ~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
Figure 165604DEST_PATH_IMAGE003
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤jnAnd, the following is true:f ij '∈P j ';
Figure 180965DEST_PATH_IMAGE004
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, f kl, There is a relationship:f ij '
Figure 885615DEST_PATH_IMAGE006
f ij , f kl '
Figure 568401DEST_PATH_IMAGE006
f kl , 0≤km, 0≤lnand is andf ij, f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
Figure 942881DEST_PATH_IMAGE015
if, ifi<kIs established byds i <ds k (ii) a If it isi>kIs established byds i >ds k
Figure 570172DEST_PATH_IMAGE016
If, ift j t l Are of the same classification type, thenf ij '、f kl ' is located atAThe same column is used; if it ist j t l Are of different classification types, thenf ij '、f kl Are respectively located atADifferent columns in (c).
Matrix arrayASee table 3.
TABLE 3 function code formalization of a structural matrix
Figure 547355DEST_PATH_IMAGE018
The invention establishes the formalized structure matrix of the function code in the PE file in the Windows environment by applying the proposed method for generating the formalized structure of the function code in the binary program through the specific example processA
Matrix arrayABased on function distribution tables, classification function information tables and setsP 0' ~ P n The method comprises the steps of constructing a middle element, analyzing the characteristics of the structure, the distribution rule, the mutual relation and the like of function codes in the binary program by respectively applying two granularities of the whole codes and the function codes, and effectively describing the formalized structure of the whole function codes in the binary program on the whole and function levels.
According to the method for generating the formalized structure of the function code in the binary program, provided by the invention, the characteristics of the structure, the distribution rule, the mutual relation and the like of the function code in the binary program are accurately analyzed from the two levels of the whole code and the function code through a two-dimensional matrix form, and the code structure of the binary program can be effectively analyzed, so that basic support is provided for further detecting and defending the system security risk.
The method for generating the formalized structure of the function code in the binary program can carry out recursive accurate analysis on the function code structure in the binary program through the two granularities of the whole code and the function code, effectively solves the difficult problem that the binary program is difficult to effectively analyze in the current international software safety research, provides practical support for accurately detecting the functional attribute of the binary program, and can effectively resist the safety threat caused by harmful technology, thereby ensuring the safety, reliability and credibility of the system.
The method for generating the function code formalized structure in the binary program can be applied to the binary programs with different formats in various system environments, and has the advantages of wide application range and strong portability.
The method for generating the formalized structure of the function code in the binary program can be applied to the software security fields of binary code structure and vulnerability analysis, virus and malicious program searching and killing, vulnerability mining and utilization, program code defect identification, software homology identification and the like, can provide effective support for the security, reliability and credibility of a system, and is a basis for further developing the security attack and defense technology of the system.
The invention also discloses an electronic device, comprising: at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a program of a method for generating a formalized structure of a function code in a binary program executable by the processor, and the program of the method for generating a formalized structure of a function code in a binary program is configured to implement the method for generating a formalized structure of a function code in a binary program according to the embodiment of the present invention.
The invention also discloses a computer readable storage medium, wherein the storage medium stores a function code formalized structure generation method program in a binary program, and the function code formalized structure generation method program in the binary program realizes the function code formalized structure generation method in the binary program according to the embodiment of the invention when executed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A method for generating a function code formalization structure in a binary program is characterized by comprising the following steps:
s1, carrying out structure analysis to the binary program, identifying and measuring each function code, and obtaining address space informationInformation, code structure information and measurement information of function code function attribute, and function code set with various different function attributes is constructedP 0 ~ P n Sorting the function codes, and continuing to execute step S2;
s2, generating feature information of the function code by using the code structure information, the measurement information of the function attribute of the function code, and the classification information of the function code as basic data, and constructing a classification function information table and a function distribution table, wherein the step S2 specifically includes:
generating feature information of the function code by taking the code structure information, the measurement information of the function attribute of the function code and the classification information of the function code as basic data, wherein the feature information of the function code comprises but is not limited to a starting address, an address interval, the function attribute and a classification type of the function code and is used for describing the feature of the function code;
constructing a classification function information tableT 0 ~T n : collecting the saidP 0 ~ P n The characteristic information of each function code is used as table directory entry and put into the table respectivelyT 0 ~ T n And arranging the table according to the ascending or descending order of the starting address of the function codeT 0 ~ T n Each directory entry in;
building a function distribution tableF: tabulating the classification function informationT 0 ~ T n Put the middle directory entry into the tableFIn the method, each directory entry is arranged according to the ascending order or the descending order of the starting address of the function code; continuing to execute step S3;
s3, classifying the operands in the machine instruction, obtaining the formalized numerical values of the operands, replacing the operands contained in the function codes in the function code sets with the formalized numerical values, generating the formalized structure of the function codes, and continuing to execute the step S4;
s4, according to the function distribution table and the formalized structure of the function code, constructing a matrix representation form of the formalized structure of the function code in the binary program.
2. The method for generating a function code formalized structure in a binary program according to claim 1, wherein the step S1 specifically includes:
s1-1, extracting effective data for describing the binary program and the code structure thereof from the file structure description information contained in the binary program;
s1-2, analyzing various address space information of the binary code according to the analyzing method of the program structure based on the effective data, wherein the address space information includes but is not limited to the initial address, the size and the entry point of the binary code, and is used for describing the information of the code storage structure;
s1-3, traversing the binary code address space in the binary program based on the address space information, identifying each segment of function code therein, and obtaining the code structure information of each segment of function code, wherein the code structure information includes but is not limited to the starting address, the size and the ending address of the function code, and is used for describing the information of the function code storage structure;
s1-4, based on the address space information and the code structure information, obtaining the functional attributes of different function codes in the binary program, and constructing various function code sets with different functional attributes:
Figure DEST_PATH_IMAGE001
whereinN +Is a positive integer;
s1-5, according toP 0 ~ P n The method for classifying the set of function codes comprises the following steps: division of function code into sets in the binary programP 0 ~ P n The function codes in the same set are divided into the same type, and the function codes in different sets are of different types to obtain the function codes in the same setnA type of function code.
3. The method for generating a function code formalized structure in a binary program according to claim 2, wherein the step S1-4 specifically includes:
s1-4-1, based on the address space information and various specific information describing the binary code structure in the code structure information, measuring the relationship between different function codes, obtaining the measurement information of various functional attributes of each section of function code, wherein the measurement information includes but is not limited to the storage interval, similarity and measurement value of the function code, and is used for describing information of various functional characteristics of the function code;
s1-4-2, marking the function attribute of each segment of function code in the binary program according to the measurement information of the function attribute of the function code;
s1-4-3, according to the function attribute of the function code, putting each section of function code in the binary program into the function code setP 0 ~ P n The function code setP 0 ~ P n The following conditions are satisfied:
Figure 423096DEST_PATH_IMAGE002
for arbitrary function code setsP i ,0≤i≤nAll have functional attributesξ i And, the following is true:P i the middle function codes all have functional attributesξ i
Figure DEST_PATH_IMAGE003
For any two different sets of function codesP i ,P k ,0≤i,k≤nAll are trueP i P k = Ø;
Figure 667520DEST_PATH_IMAGE004
For any function code in the binary code address spaceωThere is a unique set of function codesP j ,0≤j≤nAnd, the following is true:ωP j
Figure DEST_PATH_IMAGE005
if all codes in the binary program are expressed asUThen, there are:U=P 0P 1∪…∪P n
4. the method for generating a function code formalized structure in a binary program according to claim 2, wherein the step S3 specifically includes:
s3-1, classifying the operands in the machine instruction according to the function attribute of the machine instruction, and constructing an operand type setD
S3-2, processing the function code set by unified specificationP 0 ~ P n Generating a set of formalized structures for the function codeP 0' ~ P n ' and establishing a function code mapping table describing bijective relation between themW
5. The method for generating a function code formalized structure in a binary program according to claim 4, wherein said step S3-1 specifically includes:
building operand type setsDThe operand type setDThe following conditions are satisfied:
Figure 612212DEST_PATH_IMAGE002
the operand type setDThe data type of the middle element is a single-byte character type, including but not limited to an integer and a letter;
Figure 57099DEST_PATH_IMAGE003
for operands in arbitrary machine instructionsxAll present a unique elementtDIs established by:tIs thatxThe operand type of (d);
operand type setDThe middle element is called the formalized value of the operand in the function code.
6. The method for generating a function code formalized structure in a binary program according to claim 4, wherein the step S3-2 specifically includes:
s3-2-1, based on the operand type setDAggregating the function codeP 0 ~ P n Various operands contained in the middle function code are respectively replaced by the formalized numerical values to obtain the formalized set of the function codeP 0' ~ P n ' and establishing a function code mapping tableW(ii) a The method mainly comprises the following steps:
S3-2-1-A, respectively selecting function code setsP 0 ~ P n Continuing to execute step S3-2-1-B for each function code in the selected function code sets;
S3-2-1-B, for arbitrary function codefP i , 0≤inWill befReplacing each operand with its formalized value to obtainfForm structure off', the relationship holds:f '
Figure 974633DEST_PATH_IMAGE006
for is orf
Figure 493470DEST_PATH_IMAGE006
f', continue to execute step S3-2-1-C;
S3-2-1-C, willf' put into CollectionP i ' and will vector: (P i , f, P i ', f') as a table entry, put inWPerforming the following steps;
s3-2-2, optimizing function code mapping tableWThe optimization method comprises the following steps: arranging tables in ascending or descending order according to the starting address of the function codeWAll the best of the middle-jiaoRecording items;
s3-2-3, establishing the tableWBased on which tables can be quickly retrievedWIn describing a functional code formalized collectionP 0'~ P n ' with the function code setP 0 ~ P n The directory entries of the bijective relationship satisfy the following conditions:
Figure 405800DEST_PATH_IMAGE002
for arbitrary function codefP i , 0≤inWatch, watchWIn which there is a unique directory entry (P i , f, P i ', f') which describes the bijective relationship: there is a unique setP i ', and the only elementf '∈P i ', such thatf '
Figure 275667DEST_PATH_IMAGE006
f
Figure 70928DEST_PATH_IMAGE003
For any elementf '∈P i ', 0≤inWatch, watchWHas a unique directory entry therein: (P i , f, P i ', f') which describes the bijective relationship: presence of unique function code setsP i And a unique elementfP i So thatf
Figure 709851DEST_PATH_IMAGE006
f '。
7. The method for generating a function code formalized structure in a binary program according to claim 6, wherein said step S4 specifically includes:
s4-1, if soSaid function distribution tableFAll of them sharemThe directory of the entries is,mN +, mnthen go through themThe item directory extracts the address interval and the classification type of each function code in the binary program to form a feature array of each function code; for arbitrary function codef i P j ,0≤im, 0≤jnThe feature array is represented as: ([ds i ,dd i ],t j ) Wherein element [ 2 ]ds i ,dd i ]、t j Are respectively asf i The address range and the class type of the packet,ds i dd i are respectively asf i The start address and the end address of the memory,ds i ,dd i NNis a natural number;
s4-2, search tableWAccording to said function code setP 0~P n And function code formalization setP 0'~ P n ' bijective relationship between, and groupingP 0~P n Taking the feature array of each function code as a setP 0'~ P n ' in which the matrix coordinates of the structure are formalized, to assembleP 0'~ P n ' construction of Medium elementmLine ofn+1 column matrixASaid matrixAThe following conditions are satisfied:
Figure DEST_PATH_IMAGE007
matrix arrayAThe middle element comprises a setP 0' ~ P n ' all elements of, andAthe middle elements are different except for the front edge and the back edge of the empty collection;
Figure 934028DEST_PATH_IMAGE008
for matrixAAny of the elements off ij ', if it is not in the far side, then there is a unique setP j ',0≤jnAnd, the following is true:f ij '∈P j ';
Figure DEST_PATH_IMAGE009
for matrixAAny two off ij '、f kl If they are coded with functionsf ij, f kl, There is a relationship:f ij '
Figure 730339DEST_PATH_IMAGE006
f ij , f kl '
Figure 753789DEST_PATH_IMAGE006
f kl , 0≤km, 0≤lnand is andf ij, f kl, the characteristic data sets are respectively ([ 2 ]ds i ,dd i ],t j )、([ds k ,dd k ],t l ) Then the following relationship holds:
Figure 683437DEST_PATH_IMAGE002
if it isi<kIs established byds i <ds k (ii) a If it isi>kIs established byds i >ds k
Figure 766931DEST_PATH_IMAGE003
If it ist j t l Are of the same classification type, thenf ij '、f kl ' is located atAThe same column is used; if it ist j t l Are of different classification types, thenf ij '、f kl Are respectively located atADifferent columns in (c).
8. An electronic device comprising at least one processor, at least one memory, a communication interface, and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the memory stores a function code formalization structure generation method program in a binary program executable by the processor, the function code formalization structure generation method program in a binary program configured to implement a function code formalization structure generation method in a binary program according to any one of claims 1 to 7.
9. A computer-readable storage medium, characterized in that a function code formalization structure generation method program in a binary program is stored on the storage medium, and the function code formalization structure generation method program in a binary program realizes a function code formalization structure generation method in a binary program according to any one of claims 1 to 7 when executed.
CN202111108278.7A 2021-09-22 2021-09-22 Method, apparatus and medium for generating function code formalized structure in binary program Active CN113553041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111108278.7A CN113553041B (en) 2021-09-22 2021-09-22 Method, apparatus and medium for generating function code formalized structure in binary program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111108278.7A CN113553041B (en) 2021-09-22 2021-09-22 Method, apparatus and medium for generating function code formalized structure in binary program

Publications (2)

Publication Number Publication Date
CN113553041A CN113553041A (en) 2021-10-26
CN113553041B true CN113553041B (en) 2021-12-10

Family

ID=78134573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111108278.7A Active CN113553041B (en) 2021-09-22 2021-09-22 Method, apparatus and medium for generating function code formalized structure in binary program

Country Status (1)

Country Link
CN (1) CN113553041B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138914A (en) * 2015-08-03 2015-12-09 南京大学 Software security detection method for code reuse programming
CN105787368A (en) * 2016-02-26 2016-07-20 武汉大学 ROP defense method and device based on function scrambling
CN105786512A (en) * 2016-02-29 2016-07-20 浪潮(苏州)金融技术服务有限公司 Program generation method and dimension manager
CN107943481A (en) * 2017-05-23 2018-04-20 清华大学 C programmer code specification building method based on multi-model
CN109101235A (en) * 2018-06-05 2018-12-28 北京航空航天大学 A kind of intelligently parsing method of software program
CN111382439A (en) * 2020-03-28 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-mode deep learning
CN111930386A (en) * 2020-09-24 2020-11-13 武汉精鸿电子技术有限公司 PATTERN file compiling method and device and electronic equipment
CN112068883A (en) * 2020-07-31 2020-12-11 中国人民解放军战略支援部队信息工程大学 Method for identifying number of parameters of large binary firmware under simplified instruction set

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9021589B2 (en) * 2012-06-05 2015-04-28 Los Alamos National Security, Llc Integrating multiple data sources for malware classification
US20140180660A1 (en) * 2012-12-14 2014-06-26 Life Technologies Holdings Pte Limited Methods and systems for in silico design
CN103150626B (en) * 2013-03-01 2016-08-03 南京理工大学 BPEL process consistency metric method based on program dependency graph
US11157250B2 (en) * 2017-12-05 2021-10-26 Phase Change Software Llc Inductive equivalence in machine-based instruction editing
CN108415795B (en) * 2018-02-12 2019-04-05 人和未来生物科技(长沙)有限公司 A kind of container Dockerfile, container mirror image rapid generation and system
CN111667135B (en) * 2020-03-25 2023-07-28 国网天津市电力公司 Load structure analysis method based on typical feature extraction

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138914A (en) * 2015-08-03 2015-12-09 南京大学 Software security detection method for code reuse programming
CN105787368A (en) * 2016-02-26 2016-07-20 武汉大学 ROP defense method and device based on function scrambling
CN105786512A (en) * 2016-02-29 2016-07-20 浪潮(苏州)金融技术服务有限公司 Program generation method and dimension manager
CN107943481A (en) * 2017-05-23 2018-04-20 清华大学 C programmer code specification building method based on multi-model
CN109101235A (en) * 2018-06-05 2018-12-28 北京航空航天大学 A kind of intelligently parsing method of software program
CN111382439A (en) * 2020-03-28 2020-07-07 玉溪师范学院 Malicious software detection method based on multi-mode deep learning
CN112068883A (en) * 2020-07-31 2020-12-11 中国人民解放军战略支援部队信息工程大学 Method for identifying number of parameters of large binary firmware under simplified instruction set
CN111930386A (en) * 2020-09-24 2020-11-13 武汉精鸿电子技术有限公司 PATTERN file compiling method and device and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
How could Neural Networks understand Programs;Dinglan Peng,Shuxin Zheng;《Proceedings of the 38 th International Conference on Machine》;20210531;全文 *
二进制代码比对分析研究;郑瀚Andrew.Hann;《https://www.cnblogs.com/LittleHann/p/13451724.html》;20200812;全文 *
程序分析研究进展;张健;《软件学报》;20190130;全文 *

Also Published As

Publication number Publication date
CN113553041A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
Venkatraman et al. A hybrid deep learning image-based analysis for effective malware detection
Li et al. Libd: Scalable and precise third-party library detection in android markets
EP3654217B1 (en) Malware detection
US7809670B2 (en) Classification of malware using clustering that orders events in accordance with the time of occurance
Alasmary et al. Graph-based comparison of IoT and android malware
D'Angelo et al. Effective classification of android malware families through dynamic features and neural networks
CN111400719A (en) Firmware vulnerability distinguishing method and system based on open source component version identification
Carlin et al. The effects of traditional anti-virus labels on malware detection using dynamic runtime opcodes
RU2722692C1 (en) Method and system for detecting malicious files in a non-isolated medium
Zhu et al. Android malware detection based on multi-head squeeze-and-excitation residual network
Huang et al. Deep android malware classification with API-based feature graph
Kim et al. Binary executable file similarity calculation using function matching
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
Liu et al. Functions-based CFG embedding for malware homology analysis
Singh et al. Malware analysis using multiple API sequence mining control flow graph
CN113553041B (en) Method, apparatus and medium for generating function code formalized structure in binary program
CN109684844B (en) Webshell detection method and device, computing equipment and computer-readable storage medium
Li et al. Topology-aware hashing for effective control flow graph similarity analysis
Chen et al. MalCommunity: A graph-based evaluation model for malware family clustering
Canfora et al. How I met your mother?-an empirical study about android malware phylogenesis
Ahmad et al. Android mobile malware classification using a tokenization approach
JPWO2019176062A1 (en) Analyzer, analysis method, and program
Qi et al. A Malware Variant Detection Method Based on Byte Randomness Test.
Sharma et al. A survey of android malware detection strategy and techniques
CN111324890A (en) Processing method, detection method and device of portable executive body file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant