CN114357454A - Binary executable file dependency library analysis method and device, electronic equipment and storage medium - Google Patents

Binary executable file dependency library analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114357454A
CN114357454A CN202111518401.2A CN202111518401A CN114357454A CN 114357454 A CN114357454 A CN 114357454A CN 202111518401 A CN202111518401 A CN 202111518401A CN 114357454 A CN114357454 A CN 114357454A
Authority
CN
China
Prior art keywords
fingerprint
executable file
binary executable
information
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111518401.2A
Other languages
Chinese (zh)
Inventor
陈灵锋
肖新光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Antiy Technology Group Co Ltd
Original Assignee
Antiy Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Antiy Technology Group Co Ltd filed Critical Antiy Technology Group Co Ltd
Priority to CN202111518401.2A priority Critical patent/CN114357454A/en
Publication of CN114357454A publication Critical patent/CN114357454A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The embodiment of the invention discloses a method and a device for analyzing a dependence library of a binary executable file and electronic equipment, and relates to the technical field of network security. The method comprises the following steps: performing feature extraction on a binary executable file to be analyzed to obtain feature information of the binary executable file; matching the characteristic information of the binary executable file with a fingerprint characteristic database to determine corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library; and determining corresponding attribute information of the dependency library according to the fingerprint features of the dependency library. Through the steps of the method, the method can simply and conveniently identify the dependency library information used by the binary executable file from the binary executable file in a high-efficiency manner under the condition of not providing any source code information, thereby being convenient for analyzing the dependency library information used by the binary executable file.

Description

Binary executable file dependency library analysis method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of network security, in particular to a method and a device for analyzing a dependence library of a binary executable file, electronic equipment and a storage medium.
Background
In the process of software development, the use of open source codes improves the software development efficiency and reduces the development cost, but a large number of defects and even security holes exist in the open source software and are introduced together, so that huge risks are brought to the software. Through the analysis of the binary executable file composition, the used dependency base and the corresponding version can be identified, so that the dependency base with security vulnerabilities can be found.
The inventor finds out in the process of realizing the invention: most of the source code information will be lost after compiling a high-level language into a binary executable file, resulting in greater difficulty in parsing out the used dependency library.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and an electronic device for analyzing a dependency library of a binary executable file, which are convenient for analyzing dependency library information used by the binary executable file.
In order to achieve the purpose of the invention, the following technical scheme is adopted:
in a first aspect, an embodiment of the present invention provides a method for analyzing a dependency library of a binary executable file, where the method includes:
performing feature extraction on a binary executable file to be analyzed to obtain feature information of the binary executable file;
matching the characteristic information of the binary executable file with a fingerprint characteristic database to determine corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library;
and determining corresponding attribute information of the dependency library according to the fingerprint features of the dependency library.
With reference to the first aspect, in a first implementation manner of the first aspect, the feature information of the binary executable file includes: one or more of;
the matching of the feature information of the binary executable file with the fingerprint feature database and the determination of the corresponding dependent database fingerprint features comprise: performing global matching on any feature information of the binary executable file and the fingerprint features in the fingerprint feature database;
and if the matching degree of any feature information of the binary executable file and the first fingerprint feature in the fingerprint feature database reaches a preset global matching degree threshold value, determining the first fingerprint feature as a dependent library fingerprint feature corresponding to the binary executable file.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the feature information of the binary executable file and the fingerprint features in the fingerprint feature database respectively include: derived information, a visible string, and/or an intermediate language sequence.
With reference to the first aspect, the first and second implementation manners of the first aspect, and in a third implementation manner of the first aspect, the determining, according to the dependent library fingerprint feature, corresponding dependent library attribute information includes: and determining the corresponding dependent library name identification according to the dependent library fingerprint characteristics.
With reference to the first aspect, the first, second and third implementation manners of the first aspect, in a fourth implementation manner of the first aspect, the feature information includes: a visible string;
the determining the corresponding dependency library attribute information according to the dependency library fingerprint features further comprises: after determining the corresponding dependent library name identification according to the dependent library fingerprint characteristics, matching the visible character string with the version information table of the corresponding dependent library to determine the version information of the dependent library corresponding to the dependent library name identification; the version information table of the dependency library is preset with a dependency library name identifier and a corresponding relation between the visible character string and the dependency library version information.
With reference to the first aspect, in a fifth implementation manner of the first aspect, before performing feature extraction on the binary executable file to be analyzed, the method further includes: acquiring a file header of a binary executable file to be analyzed;
and determining the type of the binary executable file to be analyzed according to the file header.
With reference to the first aspect, in a sixth implementation manner of the first aspect, the performing feature extraction on the binary executable file to be analyzed to obtain feature information of the binary executable file includes:
analyzing a data segment and/or a code segment, export information and processor architecture information of the binary executable file according to the type of the binary executable file;
if the data segment is obtained through analysis, feature extraction is carried out on the data segment to obtain a visible character string and/or derived information;
if the code segment is obtained through analysis, disassembling the code segment according to the processor architecture information to obtain an assembly instruction; the assembler instruction comprises: assembling the code;
translating the assembly code into an intermediate language independent of the processor architecture;
establishing a control flow graph based on the intermediate language;
and performing static program analysis on the control flow graph to obtain an intermediate language sequence.
With reference to the first aspect, any one of the first to sixth embodiments of the first aspect, in a seventh embodiment of the first aspect, the method further comprises: analyzing the acquired open source software libraries of different versions as sample files;
extracting fingerprint features in the sample file;
establishing a fingerprint characteristic database according to the fingerprint characteristics in the sample file; the fingerprint features include: derived information, visible strings, and/or intermediate languages.
With reference to the first aspect, any one of the first to seventh embodiments of the first aspect, in an eighth embodiment of the first aspect, the creating a fingerprint feature database based on fingerprint features in the sample file includes: similarity calculation is carried out on the same type of fingerprint features in different sample files based on a dimensionality reduction algorithm, and the same type of fingerprint feature values shared by the different sample files are obtained;
and storing the common fingerprint characteristic values of the same type of different sample files to form the fingerprint characteristic database.
In a second aspect, an apparatus for analyzing a dependency library of a binary executable file according to an embodiment of the present invention includes: the extraction program module is used for extracting the characteristics of the binary executable file to be analyzed to obtain the characteristic information of the binary executable file; the matching program module is used for matching the characteristic information of the binary executable file with a fingerprint characteristic database and determining the corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library; and the determining program module is used for determining corresponding attribute information of the dependency library according to the fingerprint characteristics of the dependency library.
In a third aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; a memory; the memory stores one or more executable programs, and the one or more processors read the executable program codes stored in the memory to run a program corresponding to the executable program codes, so as to execute the binary executable file dependent library analysis method according to any one of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the method for analyzing a dependent library of binary executable files according to any one of the first aspect.
The embodiment of the invention provides a binary executable file dependency library analysis method, a binary executable file dependency library analysis device, electronic equipment and a storage medium, wherein the method comprises the following steps: performing feature extraction on a binary executable file to be analyzed to obtain feature information of the binary executable file; matching the characteristic information of the binary executable file with a fingerprint characteristic database to determine corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library; and determining corresponding attribute information of the dependency library according to the fingerprint features of the dependency library. Through the steps of the method, the dependency library information used by the binary executable file can be simply and conveniently identified from the binary executable file in a high-efficiency manner under the condition that no source code information is provided, so that the technical scheme provided by the invention is convenient for analyzing the dependency library information used by the binary executable file.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating an embodiment of a method for analyzing a dependency library of a binary executable file according to the present invention;
FIG. 2 is a flowchart illustrating another embodiment of a method for analyzing a dependency library of a binary executable according to the present invention;
FIG. 3 is a flowchart illustrating a method for analyzing a dependency library of a binary executable according to another embodiment of the present invention;
FIG. 4 is a block diagram illustrating an exemplary architecture of an exemplary embodiment of an apparatus for analyzing a dependency library of a binary executable file;
FIG. 5 is a block diagram illustrating an architecture of an apparatus for analyzing a dependency library of a binary executable according to another embodiment of the present invention;
FIG. 6 is a block diagram illustrating an architecture of an exemplary embodiment of an apparatus for analyzing a dependency library of a binary executable file according to the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a flowchart illustrating an embodiment of a method for analyzing a dependency library of a binary executable file according to the present invention; referring to fig. 1, the binary executable file dependency library analysis method provided in the embodiment of the present invention may be applied to a dependency library analysis scenario, and may also be used to find a dependency library with a security vulnerability, so as to improve the security of developed software to a certain extent; the dependent library refers to an open source library of dependencies. It should be noted that the method may be solidified in a certain manufactured product in the form of software, and when a user uses the product, the method flow of the present application may be reproduced.
For example, the method for analyzing the binary executable file dependent library provided by this embodiment is installed on an electronic device such as a computer or a mobile phone in the form of application software, and when a user triggers the product to run on the computer or the mobile phone, the method previously solidified in the electronic product is mechanically replayed and reproduced to perform the method steps of the binary executable file dependent library analysis in this embodiment.
The binary executable file dependent library analysis method may include:
and 110, performing feature extraction on the binary executable file to be analyzed to obtain feature information of the binary executable file.
The characteristic information of the binary executable file comprises: derived information, a visible string, and/or an intermediate language sequence. The intermediate language sequence includes one or more intermediate languages, and the Intermediate Language (IL) is a term in computer technology, and is simply an equivalent internal representation code of a source program that is easily translated into a target program between a source language (a high-level language, such as C #, VB, F #, etc.) and a target language (a computer executable machine language, such as a binary executable file), and the intermediate language plays a role of a bridge in the compiling process. The derived information includes: derived functions, etc.
120. Matching the characteristic information of the binary executable file with a fingerprint characteristic database to determine corresponding dependent database fingerprint characteristics; the fingerprint feature database stores fingerprint features for identifying dependency library attribute information.
The fingerprint feature herein is not a biometric fingerprint but characteristic information for identifying the type of object, for example, depending on library attribute information. The dependent library attribute information may include: dependent library names (which dependent libraries may be determined) and/or dependent library version information.
130. And determining corresponding attribute information of the dependency library according to the fingerprint features of the dependency library.
According to the method for analyzing the dependency library of the binary executable file, provided by the embodiment of the invention, through the steps 110 to 130 of the method, the dependency library information used by the binary executable file can be simply and conveniently identified from the binary executable file in a high-efficiency manner under the condition that no source code information is provided, so that the technical scheme provided by the invention is convenient for analyzing the dependency library information used by the binary executable file.
In some embodiments, the feature information of the binary executable file comprises: one or more of;
referring to fig. 2, the step of determining the corresponding dependent library fingerprint feature (step 120) according to the matching of the feature information of the binary executable file and the fingerprint feature database includes the steps of: 121. performing global matching on any feature information of the binary executable file and the fingerprint features in the fingerprint feature database; 122. and if the matching degree of any feature information of the binary executable file and the first fingerprint feature in the fingerprint feature database reaches a preset global matching degree threshold value, determining the first fingerprint feature as a dependent library fingerprint feature corresponding to the binary executable file.
The fingerprint features in the fingerprint feature database respectively comprise: derived information, a visible string, and/or an intermediate language sequence.
Specifically, the preset global matching degree threshold is 30%, 40%, 50%, 60%, 70%, 80%, 90% or 100%.
Illustratively, the preset global matching threshold is 30%, and if 30% of certain feature information to be detected matches a corresponding fingerprint feature of a third-party open source library, it may be determined that the binary executable file to be detected uses the third-party open source library, so that the used dependency library may be directly analyzed from the binary executable file.
Specifically, for the condition that the feature information obtained by analyzing the binary executable file is derived information, matching the derived information with the fingerprint features in the fingerprint feature database according to the matching method, and if the derived information of the binary executable file to be analyzed matches the fingerprint features of the derived information in the fingerprint feature database of the previous open source library, determining that the binary executable file to be detected uses the open source library.
Similarly, for the condition that the feature information obtained by analyzing the binary executable file is a visible character string, matching the visible character string of the binary executable file to be checked with the fingerprint features in the fingerprint feature database, and if the visible character string fingerprint features of the previous open source library are matched, determining that the binary executable file to be detected uses the open source library.
And if the intermediate language sequence of the binary executable file to be analyzed is 30% matched with the intermediate language sequence fingerprint characteristics in the fingerprint characteristic database, determining that the intermediate language sequence fingerprint characteristics are matched with the intermediate language sequence fingerprint characteristics of an open source library, and determining that the binary executable file to be detected uses the open source library.
For each type of feature information, if some type of feature information is not analyzed from the binary executable file, for example, the binary executable file does not analyze the derived information, skipping the step of matching the fingerprint feature database based on the derived information type of feature information; if the binary executable file does not analyze the visible character string, skipping the step of matching with the fingerprint feature database based on the feature information; and if the binary executable file does not analyze the intermediate language sequence, skipping the step of matching with the fingerprint feature database based on the feature information. Thus, the matching detection efficiency can be improved.
Specifically, for the case that the dependent library attribute information includes a dependent library name, referring to fig. 3, the determining the corresponding dependent library attribute information according to the dependent library fingerprint feature (step 130) includes: 131. and determining the corresponding dependent library name identification according to the dependent library fingerprint characteristics. By determining the dependent library name identifier, the dependent library used by the binary executable file can be determined according to the dependent library name identifier.
Furthermore, a version information table of a dependency library is also maintained in the fingerprint feature database; the characteristic information includes: a visible string; the version information table of the dependency library is preset with a name identifier of the dependency library and a corresponding relation between the visible character string and the version information of the dependency library.
Said determining (step 130) corresponding dependency library attribute information from said dependency library fingerprint features further comprises: and 132, after determining the corresponding dependent library name identifier according to the dependent library fingerprint characteristics (step 131), matching the visible character string with the version information table of the corresponding dependent library to determine the dependent library version information corresponding to the dependent library name identifier.
In this embodiment, the open source dependency library used by the binary executable file to be detected is determined according to the traversal matching of the various types of feature information, and after the name identifier of the dependency library is determined, corresponding version feature information is obtained from the version information table of the preset dependency library in a matching manner according to the visible character string information in the binary executable file to be detected; and if the matching is positive, extracting the version information of the dependency library. Therefore, the invention can directly detect the attribute information such as the library name, the version and the like of the dependency library used by the binary executable file accurately and efficiently from the binary executable file under the condition of not providing any source code information.
Furthermore, after the open source dependency library used by the binary executable file to be detected and the version corresponding to the open source dependency library are obtained, historical vulnerability information of the open source dependency libraries can be conveniently inquired, and then the security vulnerability possibly existing in the execution process of the current binary executable file is found, and corresponding countermeasures are taken, so that the security of the software is improved.
It can be understood that files have different naming formats depending on the system or platform on which the target file operates; for example, for the microsoft Windows operating system, it is common to pe (portable executable) format files; while for Unix and X86-64 Linux, ELF (Executable and Linable Format, ELF) Format files are available.
Therefore, in order to analyze the dependency library used by the binary executable file more accurately, in some embodiments, before performing feature extraction on the binary executable file to be analyzed (step 110), the method further includes: acquiring a file header of a binary executable file to be analyzed; and determining the type of the binary executable file to be analyzed according to the file header.
In this embodiment, whether the binary executable is a PE file or an ELF file may be identified by calling a third party compilation library libmagic.
Specifically, the extracting features of the binary executable file to be analyzed to obtain the feature information of the binary executable file (step 110) includes:
analyzing a data segment and/or a code segment, export information and processor architecture information of the binary executable file according to the type of the binary executable file;
if the data segment is obtained through analysis, feature extraction is carried out on the data segment to obtain a visible character string and/or derived information; if the binary file is analyzed to not comprise the data segment, the step is skipped.
Wherein, extracting the visible character string in the data segment comprises: ascii strings and Usc-2 strings.
In some embodiments, the visible character string carries a virtual address identifier, and after obtaining the visible character string, the method further includes: sequencing the obtained visible character strings according to the virtual address; therefore, feature matching can be carried out on the visible character strings and the fingerprint feature database one by one according to the sequence of the visible character strings, and the open source library which the binary file depends on is obtained.
If the code segment is obtained through analysis, disassembling the code segment according to the processor architecture information to obtain an assembly instruction; the assembler instruction can include: assembling the code; translating the assembly code into an Intermediate Language (IL) that is independent of a processor architecture; and establishing a Control Flow Graph (CFG) based on the intermediate language.
In this embodiment, if the slave binary file does not contain a data segment, the step is skipped. If the code segment is obtained by analysis, a disassembler is used for obtaining a series of assembly instructions according to the processor architecture by using the data of the code segment. Each assembly instruction may include assembly code and its virtual address.
And performing static program analysis on the control flow graph to obtain an intermediate language sequence.
In the embodiment, a static program analysis method is adopted to simplify a control flow graph based on an intermediate language to obtain an optimized intermediate language sequence; the intermediate language sequence includes one or more intermediate languages.
To implement the above-described dependent library analysis method, in some embodiments, the method further comprises: constructing a fingerprint feature database, wherein the fingerprint feature database can comprise: the system comprises a derived information characteristic data table, a visible character string characteristic information table, an intermediate language sequence characteristic information table, a dependency library version information table and the like.
The building the feature database may include: analyzing the acquired open source software libraries of different versions as sample files; extracting fingerprint features in the sample file; establishing a fingerprint characteristic database according to the fingerprint characteristics in the sample file; the fingerprint features include: derived information, visible strings, and/or intermediate languages.
In this embodiment, a sample file of a large number of existing open source software libraries of different versions, different architectures, and different compiling options is automatically analyzed, and a fingerprint feature database including a derived information feature data table, a visible character string feature information table, an intermediate language sequence feature information table, and a dependent library version information table is established according to feature information (fingerprint features) therein, such as derived information, visible character string information, an intermediate language sequence, and the like, so that it is possible to directly perform library-dependent analysis from a binary executable file.
It is understood that for the same open source software library, such as openssl library, whose versions are 0.9.8, 0.9.7 and 1.0.2, where 0.9.8 and 0.9.7 belong to the same large version 0.9.x family, the derived information, visible strings or intermediate language information belonging to the same large version will have common fingerprint features, for example, 0.9.8 and 0.9.7 belong to the large version 0.9.x family with common derived information, such as the derived symbol AES _ cfbr _ encrypt _ block, but no derived symbol in 1. x.
Therefore, in order to improve the efficiency of subsequent matching, in some embodiments, a maximum common interval may be calculated for derived information, visible strings, or intermediate language information belonging to the same large version based on their similarity. Each derived information, visible character string or intermediate language information in the common interval is shared by at least two different versions of the open source software, namely, a first mapping relation of corresponding fingerprint characteristics and open source software name identification can be established. Further, a second mapping relationship between the visible string and the open source software version may be established. Of course, if there is no feature information of a certain type in a sample file, it is sufficient to calculate feature information of other types.
In order to improve subsequent matching efficiency, specifically, the creating a fingerprint feature database according to the fingerprint features in the sample file includes: similarity calculation is carried out on the same type of fingerprint features in different sample files based on a dimensionality reduction algorithm, and the same type of fingerprint feature values shared by the different sample files are obtained; and storing the common fingerprint characteristic values of the same type of different sample files to form the fingerprint characteristic database.
The dimensionality reduction algorithm can be a minhash algorithm, and the fingerprint characteristic value can be a hash value; in this embodiment, the visible string information and the intermediate language information obtained from the sample file may be stored in an Apache Spark engine (an open source processing engine or framework), and a minhash algorithm built in the Apache Spark engine is used to perform the dimension reduction processing on the data.
For the creation of the version information table, illustratively, for each open source software library, the version characteristics of the library are collected, for example, in a zlib (a library for providing data compression) library, a row of character strings "deflate 1.2.11copy 1995-.
In this embodiment, through the above steps, a fingerprint feature database integrating various fingerprint features (feature information) can be efficiently established, and thus, the method can be applied to efficiently and accurately analyzing the dependency library information used by the binary file.
As can be seen from the above description, the method for analyzing a binary executable file dependency library according to the embodiment of the present invention can identify the library name of the dependency library used by the binary executable file and the corresponding version information even if no source code information is provided, so as to find whether a known bug exists in the binary executable file, prevent a computer from being attacked during running the binary executable file, and thus can ensure the safe running of the computer.
Example two
FIG. 4 is a block diagram illustrating an exemplary architecture of an exemplary embodiment of an apparatus for analyzing a dependency library of a binary executable file; as shown in fig. 4, the binary executable file dependent library analysis apparatus includes:
an extraction program module 210, configured to perform feature extraction on a binary executable file to be analyzed to obtain feature information of the binary executable file;
a matching program module 220, configured to match the feature information of the binary executable file with a fingerprint feature database, and determine a corresponding dependent database fingerprint feature; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library;
and the determining program module 230 is configured to determine corresponding dependency library attribute information according to the dependency library fingerprint features.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect of the apparatus of this embodiment are similar, and are not described herein again and may be referred to each other.
Referring to fig. 5, in this embodiment, the feature information of the binary executable file includes: one or more of;
as an alternative embodiment, the present embodiment provides a device similar to the device described in the previous embodiment, except that: the matching program module 220 includes:
the matching program unit 221 is configured to perform global matching with the fingerprint features in the fingerprint feature database according to any feature information of the binary executable file;
a first determining program unit 222, configured to determine, if a matching degree of any feature information of the binary executable file and a first fingerprint feature in the fingerprint feature database reaches a preset global matching degree threshold, the first fingerprint feature as a dependent library fingerprint feature corresponding to the binary executable file.
In this embodiment, as another optional embodiment, the feature information of the binary executable file and the fingerprint features in the fingerprint feature database respectively include: derived information, a visible string, and/or an intermediate language sequence.
Referring to FIG. 6, as another alternative embodiment, the determination program module 230 includes: a second determining program unit 231, configured to determine, according to the dependent library fingerprint feature, a corresponding dependent library name identifier.
With continued reference to fig. 6, the feature information includes: a visible string;
the determination program module 230 further includes: a third determining program unit 232, configured to, after determining a corresponding dependent library name identifier according to the dependent library fingerprint feature, perform matching according to the visible character string and a version information table of a corresponding dependent library, and determine dependent library version information corresponding to the dependent library name identifier; the version information table of the dependency library is preset with a dependency library name identifier and a corresponding relation between the visible character string and the dependency library version information.
In this embodiment, as a further optional embodiment, the apparatus further includes: an identification program module for: acquiring a file header of a binary executable file to be analyzed; and determining the type of the binary executable file to be analyzed according to the file header.
Specifically, the extraction program module 210 includes:
the analysis program unit is used for analyzing the data section and/or the code section, the export information and the processor architecture information of the binary executable file according to the type of the binary executable file;
the first extraction program unit is used for extracting the characteristics of the data segment to obtain a visible character string and/or derived information if the data segment is obtained through analysis;
the disassembling program unit is used for disassembling the code segments according to the processor architecture information to obtain an assembling instruction if the code segments are obtained through analysis; the assembler instruction comprises: assembling the code;
a translator unit for translating the assembly code into an intermediate language independent of the processor architecture;
a program establishing unit for establishing a control flow graph based on the intermediate language;
the static analysis program unit is used for carrying out static program analysis on the control flow graph to obtain an intermediate language sequence; the intermediate language sequence includes: one or more intermediate languages.
In some embodiments, the apparatus further comprises: the sample analysis program module is used for analyzing the acquired open source software libraries of different versions as sample files;
a second extraction program module 210, configured to extract fingerprint features in the sample file;
the characteristic library establishing program module is used for establishing a fingerprint characteristic database according to the fingerprint characteristics in the sample file; the fingerprint features include: derived information, visible strings, and/or intermediate languages.
In still other embodiments, the feature library creation program module comprises:
the dimension reduction program unit is used for carrying out similarity calculation on the same type of fingerprint features in different sample files based on a dimension reduction algorithm to obtain the same type of fingerprint feature values shared by different sample files;
and the storage program unit is used for storing the same type of fingerprint characteristic values shared by different sample files to form the fingerprint characteristic database.
The binary executable file dependency library analysis device provided by the embodiment of the invention can simply and conveniently identify the dependency library information used by the binary executable file from the binary executable file efficiently on the basis of the same specific technical characteristics as those of the first embodiment under the condition of not providing any source code information, so that the device is convenient for analyzing the dependency library information used by the binary executable file.
For each embodiment of the binary executable file dependent library analysis device provided by the invention, since the binary executable file dependent library analysis device is basically similar to the method embodiment, the relevant points can be referred to the description of the method embodiment part.
A further embodiment of the present invention provides an electronic device, including one or more processors; a memory; the memory stores one or more executable programs, and the one or more processors read the executable program codes stored in the memory to run programs corresponding to the executable program codes so as to execute the method of any one of the embodiments.
Fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the present invention, which may implement the method according to any one of the embodiments of the present invention, as shown in fig. 7, as an alternative embodiment, the electronic device may include: the device comprises a shell 41, a processor 42, a memory 43, a circuit board 44 and a power circuit 45, wherein the circuit board 44 is arranged inside a space enclosed by the shell 41, and the processor 42 and the memory 43 are arranged on the circuit board 44; a power supply circuit 45 for supplying power to each circuit or device of the electronic apparatus; the memory 43 is used for storing executable program code; the processor 42 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 43, so as to execute the binary executable file dependent library analysis method described in any one of the embodiments.
For the specific execution process of the above steps by the processor 42 and the further steps executed by the processor 42 by running the executable program code, reference may be made to the description of the first embodiment of the binary executable file dependency library analysis method of the present invention, which is not described herein again.
The electronic device exists in a variety of forms, including but not limited to: (1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others. (2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads. (3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio and video playing modules (such as an iPod), handheld game consoles, electronic books, and intelligent toys and portable car navigation devices. (4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service. (5) And other electronic equipment with data interaction function.
Yet another embodiment of the present invention provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the method for analyzing a dependent library of binary executable files according to any one of the preceding embodiments.
In summary, the method and the device for analyzing the dependency library of the binary executable file provided by the present invention do not need to provide source code information, but can still identify the library name of the dependency library and the corresponding version information used by the binary executable file from the binary executable file, so as to facilitate the analysis of the library information used by the binary executable file.
Further, according to the dependent library name and the corresponding version information used by the binary executable file obtained through analysis, whether a known bug exists in the binary executable file can be found, the risk that a computer is attacked when the binary executable file is operated is prevented, and therefore the safe operation of the computer can be guaranteed.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may also be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (20)

1. A method for analyzing a dependency library of a binary executable file, the method comprising the steps of:
performing feature extraction on a binary executable file to be analyzed to obtain feature information of the binary executable file;
matching the characteristic information of the binary executable file with a fingerprint characteristic database to determine corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library;
and determining corresponding attribute information of the dependency library according to the fingerprint features of the dependency library.
2. The method of claim 1, wherein the feature information of the binary executable file comprises: one or more of;
the matching of the feature information of the binary executable file with the fingerprint feature database and the determination of the corresponding dependent database fingerprint features comprise: performing global matching on any feature information of the binary executable file and the fingerprint features in the fingerprint feature database;
and if the matching degree of any feature information of the binary executable file and the first fingerprint feature in the fingerprint feature database reaches a preset global matching degree threshold value, determining the first fingerprint feature as a dependent library fingerprint feature corresponding to the binary executable file.
3. The method according to claim 1 or 2, wherein the feature information of the binary executable file and the fingerprint features in the fingerprint feature database respectively comprise: derived information, a visible string, and/or an intermediate language sequence.
4. The method of claim 1, wherein determining corresponding dependency pool attribute information from the dependency pool fingerprint features comprises:
and determining the corresponding dependent library name identification according to the dependent library fingerprint characteristics.
5. The method of claim 4, wherein the feature information comprises: a visible string;
the determining the corresponding dependency library attribute information according to the dependency library fingerprint features further comprises: after determining the corresponding dependent library name identification according to the dependent library fingerprint characteristics, matching the visible character string with the version information table of the corresponding dependent library to determine the version information of the dependent library corresponding to the dependent library name identification; the version information table of the dependency library is preset with a dependency library name identifier and a corresponding relation between the visible character string and the dependency library version information.
6. The method of claim 1, wherein prior to feature extraction of the binary executable file to be analyzed, the method further comprises:
acquiring a file header of a binary executable file to be analyzed;
and determining the type of the binary executable file to be analyzed according to the file header.
7. The method of claim 6, wherein the extracting features of the binary executable file to be analyzed to obtain the feature information of the binary executable file comprises:
analyzing a data segment and/or a code segment, export information and processor architecture information of the binary executable file according to the type of the binary executable file;
if the data segment is obtained through analysis, feature extraction is carried out on the data segment to obtain a visible character string and/or derived information;
if the code segment is obtained through analysis, disassembling the code segment according to the processor architecture information to obtain an assembly instruction; the assembler instruction comprises: assembling the code;
translating the assembly code into an intermediate language independent of the processor architecture;
establishing a control flow graph based on the intermediate language;
and performing static program analysis on the control flow graph to obtain an intermediate language sequence.
8. The method of claim 1, further comprising: analyzing the acquired open source software libraries of different versions as sample files;
extracting fingerprint features in the sample file;
establishing a fingerprint characteristic database according to the fingerprint characteristics in the sample file; the fingerprint features include: derived information, visible strings, and/or intermediate languages.
9. The method of claim 8, wherein the building a fingerprint database from the fingerprint features in the sample file comprises:
similarity calculation is carried out on the same type of fingerprint features in different sample files based on a dimensionality reduction algorithm, and the same type of fingerprint feature values shared by the different sample files are obtained;
and storing the common fingerprint characteristic values of the same type of different sample files to form the fingerprint characteristic database.
10. An apparatus for analyzing a binary executable dependency library, the apparatus comprising: the extraction program module is used for extracting the characteristics of the binary executable file to be analyzed to obtain the characteristic information of the binary executable file;
the matching program module is used for matching the characteristic information of the binary executable file with a fingerprint characteristic database and determining the corresponding dependent database fingerprint characteristics; the fingerprint characteristic database stores fingerprint characteristics for identifying attribute information of the dependency library;
and the determining program module is used for determining corresponding attribute information of the dependency library according to the fingerprint characteristics of the dependency library.
11. The apparatus of claim 9, wherein the feature information of the binary executable file comprises: one or more of;
the matching program module comprises:
the matching program unit is used for carrying out global matching on the fingerprint features in the fingerprint feature database according to any feature information of the binary executable file;
a first determining program unit, configured to determine, if a matching degree of any feature information of the binary executable file and a first fingerprint feature in the fingerprint feature database reaches a preset global matching degree threshold, the first fingerprint feature as a dependent library fingerprint feature corresponding to the binary executable file.
12. The apparatus according to claim 10 or 11, wherein the feature information of the binary executable file and the fingerprint features in the fingerprint feature database respectively comprise: derived information, a visible string, and/or an intermediate language sequence.
13. The apparatus of claim 10, wherein the determining program module comprises: and the second determining program unit is used for determining the corresponding dependent library name identifier according to the dependent library fingerprint characteristics.
14. The apparatus of claim 13, wherein the feature information comprises: a visible string;
the determining program module further comprises: a third determining program unit, configured to, after determining a corresponding dependent library name identifier according to the dependent library fingerprint feature, perform matching according to the visible character string and a version information table of a corresponding dependent library, and determine dependent library version information corresponding to the dependent library name identifier; the version information table of the dependency library is preset with a dependency library name identifier and a corresponding relation between the visible character string and the dependency library version information.
15. The apparatus of claim 10, further comprising: an identification program module for:
acquiring a file header of a binary executable file to be analyzed;
and determining the type of the binary executable file to be analyzed according to the file header.
16. The apparatus of claim 15, wherein the extraction program module comprises:
the analysis program unit is used for analyzing the data section and/or the code section, the export information and the processor architecture information of the binary executable file according to the type of the binary executable file;
the first extraction program unit is used for extracting the characteristics of the data segment to obtain a visible character string and/or derived information if the data segment is obtained through analysis;
the disassembling program unit is used for disassembling the code segments according to the processor architecture information to obtain an assembling instruction if the code segments are obtained through analysis; the assembler instruction comprises: assembling the code;
a translator unit for translating the assembly code into an intermediate language independent of the processor architecture;
a program establishing unit for establishing a control flow graph based on the intermediate language;
the static analysis program unit is used for carrying out static program analysis on the control flow graph to obtain an intermediate language sequence; the intermediate language sequence includes: one or more intermediate languages.
17. The apparatus of claim 10, further comprising: the sample analysis program module is used for analyzing the acquired open source software libraries of different versions as sample files;
the second extraction program module is used for extracting fingerprint features in the sample file;
the characteristic library establishing program module is used for establishing a fingerprint characteristic database according to the fingerprint characteristics in the sample file; the fingerprint features include: derived information, visible strings, and/or intermediate languages.
18. The apparatus of claim 17, wherein the feature library builder module comprises:
the dimension reduction program unit is used for carrying out similarity calculation on the same type of fingerprint features in different sample files based on a dimension reduction algorithm to obtain the same type of fingerprint feature values shared by different sample files;
and the storage program unit is used for storing the same type of fingerprint characteristic values shared by different sample files to form the fingerprint characteristic database.
19. An electronic device, comprising: one or more processors; a memory; the memory stores one or more executable programs, and the one or more processors read the executable program codes stored in the memory to execute programs corresponding to the executable program codes for executing the method of any one of claims 1 to 9.
20. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the method of any of the preceding claims 1 to 9.
CN202111518401.2A 2021-12-13 2021-12-13 Binary executable file dependency library analysis method and device, electronic equipment and storage medium Pending CN114357454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111518401.2A CN114357454A (en) 2021-12-13 2021-12-13 Binary executable file dependency library analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111518401.2A CN114357454A (en) 2021-12-13 2021-12-13 Binary executable file dependency library analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114357454A true CN114357454A (en) 2022-04-15

Family

ID=81100120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111518401.2A Pending CN114357454A (en) 2021-12-13 2021-12-13 Binary executable file dependency library analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114357454A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075960A (en) * 2023-10-17 2023-11-17 统信软件技术有限公司 Program reconstruction method, application cross-platform migration method, device and computing equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069352A (en) * 2015-07-29 2015-11-18 浪潮电子信息产业股份有限公司 Method for constructing trusted application program running environment on server
US20170214704A1 (en) * 2013-12-30 2017-07-27 Beijing Qihoo Technology Company Limited Method and device for feature extraction
CN112800423A (en) * 2021-01-26 2021-05-14 北京航空航天大学 Binary code authorization vulnerability detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170214704A1 (en) * 2013-12-30 2017-07-27 Beijing Qihoo Technology Company Limited Method and device for feature extraction
CN105069352A (en) * 2015-07-29 2015-11-18 浪潮电子信息产业股份有限公司 Method for constructing trusted application program running environment on server
CN112800423A (en) * 2021-01-26 2021-05-14 北京航空航天大学 Binary code authorization vulnerability detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARITON KARAMITAS等: "Function matching between binary executables: efficient algorithms and features", pages 1 - 17, Retrieved from the Internet <URL:《网页在线公开:https://link.springer.com/article/10.1007/s11416-019-00339-6》> *
孙劲光等: "局部特征脸型分类方法", 《智能系统学报》, vol. 12, no. 1, 8 May 2017 (2017-05-08), pages 104 - 109 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075960A (en) * 2023-10-17 2023-11-17 统信软件技术有限公司 Program reconstruction method, application cross-platform migration method, device and computing equipment
CN117075960B (en) * 2023-10-17 2024-01-23 统信软件技术有限公司 Program reconstruction method, application cross-platform migration method, device and computing equipment

Similar Documents

Publication Publication Date Title
CN102779257B (en) A kind of safety detection method of Android application program and system
CN102831338B (en) A kind of safety detection method of Android application program and system
KR20170068814A (en) Apparatus and Method for Recognizing Vicious Mobile App
CN111104677A (en) Vulnerability patch detection method and device based on CPE (customer premise Equipment) specification
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN113381963A (en) Domain name detection method, device and storage medium
CN116868193A (en) Firmware component identification and vulnerability assessment
CN114357454A (en) Binary executable file dependency library analysis method and device, electronic equipment and storage medium
CN109145589B (en) Application program acquisition method and device
CN114282212A (en) Rogue software identification method and device, electronic equipment and storage medium
CN114398673A (en) Application compliance detection method and device, storage medium and electronic equipment
CN116069650A (en) Method and device for generating test cases
CN115203674A (en) Automatic login method, system, device and storage medium for application program
CN108875363B (en) Method and device for accelerating virtual execution, electronic equipment and storage medium
CN112887328A (en) Sample detection method, device, equipment and computer readable storage medium
CN113779576A (en) Identification method and device for executable file infected virus and electronic equipment
CN108881151B (en) Joint-point-free determination method and device and electronic equipment
CN111061642A (en) Full-automatic competition data processing system and method based on user data
CN114168953A (en) Malicious code detection method and device, electronic equipment and storage medium
EP4209938A1 (en) Systems, methods, and storage media for creating secured computer code
CN112395602B (en) Processing method, device and system for static security feature database
CN115017878A (en) Binary executable file similarity analysis method and system, electronic equipment and storage medium
CN110866250A (en) Virus defense method and device and electronic equipment
CN114817920A (en) Source code positioning detection method, system, electronic equipment and storage medium
CN114036518A (en) Virus file processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination