CN117707953A - Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium - Google Patents

Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117707953A
CN117707953A CN202311724616.9A CN202311724616A CN117707953A CN 117707953 A CN117707953 A CN 117707953A CN 202311724616 A CN202311724616 A CN 202311724616A CN 117707953 A CN117707953 A CN 117707953A
Authority
CN
China
Prior art keywords
binary
analyzed
feature
file
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311724616.9A
Other languages
Chinese (zh)
Inventor
余进奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Hubei Topsec Network Security Technology Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Hubei Topsec Network Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd, Hubei Topsec Network Security Technology Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202311724616.9A priority Critical patent/CN117707953A/en
Publication of CN117707953A publication Critical patent/CN117707953A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the disclosure discloses a binary software component analysis method, a binary software component analysis device, electronic equipment and a storage medium. The binary software component analysis method comprises the following steps: acquiring a binary file to be analyzed in binary software; extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed; setting matching priority for the multiple types of feature fingerprints to be analyzed; according to the matching priority, performing feature matching on the feature fingerprint to be analyzed and a knowledge base to obtain a feature matching result; and carrying out component analysis on the binary file to be analyzed according to the feature matching result. The method can accurately analyze the components of the binary software.

Description

Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of network security, and in particular relates to a binary software component analysis method, a binary software component analysis device, electronic equipment and a storage medium.
Background
Currently, an open source component is gradually becoming a core infrastructure for software development, but some risks and potential safety hazards are brought in the application process of the open source component. In embedded systems such as industrial systems, internet of vehicles systems, etc., many source programs may be lost for a long period of time or proprietary, such that their security cannot be assessed at the source code level, and thus, it is necessary to detect the components of the source used by binary software component analysis and risk detection techniques, as well as potential risks.
In the prior art, the component analysis and risk detection methods for binary software mainly comprise the following steps: firstly, searching ASCII character strings in a binary file, extracting character string characteristic information in the ASCII character strings, such as version number information, to identify the version of the binary file, and then matching software and version known to have security vulnerability risks according to the name, version information and the like of the identified binary file, so as to realize vulnerability scanning of the binary file.
The inventor finds that the character string features are only one of typical features of binary software, and part of commercial software may modify the character string features in the development and compiling processes, so that the extracted ASCII character string cannot comprehensively and accurately reflect the features of the binary software, that is, the binary software cannot be subjected to accurate component analysis, and therefore, the condition of missing report or even false report is easy to occur.
Disclosure of Invention
In view of the above, the embodiments of the present disclosure provide a binary software component analysis method, apparatus, electronic device, and storage medium, which can perform accurate component analysis on binary software.
In a first aspect, an embodiment of the present disclosure provides a binary software component analysis method, which adopts the following technical scheme:
The binary software component analysis method comprises the following steps:
acquiring a binary file to be analyzed in binary software;
extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed;
setting matching priority for the multiple types of feature fingerprints to be analyzed;
according to the matching priority, performing feature matching on the feature fingerprint to be analyzed and a knowledge base to obtain a feature matching result;
and carrying out component analysis on the binary file to be analyzed according to the feature matching result.
Optionally, the obtaining the binary file to be analyzed in the binary software includes:
acquiring binary files in various file formats in the binary software;
identifying the binary files with the multiple file formats to obtain the binary file to be analyzed;
the binary file to be analyzed is an executable file, a relocatable file or a sharing target file.
Optionally, the extracting the multi-class feature fingerprint to be analyzed of the binary file to be analyzed includes:
extracting a plurality of original features of the binary file to be analyzed;
data cleaning is carried out on the plurality of original features;
and classifying the plurality of original features after data cleaning to obtain the multi-class feature fingerprints to be analyzed.
Optionally, the original features include hash values, MD5, SHA1 values, symbol table information, file dependent library information, disassembled code symbols, and ASCII strings.
Optionally, the multiple types of feature fingerprints to be analyzed include: a digital signature class feature fingerprint to be analyzed, a symbol table feature fingerprint to be analyzed, a dependency relationship feature fingerprint to be analyzed and a character string feature fingerprint to be analyzed; the digital signature feature fingerprint to be analyzed has a first matching priority; the knowledge base comprises a plurality of known binary components and corresponding digital signature characteristic fingerprints, symbol table characteristic fingerprints, dependency characteristic fingerprints and character string characteristic fingerprints.
Optionally, the feature matching is performed on the feature fingerprint to be analyzed and the knowledge base according to the matching priority, so as to obtain a feature matching result, including:
performing feature matching on the digital signature feature fingerprints to be analyzed and the digital signature feature fingerprints of all known binary components;
if the matching is successful, a first feature matching result is obtained, wherein the first feature matching result is that the binary file to be analyzed is a known binary component;
if the matching fails, performing feature matching on the symbol table feature fingerprint to be analyzed and the symbol table feature fingerprint of the i-th known binary component in the knowledge base to obtain a first matching result, wherein i is 1,2,3, …, P, and P is the total number of the known binary components in the knowledge base;
Performing feature matching on the character string feature fingerprint to be analyzed and the character string feature fingerprint of the ith known binary component to obtain a second matching result;
and performing feature matching on the dependency characteristic fingerprint to be analyzed and the dependency characteristic fingerprint of the ith known binary component to obtain a third matching result.
Optionally, the first matching result is the number of matches between the symbol table feature fingerprint to be analyzed and the symbol table feature fingerprint of the i-th known binary component, the second matching result is the number of matches between the character string feature fingerprint to be analyzed and the character string feature fingerprint of the i-th known binary component, and the third matching result is the number of matches between the dependency feature fingerprint to be analyzed and the dependency feature fingerprint of the i-th known binary component.
Optionally, the performing component analysis on the binary file to be analyzed according to the feature matching result includes:
obtaining feature similarity between the binary file to be analyzed and an ith known binary component according to the first matching result, the second matching result and the third matching result;
And obtaining a component analysis result according to the highest feature similarity.
Optionally, obtaining the feature similarity between the binary file to be analyzed and the i-th known binary component according to the first matching result, the second matching result and the third matching result includes:
calculating a first scoring factor K1, K1E [0,1] according to the first matching result;
calculating a second scoring factor K2, K2E [0,1] according to the second matching result;
calculating a third scoring factor K3, K3E [0,1] according to the third matching result;
and calculating to obtain the feature similarity according to the weights corresponding to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3 and the third scoring factor K3.
Optionally, the calculating to obtain the feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3, and weights corresponding to the three includes:
presetting a first weight N1 for the first scoring factor K1, presetting a second weight N2 for the second scoring factor K2, presetting a third weight N3 for the third scoring factor K3, wherein N1 epsilon [0,1], N2 epsilon [0,1], N3 epsilon [0,1];
when the third scoring factor k3=1, correcting the value of the third weight N3 to be 1;
And calculating to obtain feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3, the first weight N1, the second weight N2 and the corrected third weight N3.
Optionally, the obtaining the component analysis result according to the highest feature similarity includes:
if the highest feature similarity is greater than or equal to the preset threshold, the component analysis result is that the main component of the binary file to be analyzed is a known binary component corresponding to the highest feature similarity;
and if the highest feature similarity is smaller than the preset threshold value, the component analysis result is that the main component of the binary file to be analyzed is not a known binary component.
Optionally, the binary software component analysis method further includes: and carrying out safety detection on the binary file to be analyzed according to the component analysis result and the multiple types of characteristic fingerprints to be analyzed.
Optionally, the performing security detection on the binary file to be analyzed according to the component analysis result and the multiple types of feature fingerprints to be analyzed includes:
if the component analysis result is that the binary file to be analyzed is a known binary component, searching relevant information in the knowledge base according to the known binary component, and carrying out vulnerability detection and license term compliance detection on the binary file to be analyzed according to the relevant information;
If the component analysis result is that the main component of the binary file to be analyzed is a known binary component, searching related information in the knowledge base according to the known binary component, and performing vulnerability detection, license term compliance detection and open source license compatibility detection on the binary file to be analyzed according to the related information;
and if the component analysis result is that the main component of the binary file to be analyzed is not a known binary component, performing license term compliance detection and open source license-to-license compatibility detection on the binary file to be analyzed according to the multi-class feature fingerprints to be analyzed.
In a second aspect, an embodiment of the present disclosure further provides a binary software component analysis apparatus, which adopts the following technical scheme:
the binary software component analysis apparatus includes:
the file acquisition module is used for acquiring a binary file to be analyzed in the binary software;
the feature extraction module is used for extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed;
the priority setting module is used for setting matching priorities for the multiple types of feature fingerprints to be analyzed;
the feature matching module is used for carrying out feature matching on the feature fingerprint to be analyzed and the knowledge base according to the matching priority, so as to obtain a feature matching result;
And the component analysis module is used for carrying out component analysis on the binary file to be analyzed according to the characteristic matching result.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, which adopts the following technical scheme:
the electronic device includes:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the binary software constituent analysis methods described above.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium storing computer instructions for causing a computer to perform any one of the binary software constituent analysis methods described above.
The embodiment of the disclosure provides a binary software component analysis method, a device, an electronic device and a storage medium, wherein the binary software component analysis method is higher in accuracy compared with the scheme of component analysis only through ASCII character strings in the prior art by extracting multiple types of feature fingerprints to be analyzed of a binary file to be analyzed and carrying out component analysis on the binary file to be analyzed through feature matching results of the multiple types of feature fingerprints to be analyzed and a knowledge base. In addition, in the analysis method, matching priorities are set for the multiple types of feature fingerprints to be analyzed, feature matching is carried out on the feature fingerprints to be analyzed and the knowledge base according to the matching priorities, feature matching results are obtained, and the efficiency of component analysis can be improved.
The foregoing description is only an overview of the disclosed technology, and may be implemented in accordance with the disclosure of the present disclosure, so that the above-mentioned and other objects, features and advantages of the present disclosure can be more clearly understood, and the following detailed description of the preferred embodiments is given with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a flow chart of a binary software component analysis method provided by an embodiment of the present disclosure;
fig. 2 is a specific flowchart of step S1 provided in an embodiment of the present disclosure;
fig. 3 is a specific flowchart of step S2 provided in an embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for constructing a knowledge base according to an embodiment of the present disclosure;
fig. 5 is a specific flowchart of step S4 provided in an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating step S5 according to an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating security detection according to an embodiment of the present disclosure;
FIG. 8 is a functional block diagram of a binary software component analysis apparatus provided by an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
It should be appreciated that the following specific embodiments of the disclosure are described in order to provide a better understanding of the present disclosure, and that other advantages and effects will be apparent to those skilled in the art from the present disclosure. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the disclosure by way of illustration, and only the components related to the disclosure are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the disclosure provides a binary software component analysis method, specifically, as shown in fig. 1, the binary software component analysis method includes:
and S1, acquiring a binary file to be analyzed in the binary software.
The binary software may be independent binary software or binary software obtained by decompressing a binary software package, one binary software may contain one or more binary files (also referred to as binary components, which are the minimum units of component analysis), and in the actual analysis process, the binary files to be analyzed need to be obtained from the binary software.
Optionally, as shown in fig. 2, step S1 of obtaining the binary file to be analyzed in the binary software specifically includes:
and S11, acquiring binary files in various file formats in the binary software.
And (3) acquiring binary software or a binary software package uploaded by a user, if the binary software uploaded by the user is an executable file, directly entering a substep S12, and if the binary software package uploaded by the user is a binary software package, performing recursion analysis on the binary software package to extract binary files with various file formats in the binary software package.
Illustratively, the binary package may be a file type of a compressed file, an executable file, a firmware image, or the like.
Illustratively, the above recursive parsing process includes: depending on the file type of the binary package, unpacking operations may be performed on the uploaded binary package, e.g., ZIP unpacking tools may be used for the packed files to extract files and directories therein, and corresponding tools may be used for the firmware image to unpack the firmware image. The directory tree after the binary software package is decompressed can be obtained through recursion analysis.
The process of extracting binary files of multiple file formats in a binary software package comprises the following steps: the file type of the file under the directory tree is identified, and the file type comprises an executable linkable file, a compressed file, a code file, a configuration file, a data file, a plain text file, other files and the like.
And S12, identifying the binary files with various file formats to obtain the binary files to be analyzed.
Illustratively, the binary file to be analyzed is an executable file, a relocatable file, or a shared object file.
For example, the executable linkable file is identified, the executable file, the relocatable file and the sharing target file are extracted, the unpacking operation can be continuously executed for the compressed file, and then the corresponding identification operation is performed according to the file type of the unpacked file.
And S2, extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed.
Optionally, as shown in fig. 3, step S2 extracts multiple types of feature fingerprints to be analyzed of the binary file to be analyzed, including:
and S21, extracting a plurality of original features of the binary file to be analyzed.
Optionally, the original features include hash values, MD5, SHA1 values, symbol table information, file dependency library information, disassembled code symbols, and ASCII strings. The original features can be extracted by a system command, a tool or a program written according to a binary file construction mode in the embodiment of the disclosure. For example, hash values, MD5, SHA1 values of binary files are extracted using the hashlib library of Python; extracting symbol table information and disassembled code symbols of the binary file by using a disassembly tool such as objdump; extracting file dependency library information, such as a dependency library list, of the binary file using a system tool, such as ldd (Linux) or dumpbin/DEPENDENTS (Windows); system command strings are used to extract ASCII strings in binary files.
And a substep S22, cleaning the data of the plurality of original features.
By data cleaning the original features, invalid features can be eliminated, preserving key features useful for component analysis.
Illustratively, the data cleansing includes one or more of the following: (1) removing the missing values: detecting and deleting missing values in the original feature, such as a large number of empty rows; (2) removing duplicate values: detecting and deleting repeated characteristics, which are mostly applied to ASCII character strings; (3) removing abnormal values: detecting and deleting abnormal characteristics with overlong and excessively short length, and being applied to symbol table information, ASCII character strings and the like; (4) normalized data: converting the hash value, MD5, SHA1 value, file dependent library information and the like into a standard data format; (5) cross-validation: the use of cross-validation ensures that there are no conflicting features, which are commonly used for cross-validation of symbol table information, version numbers in ASCII strings, etc.
And S23, classifying the plurality of original features after the data cleaning to obtain a plurality of types of feature fingerprints to be analyzed.
The plurality of original features after data cleaning are equivalent to metadata for identifying binary files to be analyzed, and the fingerprints of the plurality of types of features to be analyzed can be obtained by classifying the original features according to the fingerprint attributes of the original features. The fingerprint attributes comprise four categories of digital signature fingerprints, symbol table fingerprints, dependency fingerprints and character string fingerprints, and the plurality of original features after data cleaning are classified to obtain digital signature feature fingerprints to be analyzed, symbol table feature fingerprints to be analyzed, dependency relationship feature fingerprints to be analyzed and character string feature fingerprints to be analyzed. Wherein each class of feature fingerprints to be analyzed comprises at least one feature fingerprint, preferably it comprises a plurality of feature fingerprints, e.g. the digital signature class feature fingerprint to be analyzed comprises: one or more of a hash value, MD5 value, SHA1 value, etc.; the symbol table feature fingerprint to be analyzed comprises: one or more of symbol table information, disassembly code symbols, etc.; the dependency characteristic fingerprint to be analyzed comprises the following steps: file dependency library information; the character string feature fingerprint to be analyzed comprises: ASCII string.
And S3, setting matching priority for the multiple types of feature fingerprints to be analyzed.
Optionally, since the feature fingerprint of the digital signature to be analyzed has uniqueness and the matching result obtained according to the feature fingerprint has higher credibility, the first matching priority is set for the feature fingerprint of the digital signature to be analyzed. On the basis, the symbol table feature fingerprint to be analyzed, the dependency relationship feature fingerprint to be analyzed and the character string feature fingerprint to be analyzed can all have the second matching priority, or the symbol table feature fingerprint to be analyzed and the character string feature fingerprint to be analyzed have the second matching priority, and the dependency relationship feature fingerprint to be analyzed has the third matching priority.
And S4, carrying out feature matching on the feature fingerprints to be analyzed and the knowledge base according to the matching priority, and obtaining feature matching results.
Illustratively, the knowledge base includes a plurality of known binary components and their corresponding digital signature class feature fingerprints, symbol table feature fingerprints, dependency feature fingerprints, and string feature fingerprints. The knowledge base can be pre-constructed, and the pre-constructed knowledge base is loaded when binary file composition analysis is performed. It should be noted that in the embodiment of the present disclosure, if the vendor and the name of two binary components are the same and the version information is different, the two binary components are considered to be two different binary components in the knowledge base, and this principle is applied in the subsequent steps.
Illustratively, as shown in FIG. 4, the process of pre-building a knowledge base includes:
(1) Acquiring a known binary component: known binary components, binaries, or packages may be downloaded from software vendors or open source communities (e.g., github) or the like by automated crawlers or the like.
(2) Acquiring product information of a known binary component: the manufacturer information, version information, MD5, applicable environment, programming language and other product information of the binary component can be obtained at the position of the binary component, the release page of the binary software or the binary software package and the like.
(3) Acquiring component information of a known binary component: the dependency component and version information (such as the dependency relationship of a pon.xml file of a JAVA software package), the used license information and the like can be acquired at the release page and the like of the binary component, the binary software or the binary software package, and the associated known vulnerability information can be acquired at a vulnerability release website (such as NVD, CNNVD and the like).
(4) Building a unique identification of a known binary component: CPE (Common PlatformEnumeration, universal platform enumeration) may be used as a unique identification of a binary component.
(5) Acquiring characteristic fingerprints of known binary components: a digital signature class feature fingerprint, a symbol table feature fingerprint, a dependency feature fingerprint and a character string feature fingerprint of a known binary component are acquired, wherein the feature fingerprints comprise all features of the known binary component in a knowledge base.
(6) And (5) storing the information or the content obtained in the steps (1) to (5) to complete the knowledge base construction.
In the application process, all feature fingerprints of the knowledge base can be loaded by default, or part of feature fingerprints of the knowledge base can be selectively loaded according to the type of the binary file, user configuration and the like, for example, when the suffix of the binary file to be analyzed is jar, the programming language is JAVA, and only the feature fingerprints related to JAVA in the knowledge base can be loaded at the moment.
Optionally, as shown in fig. 5, step S4 performs feature matching on the feature fingerprint to be analyzed and the knowledge base according to the matching priority, so as to obtain a feature matching result, including:
and S41, performing feature matching on the digital signature feature fingerprints to be analyzed and the digital signature feature fingerprints of the known binary components.
Because the digital signature feature fingerprint to be analyzed has the first matching priority, the digital signature feature fingerprint to be analyzed is firstly subjected to feature matching with the digital signature feature fingerprints of all the known binary components, for example, the type and the value of the digital signature feature fingerprint to be analyzed are used as query conditions, whether the digital signature feature fingerprint is the digital signature feature fingerprint of the known binary component or not is searched in a knowledge base, if so, the matching is successful, and if not, the matching is failed.
In the substep S42, if the matching is successful, a first feature matching result is obtained, where the binary file to be analyzed is a known binary component, for example, the first feature matching result includes vendor, name and version information of the matched known binary component.
If the feature fingerprint of the digital signature to be analyzed is successfully matched, the obtained first feature matching result shows that the binary file to be analyzed is a known binary component in the knowledge base, and the component analysis is not needed to be carried out on the binary file to be analyzed, and the first feature matching result can be understood to be directly used as the component analysis result.
And step S43, if the matching fails, performing feature matching on the symbol table feature fingerprint to be analyzed and the symbol table feature fingerprint of the i-th known binary component in the knowledge base to obtain a first matching result, wherein i is 1,2,3, …, P, and P is the total number of the known binary components in the knowledge base.
Illustratively, the matching is performed in the symbol table feature fingerprint of the ith known binary component of the knowledge base on the condition that the symbol table feature fingerprint to be analyzed, and a first matching result can be obtained by adopting a regular matching mode, a character string matching mode and the like. And i, taking all values between 1 and P, namely respectively performing feature matching on the feature fingerprints of the symbol table to be analyzed and the feature fingerprints of the symbol table of all known binary components in the knowledge base to obtain a plurality of first matching results.
Illustratively, the first match result is the number of matches between the symbol characteristic fingerprint to be analyzed and the symbol characteristic fingerprint of the i-th known binary component.
And S44, performing feature matching on the character string feature fingerprint to be analyzed and the character string feature fingerprint of the ith known binary component to obtain a second matching result.
Illustratively, the matching is performed in the character string feature fingerprint of the ith known binary component of the knowledge base on the condition that the character string feature fingerprint to be analyzed, and a second matching result can be obtained by adopting a regular matching mode, a character string matching mode and the like. And i, taking all values between 1 and P, namely respectively performing feature matching on character string feature fingerprints to be analyzed and character string feature fingerprints of all known binary components in the knowledge base to obtain a plurality of second matching results.
Illustratively, the second matching result is the number of matches between the string feature fingerprint to be analyzed and the string feature fingerprint of the i-th known binary component.
And S45, performing feature matching on the dependency characteristic fingerprint to be analyzed and the dependency characteristic fingerprint of the ith known binary component to obtain a third matching result.
The number of the dependent components can be compared first when the specific implementation is carried out, and if the number is the same, whether the dependent component lists are consistent or not is compared, and finally a third matching result is obtained.
Illustratively, the third matching result is the number of matches between the dependency feature fingerprint to be analyzed and the dependency feature fingerprint of the i-th known binary component.
It should be noted that, there is no sequence limitation between the feature matching of the feature fingerprint of the symbol table to be analyzed, the feature matching of the feature fingerprint of the character string to be analyzed and the feature matching of the feature fingerprint of the dependency relationship to be analyzed, the feature matching can be performed on any one of the feature matching, or the feature matching can be performed on the feature fingerprint of the symbol table to be analyzed and the feature fingerprint of the character string to be analyzed at the same time, and then the feature matching is performed on the feature fingerprint of the dependency relationship to be analyzed.
And S5, carrying out component analysis on the binary file to be analyzed according to the feature matching result.
Optionally, as shown in fig. 6, step S5 performs component analysis on the binary file to be analyzed according to the feature matching result, including:
And a substep S51, obtaining the feature similarity between the binary file to be analyzed and the ith known binary component according to the first matching result, the second matching result and the third matching result.
Optionally, the substep S51 specifically includes:
calculating a first scoring factor K1, K1E [0,1] according to the first matching result;
calculating a second scoring factor K2 according to the second matching result, wherein K2 is E [0,1];
calculating a third scoring factor K3 according to the third matching result, wherein K3 epsilon [0,1];
and calculating to obtain the feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3 and the weights corresponding to the first scoring factor K1, the second scoring factor K2 and the third scoring factor K3.
The higher the scoring factor K (collectively, the first scoring factor K1, the second scoring factor K2, and the third scoring factor K3), the higher the similarity of the corresponding feature fingerprint characterizing the corresponding feature fingerprint to be analyzed and the corresponding feature fingerprint of the knowledge base known binary component.
Illustratively, the specific formula for calculating the scoring factor K (collectively, the first scoring factor K1, the second scoring factor K2, and the third scoring factor K3) is as follows:
wherein K is a scoring factor, A is the feature quantity (feature quantity included in the feature fingerprint of the symbol table, the feature fingerprint of the character string or the feature fingerprint of the dependency relationship to be analyzed) included in the feature fingerprint to be analyzed, B is the feature quantity (feature quantity included in the feature fingerprint of the symbol table, the feature fingerprint of the character string or the feature fingerprint of the dependency relationship) included in the feature fingerprint of the knowledge base, and M is the matching quantity. It should be understood that if the scoring factor K is the first scoring factor K1, a is the number of features included in the character string feature fingerprint to be analyzed, B is the number of features included in the character string feature fingerprint in the knowledge base, and M is the number of matches corresponding to the first matching result.
In the process of calculating the scoring factor K, only the intersection and union between the feature fingerprints of the to-be-analyzed feature fingerprint and the feature fingerprints of the known binary component of the knowledge base are concerned, complex mathematical operation or distance measurement is not needed, the discrete data processing efficiency is high, and the calculating speed is high.
Optionally, calculating the feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3, and weights corresponding to the three factors includes:
the first weight N1 is preset for the first scoring factor K1, the second weight N2 is preset for the second scoring factor K2, the third weight N3 is preset for the third scoring factor K3, N1 epsilon [0,1], N2 epsilon [0,1], N3 epsilon [0,1].
When the third scoring factor k3=1, the value of the third weight N3 is corrected to 1. The third scoring factor k3=1 indicates that the third matching result shows complete agreement with the known binary component of the knowledge base, and characterizes the binary file to be analyzed as a direct source code compiled of the known binary component or as a known open source component compiled after source code modification.
And calculating to obtain the feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3, the first weight N1, the second weight N2 and the corrected third weight N3.
The feature similarity obtained by calculation is more comprehensive and accurate due to the fact that the first scoring factor K1, the second scoring factor K2 and the third scoring factor K3 are used for combined calculation.
Illustratively, the calculation formula of the feature similarity S is:
and S52, obtaining a component analysis result according to the highest feature similarity.
When determining the highest feature similarity, a reliability threshold (for example, 0.4) may be set, and feature similarities with feature similarities lower than the reliability threshold are directly excluded, so that the process of determining the highest feature similarity is simplified.
Optionally, the substep S52 specifically includes:
judging the relationship between the highest feature similarity and a preset threshold value;
if the highest feature similarity is greater than or equal to the preset threshold, the component analysis result is that the main component of the binary file to be analyzed is the known binary component corresponding to the highest feature similarity, for example, the component analysis result contains manufacturer, name and version information of the known binary component with the highest feature similarity;
if the highest feature similarity is smaller than the preset threshold value, the component analysis result is that the main components of the binary file to be analyzed are not known binary components.
The above preset threshold may be set according to practical situations, for example, the preset threshold is set to 0.9 in consideration of possible errors in extraction of symbol table information and character string information due to different construction manners of binary components.
Optionally, the binary software component analysis method in the embodiment of the present disclosure further includes: and carrying out security detection on the binary file to be analyzed according to the component analysis result and the multiple types of characteristic fingerprints to be analyzed. Specifically, if the component analysis result is that the binary file to be analyzed is a known binary component, searching related information in a knowledge base according to the known binary component, and performing vulnerability detection and license term compliance detection on the binary file to be analyzed according to the related information; if the component analysis result is that the main component of the binary file to be analyzed is a known binary component, searching related information in a knowledge base according to the known binary component, and performing vulnerability detection, license term compliance detection and open source license compatibility detection on the binary file to be analyzed according to the related information; and if the component analysis result is that the main component of the binary file to be analyzed is not a known binary component, performing license term compliance detection and open source license-to-license compatibility detection on the binary file to be analyzed according to the multi-class feature fingerprints to be analyzed.
In the process of security detection, when the binary file to be analyzed is a known binary component, or the main component is the known binary component, the vulnerability of the binary file to be analyzed can be detected if the component analysis result contains the name and version information of the known binary component, and when the binary file to be analyzed is the known binary component, the binary file to be analyzed has no problem of compatibility among open source licenses, so that the risk is not detected, and when the main component of the binary file to be analyzed is not the known binary component, although the vulnerability can not be detected, license compliance detection and compatibility detection among open source licenses can be carried out on the binary file to be analyzed through various types of feature fingerprints to be analyzed.
For the case that the main component of the binary file to be analyzed is not a known binary component, the embodiments of the present disclosure have more obvious technical advantages, specifically as follows:
in the prior art, the version of the binary file is identified by extracting character string characteristic information in the ASCII character string, then software and version with known security vulnerability risk are matched according to the name, version information and the like of the identified binary file, so that vulnerability scanning of the binary file is realized, and therefore, any security detection cannot be carried out on components and versions of the binary software under the condition that the components and the versions of the binary software are not analyzed.
In the application, as the multi-class feature fingerprints to be analyzed of the binary file to be analyzed are extracted first, the binary file to be analyzed is subjected to safety detection according to the component analysis result and the multi-class feature fingerprints to be analyzed, and even if the component analysis result is that the main component of the binary file to be analyzed is not a known binary component, the license compliance detection and the open source license compatibility detection can be performed on the binary file to be analyzed according to the multi-class feature fingerprints.
In the security detection process, vulnerability and open source license clause compliance of the binary file are detected, and compatibility risks among a plurality of open source licenses contained in the binary file are detected, so that the security requirement of the binary file is guaranteed to the greatest extent.
Illustratively, as shown in fig. 7, the process of performing security detection on the binary file to be analyzed according to the component analysis result includes one or more of the following steps:
(1) Building a unique identification of the binary component: constructing a standard CPE data format according to the component manufacturer, the name and the version information;
(2) Knowledge base query: inquiring product information, known security vulnerability information, open source license information, dependent component relation and the like of the known binary component in a knowledge base by taking the constructed CPE as an inquiry condition;
(3) Constructing a dependency graph of binary components: executing recursion query when the dependency relation component is queried in the last step, wherein the recursion query comprises a direct dependency component and an indirect dependency component of the component, and finally generating a dependency relation map of the binary component;
(4) Security detection, including in particular one or more of the following:
vulnerability detection: carrying out security detection on all binary components on the dependency relationship graph, identifying known security vulnerabilities of the components, and judging vulnerability of the components according to vulnerability descriptions and risk levels;
license terms compliance detection: identifying terms of an open source component license, wherein general terms can protect intellectual property rights for component release and use, conduct behavior restriction on component use, modification and distribution (for example, a binary component uses a GPLv2 license, if the component is referenced for code modification, the open source code is required according to the terms), and detect compliance of license terms to avoid business risks;
detection of compatibility between open source licenses: if the binary software references a plurality of open source components based on different licenses during development, and the license terms contain conflicting requirements, there is a compatibility problem between the open source licenses.
In the binary software component analysis method provided by the embodiment of the disclosure, the multiple types of feature fingerprints to be analyzed of the binary file to be analyzed are extracted from multiple dimensions, the dimension of feature recognition of the binary file is enriched, and the component analysis is performed on the binary file to be analyzed through the feature matching result of the multiple types of feature fingerprints to be analyzed and the knowledge base, so that the binary software component analysis method has higher accuracy compared with the scheme that component analysis is performed only through ASCII character strings in the prior art. In addition, in the analysis method, matching priorities are set for the multiple types of feature fingerprints to be analyzed, feature matching is carried out on the feature fingerprints to be analyzed and the knowledge base according to the matching priorities, feature matching results are obtained, component analysis efficiency can be improved, and resource expenditure is effectively reduced.
In addition, an embodiment of the present disclosure further provides a binary software component analysis apparatus, as shown in fig. 8, including:
the file acquisition module is used for acquiring a binary file to be analyzed in the binary software;
the feature extraction module is used for extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed;
the priority setting module is used for setting matching priorities for the multiple types of feature fingerprints to be analyzed;
The feature matching module is used for carrying out feature matching on the feature fingerprints to be analyzed and the knowledge base according to the matching priority, so as to obtain feature matching results;
and the component analysis module is used for carrying out component analysis on the binary file to be analyzed according to the feature matching result.
Optionally, the binary software component analysis apparatus further includes: the security detection module is used for searching related information in a knowledge base according to the known binary component when the binary file to be analyzed is the known binary component, detecting vulnerability and license compliance according to the related information, or searching related information in the knowledge base according to the known binary component when the main component of the binary file to be analyzed is the known binary component, detecting vulnerability, license compliance and open source license compliance according to the related information, or detecting license compliance and open source license compliance according to the multi-class feature fingerprints of the binary file to be analyzed when the binary file to be analyzed is not the known binary component.
In order to facilitate a better understanding and implementation of the binary software component analysis method of the embodiments of the present disclosure by those skilled in the art, two specific embodiments are provided below for illustration.
Example 1
The composition analysis performed on javasample.
(1) And obtaining the Javasample. JAR, identifying the file format as JAR, and directly executing decompression operation to obtain the decompressed file directory tree.
(2) Traversing the decompressed file directory tree to find a class file which is mainly compiled by Java codes and a package manager feature file (pore. Xml), wherein the binary file is Java sample. Jar.
(3) Extracting original features of Javasample. JAR, including extracting file MD5 values, file dependency library information, ASCII strings, and JAR binary files do not relate to symbol table information and therefore are not extracted.
(4) And executing data cleaning on the original features, wherein the data cleaning comprises the steps of normalizing file dependency library information, and performing operations such as repeated value and abnormal value clearing on ASCII character strings.
(5) Crawling the product information, the component dependency information, the component known security vulnerability information, the binary file and the digital signature information thereof of the component, based on the product information, the digital signature information, the dependency relationship, the known security vulnerability and the open source license information of the binary component contained in the crawled information, extracting ASCII character string characteristics based on the binary file, and completing the knowledge base construction of the jaxen1.1.4 component, wherein the unique identification of the component is as follows: a is jaxen, jaxen is 1.1.14.
(6) According to the file format of Javasample. Jar, loading the JAVA knowledge base of the step (5), matching the digital signature characteristic fingerprints in the characteristic fingerprints extracted in the step (4) with the digital signature characteristic fingerprints extracted based on the known binary components in the knowledge base, and in the matching process, searching the knowledge base to find MD5 value matching by taking the type and the value of the digital signature characteristic fingerprints in the characteristic fingerprints extracted in the step (4) as query conditions, so that the Javasample. Jar is the known binary components (a: jaxen: jaxen: 1.1.14), and directly exiting the characteristic matching and component analysis flow.
(7) Based on the knowledge base analysis component a, jaxen, and 1.1.14, the direct dependency and indirect dependency component relationship are subjected to security detection, all binary components on the dependency relationship graph are subjected to security detection, the existing known security holes are identified, and 7 known security holes are identified in total.
(8) And performing license term compliance detection, and identifying 1 open source license in total, wherein the license term compliance is a low-risk open source license.
Example 2
Component analysis of CSample in example 2 specifically included:
(1) And acquiring the CSample, and identifying the file format as an executable file.
(2) Extracting original features of CSample, including extracting MD5 value, symbol table information, file dependency library information and ASCII character string of binary file.
(3) And executing data cleaning on the original features, wherein the data cleaning comprises the steps of normalizing file dependency library information, and performing operations such as repeated value and abnormal value clearing on ASCII character strings.
(4) According to the file format of CSample, loading a binary C program knowledge base, matching the digital signature characteristic fingerprints in the characteristic fingerprints extracted in the step (3) with the digital signature characteristic fingerprints extracted based on known binary components in the knowledge base, in the matching process, taking the type and the value of the digital signature characteristic fingerprints in the characteristic fingerprints extracted in the step (3), namely the MD5 value, as query conditions, searching in the knowledge base, and searching for the binary components which are not matched with the knowledge base, and then matching the character string characteristic fingerprints, the symbol table characteristic fingerprints and the dependency characteristic fingerprints in the characteristic fingerprints extracted in the step (3) with the characteristic fingerprints corresponding to the known binary components in the knowledge base to obtain a plurality of corresponding matching results.
(5) Calculating the feature similarity between the CSsample and each known binary component according to the matching result of each known binary component:
The original features of CSample extracted in step (2) include: for a known binary component a, rdesktop and 1.8.3, extracting character string characteristics 3110, symbol table characteristics 710 and dependency characteristics 22 of a, rdesktop and 1.8.3 from a knowledge base, wherein the number of the characteristics matched with the two is as follows: character string feature 2156, symbol table feature 597, dependency feature 22, then
Character string scoring factor:
symbol table scoring factors:
dependency scoring factor:
the set importance weights are as follows: the string weight N1 is 0.8, the symbol table weight N2 is 0.9, and the dependency weight N3 is 0.7, and since k3=1, N3 is adjusted to 1 in the subsequent calculation.
The feature similarity between CSample and a: rdeaktop: 1.8.3 is:
the feature similarity calculation process is carried out on all known binary components in the knowledge base to obtain a plurality of feature similarities, wherein the known binary component with the highest feature similarity is a:rdesktop:rdesktop:1.8.3, and because the known binary component reaches a preset threshold value of 0.8, CSample is considered to be based on a:rdesktop:rdesktop:1.8.3, direct reference and source code modification are carried out, and the main components are the same as a:rdesktop:rdesktop: 1.8.3.
(6) Based on the direct dependency and indirect dependency component relation of the knowledge base analysis component a, rdeaktop and 1.8.3, carrying out safety detection, wherein the safety detection comprises: on the one hand, since CSample is based on a: rdesktop: rdesktop:1.8.3, compliance detection of license terms needs to be detected, on the other hand, since a: rdesktop: rdesktop:1.8.3 identifies an Apache-2.0 open source license whose terms are compliant and are low risk open source licenses, on the other hand, based on symbol table information, it is found that GPLv2.0 license is also used in CSample whose terms are high risk open source license, and the user is not allowed to distribute the modified code or derivative software in a commercial software manner, on the other hand, apache v2.0 code and GPLv2.0 code are used in the CSample simultaneously and issued in combination, there is a compatibility problem because user patent license and some patent termination and infringement protection terms are explicitly granted in Apache v2.0, and GPLv2.0 prohibits patent license.
An electronic device according to an embodiment of the present disclosure includes a memory and a processor. The memory is for storing non-transitory computer readable instructions. In particular, the memory may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions. In one embodiment of the present disclosure, the processor is configured to execute the computer readable instructions stored in the memory, to cause the electronic device to perform all or part of the steps of the binary software composition analysis method of the embodiments of the present disclosure described above.
It should be understood by those skilled in the art that, in order to solve the technical problem of how to obtain a good user experience effect, the present embodiment may also include well-known structures such as a communication bus, an interface, and the like, and these well-known structures are also included in the protection scope of the present disclosure.
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. A schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 9 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 9, the electronic device may include a processor (e.g., a central processing unit, a graphic processor, etc.) that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage device into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the electronic device are also stored. The processor, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
In general, the following devices may be connected to the I/O interface: input means including, for example, sensors or visual information gathering devices; output devices including, for example, display screens and the like; storage devices including, for example, magnetic tape, hard disk, etc.; a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices, such as edge computing devices, to exchange data. While fig. 9 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device, or installed from a storage device, or installed from ROM. All or part of the steps of the binary software component analysis method of the embodiments of the present disclosure are performed when the computer program is executed by a processor.
The detailed description of the present embodiment may refer to the corresponding description in the foregoing embodiments, and will not be repeated herein.
A computer-readable storage medium according to an embodiment of the present disclosure has stored thereon non-transitory computer-readable instructions. When executed by a processor, perform all or part of the steps of the binary software component analysis methods of the various embodiments of the present disclosure described previously.
The computer-readable storage medium described above includes, but is not limited to: optical storage media (e.g., CD-ROM and DVD), magneto-optical storage media (e.g., MO), magnetic storage media (e.g., magnetic tape or removable hard disk), media with built-in rewritable non-volatile memory (e.g., memory card), and media with built-in ROM (e.g., ROM cartridge).
The detailed description of the present embodiment may refer to the corresponding description in the foregoing embodiments, and will not be repeated herein.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this disclosure, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, and the block diagrams of devices, apparatuses, devices, systems involved in this disclosure are merely illustrative examples and are not intended to require or implicate that connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
In addition, as used herein, the use of "or" in the recitation of items beginning with "at least one" indicates a separate recitation, such that recitation of "at least one of A, B or C" for example means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the term "exemplary" does not mean that the described example is preferred or better than other examples.
It is also noted that in the systems and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
Various changes, substitutions, and alterations are possible to the techniques described herein without departing from the teachings of the techniques defined by the appended claims. Furthermore, the scope of the claims of the present disclosure is not limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. The processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (16)

1. A binary software component analysis method, comprising:
acquiring a binary file to be analyzed in binary software;
extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed;
setting matching priority for the multiple types of feature fingerprints to be analyzed;
according to the matching priority, performing feature matching on the feature fingerprint to be analyzed and a knowledge base to obtain a feature matching result;
and carrying out component analysis on the binary file to be analyzed according to the feature matching result.
2. The binary software component analysis method according to claim 1, wherein the acquiring the binary file to be analyzed in the binary software includes:
acquiring binary files in various file formats in the binary software;
identifying the binary files with the multiple file formats to obtain the binary file to be analyzed;
The binary file to be analyzed is an executable file, a relocatable file or a sharing target file.
3. The binary software component analysis method according to claim 1, wherein the extracting the multi-class feature fingerprint to be analyzed of the binary file to be analyzed comprises:
extracting a plurality of original features of the binary file to be analyzed;
data cleaning is carried out on the plurality of original features;
and classifying the plurality of original features after data cleaning to obtain the multi-class feature fingerprints to be analyzed.
4. The binary software component analysis method of claim 3, wherein the original features include hash values, MD5, SHA1 values, symbol table information, file-dependent library information, decompiled code symbols, and ASCII strings.
5. The binary software component analysis method according to claim 1, wherein the plurality of types of feature fingerprints to be analyzed include: a digital signature class feature fingerprint to be analyzed, a symbol table feature fingerprint to be analyzed, a dependency relationship feature fingerprint to be analyzed and a character string feature fingerprint to be analyzed; the digital signature feature fingerprint to be analyzed has a first matching priority; the knowledge base comprises a plurality of known binary components and corresponding digital signature characteristic fingerprints, symbol table characteristic fingerprints, dependency characteristic fingerprints and character string characteristic fingerprints.
6. The binary software component analysis method according to claim 5, wherein the feature matching between the feature fingerprint to be analyzed and the knowledge base according to the matching priority to obtain a feature matching result comprises:
performing feature matching on the digital signature feature fingerprints to be analyzed and the digital signature feature fingerprints of all known binary components;
if the matching is successful, a first feature matching result is obtained, wherein the first feature matching result is that the binary file to be analyzed is a known binary component;
if the matching fails, performing feature matching on the symbol table feature fingerprint to be analyzed and the symbol table feature fingerprint of the i-th known binary component in the knowledge base to obtain a first matching result, wherein i is 1,2,3, …, P, and P is the total number of the known binary components in the knowledge base;
performing feature matching on the character string feature fingerprint to be analyzed and the character string feature fingerprint of the ith known binary component to obtain a second matching result;
and performing feature matching on the dependency characteristic fingerprint to be analyzed and the dependency characteristic fingerprint of the ith known binary component to obtain a third matching result.
7. The binary software component analysis method according to claim 6, wherein the first matching result is a number of matches between the symbol table feature fingerprint to be analyzed and the symbol table feature fingerprint of the i-th known binary component, the second matching result is a number of matches between the string feature fingerprint to be analyzed and the string feature fingerprint of the i-th known binary component, and the third matching result is a number of matches between the dependency feature fingerprint to be analyzed and the dependency feature fingerprint of the i-th known binary component.
8. The binary software component analysis method according to claim 6 or 7, wherein the component analysis of the binary file to be analyzed according to the feature matching result comprises:
obtaining feature similarity between the binary file to be analyzed and an ith known binary component according to the first matching result, the second matching result and the third matching result;
and obtaining a component analysis result according to the highest feature similarity.
9. The binary software component analysis method according to claim 8, wherein obtaining feature similarities between the binary file to be analyzed and an i-th known binary component based on the first matching result, the second matching result, and the third matching result comprises:
Calculating a first scoring factor K1, K1E [0,1] according to the first matching result;
calculating a second scoring factor K2, K2E [0,1] according to the second matching result;
calculating a third scoring factor K3, K3E [0,1] according to the third matching result;
and calculating to obtain the feature similarity according to the weights corresponding to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3 and the third scoring factor K3.
10. The method for analyzing binary software components according to claim 9, wherein the calculating feature similarity according to the weights corresponding to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3 and the three comprises:
presetting a first weight N1 for the first scoring factor K1, presetting a second weight N2 for the second scoring factor K2, presetting a third weight N3 for the third scoring factor K3, wherein N1 epsilon [0,1], N2 epsilon [0,1], N3 epsilon [0,1];
when the third scoring factor k3=1, correcting the value of the third weight N3 to be 1;
and calculating to obtain feature similarity according to the first scoring factor K1, the second scoring factor K2, the third scoring factor K3, the first weight N1, the second weight N2 and the corrected third weight N3.
11. The binary software component analysis method according to claim 8, wherein the obtaining the component analysis result according to the highest feature similarity comprises:
if the highest feature similarity is greater than or equal to the preset threshold, the component analysis result is that the main component of the binary file to be analyzed is a known binary component corresponding to the highest feature similarity;
and if the highest feature similarity is smaller than the preset threshold value, the component analysis result is that the main component of the binary file to be analyzed is not a known binary component.
12. The binary software component analysis method according to claim 11, further comprising: and carrying out safety detection on the binary file to be analyzed according to the component analysis result and the multiple types of characteristic fingerprints to be analyzed.
13. The method for analyzing components of binary software according to claim 12, wherein the security detection of the binary file to be analyzed according to the component analysis result and the multi-class feature fingerprint to be analyzed comprises:
if the component analysis result is that the binary file to be analyzed is a known binary component, searching relevant information in the knowledge base according to the known binary component, and carrying out vulnerability detection and license term compliance detection on the binary file to be analyzed according to the relevant information;
If the component analysis result is that the main component of the binary file to be analyzed is a known binary component, searching related information in the knowledge base according to the known binary component, and performing vulnerability detection, license term compliance detection and open source license compatibility detection on the binary file to be analyzed according to the related information;
and if the component analysis result is that the main component of the binary file to be analyzed is not a known binary component, performing license term compliance detection and open source license-to-license compatibility detection on the binary file to be analyzed according to the multi-class feature fingerprints to be analyzed.
14. A binary software component analysis apparatus, comprising:
the file acquisition module is used for acquiring a binary file to be analyzed in the binary software;
the feature extraction module is used for extracting multiple types of feature fingerprints to be analyzed of the binary file to be analyzed;
the priority setting module is used for setting matching priorities for the multiple types of feature fingerprints to be analyzed;
the feature matching module is used for carrying out feature matching on the feature fingerprint to be analyzed and the knowledge base according to the matching priority, so as to obtain a feature matching result;
And the component analysis module is used for carrying out component analysis on the binary file to be analyzed according to the characteristic matching result.
15. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the binary software constituent analysis method according to any one of claims 1-13.
16. A computer readable storage medium storing computer instructions for causing a computer to perform the binary software component analysis method of any one of claims 1 to 13.
CN202311724616.9A 2023-12-13 2023-12-13 Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium Pending CN117707953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311724616.9A CN117707953A (en) 2023-12-13 2023-12-13 Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311724616.9A CN117707953A (en) 2023-12-13 2023-12-13 Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117707953A true CN117707953A (en) 2024-03-15

Family

ID=90149259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311724616.9A Pending CN117707953A (en) 2023-12-13 2023-12-13 Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117707953A (en)

Similar Documents

Publication Publication Date Title
US9876812B1 (en) Automatic malware signature extraction from runtime information
US9003529B2 (en) Apparatus and method for identifying related code variants in binaries
RU2420791C1 (en) Method of associating previously unknown file with collection of files depending on degree of similarity
Alazab et al. Malware detection based on structural and behavioural features of API calls
EP2693356B1 (en) Detecting pirated applications
US20090144702A1 (en) System And Program Product for Determining Java Software Code Plagiarism and Infringement
US20140082729A1 (en) System and method for analyzing repackaged application through risk calculation
US20200380125A1 (en) Method for Detecting Libraries in Program Binaries
US11036479B2 (en) Devices, systems, and methods of program identification, isolation, and profile attachment
US10409572B2 (en) Compiled file normalization
CN112395305B (en) SQL sentence analysis method and device, electronic equipment and storage medium
RU2722692C1 (en) Method and system for detecting malicious files in a non-isolated medium
US10296326B2 (en) Method and system for identifying open-source software package based on binary files
US10203953B2 (en) Identification of duplicate function implementations
KR102073068B1 (en) Method for clustering application and apparatus thereof
CN114386046A (en) Unknown vulnerability detection method and device, electronic equipment and storage medium
US8650170B2 (en) Systems and methods for inter-object pattern matching
CN117707953A (en) Binary software component analysis method, binary software component analysis device, electronic equipment and storage medium
US9727344B2 (en) Mining dependencies from disk images
RU101223U1 (en) SYSTEM FOR QUICK DETECTION OF SIMILAR OBJECTS USING CONVOLUTIONS
EP3989094A1 (en) Verifying information creating system, verifying information creating method, and verifying information creating program
CN112948415A (en) SQL statement detection method and device, terminal equipment and storage medium
US20200272916A1 (en) Hypothesis verification apparatus, hypothesis verification method, and computer-readable recording medium
US11356853B1 (en) Detection of malicious mobile apps
CN113721978B (en) Method and system for detecting open source component in mixed source software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination