CN115686623A - Homologous detection method of closed-source software - Google Patents
Homologous detection method of closed-source software Download PDFInfo
- Publication number
- CN115686623A CN115686623A CN202211372662.2A CN202211372662A CN115686623A CN 115686623 A CN115686623 A CN 115686623A CN 202211372662 A CN202211372662 A CN 202211372662A CN 115686623 A CN115686623 A CN 115686623A
- Authority
- CN
- China
- Prior art keywords
- software
- closed
- source software
- database
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 claims abstract description 32
- 238000001914 filtration Methods 0.000 claims description 9
- 238000009434 installation Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000007943 implant Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Landscapes
- Stored Programmes (AREA)
Abstract
The embodiment of the application discloses a closed-source software homologous detection method, which comprises the steps of constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary characteristic library and an incidence relation database of closed-source software files and characteristics; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics; the detection result of the software to be detected is determined based on the matched features and the corresponding files, the method can establish a closed source software database, a database building program realized based on the database building process can realize automatic database building, the database can be supplemented at any time, a new database can be established according to detection requirements, and support is provided for realizing the homologous detection of the closed source software.
Description
Technical Field
The application relates to the technical field of computers, in particular to a homologous detection method of closed-source software.
Background
With the development of society and the gradual maturity of internet technology, people pay more and more attention to intellectual property and software technology autonomy, the open source security field is rapidly developed, and related technologies such as open source security detection and open source code clone detection are quite mature, so that the improvement of software autonomy is greatly promoted. However, these techniques are all based on the detection of the database constructed by the open source software, and can only detect the reference to the open source software, but cannot detect the reference to the closed source software.
Closed source software, although unable to acquire source code, may still be introduced as a separate functional unit. The software developer may obtain the functional entity of the closed source software by purchase or other methods, and implant the functional entity into the own software as a part of the own software. The reference to the closed-source software can cause the function of the software to depend on the referenced closed-source software, so that the autonomy of the software is greatly reduced, and great hidden danger exists in the aspect of technical safety.
Disclosure of Invention
In order to solve or partially solve the above problems, the present application provides a homologous detection method for closed source software.
The application provides a closed-source software homologous detection method, which comprises the following steps: constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary feature library and an incidence relation database of closed-source software files and features; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the closed source software file and the association relation database of the characteristics; and determining the detection result of the software to be detected based on the matched features and the corresponding files.
In some examples, building a closed-source software database includes: collecting closed source software; acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software; and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
In some examples, obtaining the binary file of the software to be detected includes: acquiring the software to be detected; and acquiring a binary file corresponding to the software to be detected.
In some examples, determining the feature in the closed source software database that matches the binary file of the software to be detected, and determining the file corresponding to the matching feature from the closed source software file and the association relationship database of features includes: decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected; matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics; and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
In some examples, determining the detection result of the software to be detected based on the matched features and the corresponding files includes: grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong; and outputting the detection result of the software to be detected according to the matching proportion.
In some examples, before outputting the detection result of the software to be detected according to the matching proportion, the method further includes: and filtering all the matching proportions according to a proportion threshold value.
In some examples, the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps: determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion; and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
Compared with the prior art, the method has the following beneficial effects:
in the technical scheme provided by the application, a closed-source software database is constructed, wherein the closed-source software database comprises a closed-source software binary characteristic library and an incidence relation database of closed-source software files and characteristics; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics; the detection result of the software to be detected is determined based on the matched characteristics and the corresponding files, the method can establish a closed source software database, a database building program realized based on the database building process can realize automatic database building, the database can be supplemented at any time, a new database can be built according to the detection requirement, and support is provided for realizing the homologous detection of the closed source software. The closed source software homologous detection method can carry out binary homologous detection based on a closed source software library.
Drawings
Fig. 1 is a basic flowchart of a method for detecting homology of closed-source software according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should also be noted that: reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Example one
Referring to fig. 1, fig. 1 illustrates a closed-source software homology detection method according to an exemplary embodiment, where the closed-source software homology detection method includes:
s101, constructing a closed source software database, wherein the closed source software database comprises a closed source software binary characteristic library and an incidence relation database of closed source software files and characteristics;
s102, acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics;
s103, determining the detection result of the software to be detected based on the matched features and the corresponding files.
When the corresponding file is detected, the detection result is used for representing the file which is homologous with the software to be detected and the closed source software to which the file belongs, and when the corresponding file is not detected, the detection result is used for representing the file which is not homologous with the software to be detected and the closed source software to which the file belongs.
In some examples, building a closed-source software database includes: collecting closed source software; acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software; and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
In some examples, obtaining the binary file of the software to be detected includes: acquiring the software to be detected; and acquiring a binary file corresponding to the software to be detected.
In some examples, determining the feature in the closed source software database that matches the binary file of the software to be detected, and determining the file corresponding to the matching feature from the closed source software file and the association relationship database of features includes: decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected; matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics; and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
In some examples, determining the detection result of the software to be detected based on the matched features and the corresponding files includes: grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong; and outputting the detection result of the software to be detected according to the matching proportion.
In some examples, before outputting the detection result of the software to be detected according to the matching proportion, the method further includes: and filtering all the matching proportions according to a proportion threshold value.
In some examples, the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps: determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion; and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
Specifically, the closed source software database comprises a closed source software binary feature library, a closed source software information database, and a closed source software and file and feature association relation database. The closed-source software information database can be crawled in a crawler mode, the incidence relation database is generated in the database building process, and the closed-source software binary characteristic database is a core part and provides data support for binary characteristic matching detection;
1) Collecting closed source software installation packages
To construct a closed-source software library, a software package of closed-source software is collected firstly, and the closed-source software is not open and cannot be directly acquired. In consideration of the fact that closed-source software is mostly provided externally in the form of a software installation package and is used by a user after the software is installed through the software installation package, the closed-source software is indirectly acquired by collecting the software installation package of the closed-source software.
2) Extracting software principals from installation packages
The software installation package is an installer of software, and not software itself. The software can be installed by running a software installation package, and the installed software is the software main body required by the invention. However, in the actual operation process, the collected software installation packages cannot be installed once, corresponding software extraction programs are compiled aiming at different types by analyzing the types of the software installation packages, the installation packages are processed by the extraction programs, the software main body can be extracted through the installation packages without installation, and the extracted software main body is consistent with the software installed by the software installation packages. Types of installation packages that can be processed by the extraction program designed and implemented by the invention include, but are not limited to, apk, exe, msi, dmg, iso, rpm, deb, tar.
3) Obtaining binary files from a software agent
After the software main body is extracted from the installation package, the binary files contained in the software main body can be obtained, all files of the software main body are traversed, and the binary files can be screened out by identifying the file types. And simultaneously storing the mapping relation between the software and the binary file thereof in a warehouse as a basis for homologous detection.
4) Extracting binary file features
The binary file can not be directly used, the intermediate representation code of the binary file is obtained by decompiling the binary file, and the required characteristics are extracted through the intermediate code.
5) Constructing a binary feature library
And storing the extracted binary file features into a warehouse. The mapping relation between the features and the files needs to be stored while the feature data is stored, so that feature matching in the detection process is facilitated.
Binary isogeny detection
The binary homologous detection process based on the closed-source software database comprises the following steps:
1) Uploading software installation packages/software packages
And uploading the software to be detected, wherein the software to be detected can be a software installation package or closed source software. The closed source software can be directly used, the closed source software needs to be extracted from the software installation package, and the method is the same as the steps.
2) Obtaining binary files
And acquiring the binary file for uploading the software to be detected, wherein the method is the same as the steps.
3) Extracting binary file features
And decompiling the binary file of the uploaded detected software and then extracting binary characteristics, wherein the method is the same as the steps.
4) Feature matching
And matching the binary characteristics of the detection software with the characteristics in the closed source software characteristic library, and recording the matched characteristics and the files to which the matched characteristics belong.
5) Calculating the matching proportion
And grouping the matched characteristics according to the files to which the characteristics belong, comparing the characteristics with the original characteristics of the files, and calculating the matching proportion of the files.
6) Filtering results according to threshold
And filtering the preliminary detection result, and filtering unreliable parts in the result according to a threshold value. The threshold value can be the file matching proportion and the feature matching quantity.
7) Output of detection result
And outputting a final detection result, wherein the detection result comprises the homologous file and the closed-source software to which the homologous file belongs.
The invention provides a closed-source software homologous detection method, which can establish a closed-source software database, can realize automatic database establishment based on a database establishment program realized by the database establishment process, can supplement the database at any time, can establish a new database according to detection requirements, and provides support for realizing the homologous detection of the closed-source software. The closed source software homologous detection method can carry out binary homologous detection based on a closed source software library.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained in this patent by applying specific examples, and the descriptions of the embodiments above are only used to help understanding the principles of the embodiments of the present invention; the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (7)
1. A method for detecting the homology of closed-source software is characterized by comprising the following steps:
constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary feature library and an incidence relation database of closed-source software files and features;
acquiring a binary file of the software to be detected;
determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics;
and determining the detection result of the software to be detected based on the matched features and the corresponding files.
2. The method of claim 1, wherein building a closed-source software database comprises:
collecting closed source software;
acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software;
and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
3. The method of claim 1, wherein obtaining the binary file of the software to be tested comprises:
acquiring the software to be detected;
and acquiring a binary file corresponding to the software to be detected.
4. The method according to claim 1, wherein determining the features in the closed source software database that match the binary files of the software to be detected, and determining the files corresponding to the matched features from the closed source software files and the association database of the features comprises:
decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected;
matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics;
and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
5. The method of claim 4, wherein determining the detection result of the software to be detected based on the matched features and the corresponding files comprises:
grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong;
and outputting the detection result of the software to be detected according to the matching proportion.
6. The method according to claim 6, wherein before outputting the detection result of the software to be detected according to the matching ratio, the method further comprises:
and filtering all the matching proportions according to a proportion threshold value.
7. The method of claim 6, wherein the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps:
determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion;
and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211372662.2A CN115686623A (en) | 2022-11-03 | 2022-11-03 | Homologous detection method of closed-source software |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211372662.2A CN115686623A (en) | 2022-11-03 | 2022-11-03 | Homologous detection method of closed-source software |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115686623A true CN115686623A (en) | 2023-02-03 |
Family
ID=85047236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211372662.2A Pending CN115686623A (en) | 2022-11-03 | 2022-11-03 | Homologous detection method of closed-source software |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115686623A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116149669A (en) * | 2023-04-14 | 2023-05-23 | 杭州安恒信息技术股份有限公司 | Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100241469A1 (en) * | 2009-03-18 | 2010-09-23 | Novell, Inc. | System and method for performing software due diligence using a binary scan engine and parallel pattern matching |
CN111666101A (en) * | 2020-04-24 | 2020-09-15 | 北京大学 | Software homologous analysis method and device |
CN113987427A (en) * | 2021-10-28 | 2022-01-28 | 苏州棱镜七彩信息科技有限公司 | Tracing method of homologous codes |
CN114385231A (en) * | 2021-12-20 | 2022-04-22 | 杭州安恒信息安全技术有限公司 | Data processing method, data processing device, storage medium and electronic equipment |
CN115238102A (en) * | 2022-06-28 | 2022-10-25 | 北京关键科技股份有限公司 | Code data feature extraction and retrieval method and device |
-
2022
- 2022-11-03 CN CN202211372662.2A patent/CN115686623A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100241469A1 (en) * | 2009-03-18 | 2010-09-23 | Novell, Inc. | System and method for performing software due diligence using a binary scan engine and parallel pattern matching |
CN111666101A (en) * | 2020-04-24 | 2020-09-15 | 北京大学 | Software homologous analysis method and device |
CN113987427A (en) * | 2021-10-28 | 2022-01-28 | 苏州棱镜七彩信息科技有限公司 | Tracing method of homologous codes |
CN114385231A (en) * | 2021-12-20 | 2022-04-22 | 杭州安恒信息安全技术有限公司 | Data processing method, data processing device, storage medium and electronic equipment |
CN115238102A (en) * | 2022-06-28 | 2022-10-25 | 北京关键科技股份有限公司 | Code data feature extraction and retrieval method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116149669A (en) * | 2023-04-14 | 2023-05-23 | 杭州安恒信息技术股份有限公司 | Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104123493B (en) | The safety detecting method and device of application program | |
CN110083623B (en) | Business rule generation method and device | |
CN112669138B (en) | Data processing method and related equipment | |
EP2693356A2 (en) | Detecting pirated applications | |
CN104836781A (en) | Method distinguishing identities of access users, and device | |
CN109446753A (en) | Detect method, apparatus, computer equipment and the storage medium of pirate application program | |
CN115686623A (en) | Homologous detection method of closed-source software | |
CN111222137A (en) | Program classification model training method, program classification method and device | |
CN113052577B (en) | Class speculation method and system for block chain digital currency virtual address | |
CN113011889A (en) | Account abnormity identification method, system, device, equipment and medium | |
CN112579462A (en) | Test case acquisition method, system, equipment and computer readable storage medium | |
CN114329455B (en) | User abnormal behavior detection method and device based on heterogeneous graph embedding | |
CN114627412A (en) | Method, device and processor for realizing unsupervised depth forgery video detection processing based on error reconstruction and computer storage medium thereof | |
CN115114587A (en) | Automatic identification method, system, equipment and storage medium of counterfeit applet | |
CN109460474B (en) | User preference trend mining method | |
CN113282921B (en) | File detection method, device, equipment and storage medium | |
CN111651500A (en) | User identity recognition method, electronic device and storage medium | |
CN113297498B (en) | Internet-based food attribute mining method and system | |
CN115422522A (en) | Abnormal equipment judgment reference establishment method, abnormal equipment identification method, abnormal equipment judgment reference establishment device, abnormal equipment identification device and abnormal equipment identification device | |
CN114792007A (en) | Code detection method, device, equipment, storage medium and computer program product | |
CN113434826A (en) | Detection method and system for counterfeit mobile application and related products | |
CN107229865B (en) | Method and device for analyzing Webshell intrusion reason | |
CN112016961A (en) | Pushing method and device, electronic equipment and computer readable storage medium | |
CN113254352A (en) | Test method, device, equipment and storage medium for test case | |
CN109088859B (en) | Method, device, server and readable storage medium for identifying suspicious target object |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Liang Dagong Inventor after: Wang Bo Inventor after: Lv Jinbiao Inventor after: Wang Xiaozhou Inventor before: Liang Dagong Inventor before: Wang Bo Inventor before: Lv Jinbiao |