CN115686623A - Homologous detection method of closed-source software - Google Patents

Homologous detection method of closed-source software Download PDF

Info

Publication number
CN115686623A
CN115686623A CN202211372662.2A CN202211372662A CN115686623A CN 115686623 A CN115686623 A CN 115686623A CN 202211372662 A CN202211372662 A CN 202211372662A CN 115686623 A CN115686623 A CN 115686623A
Authority
CN
China
Prior art keywords
software
closed
source software
database
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211372662.2A
Other languages
Chinese (zh)
Inventor
梁大功
王博
吕金彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lengjing Qicai Information Technology Co ltd
Original Assignee
Suzhou Lengjing Qicai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lengjing Qicai Information Technology Co ltd filed Critical Suzhou Lengjing Qicai Information Technology Co ltd
Priority to CN202211372662.2A priority Critical patent/CN115686623A/en
Publication of CN115686623A publication Critical patent/CN115686623A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application discloses a closed-source software homologous detection method, which comprises the steps of constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary characteristic library and an incidence relation database of closed-source software files and characteristics; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics; the detection result of the software to be detected is determined based on the matched features and the corresponding files, the method can establish a closed source software database, a database building program realized based on the database building process can realize automatic database building, the database can be supplemented at any time, a new database can be established according to detection requirements, and support is provided for realizing the homologous detection of the closed source software.

Description

Homologous detection method of closed-source software
Technical Field
The application relates to the technical field of computers, in particular to a homologous detection method of closed-source software.
Background
With the development of society and the gradual maturity of internet technology, people pay more and more attention to intellectual property and software technology autonomy, the open source security field is rapidly developed, and related technologies such as open source security detection and open source code clone detection are quite mature, so that the improvement of software autonomy is greatly promoted. However, these techniques are all based on the detection of the database constructed by the open source software, and can only detect the reference to the open source software, but cannot detect the reference to the closed source software.
Closed source software, although unable to acquire source code, may still be introduced as a separate functional unit. The software developer may obtain the functional entity of the closed source software by purchase or other methods, and implant the functional entity into the own software as a part of the own software. The reference to the closed-source software can cause the function of the software to depend on the referenced closed-source software, so that the autonomy of the software is greatly reduced, and great hidden danger exists in the aspect of technical safety.
Disclosure of Invention
In order to solve or partially solve the above problems, the present application provides a homologous detection method for closed source software.
The application provides a closed-source software homologous detection method, which comprises the following steps: constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary feature library and an incidence relation database of closed-source software files and features; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the closed source software file and the association relation database of the characteristics; and determining the detection result of the software to be detected based on the matched features and the corresponding files.
In some examples, building a closed-source software database includes: collecting closed source software; acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software; and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
In some examples, obtaining the binary file of the software to be detected includes: acquiring the software to be detected; and acquiring a binary file corresponding to the software to be detected.
In some examples, determining the feature in the closed source software database that matches the binary file of the software to be detected, and determining the file corresponding to the matching feature from the closed source software file and the association relationship database of features includes: decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected; matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics; and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
In some examples, determining the detection result of the software to be detected based on the matched features and the corresponding files includes: grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong; and outputting the detection result of the software to be detected according to the matching proportion.
In some examples, before outputting the detection result of the software to be detected according to the matching proportion, the method further includes: and filtering all the matching proportions according to a proportion threshold value.
In some examples, the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps: determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion; and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
Compared with the prior art, the method has the following beneficial effects:
in the technical scheme provided by the application, a closed-source software database is constructed, wherein the closed-source software database comprises a closed-source software binary characteristic library and an incidence relation database of closed-source software files and characteristics; acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics; the detection result of the software to be detected is determined based on the matched characteristics and the corresponding files, the method can establish a closed source software database, a database building program realized based on the database building process can realize automatic database building, the database can be supplemented at any time, a new database can be built according to the detection requirement, and support is provided for realizing the homologous detection of the closed source software. The closed source software homologous detection method can carry out binary homologous detection based on a closed source software library.
Drawings
Fig. 1 is a basic flowchart of a method for detecting homology of closed-source software according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should also be noted that: reference to "a plurality" in this application means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Example one
Referring to fig. 1, fig. 1 illustrates a closed-source software homology detection method according to an exemplary embodiment, where the closed-source software homology detection method includes:
s101, constructing a closed source software database, wherein the closed source software database comprises a closed source software binary characteristic library and an incidence relation database of closed source software files and characteristics;
s102, acquiring a binary file of the software to be detected; determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics;
s103, determining the detection result of the software to be detected based on the matched features and the corresponding files.
When the corresponding file is detected, the detection result is used for representing the file which is homologous with the software to be detected and the closed source software to which the file belongs, and when the corresponding file is not detected, the detection result is used for representing the file which is not homologous with the software to be detected and the closed source software to which the file belongs.
In some examples, building a closed-source software database includes: collecting closed source software; acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software; and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
In some examples, obtaining the binary file of the software to be detected includes: acquiring the software to be detected; and acquiring a binary file corresponding to the software to be detected.
In some examples, determining the feature in the closed source software database that matches the binary file of the software to be detected, and determining the file corresponding to the matching feature from the closed source software file and the association relationship database of features includes: decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected; matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics; and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
In some examples, determining the detection result of the software to be detected based on the matched features and the corresponding files includes: grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong; and outputting the detection result of the software to be detected according to the matching proportion.
In some examples, before outputting the detection result of the software to be detected according to the matching proportion, the method further includes: and filtering all the matching proportions according to a proportion threshold value.
In some examples, the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps: determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion; and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
Specifically, the closed source software database comprises a closed source software binary feature library, a closed source software information database, and a closed source software and file and feature association relation database. The closed-source software information database can be crawled in a crawler mode, the incidence relation database is generated in the database building process, and the closed-source software binary characteristic database is a core part and provides data support for binary characteristic matching detection;
1) Collecting closed source software installation packages
To construct a closed-source software library, a software package of closed-source software is collected firstly, and the closed-source software is not open and cannot be directly acquired. In consideration of the fact that closed-source software is mostly provided externally in the form of a software installation package and is used by a user after the software is installed through the software installation package, the closed-source software is indirectly acquired by collecting the software installation package of the closed-source software.
2) Extracting software principals from installation packages
The software installation package is an installer of software, and not software itself. The software can be installed by running a software installation package, and the installed software is the software main body required by the invention. However, in the actual operation process, the collected software installation packages cannot be installed once, corresponding software extraction programs are compiled aiming at different types by analyzing the types of the software installation packages, the installation packages are processed by the extraction programs, the software main body can be extracted through the installation packages without installation, and the extracted software main body is consistent with the software installed by the software installation packages. Types of installation packages that can be processed by the extraction program designed and implemented by the invention include, but are not limited to, apk, exe, msi, dmg, iso, rpm, deb, tar.
3) Obtaining binary files from a software agent
After the software main body is extracted from the installation package, the binary files contained in the software main body can be obtained, all files of the software main body are traversed, and the binary files can be screened out by identifying the file types. And simultaneously storing the mapping relation between the software and the binary file thereof in a warehouse as a basis for homologous detection.
4) Extracting binary file features
The binary file can not be directly used, the intermediate representation code of the binary file is obtained by decompiling the binary file, and the required characteristics are extracted through the intermediate code.
5) Constructing a binary feature library
And storing the extracted binary file features into a warehouse. The mapping relation between the features and the files needs to be stored while the feature data is stored, so that feature matching in the detection process is facilitated.
Binary isogeny detection
The binary homologous detection process based on the closed-source software database comprises the following steps:
1) Uploading software installation packages/software packages
And uploading the software to be detected, wherein the software to be detected can be a software installation package or closed source software. The closed source software can be directly used, the closed source software needs to be extracted from the software installation package, and the method is the same as the steps.
2) Obtaining binary files
And acquiring the binary file for uploading the software to be detected, wherein the method is the same as the steps.
3) Extracting binary file features
And decompiling the binary file of the uploaded detected software and then extracting binary characteristics, wherein the method is the same as the steps.
4) Feature matching
And matching the binary characteristics of the detection software with the characteristics in the closed source software characteristic library, and recording the matched characteristics and the files to which the matched characteristics belong.
5) Calculating the matching proportion
And grouping the matched characteristics according to the files to which the characteristics belong, comparing the characteristics with the original characteristics of the files, and calculating the matching proportion of the files.
6) Filtering results according to threshold
And filtering the preliminary detection result, and filtering unreliable parts in the result according to a threshold value. The threshold value can be the file matching proportion and the feature matching quantity.
7) Output of detection result
And outputting a final detection result, wherein the detection result comprises the homologous file and the closed-source software to which the homologous file belongs.
The invention provides a closed-source software homologous detection method, which can establish a closed-source software database, can realize automatic database establishment based on a database establishment program realized by the database establishment process, can supplement the database at any time, can establish a new database according to detection requirements, and provides support for realizing the homologous detection of the closed-source software. The closed source software homologous detection method can carry out binary homologous detection based on a closed source software library.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is only a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained in this patent by applying specific examples, and the descriptions of the embodiments above are only used to help understanding the principles of the embodiments of the present invention; the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A method for detecting the homology of closed-source software is characterized by comprising the following steps:
constructing a closed-source software database, wherein the closed-source software database comprises a closed-source software binary feature library and an incidence relation database of closed-source software files and features;
acquiring a binary file of the software to be detected;
determining the characteristics matched with the binary files of the software to be detected in the closed source software database, and determining files corresponding to the matched characteristics from the incidence relation database of the closed source software files and the characteristics;
and determining the detection result of the software to be detected based on the matched features and the corresponding files.
2. The method of claim 1, wherein building a closed-source software database comprises:
collecting closed source software;
acquiring a binary file from the closed source software, performing decompiling on the binary file of the closed source software, and extracting the binary file characteristics corresponding to the closed source software;
and constructing a closed source software binary feature library and an incidence relation database of the closed source software file and the features according to the extracted binary file features corresponding to the closed source software.
3. The method of claim 1, wherein obtaining the binary file of the software to be tested comprises:
acquiring the software to be detected;
and acquiring a binary file corresponding to the software to be detected.
4. The method according to claim 1, wherein determining the features in the closed source software database that match the binary files of the software to be detected, and determining the files corresponding to the matched features from the closed source software files and the association database of the features comprises:
decompiling the binary file corresponding to the software to be detected, and extracting the characteristics of the binary file corresponding to the software to be detected;
matching the binary file characteristics of the software to be detected with the binary file characteristics contained in the closed source software database one by one, and recording the successfully matched binary file characteristics;
and determining a corresponding file from the incidence relation database of the closed source software file and the characteristics according to the binary file characteristics successfully matched.
5. The method of claim 4, wherein determining the detection result of the software to be detected based on the matched features and the corresponding files comprises:
grouping the binary file characteristics successfully matched according to the files to which the binary file characteristics belong, comparing the binary file successfully matched in each group with the binary characteristics corresponding to the files to which the binary file characteristics belong, and determining the matching proportion of the binary file characteristics to each file to which the binary file characteristics belong;
and outputting the detection result of the software to be detected according to the matching proportion.
6. The method according to claim 6, wherein before outputting the detection result of the software to be detected according to the matching ratio, the method further comprises:
and filtering all the matching proportions according to a proportion threshold value.
7. The method of claim 6, wherein the closed-source software database further comprises: a closed source software information database; outputting the detection result of the software to be detected according to the matching proportion, comprising the following steps:
determining corresponding closed source software from the closed source software information database according to the filtered files corresponding to the matching proportion;
and outputting the files corresponding to the closed source software and the matched proportion after the filtering is finished.
CN202211372662.2A 2022-11-03 2022-11-03 Homologous detection method of closed-source software Pending CN115686623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211372662.2A CN115686623A (en) 2022-11-03 2022-11-03 Homologous detection method of closed-source software

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211372662.2A CN115686623A (en) 2022-11-03 2022-11-03 Homologous detection method of closed-source software

Publications (1)

Publication Number Publication Date
CN115686623A true CN115686623A (en) 2023-02-03

Family

ID=85047236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211372662.2A Pending CN115686623A (en) 2022-11-03 2022-11-03 Homologous detection method of closed-source software

Country Status (1)

Country Link
CN (1) CN115686623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149669A (en) * 2023-04-14 2023-05-23 杭州安恒信息技术股份有限公司 Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149669A (en) * 2023-04-14 2023-05-23 杭州安恒信息技术股份有限公司 Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium

Similar Documents

Publication Publication Date Title
CN104123493B (en) The safety detecting method and device of application program
CN110083623B (en) Business rule generation method and device
EP2693356B1 (en) Detecting pirated applications
CN112669138A (en) Data processing method and related equipment
CN115686623A (en) Homologous detection method of closed-source software
CN111222137A (en) Program classification model training method, program classification method and device
CN113011889A (en) Account abnormity identification method, system, device, equipment and medium
CN112579462A (en) Test case acquisition method, system, equipment and computer readable storage medium
CN106301979B (en) Method and system for detecting abnormal channel
CN106998336B (en) Method and device for detecting user in channel
CN114841820A (en) Transaction risk control method and system
CN114627412A (en) Method, device and processor for realizing unsupervised depth forgery video detection processing based on error reconstruction and computer storage medium thereof
CN103577543A (en) Ranking fraud detection method and ranking fraud detection system of application program
CN109460474B (en) User preference trend mining method
CN115131139B (en) Method, device and medium for obtaining target result based on structural data
CN113297498B (en) Internet-based food attribute mining method and system
CN115422522A (en) Abnormal equipment judgment reference establishment method, abnormal equipment identification method, abnormal equipment judgment reference establishment device, abnormal equipment identification device and abnormal equipment identification device
CN113434826A (en) Detection method and system for counterfeit mobile application and related products
CN107229865B (en) Method and device for analyzing Webshell intrusion reason
CN112016961A (en) Pushing method and device, electronic equipment and computer readable storage medium
CN114579711A (en) Method, device, equipment and storage medium for identifying fraud application program
CN113434860A (en) Virus detection method and device, computing equipment and storage medium
CN113254352A (en) Test method, device, equipment and storage medium for test case
CN116305134A (en) Binary system-based software traceability detection method
US20230214842A1 (en) Locating suspect transaction patterns in financial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Liang Dagong

Inventor after: Wang Bo

Inventor after: Lv Jinbiao

Inventor after: Wang Xiaozhou

Inventor before: Liang Dagong

Inventor before: Wang Bo

Inventor before: Lv Jinbiao