CN114064116A - Software detection method and device - Google Patents

Software detection method and device Download PDF

Info

Publication number
CN114064116A
CN114064116A CN202010753824.1A CN202010753824A CN114064116A CN 114064116 A CN114064116 A CN 114064116A CN 202010753824 A CN202010753824 A CN 202010753824A CN 114064116 A CN114064116 A CN 114064116A
Authority
CN
China
Prior art keywords
feature
software
features
groups
source software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010753824.1A
Other languages
Chinese (zh)
Inventor
郑志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010753824.1A priority Critical patent/CN114064116A/en
Publication of CN114064116A publication Critical patent/CN114064116A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/77Software metrics

Abstract

The application provides a software detection method and device, and belongs to the technical field of computers. The scheme provided by the application can divide the extracted multiple features of the tested software into multiple feature groups according to the incidence relation, and can perform feature matching by taking the feature groups as granularity. Therefore, in the process of feature matching, not only the information of the features but also the relevance among the features can be considered, and the accuracy of feature matching, namely the accuracy of the detected target open source software, can be effectively improved.

Description

Software detection method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a software detection method and apparatus.
Background
Open Source Software (OSS) refers to software for which source code can be used by the public, and the use, modification, and distribution of the open source software is not limited by a license. For software developed based on open source software, in order to analyze the security performance of the software, the open source software referenced by the software needs to be detected.
In the related art, a Software Composition Analysis (SCA) method is generally used to detect the name and version number of open source software referenced by a certain software. The SCA method can extract the features of the software to be detected, match the extracted features with the features of each open source software stored in the feature library, and further determine the open source software quoted by the software to be detected based on the result of feature matching.
However, because the same or similar features may exist between different open source software, a plurality of open source software matched with the software to be detected may be determined based on the SCA method, and the detection accuracy is low.
Disclosure of Invention
The application provides a software detection method and device, which can solve the technical problem that the accuracy of a software detection method in the related technology is low.
In one aspect, a software detection method is provided, which can extract a plurality of features from a code of a software to be detected, and divide the plurality of features into a plurality of feature groups, wherein each feature group comprises at least two features having a relationship; if the ratio of the feature groups matched with the feature template group of the target open source software in the plurality of feature groups is greater than a ratio threshold, determining that the target open source software is quoted by the tested software; the target open source software comprises a plurality of feature template groups, and each feature template group comprises at least two feature templates with incidence relations.
In the software detection method provided by the application, the extracted multiple features can be divided into multiple feature groups according to the incidence relation, and the feature groups are used as the granularity for feature matching, so that the matching precision can be effectively improved, and the accuracy of the detected target open-source software is further improved.
Optionally, the code of the software under test comprises a plurality of function call chains, each function call chain comprising at least one function with a dependency relationship; the process of dividing the plurality of features into a plurality of feature groups may include: and dividing at least two features called by the functions included in each function calling chain into a feature group to obtain a plurality of feature groups.
Grouping based on the function call chain can ensure that a plurality of characteristics in each characteristic group after grouping can reflect the dependency relationship between functions. And further, after feature matching is carried out based on the feature group, the target open source software quoted by the tested software can be accurately identified.
Optionally, the process of dividing the plurality of features into a plurality of feature groups may further include: determining a storage address of an instruction for calling each feature in the code of the software to be tested; dividing the determined storage address of each instruction into a plurality of non-overlapping address ranges; at least two characteristics called by the instruction with the storage address in the same address range are divided into one characteristic group to obtain a plurality of characteristic groups.
Grouping the plurality of features based on the storage addresses of the instructions for calling the features can ensure that the plurality of features in each feature group after grouping are called by the instructions with the storage addresses in the same address range. That is, the plurality of divided feature groups can reflect the relative positional relationship between the instructions that call the features. Therefore, after feature matching is carried out based on the feature group, the target open source software quoted by the software to be tested can be accurately identified.
Optionally, the method may further include: for each feature group in the plurality of feature groups, determining a first similarity between the feature group and each feature template group of the target open source software; and if the first similarity between the feature group and any feature template group of the target open source software is greater than a first similarity threshold value, determining that the feature group is matched with any feature template group.
The first similarity between one feature group and a certain feature template group of the target open source software may be: the number of features of the plurality of features included in the feature group that match the feature templates of the feature template group is a ratio among the plurality of features included in the feature group.
Optionally, before matching the set of features with the set of feature templates, the method may further comprise: determining second similarity of the plurality of features and a plurality of feature templates included in each open source software in a feature library, wherein the feature library includes the plurality of feature templates included in each open source software in the plurality of open source software; determining at least one alternative open-source software from the plurality of open-source software, wherein second similarity of the plurality of features and a plurality of feature templates included in each alternative open-source software is greater than a second similarity threshold, and the at least one alternative open-source software comprises the target open-source software; and determining the number of feature groups matched with any feature template group of each alternative open source software in the plurality of feature groups.
The method provided by the application can also be used for carrying out feature matching by taking the features as granularity, so that at least one alternative open-source software is screened out from a plurality of open-source software included in the feature library. And then, continuously determining the target open-source software from the at least one alternative open-source software. Therefore, the software detection efficiency can be effectively improved on the premise of ensuring the accuracy of the determined target open source software.
Optionally, the process of dividing the plurality of features into a plurality of feature groups may further include: dividing the plurality of features into a plurality of initial groups, wherein each initial group comprises one feature or at least two features with association relationship; a plurality of feature groups are determined from the plurality of initial groupings, each feature group including a number of features greater than a number threshold, the number threshold being greater than 1.
If the number of the features included in the feature group is small, in the subsequent feature matching process, the feature group may be matched with more feature template groups of open source software in the feature library, so that more interference items appear, and the efficiency of feature matching is affected. In the method provided by the application, only the initial group with a large number of included features can be reserved as the feature group, so that the efficiency of subsequent feature matching can be ensured to be high.
In another aspect, a software detection apparatus is provided, which may include at least one module, and the at least one module may be configured to implement the software detection method provided in the above aspect.
In yet another aspect, a software detection apparatus is provided, which may include: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the software detection method as provided in any one of the above aspects when executing the computer program.
In yet another aspect, a computer-readable storage medium is provided, in which instructions are stored, the instructions being executed by a processor to implement the software detection method provided in any one of the above aspects.
In a further aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the software detection method as provided in any one of the above aspects.
In summary, the present application provides a software detection method and device, which can divide a plurality of extracted features of detected software into a plurality of feature groups according to an association relationship, and can perform feature matching using the feature groups as a granularity. Therefore, in the process of feature matching, the similarity between the features of the software to be detected and the feature template of the open source software can be considered, the similarity between the features of the software to be detected and the similarity between the features of the open source software can be considered, and the accuracy of feature matching, namely the accuracy of the detected target open source software, can be effectively improved.
Drawings
FIG. 1 is a schematic diagram of an implementation environment of a software detection method provided by an embodiment of the present application;
FIG. 2 is a flow chart of a software detection method provided by an embodiment of the present application;
FIG. 3 is a diagram of a function call provided by an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a storage location of an instruction to invoke a feature according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a software inspection process provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a software inspection apparatus according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of another software detection apparatus provided in the embodiments of the present application;
fig. 8 is a schematic structural diagram of another software detection apparatus according to an embodiment of the present application.
Detailed Description
The following describes a software detection method and apparatus provided in the embodiments of the present application in detail with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an implementation environment of a software detection method according to an embodiment of the present application. As shown in fig. 1, the implementation environment may include a server 01 and at least one client 02. For example, only one client 01 is schematically shown in fig. 1. Each client 02 may establish a communication connection with the server 01 through a wired or wireless network. Each client 02 may send the code of the software under test to the server 01. A software detection device is deployed in the server 01, and the software detection device can determine the open source software referenced by the software under test based on the received code of the software under test. The server 01 may also be referred to as a software detection platform, and the server 01 may be a server, a server cluster composed of a plurality of servers, or a cloud computing center. The client 02 may be a computer device, for example, a desktop computer, or a mobile terminal such as a notebook computer, a tablet computer, or a mobile phone.
Fig. 2 is a flowchart of a software detection method provided in an embodiment of the present application, where the method may be applied to a software detection apparatus, for example, the server 01 shown in fig. 1. Referring to fig. 2, the method may include:
step 101, extracting a plurality of features from the code of the tested software.
In the embodiment of the application, the software detection device may receive a code of the software to be detected uploaded by the client, and perform feature extraction (extract signature) on the code of the software to be detected to obtain a plurality of features of the software to be detected. For example, the software detection device may extract the code of the software under test based on a pre-configured feature extraction algorithm.
Optionally, the code of the software under test uploaded by the client may be object code (object code), which refers to code generated after compiling source code. The object code may also be referred to as binary code or a binary program. In this embodiment, the software detection apparatus may perform disassembly processing on the object code, so as to obtain an assembly code of the software under test. The software detection means may then extract a plurality of features from the assembly code.
Alternatively, the code uploaded by the client may also be the source code of the software under test. The software detection device can directly perform feature extraction on the source code after receiving the source code.
Alternatively, each feature extracted from the code of the software under test by the software detection apparatus may be a constant, and the constant may be a character string. That is, each feature extracted by the software detection apparatus may be a constant string (constant string), or may also be referred to as a string constant.
Step 102, dividing the plurality of features into a plurality of feature groups, wherein each feature group comprises at least two features with an association relationship.
In this embodiment, the software detection apparatus may divide the plurality of features into a plurality of feature groups according to a pre-configured grouping method. Wherein each feature group comprises at least two features having an association relationship.
As an alternative implementation, since the codes of the software under test generally include a plurality of function call chains, the software detection apparatus may group the plurality of features according to the function call chains of the software under test. The grouping process is as follows:
firstly, the software detection device determines a plurality of function call chains included in the code of the software under test based on the dependency relationship among a plurality of functions (functions) included in the code of the software under test. For example, the software detection apparatus may generate a function Call Graph (CG) of the software under test, the CG being a directed graph that can embody a plurality of function call chains of the software under test. Each function calls the dependencies that exist for each function between the chain starting from the entry function to the exit function.
For example, it is assumed that the function call graph of the software under test generated by the software detection apparatus based on the code of the software under test is as shown in fig. 3. Then, as can be seen from fig. 3, the code of the software under test includes 10 functions f0 to f9, where the function f0 may be a main (main) function. There are 6 different function call chains in the 10 functions:
1、f0→f1→f4→f6→f9;
2、f0→f1→f4→f7→f9;
3、f0→f2→f4→f6→f9;
4、f0→f2→f4→f7→f9;
5、f0→f3→f5→f7→f9;
6、f0→f3→f5→f8→f9。
the software detection means may then divide at least two features called by at least one function (which may also be referred to as a reference) included in each function call chain into a feature group, thereby obtaining a plurality of feature groups. For example, in step 101, the software testing apparatus may first identify a plurality of functions included in the code of the software under test, where each function may also be referred to as a function body. The software detection device then performs feature extraction on each function respectively, so as to extract one or more features called by the function from each function. Accordingly, after determining the plurality of function call chains included in the software under test, the software testing apparatus may group the plurality of features based on the function to which each feature is referenced.
For example, for the function call graph shown in fig. 3, the software detection apparatus may divide the plurality of features of the software under test into 6 feature groups based on the 6 function call chains.
The above-mentioned grouping method based on function call chain can ensure that the plurality of characteristics in each grouped characteristic group can reflect the dependency relationship between the functions. Furthermore, after feature matching is carried out based on the feature group, the target open source software quoted by the tested software can be accurately identified.
As another optional implementation manner, the software detection apparatus may further group the plurality of features according to an address range where a storage address of an instruction calling each feature is located, so as to obtain a plurality of feature groups. Where the instructions are part of a function, i.e. each function comprises one or more instructions. The process of this instruction-based storage address grouping is as follows:
first, the software detecting device can determine the storage address of the instruction in the code of the software to be detected, which calls each feature, in the memory. Then, the software detection apparatus may sort the determined storage addresses of the respective instructions in a descending order or a descending order. Then, the software detection device may divide each determined memory address into a plurality of non-overlapping address ranges, and divide at least two features called by instructions having memory addresses within the same address range into a feature group. That is, the software detection apparatus may divide a plurality of features called by a plurality of instructions having adjacent or similar storage locations into one feature group.
For example, referring to fig. 4, it is assumed that the distribution ranges of the storage addresses in the memory of the multiple instructions for invoking the respective features, which are determined by the software detection device, are: a0 through a1 address segments, a2 through A3 address segments, and a4 through a5 address segments. The software detecting means may divide the features called by the instructions whose memory addresses are in the address sections a0 through a1 into feature group 1, divide the features called by the instructions whose memory addresses are in the address sections a2 through A3 into feature group 2, and divide the features called by the instructions whose memory addresses are in the address sections a4 through a5 into feature group 3. As shown in FIG. 4, the features invoked by an instruction whose memory address is in the address range A2 through A3 include: feature S1 to feature S5, the software inspection device can divide 5 features from feature S1 to feature S5 into feature group 2.
The source code engineering of the tested software can comprise a plurality of different source code files, the storage addresses of the instructions in the different source code files are far apart, and the storage addresses of the instructions in the same source code file are close. Therefore, the characteristics are grouped based on the storage addresses of the instructions, and the characteristics in each grouped characteristic group can be called by the instructions in the same source code file. That is, the divided feature groups can reflect the relative positional relationship between the instructions calling the features. Therefore, after feature matching is carried out based on the feature group, the target open source software quoted by the software to be tested can be accurately identified.
As another alternative implementation manner, the software detection apparatus may further group the plurality of feature groups by using the above two manners, and then obtain a union of the plurality of feature groups obtained by the division. That is, the software detection apparatus may group the plurality of features according to a function call chain of the software to be detected, to obtain a plurality of feature groups. The plurality of features may be grouped according to an address range in which a storage address of an instruction calling each feature is located, to obtain a plurality of feature groups. And then, carrying out de-duplication processing on the plurality of feature groups obtained by the two grouping modes so as to obtain a plurality of feature groups for carrying out feature matching subsequently.
By combining the two grouping modes, the finally obtained multiple feature groups can not only reflect the dependency relationship between functions for calling features, but also reflect the relative position relationship between instructions for calling the features, so that the feature matching precision can be further improved, and the accuracy of the determined target open source software is further improved.
Optionally, in this embodiment of the application, the software detecting device may further divide the plurality of features into a plurality of initial groups based on the grouping manner. Wherein each initial grouping comprises one feature or comprises at least two features having an association relationship. And then, the software detection device determines a plurality of feature groups from the plurality of initial groups, wherein the number of the features included in each feature group is larger than the number threshold value. The number threshold may be greater than 1. That is, the software detecting means may delete the initial group including the smaller number of features, and only retain the initial group including the larger number of features as the feature group, that is, the software detecting means may prune the initial group including the smaller number of features.
The number threshold may be a fixed value pre-configured in the software detection device, and the size of the number threshold may be configured by operation and maintenance personnel according to an application scenario.
Since the initial group including a small number of features may be matched with a large number of feature template groups of open source software in the feature library in the subsequent feature matching process, resulting in a large number of interference items, the efficiency of feature matching may be affected. In the embodiment of the present application, since only the initial group including a larger number of features may be retained as the feature group, the efficiency of subsequent feature matching can be ensured.
And 103, determining second similarity of the plurality of features and a plurality of feature templates included in each open source software in the feature library.
In this embodiment of the present application, a feature library may be stored in the software detection apparatus, where the feature library records a plurality of feature template groups included in each of a plurality of open source software, and each feature template group includes at least two feature templates having an association relationship. The plurality of feature template groups of each open source software may also be obtained by performing feature extraction and feature grouping on the open source software in advance. Also, each feature template may be a constant in the source code of the open source software. The process of extracting the features of the open source software may refer to the related description of step 101, and the process of grouping the features of the open source software may refer to the related description of step 102.
When the software detection apparatus performs feature matching (match signature), the plurality of features of the software to be detected may be compared with each open source software in the feature library one by one, so as to calculate the second similarity between the plurality of features of the software to be detected and the plurality of feature templates included in each open source software. Wherein, the second similarity between the plurality of features and the plurality of feature templates included in the open source software may be represented as: N/N, wherein N refers to the total number of the plurality of features, N is an integer greater than 1, and N refers to the number of the features which are the same as any feature template included in the open source software. The second similarity may also be referred to as a coverage of the feature of the software under test.
For example, referring to fig. 5, it is assumed that the software detection apparatus extracts 12 features from the software under test (each of the rectangles and triangles in fig. 5 represents one feature), and the 12 features are divided into two feature groups of Z1 and Z2. The software detecting device may first calculate a second similarity between the 12 features and a plurality of feature templates included in each open source software in the feature library. As shown in fig. 5, since each of the 12 features matches one of the 12 feature templates included in the open source software 1, the software detection apparatus may determine that the second similarity between the plurality of features and the 12 feature templates included in the open source software 1 is 1.
And 104, determining at least one alternative open-source software from a plurality of open-source software included in the feature library based on the second similarity.
And the second similarity of the plurality of features and the plurality of feature templates included by each alternative open source software is greater than a second similarity threshold value. That is, the software detection device may initially screen out at least one alternative open source software that is similar to the software to be detected from the feature library based on the second similarity. And then, accurately determining the target open-source software referenced by the software to be tested from the at least one alternative open-source software. The method provided by the embodiment of the application can screen out at least one alternative open-source software based on coarse-grained feature matching, and then perform fine-grained feature matching, so that the software detection efficiency can be effectively improved.
The second similarity threshold may be a fixed value pre-stored in the software detection device, and the second similarity threshold may also be configured by the operation and maintenance personnel according to the application scenario. For example, the second similarity threshold may be 0.7.
Alternatively, if the software detection device does not detect the alternative open source software satisfying the above conditions from the feature library, it may be determined that the open source software matching the software under test does not exist in the feature library, that is, the software under test fails to match. And the software detection device can send a prompt message for indicating that the matching of the open source software fails to the client.
Step 105, for each feature group in the plurality of feature groups, determining a first similarity between the feature group and each feature template group of each alternative open source software.
In this embodiment of the application, after determining at least one alternative open-source software, the software detection apparatus may continue to perform fine-grained feature matching with each alternative open-source software by using the feature group of the software to be detected as a unit. Alternatively, for each feature group, the software detection device may calculate a first similarity between the feature group and each of a plurality of feature template groups included in each alternative open source software.
Wherein, the first similarity between a certain feature group and a feature template group in an alternative open source software can be expressed as: M/M, wherein M refers to the total number of features included in the feature group, M is an integer greater than 1, and M refers to the number of features in the feature group that are the same as any one of the feature templates included in the feature template group. The first similarity may also be referred to as a coverage of the feature set of the software under test.
For example, referring to fig. 5, it is assumed that the alternative open source software determined by the software detection apparatus based on step 104 includes open source software 1, and the open source software 1 includes two feature template groups of M1 and M2. The software inspection device may calculate a first similarity of feature group Z1 to feature template group M1, a first similarity of feature group Z1 to feature template group M2, a first similarity of feature group Z2 to feature template group M1, and a first similarity of feature group Z2 to feature template group M2, respectively, for the software under inspection.
And 106, if the similarity between the feature group and any feature template group of the alternative open source software is greater than a first similarity threshold value, determining that the feature group is matched with any feature template group.
The first similarity threshold may be a fixed value pre-stored in the software detection device, and the first similarity threshold may also be configured by the operation and maintenance staff according to the application scenario. The first similarity threshold and the second similarity threshold may be equal or different.
For example, assume that the first similarity threshold is 0.8. Also, referring to fig. 5, feature set Z1 includes 5 of the 6 features that are identical to the features in feature template set M1, and feature set Z2 includes 5 of the 6 features that are also identical to the features in feature template set M2. The software inspection device may determine that the first similarity of feature group Z1 to feature template group M1 and the first similarity of feature group Z2 to feature template group M2 are both 5/6. Since the first similarity 5/6 is greater than the first similarity threshold 0.8, the software inspection device can determine that feature group Z1 matches feature template set M1 and feature group Z2 matches feature template set M2.
And 107, detecting whether the ratio of the feature group matched with the feature template group of each alternative open source software in the plurality of feature groups is larger than a ratio threshold value.
In this embodiment, for each alternative open-source software, the software detecting device may count the proportion (this proportion may also be referred to as coverage or confidence) of the feature set matching the feature template set of the alternative open-source software in the plurality of feature sets, and detect whether the proportion is greater than a ratio threshold. The ratio threshold may be a fixed value pre-configured in the software detection device, and the size of the ratio threshold may be configured by operation and maintenance personnel according to an application scenario. For example, the ratio threshold may be 80%.
If the ratio of the feature group matched with the feature template group of any alternative open source software in the plurality of feature groups is greater than the ratio threshold, executing step 108; if the ratio of the feature groups matched with the feature template group of each alternative open source software in the plurality of feature groups is not greater than the ratio threshold, step 109 is executed.
And step 108, determining the alternative open-source software as the target open-source software referenced by the tested software.
If the ratio of the feature group matched with the feature template group of any alternative open-source software in the plurality of feature groups of the software to be tested is greater than the ratio threshold, the software detection device may determine any alternative open-source software as the target open-source software referred by the software to be tested. The reference to the target open source software by the software to be tested may refer to that the code of the software to be tested includes the code of the target open source software. In this embodiment of the application, the number of the target open source software determined by the software detection device may be 1, or may be greater than 1.
For example, assuming that the ratio threshold is 2, and as shown in fig. 5, the two feature sets Z1 and Z2 included in the software under test each match with one feature template set in the open source software 1, the software detection apparatus may determine the open source software 1 as the target open source software referenced by the software under test.
In the embodiment of the application, after determining the target open-source software referenced by the software to be tested, the software detection device may further send the relevant information of the target open-source software to the client. The related information may include information such as the name and version number of the target open source software. After the client acquires the relevant information of the target open source software, the client can monitor and maintain the safety performance of the software to be tested based on the known bugs of the target open source software.
Step 109, determining that the tested software does not reference the alternative open source software.
In this embodiment of the present application, if the ratios of the feature groups matched with the feature template groups of any optional open source software to the plurality of feature groups of the software to be tested are not greater than the ratio threshold, the software detection device may determine that the software to be tested is not matched with any optional open source software, and may further determine that the software to be tested does not refer to the optional open source software. That is, the software detection apparatus may determine that there is no open source software matching the software under test in the feature library, that is, the software under test fails to match. And the software detection device can send a prompt message for indicating that the matching of the open source software fails to the client.
Optionally, the order of steps of the software detection method provided in the embodiment of the present application may be appropriately adjusted, and the steps may also be correspondingly increased or decreased according to the situation. For example, the steps 103 and 104 may be deleted as the case may be, that is, the software detection apparatus may also directly match the feature set of the software under test with the feature template set of each open source software in the feature library without determining the alternative open source software first. Alternatively, step 103 and step 104 may be performed before step 102, that is, the software detection apparatus may determine the alternative open source software first and then group the features. Accordingly, in this implementation, if the software detection apparatus does not determine the alternative open-source software that satisfies the condition, the step 102 may not be executed any more, so that the computing resource of the software detection apparatus may be saved. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.
In summary, the embodiment of the present application provides a software detection method, which can divide a plurality of extracted features of detected software into a plurality of feature groups according to an association relationship, and can perform feature matching by using the feature groups as a granularity. Therefore, in the process of feature matching, the similarity between the features of the software to be detected and the feature template of the open source software can be considered, the similarity between the features of the software to be detected and the similarity between the features of the open source software can be considered, and the accuracy of feature matching, namely the accuracy of the detected target open source software, can be effectively improved.
Fig. 6 is a schematic structural diagram of a software detection apparatus provided in an embodiment of the present application, where the apparatus may be used to implement the software detection method provided in the foregoing method embodiment, and the apparatus may be deployed in the server 01 shown in fig. 1. Referring to fig. 6, the apparatus may include:
the extraction module 201 is configured to extract a plurality of features from the code of the software under test. The functional implementation of the extraction module 201 may refer to the related description of step 101 above.
A grouping module 202, configured to divide the plurality of features into a plurality of feature groups, where each feature group includes at least two of the features having an association relationship. The functional implementation of the grouping module 202 can refer to the related description of step 102 above.
The first determining module 203 is configured to determine that the software to be tested refers to the target open-source software if a ratio of a feature group, which is matched with a feature template group of the target open-source software, in the plurality of feature groups is greater than a ratio threshold. The functional implementation of the first determining module 203 can refer to the related description of step 107 and step 108.
The target open source software comprises a plurality of feature template groups, and each feature template group comprises at least two feature templates with incidence relations.
Optionally, the code of the software under test comprises a plurality of function call chains, each function call chain comprising at least one function with a dependency relationship; the grouping module 202 may be configured to: and dividing at least two features called by the functions included in each function calling chain into a feature group to obtain a plurality of feature groups.
Optionally, the grouping module 202 may be configured to: determining a storage address of an instruction for calling each characteristic in the code of the software to be tested; dividing the storage address of the determined instruction into a plurality of non-overlapping address ranges; at least two characteristics called by the instruction with the storage address in the same address range are divided into one characteristic group to obtain a plurality of characteristic groups.
Optionally, as shown in fig. 7, the apparatus may further include:
a second determining module 204, configured to determine, for each feature group of the plurality of feature groups, a first similarity between the feature group and each feature template group of the target open source software. The functional implementation of the second determination module 204 can refer to the related description of step 105 above.
A third determining module 205, configured to determine that the feature set matches any feature template set of the target open-source software if the first similarity between the feature set and the any feature template set is greater than a first similarity threshold. The functional implementation of the third determination module 205 can refer to the related description of step 106 above.
Optionally, as shown in fig. 7, the apparatus may further include:
a fourth determining module 206, configured to determine a second similarity between the plurality of features and a plurality of feature templates included in each open source software in a feature library, where the feature library includes the plurality of feature templates included in each open source software in the plurality of open source software. The functional implementation of the fourth determination module 206 can refer to the related description of step 103 above.
A fifth determining module 207, configured to determine at least one candidate open-source software from the multiple open-source software, where second similarities between the multiple features and multiple feature templates included in each candidate open-source software are greater than a second similarity threshold, and the at least one candidate open-source software includes the target open-source software. The functional implementation of the fifth determining module 207 can refer to the related description of step 104.
A sixth determining module 208, configured to determine the number of feature groups, which match any feature template group of each alternative open source software, in the plurality of feature groups. The functional implementation of the sixth determining module 208 can refer to the related description of step 107 above.
Optionally, the grouping module 202 may be configured to:
dividing the plurality of features into a plurality of initial groups, wherein each initial group comprises one feature or at least two features with association;
a plurality of feature groups are determined from the plurality of initial groupings, each feature group including a number of features greater than a number threshold, the number threshold being greater than 1.
In the embodiment of the present application, as can be seen from fig. 5 and the above description, the software detection apparatus is mainly used for implementing two functions of feature extraction and feature matching, and therefore each module included in the software detection apparatus can be divided into a feature extraction unit and a feature matching unit. Wherein, the feature extraction unit may include the extraction module 201 and the grouping module 202. The feature matching unit may include the first to sixth determination modules 203 to 208.
In summary, the embodiment of the present application provides a software detection apparatus, which can divide a plurality of extracted features of detected software into a plurality of feature groups according to an association relationship, and can perform feature matching by using the feature groups as a granularity. Therefore, in the process of feature matching, the similarity between the features of the software to be detected and the feature template of the open source software can be considered, the similarity between the features of the software to be detected and the similarity between the features of the open source software can be considered, and the accuracy of feature matching, namely the accuracy of the detected target open source software, can be effectively improved.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the software detection apparatus and the modules described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
It should be understood that the software detection apparatus provided in the embodiments of the present application may also be implemented by an application-specific integrated circuit (ASIC), or a Programmable Logic Device (PLD), which may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. The software detection method provided by the above method embodiment may also be implemented by software, and when the software detection method provided by the above method embodiment is implemented by software, each module in the software detection apparatus may also be a software module.
Fig. 8 is a schematic structural diagram of another software detection apparatus provided in an embodiment of the present application, which can implement the software detection method provided in the foregoing method embodiment, and can be applied to the server 01 shown in fig. 1. Referring to fig. 8, the apparatus may include: processor 8001, memory 8002, transceiver 8003, and bus 8004. The bus 8004 is used to connect the processor 8001, the memory 8002, and the transceiver 8003. Communication connections with other devices may be made through transceiver 8003 (which may be wired or wireless). The memory 8002 stores therein a computer program for realizing various application functions.
It should be understood that, in the embodiment of the present application, the processor 8001 may be a Central Processing Unit (CPU), and the processor 8001 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Graphics Processing Units (GPUs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or any conventional processor or the like.
Memory 8002 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).
The bus 8004 may include a power bus, a control bus, a status signal bus, and the like, in addition to the data bus. But for clarity of illustration the various buses are labeled as bus 8004 in the figure.
The processor 8001 is configured to execute a computer program stored in the memory 8002, and the processor 8001 implements the software detection method in the above-described method embodiments by executing the computer program.
The embodiments of the present application also provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the steps in the above method embodiments.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the steps in the above-mentioned method embodiments.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded or executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a Solid State Drive (SSD).
It is to be understood that reference herein to "at least one" means one or more and "a plurality" means two or more. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
The above description is only an alternative embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A method for software inspection, the method comprising:
extracting a plurality of features from the code of the tested software;
dividing the plurality of features into a plurality of feature groups, wherein each feature group comprises at least two features with an association relationship;
if the ratio of the feature groups matched with the feature template group of the target open source software in the plurality of feature groups is larger than a ratio threshold, determining that the target open source software is quoted by the tested software;
the target open source software comprises a plurality of feature template groups, and each feature template group comprises at least two feature templates with an incidence relation.
2. The method of claim 1, wherein the code of the software under test comprises a plurality of function call chains, each of the function call chains comprising at least one function having a dependency;
the dividing the plurality of features into a plurality of feature groups includes:
and dividing at least two features called by the function included in each function calling chain into a feature group to obtain a plurality of feature groups.
3. The method of claim 1 or 2, wherein the dividing the plurality of features into a plurality of feature groups comprises:
determining a storage address of an instruction for calling each characteristic in the code of the software to be tested;
dividing the determined storage address of the instruction into a plurality of non-overlapping address ranges;
and dividing at least two characteristics called by the instruction with the storage address in the same address range into a characteristic group to obtain a plurality of characteristic groups.
4. The method of any of claims 1 to 3, further comprising:
for each of the feature groups in the plurality of feature groups, determining a first similarity of the feature group to each of the feature template groups of the target open source software;
and if the first similarity between the feature group and any feature template group of the target open source software is greater than a first similarity threshold value, determining that the feature group is matched with any feature template group.
5. The method of any of claims 1 to 4, further comprising:
determining a second similarity between the plurality of features and a plurality of feature templates included in each open source software in a feature library, wherein the feature library includes the plurality of feature templates included in each open source software in the plurality of open source software;
determining at least one alternative open source software from the plurality of open source software, wherein second similarity of the plurality of features and a plurality of feature templates included in each alternative open source software is greater than a second similarity threshold, and the at least one alternative open source software includes the target open source software;
and determining the number of feature groups matched with any feature template group of each alternative open source software in the plurality of feature groups.
6. The method of any of claims 1 to 5, wherein said dividing the plurality of features into a plurality of feature groups comprises:
dividing the plurality of features into a plurality of initial groups, wherein each initial group comprises one feature or at least two features with association relationship;
determining a plurality of feature groups from the plurality of initial groupings, each of the feature groups including a number of features greater than a number threshold, the number threshold being greater than 1.
7. A software detection apparatus, characterized in that the apparatus comprises:
the extraction module is used for extracting a plurality of characteristics from the code of the tested software;
a grouping module for dividing the plurality of features into a plurality of feature groups, each of the feature groups including at least two of the features having an association relationship;
the first determination module is used for determining that the tested software refers to the target open-source software if the ratio of the feature groups matched with the feature template group of the target open-source software in the plurality of feature groups is greater than a ratio threshold;
the target open source software comprises a plurality of feature template groups, and each feature template group comprises at least two feature templates with an incidence relation.
8. The apparatus of claim 7, wherein the code of the software under test comprises a plurality of function call chains, each of the function call chains comprising at least one function having a dependency; the grouping module is configured to:
and dividing at least two features called by the function included in each function calling chain into a feature group to obtain a plurality of feature groups.
9. The apparatus of claim 7 or 8, wherein the grouping module is configured to:
determining a storage address of an instruction for calling each characteristic in the code of the software to be tested;
dividing the determined storage address of the instruction into a plurality of non-overlapping address ranges;
and dividing at least two characteristics called by the instruction with the storage address in the same address range into a characteristic group to obtain a plurality of characteristic groups.
10. The apparatus of any of claims 7 to 9, further comprising:
a second determining module, configured to determine, for each of the feature groups in the plurality of feature groups, a first similarity between the feature group and each of the feature template groups of the target open source software;
and a third determining module, configured to determine that the feature group matches any feature template group of the target open-source software if a first similarity between the feature group and the feature template group is greater than a first similarity threshold.
11. The apparatus of any one of claims 7 to 10, further comprising:
a fourth determining module, configured to determine second similarities between the plurality of features and a plurality of feature templates included in each open source software in a feature library, where the feature library includes the plurality of feature templates included in each open source software in the plurality of open source software;
a fifth determining module, configured to determine at least one alternative open-source software from the multiple open-source software, where second similarities between the multiple features and multiple feature templates included in each alternative open-source software are greater than a second similarity threshold, and the at least one alternative open-source software includes the target open-source software;
and a sixth determining module, configured to determine the number of feature groups, in the plurality of feature groups, that match any one feature template group of each alternative open-source software.
12. The apparatus of any one of claims 7 to 11, wherein the grouping module is configured to:
dividing the plurality of features into a plurality of initial groups, wherein each initial group comprises one feature or at least two features with association relationship;
determining a plurality of feature groups from the plurality of initial groupings, each of the feature groups including a number of features greater than a number threshold, the number threshold being greater than 1.
13. A software detection apparatus, characterized in that the apparatus comprises: memory, processor and computer program stored on the memory and capable of running on the processor, the processor implementing the method according to any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium having stored thereon instructions for execution by a processor to perform the method of any one of claims 1 to 6.
CN202010753824.1A 2020-07-30 2020-07-30 Software detection method and device Pending CN114064116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010753824.1A CN114064116A (en) 2020-07-30 2020-07-30 Software detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010753824.1A CN114064116A (en) 2020-07-30 2020-07-30 Software detection method and device

Publications (1)

Publication Number Publication Date
CN114064116A true CN114064116A (en) 2022-02-18

Family

ID=80227400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010753824.1A Pending CN114064116A (en) 2020-07-30 2020-07-30 Software detection method and device

Country Status (1)

Country Link
CN (1) CN114064116A (en)

Similar Documents

Publication Publication Date Title
US10176323B2 (en) Method, apparatus and terminal for detecting a malware file
CN106796640A (en) Classification malware detection and suppression
CN108459964B (en) Test case selection method, device, equipment and computer readable storage medium
US11048798B2 (en) Method for detecting libraries in program binaries
CN109491763B (en) System deployment method and device and electronic equipment
CN110231994B (en) Memory analysis method, memory analysis device and computer readable storage medium
US20150234700A1 (en) System Level Memory Leak Detection
US11055168B2 (en) Unexpected event detection during execution of an application
CN105357204B (en) Method and device for generating terminal identification information
US9733930B2 (en) Logical level difference detection between software revisions
US20160098563A1 (en) Signatures for software components
CN115562992A (en) File detection method and device, electronic equipment and storage medium
US10366236B2 (en) Software analysis system, software analysis method, and software analysis program
CN111475411A (en) Server problem detection method, system, terminal and storage medium
CN110688096A (en) Method, device, medium and electronic equipment for constructing application program containing plug-in
CN110955434A (en) Software development kit processing method and device, computer equipment and storage medium
CN110795331A (en) Software testing method and device
WO2021139139A1 (en) Permission abnormality detection method and apparatus, computer device, and storage medium
CN116483888A (en) Program evaluation method and device, electronic equipment and computer readable storage medium
US20140137083A1 (en) Instrumenting computer program code by merging template and target code methods
CN114064116A (en) Software detection method and device
CN113342660B (en) File testing method, device, system, electronic equipment and readable storage medium
US11941115B2 (en) Automatic vulnerability detection based on clustering of applications with similar structures and data flows
CN113486359B (en) Method and device for detecting software loopholes, electronic device and storage medium
CN114816772A (en) Debugging method, debugging system and computing device for application running based on compatible layer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220222

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination