CN115357897A - Open source software identification method and device - Google Patents

Open source software identification method and device Download PDF

Info

Publication number
CN115357897A
CN115357897A CN202210801205.4A CN202210801205A CN115357897A CN 115357897 A CN115357897 A CN 115357897A CN 202210801205 A CN202210801205 A CN 202210801205A CN 115357897 A CN115357897 A CN 115357897A
Authority
CN
China
Prior art keywords
source software
target
information
open source
open
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210801205.4A
Other languages
Chinese (zh)
Inventor
王治文
欧建深
李广生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202210801205.4A priority Critical patent/CN115357897A/en
Publication of CN115357897A publication Critical patent/CN115357897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Abstract

The application discloses an open source software identification method and device. The method comprises the following steps: acquiring a target code to be identified; matching feature information of open source software in an open source software feature library with a target code, and determining target feature information matched with the target code, wherein the feature information comprises at least one of the following items: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is newly added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from the initial version to the current latest version; determining open source software information corresponding to the target code according to the target characteristic information; the open source software information comprises the version number, name, file directory, file size or hash value of the file of the open source software.

Description

Open source software identification method and device
The application is a divisional application, the application number of the original application is 201711463010.9, the date of the original application is 2017, 12 and 28, and the whole content of the original application is incorporated into the application by reference.
Technical Field
The present application relates to the field of computer applications, and in particular, to a method and an apparatus for identifying open source software.
Background
The open source software plays an increasingly important role in cloud computing and becomes a key of cloud computing network security. The key for ensuring the effective closed loop of the vulnerability of the open source component is to accurately identify the open source software and the version thereof. The industry has mature network security vulnerability libraries at present, and the premise of the open-source software vulnerability closed loops is that which open-source software and versions thereof are used in product codes can be accurately identified, but due to the characteristics of open source, modification, distributability and the like of the codes of the open-source software, the open-source software has multiple versions and is difficult to identify.
At present, the industry mainly adopts a file comparison technology to identify open source software and versions thereof used in codes, and the file comparison is mainly characterized in that attributes such as hash values, file sizes, file directories and the like of open source files are collected as much as possible, full comparison is carried out, all open source file information to be collected is compared with the codes, and the open source software and the versions thereof with the highest similarity are identified according to comparison results.
Because the number of open source software files is huge, a great deal of time is consumed for identifying the open source software through comparing the full amount of files, and the efficiency is low.
Disclosure of Invention
The embodiment of the application provides an open source software identification method and device, which are used for quickly identifying open source software and versions thereof used in product codes and improving identification efficiency.
In view of this, a first aspect of the present application provides an open source software identification method, including: the identification device acquires a target code which needs to be identified by open source software, then matches the feature information in the open source software feature library with the target code, identifies the target feature information matched with the target code, and can determine the open source software information corresponding to the target code according to the target feature information;
wherein the characteristic information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is newly added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid feature information comprises unmodified file information of the open source software from the initial version to the current latest version.
In the implementation mode, the identification device can identify the open source software and the version thereof used by the target code according to the characteristic information without comparing the full amount, so that the identification efficiency is improved.
With reference to the first aspect of the present application, in a first implementation manner of the first aspect of the present application, the identifying device may determine the target feature information by: the identification device determines target file information corresponding to the target code, acquires endpoint feature information of each open source software in an open source software feature library, and then judges whether the endpoint feature information exists in the target endpoint feature information matched with the target file information or not, if so, the target endpoint feature information is the target feature information matched with the target code;
correspondingly, the identification device can determine the open source software information corresponding to the object code by the following modes: and determining the information of the open source software (first open source software) to which the target endpoint characteristic information belongs as the open source software information corresponding to the target code.
In the implementation mode, the identification device provides a mode for identifying the open-source software information, and the realizability of the scheme is improved.
With reference to the first implementation manner of the first aspect of the present application, in a second implementation manner of the first aspect of the present application, after the identifying device determines the target endpoint feature information, the following process may be further performed: the identification device determines the version number of first open source software corresponding to each end point feature information in the target end point feature information, determines the highest version number in the version numbers as a target version number, then obtains the target version number of the first open source software in an open source software feature library and the bifurcation point feature information corresponding to the version number after the target version number, judges whether the bifurcation point feature information exists in the target bifurcation point feature information matched with the target file information, and if the bifurcation point feature information exists, the version number corresponding to the target bifurcation point feature information is the version number of the open source software corresponding to the target code.
In the implementation mode, the identification device can accurately identify the version number of the open source software corresponding to the target code through the feature information of the branch point, and the identification precision is improved.
With reference to the first or second implementation manner of the first aspect of the present application, in a third implementation manner of the first aspect of the present application, after the identifying device determines the target endpoint feature information, the following process may be further performed: the identification device acquires invalid feature information of first open source software in an open source software feature library, judges whether target invalid feature information matched with target file information exists in the invalid feature information of the first open source software, and determines that other invalid feature information except the target invalid feature information in the invalid feature information of the first open source software is modified in a target code if the target invalid feature information exists in the invalid feature information of the first open source software.
In this implementation manner, after the identification device identifies the open-source software information corresponding to the target code, it can be further identified which part of the open-source software is modified by the target code, so that the flexibility of the scheme is improved.
With reference to the first aspect of the present application, in a fourth implementation manner of the first aspect, the identification device may determine the target feature information by: the identification device determines target file information corresponding to a target code, acquires bifurcation point feature information of each open source software in an open source software feature library, judges whether the bifurcation point feature information exists in target bifurcation point feature information matched with the target file information, and if the bifurcation point feature information exists, the target bifurcation point information is target feature information matched with the target code;
correspondingly, the identification device can determine the open source software information corresponding to the object code by the following modes: the identification device determines that first open-source software to which the target bifurcation point feature information belongs and the version number of the first open-source software corresponding to the target bifurcation point feature information correspond to the target code, namely the open-source software corresponding to the target code is the first open-source software, and the version number of the open-source software corresponding to the target is the version number corresponding to the target bifurcation point information.
In the implementation mode, the identification device provides another mode for identifying the open-source software information, and the flexibility of the scheme is improved.
With reference to the first implementation manner of the first aspect of the present application, in a fifth implementation manner of the first aspect, when the identifying device determines that the target feature information does not exist, the identifying device may obtain branch point feature information of each open source software in the open source software feature library, and determine whether the branch point feature information exists in target branch point feature information matched with the target file information, where if the branch point feature information exists, the target branch point information is target feature information matched with the target code;
correspondingly, the identification device can determine the open source software information corresponding to the object code by the following modes: the identification device determines that first open-source software to which the target bifurcation point feature information belongs and the version number of the first open-source software corresponding to the target bifurcation point feature information correspond to the target code, namely the open-source software corresponding to the target code is the first open-source software, and the version number of the open-source software corresponding to the target is the version number corresponding to the target bifurcation point information.
In the implementation mode, the identification device provides another mode for identifying the open-source software information, and the flexibility of the scheme is improved.
A second aspect of the present application provides an identification apparatus, comprising: the device comprises a first acquisition module, a first determination module and a second determination module; the first acquisition module is used for acquiring a target code to be identified; the system comprises a first determining module and a second determining module, wherein the first determining module is used for matching the feature information of open source software in the open source software feature library with a target code and determining the target feature information matched with the target code, and the second determining module is used for determining the open source software information corresponding to the target code according to the target feature information.
Wherein the characteristic information includes at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid feature information comprises information of files which are unmodified from the initial version to the current latest version of the open source software.
With reference to the second aspect of the present application, in a first implementation manner of the second aspect of the present application, the first determining module includes: the device comprises a first acquisition unit, a first determination unit, a first judgment unit and a second determination unit; the second module includes: a third determination unit;
the first acquisition unit is used for acquiring the endpoint feature information of each open source software in the open source software feature library; the first determining unit is used for determining target file information corresponding to the target code; the first judging unit is used for judging whether the target endpoint feature information of the first open source software corresponding to the target file information exists in the open source software feature library or not; the second determining unit is used for determining that the target endpoint characteristic information is matched with the target code when the first judging unit determines that the target endpoint characteristic information exists; and the third determining unit is used for determining the information of the first open-source software as the open-source software information corresponding to the target code according to the target endpoint characteristic information.
With reference to the first implementation manner of the second aspect of the present application, in the first implementation manner of the second aspect of the present application, the apparatus further includes: the device comprises a third determining module, a fourth determining module, a second obtaining module, a first judging module and a fifth determining module;
the third determining module is used for determining the version number of the first open source software corresponding to each endpoint characteristic information in the target endpoint characteristic information; the fourth determining module is used for determining a target version number in the version numbers of the first open source software, wherein the target version number is the highest version number in the version numbers of the first open source software; determining whether the target version number is the highest version number of all the version numbers corresponding to the first open source software in the open source software feature library, and if so, determining that the target version number is the version number of the open source software corresponding to the target code; the second obtaining module is used for obtaining the target version number of the first open-source software in the open-source software feature library and the bifurcation point feature information corresponding to the version number behind the target version number if the target version number is not the highest version number in all the version numbers corresponding to the first open-source software in the open-source software feature library; the first judging module is used for judging whether target bifurcation point feature information matched with the target file information exists in the bifurcation point feature information corresponding to a target version number of the first open source software and a version number behind the target version number; the fifth determining module is configured to determine, when the first determining module determines that the target branch point feature information exists, that the version number of the first open-source software corresponding to the target branch point feature information is the version number of the open-source software corresponding to the target code.
With reference to the first implementation manner or the second implementation manner of the second aspect of the present application, in a third implementation manner of the second aspect of the present application, the identification apparatus further includes: the third acquisition module, the second judgment module and the sixth determination module;
the third acquisition module is used for acquiring invalid feature information of the first open-source software in the open-source software feature library; the second judging module is used for judging whether target invalid characteristic information matched with the target file information exists in the invalid characteristic information of the first open source software; the sixth determining module is configured to determine that invalid feature information other than the target invalid feature information in the invalid feature information of the first open-source software is modified in the target code when the second determining module determines that the target invalid feature information exists.
With reference to the second aspect of the present application, in four implementation manners of the second aspect of the present application, the first determining module includes: the device comprises a second acquisition unit, a fourth determination unit, a second judgment unit and a fifth determination unit; the second determining module includes: a sixth determination unit;
the second acquisition unit is used for acquiring branch point feature information of each open source software in the open source software feature library; the fourth determining unit is used for determining target file information corresponding to the target code; the second judgment unit is used for judging whether target bifurcation point characteristic information matched with the target file information exists in the bifurcation point characteristic information of each open-source software; the fifth determining unit is used for determining that the target bifurcation point feature information is matched with the target code when the second judging unit determines that the target bifurcation point feature information exists; the sixth determining unit is configured to determine that the version number of the first open-source software corresponding to the target bifurcation point feature information and the version number of the first open-source software corresponding to the target bifurcation point feature information correspond to the target code.
A third aspect of the present application provides an identification apparatus comprising: a processor and a memory;
the memory is used for storing programs; the processor is used for executing the program, and specifically comprises the following steps: acquiring a target code which needs to be identified by open source software, matching feature information in an open source software feature library with the target code, identifying target feature information matched with the target code, and determining open source software information corresponding to the target code according to the target feature information;
wherein the characteristic information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is newly added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid feature information comprises information of files which are unmodified from the initial version to the current latest version of the open source software.
With reference to the third aspect of the present application, in a first implementation manner of the third aspect of the present application, the step of identifying, by the processor, target feature information matched with the target code may specifically include: determining target file information corresponding to a target code, acquiring endpoint feature information of each open source software in an open source software feature library, judging whether the endpoint feature information exists in target endpoint feature information matched with the target file information, and if so, determining that the target endpoint feature information is the target feature information matched with the target code;
correspondingly, the step of determining, by the processor, the open-source software information corresponding to the target code may specifically include: and determining the information of the open source software (first open source software) to which the target endpoint characteristic information belongs as the open source software information corresponding to the target code.
With reference to the first implementation manner of the third aspect of the present application, in a second implementation manner of the third aspect of the present application, after determining the target endpoint characteristic, the processor specifically further performs the following steps: determining the version number of first open source software corresponding to each endpoint feature information in the target endpoint feature information, determining the highest version number in the version numbers as a target version number, then obtaining the target version number of the first open source software in an open source software feature library and bifurcation point feature information corresponding to the version number after the target version number, judging whether the bifurcation point feature information exists in target bifurcation point feature information matched with the target file information, and if so, determining the version number corresponding to the target bifurcation point feature information as the version number of the open source software corresponding to the target code.
With reference to the first or second implementation manner of the third aspect of the present application, in a third implementation manner of the third aspect of the present application, after determining the target endpoint feature information, the processor specifically further performs the following steps: and acquiring invalid feature information of the first open source software in the open source software feature library, judging whether target invalid feature information matched with the target file information exists in the invalid feature information of the first open source software, and if so, determining that other invalid feature information except the target invalid feature information in the invalid feature information of the first open source software is modified in the target code.
With reference to the third aspect of the present application, in a fourth implementation manner of the third invention of the present application, the step of identifying, by the processor, the target endpoint feature information may specifically include: determining target file information corresponding to a target code, acquiring bifurcation point feature information of each open source software in an open source software feature library, judging whether the bifurcation point feature information exists in target bifurcation point feature information matched with the target file information, and if the bifurcation point feature information exists, determining that the target bifurcation point information is target feature information matched with the target code;
correspondingly, the step of the processor determining the open-source software information corresponding to the object code may specifically include: and determining that the first open-source software to which the target bifurcation point feature information belongs and the version number of the first open-source software corresponding to the target bifurcation point feature information correspond to the target code, namely the open-source software corresponding to the target code is the first open-source software, and the version number of the open-source software corresponding to the target is the version number corresponding to the target bifurcation point information.
A fourth aspect of the present application provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method according to any one of the first aspect, the first to fifth implementation manners of the first aspect.
A fifth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspect, the first to fifth implementation manners of the first aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
the method comprises the steps of matching feature information of open source software in an open source software feature library with a target code, and identifying open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point characteristic information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the bifurcation point characteristic information refers to file information of which the non-initial version is modified relative to the previous version, the invalid characteristic information refers to file information of which the open source software is not modified from the initial version to the current latest version, and the open source software and the version used by the target code can be identified by the identification device according to the characteristic information without carrying out full comparison, so that the identification efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application.
FIG. 1 is a schematic diagram of an embodiment of an open source software identification method in an embodiment of the present application;
FIG. 2 is a schematic diagram of another embodiment of an open source software identification method in an embodiment of the present application;
FIG. 3 is a schematic diagram of feature information in an embodiment of the present application;
FIG. 4 is a schematic diagram of target document information in an embodiment of the present application;
FIG. 5 is a schematic diagram of an embodiment of an identification device in an embodiment of the present application;
FIG. 6 is a schematic diagram of another embodiment of an identification device in the embodiment of the present application;
FIG. 7 is a schematic diagram of another embodiment of an identification device in the embodiment of the present application;
fig. 8 is a schematic diagram of another embodiment of the identification device in the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides an open source software identification method and device, which are used for quickly identifying open source software and versions thereof used in product codes and improving identification efficiency.
For ease of understanding, the following description refers to the noun terms to which this application refers:
open source software: also known as open-source software (open-source), is defined to describe software whose source code can be used by the public, and the use, modification and distribution of this software is not restricted by licenses.
Referring to fig. 1, a method for identifying open source software in an embodiment of the present application is described below, where an embodiment of the method for identifying open source software in the embodiment of the present application includes:
101. the identification device acquires an object code to be identified;
in this embodiment, the identification device determines the object code that needs to be identified, where the object code may specifically be input by a user on the identification device, may be actively obtained by the identification device from another device, may be sent to the identification device by another device, or may be obtained by the identification device through another route, which is not limited herein.
102. The identification device matches the feature information of the open source software in the open source software feature library with the target code and determines the target feature information matched with the target code;
after determining a target code to be identified, the identification device matches the feature information of each open source software in the open source software library with the target code, if the matching is successful, the feature information matched with the target code is used as the target feature information, and step 103 is executed; if the matching is not successful, the identification device may prompt the user to identify the failure or perform other operations, which is not limited herein.
Wherein, the characteristic information of the open source software comprises at least one of the following items: the feature information of the open source software, the feature information of the bifurcation point of the open source software or the invalid feature information of the open source software.
In this embodiment, for any open-source software in the open-source software feature library, the endpoint feature information of the open-source software refers to file information corresponding to the initial version of the open-source software and/or file information that is newly added to a non-initial version of the open-source software relative to a previous version; the bifurcation point characteristic information of the open source software refers to file information of a non-initial version of the open source software which is modified relative to a previous version; the invalid characteristic information of the open source software refers to the unmodified file information of the open source software from the initial version to the current latest version.
As an optional manner, before the recognition device matches the feature information of the open source software in the open source software feature library with the target code, the open source software feature library may be established first, and specifically, the recognition device may establish the open source software feature library in the following manner: acquiring a software package of each open source software from a website server, analyzing the software package to obtain file information corresponding to different versions of each open source software, comparing the file information of each version corresponding to the open source software aiming at each open source software to determine the characteristic information of the open source software, and establishing an open source software characteristic library according to the characteristic information; the identification device may also establish the open-source software feature library in other manners, which is not limited herein.
As an optional mode, before the recognition device matches the feature information of the open-source software in the open-source software feature library with the target code, the recognition device may acquire the open-source software feature library from another device, or when the recognition device needs to match, the recognition device invokes the open-source software feature library from another device for matching.
103. And the identification device determines the open source software information corresponding to the target code according to the target characteristic information.
And after the identification device determines the target characteristic information, determining open source software information corresponding to the target code according to the target characteristic information. In this embodiment, the open source software information may include a version number, a name, a file directory, a file size, a hash value of a file, and the like of the open source software, and the specific application is not limited.
The method comprises the steps of matching feature information of open source software in an open source software feature library with a target code, and identifying open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point characteristic information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the bifurcation point characteristic information refers to file information of which the non-initial version is modified relative to the previous version, the invalid characteristic information refers to file information of which the open source software is not modified from the initial version to the current latest version, and the open source software and the version used by the target code can be identified by the identification device according to the characteristic information without carrying out full comparison, so that the identification efficiency is improved.
Based on the above embodiment corresponding to fig. 1, it can be seen that the identification apparatus may match the feature information with the object code in a plurality of ways, and some of them are described in detail below, referring to fig. 2, another embodiment of the open source software identification method in the embodiment of the present application includes:
201. the identification device acquires an object code to be identified;
in this embodiment, the identification apparatus determines the object code to be identified, where the object code may specifically be input by a user on the identification apparatus, may be actively obtained by the identification apparatus from another device, may be sent to the identification apparatus by another device, or may be obtained by the identification apparatus through another path, which is not limited herein.
202. The identification device acquires the endpoint characteristic information of each open source software in the open source software characteristic library and determines target file information corresponding to the target code;
after determining the object code to be identified, the identifying device determines each piece of file information corresponding to the object code, and acquires the endpoint feature information of each open source software from the open source software feature library.
It should be understood that, for any open-source software, if the open-source software feature library only stores the file information corresponding to the initial version of the open-source software, the endpoint feature information of the open-source software in the open-source software library of this embodiment refers to the file information corresponding to the initial version of the open-source software; if the open-source software library stores file information corresponding to the initial version of the open-source software and file information corresponding to the non-initial version, and at least one piece of file information corresponding to the non-initial version has a content newly added compared with the file information corresponding to the initial version, the endpoint characteristic information of the open-source software in the open-source software library of the embodiment refers to the file information corresponding to the initial version of the open-source software and the file information newly added to the non-initial version of the open-source software relative to the previous version; if the open-source software library does not store the file information corresponding to the initial version of the open-source software but does not store the file information corresponding to the initial version, the endpoint characteristic information of the open-source software in the open-source software library of this embodiment refers to the file information of the non-initial version of the open-source software which is newly added relative to the previous version.
As an optional manner, in this embodiment, file information of all versions corresponding to all open-source software is stored in the open-source software feature library, end-point feature information of each open-source software in the open-source software feature library refers to file information corresponding to an initial version of each open-source software, and file information that is newly added to a non-initial version of each open-source software relative to a previous version, bifurcation point feature information of each open-source software in the open-source software feature library refers to file information that is modified from the non-initial version corresponding to each open-source software relative to the previous version, and invalid feature information of each open-source software in the open-source software feature library refers to file information that is not modified from the initial version to a current version of each open-source software.
203. The identification device judges whether the feature library of the open source software has target endpoint feature information of first open source software corresponding to the target file information, if so, step 204 is executed, and if not, step 205 is executed;
after acquiring the endpoint characteristic information of each open source software, the identification device compares the endpoint characteristic information with each target file information of the target code, if the matched endpoint characteristic information and the target file information can be identified, the identified endpoint characteristic information is determined as the target endpoint characteristic information, and the open source software corresponding to the endpoint characteristic information is determined as first open source software; if the matching endpoint characteristic information and the target file information cannot be identified, step 205 is executed. Specifically, the endpoint characteristic information of the open source software includes: the hash value of the file corresponding to the initial version of the open source software and/or the hash value of the file newly added to the non-initial version of each open source software relative to the initial version; the target file information includes: the object code corresponds to the hash value of the file. The identification device judges whether the target endpoint characteristic information of the first open source software corresponding to the target file information exists in the open source software characteristic library or not, and comprises the following steps: comparing the hash value corresponding to each endpoint feature information in the open source software feature library with the hash value corresponding to each target file information, judging whether the endpoint feature information and the target file information with the same hash value exist, if so, determining that the endpoint feature information is the target endpoint feature information matched with the target code, and if not, executing the step 205.
For convenience of description, in this embodiment, endpoint feature information that can be matched with the target file information is referred to as target endpoint feature information, and the open-source software to which the target endpoint feature information belongs is referred to as first open-source software.
It should be understood that, in this embodiment, the endpoint characteristic information of the open-source software may further include other file characteristics (such as a file size, a directory where the file is located, and the like), and the identifying device may further determine the target endpoint characteristic information according to other file characteristics, which is not limited in this application.
204. The identification device determines the information of the first open source software as the open source software information corresponding to the target code according to the target endpoint characteristic information;
after the identification device identifies the target endpoint feature information matched with the target code, it can be determined that the open source software corresponding to the target endpoint feature information is the open source software corresponding to the target code, that is, the first open source software is the open source software corresponding to the target code, and the information of the first open source software is the open source software information corresponding to the target code.
As an alternative manner, in this embodiment, there are multiple pieces of target endpoint feature information matching the target code in the open source software feature library, and the multiple pieces of target endpoint feature information all belong to one open source software (first open source software), after the identification device identifies the target endpoint feature information matching the target code, the identification device may determine a version number corresponding to each piece of target endpoint feature information in the pieces of target endpoint feature information, if there is only one version number corresponding to the target endpoint feature information, determine that this version number is the target version number, and if there are multiple version numbers corresponding to the target endpoint feature information, determine that the highest version number in the version numbers is the target version number.
If the determined target version number is the highest version number of all the version numbers corresponding to the first open source software recorded in the open source software feature library, the identification device may determine that the target version number is the version number corresponding to the target code, that is, the open source software corresponding to the target code is the first open source software, and the version number corresponding to the target code is the target version number.
If the determined target version number is not the highest version number of all version numbers corresponding to the first open source software recorded in the open source software feature library, the identification device can acquire a target version number of the first open source software in the open source software feature library and branch point feature information corresponding to the version number after the target version number, then judges whether branch point feature information matched with the target file information exists in the branch point feature information corresponding to the target version number of the first open source software and the version after the target version number, if the branch point feature information does not exist, the identification device determines whether the branch point feature information corresponding to the target version number exists in the open source software feature library, and if the target version number does not exist in the open source software feature library, the identification device can use the target version number as the version number corresponding to the target code; if yes, the feature information of the branch point matches with the target code, and for convenience of description, in this embodiment, the feature information of the branch point matching with the target code is referred to as target branch point feature information.
Specifically, the feature information of the bifurcation point of the open-source software may include: the identification device may determine the feature information of the target bifurcation point by using the hash value of the file of which the non-initial version of the open-source software is modified relative to the last version of the open-source software as follows: and comparing the hash value corresponding to each acquired bifurcation point feature information with the hash value corresponding to each target file information, judging whether bifurcation point feature information and target file information with the same hash value exist, and if yes, determining the bifurcation point feature information as target bifurcation point feature information matched with the target code.
It should be understood that, in this embodiment, the branch point feature information of the open-source software may further include other file features (such as a file size, a directory where the file is located, and the like), and the identification device may further determine the target branch point feature information according to other file features, which is not limited in this application. It should be understood that the branch point feature information refers to file information of a certain version of open source software modified relative to a previous version, and generally, a unique open source software version number can be identified through target branch point feature information matched with a target code, namely, only one version number corresponding to the target branch point feature information is provided.
After the identifying device identifies the target bifurcation point feature information matched with the target code, it may be determined that the version number corresponding to the target bifurcation point feature information is the version number corresponding to the target code, that is, the open-source software corresponding to the target code is the first open-source software, and the version number corresponding to the target code is the version number corresponding to the target bifurcation point feature information.
It should be further understood that, in the process of determining the target version number and acquiring the branch point feature information corresponding to the target version number of the first open-source software in the open-source software feature library and the version number after the target version number, it may be understood that all versions which may include all identified target endpoint feature information in the first open-source software are determined and the branch point feature information corresponding to these versions is acquired.
As an alternative manner, in this embodiment, after the identifying device identifies the target endpoint feature information matched with the target code, the identifying device may acquire the invalid feature information of the first open source software in the open source software feature library, and determine whether there is invalid feature information matched with the target file information in the invalid feature information of the first open source software recorded in the open source software feature library, if yes, these invalid feature information are matched with the target code, and for convenience of description, in this embodiment, the invalid feature information matched with the target code is referred to as target invalid feature information.
Specifically, the invalid feature information of the open source software may include: the identification means may determine the target invalid characteristic information by the following means, from the original version to the hash value of the file that is not modified by the current latest version: and comparing the hash value corresponding to each acquired invalid characteristic information with the hash value corresponding to each target file information, judging whether invalid characteristic information and target file information with the same hash value exist, and if so, determining the invalid characteristic information as the target invalid characteristic information matched with the target code.
It should be understood that, in this embodiment, the invalid characteristic information of the open-source software may further include other file characteristics (such as a file size, a directory where the file is located, and the like), and the identifying device may further determine the invalid characteristic information according to other file characteristics, which is not limited in this application.
The identifying means may determine that invalid feature information other than the target invalid feature information among the invalid feature information of the first open-source software is modified in the target code after identifying the target invalid feature information that matches the target code. It should be understood that, after the identifying device identifies the target endpoint feature information matched with the target code in the foregoing manner, it may be determined that the target code corresponds to one of the versions of the first open-source software, that is, the target code is obtained by editing the version of the first open-source software. The invalid feature information of the first open source software refers to file information of the first open source software which is not modified from an initial version to a current latest version, namely the invalid feature information exists in each version of the first open source software, if target invalid feature information matched with a target code does not exist, the invalid feature information part of the first open source software is modified in the process of generating the target code, if each invalid feature information of the first open source software is matched with the target code, the invalid feature information part of the first open source software is not modified in the process of generating the target code, and if target invalid feature information of which part is matched with the target code exists in the invalid feature information of the first open source software, other invalid feature information except the target invalid feature information in the invalid feature information of the first open source software can be determined to be modified in the process of generating the target code.
205. The recognition means performs other processes.
When the identification device determines that the target endpoint feature information corresponding to the target file information does not exist in the open-source software feature library, the identification device may obtain branch point feature information of each open-source software in the open-source software feature library, then determine whether branch point feature information matched with the target file information exists in the branch point feature information of each open-source software, and if yes, determine that the branch point feature information is matched with the target code.
It should be understood that the branch point feature information refers to file information of a certain version of open-source software modified relative to the previous version, and generally, a unique open-source software version number can be identified through target branch point feature information matched with a target code, that is, only one version number corresponding to the target branch point feature information is provided.
After the identifying device identifies the target bifurcation point feature information matched with the target code, it may be determined that open-source software (first open-source software) corresponding to the target bifurcation point feature information is open-source software corresponding to the target code, and a version number corresponding to the target bifurcation point feature information is a version number corresponding to the target code.
When the identification device determines that the target endpoint feature information corresponding to the target file information does not exist in the open source software feature library, the identification device may also acquire the invalid feature information of each open source software in the open source software feature library, then judge whether the invalid feature information of each open source software has the target invalid feature information matched with the target file information, and if so, determine that the invalid feature information is matched with the target code.
After the identification device identifies the target invalid characteristic information matched with the target code, the open source software corresponding to the target invalid characteristic information can be determined to be the open source software corresponding to the target code, and the information of the open source software is the open source software information corresponding to the target code.
When the identifying device determines that the target endpoint feature information corresponding to the target file information does not exist in the open-source software feature library, the identifying device may further execute other processes, which is not limited herein.
In some embodiments, after the identification device obtains the target code to be identified, it may also directly obtain the branch point feature information of each open source software in the open source software feature library, and determine the target file information corresponding to the target code, and then determine whether the branch point feature information of each open source software in the open source software feature library has the target branch point feature information matching the target file information, when it is determined that the target branch point feature information exists, it may determine the open source software information corresponding to the target code according to the target branch point feature information, the open source software corresponding to the target branch point feature information is the open source software corresponding to the target code, the version number of the open source software corresponding to the target branch point feature information, that is, the version number corresponding to the target code, when it is determined that the target branch point feature information does not exist, it may determine the open source software information corresponding to the target code according to the invalid feature information of each open source software in the open source software feature library, which is similar to the above step 204 and is not repeated here.
In some embodiments, after the identification device obtains the target code to be identified, it may also directly obtain invalid feature information of each open source software in the open source software feature library, and determine target file information corresponding to the target code, then judge whether there is target invalid feature information matching the target file information in the invalid feature information of each open source software in the open source software feature library, when it is determined that there is target invalid feature information, it may determine open source software information corresponding to the target code according to the target invalid feature information, the open source software corresponding to the target invalid feature information is the open source software corresponding to the target code, when it is determined that there is no target invalid feature information, it may determine open source software information corresponding to the target code according to end point feature information and/or bifurcation point feature information of each open source software in the open source software feature library, the specific process is similar to the above-mentioned steps 203 and 204, and is not described here again.
The method comprises the steps of matching feature information of open source software in an open source software feature library with a target code, and identifying open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point feature information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the branch point feature information refers to file information modified from a non-initial version relative to the previous version, the invalid feature information refers to file information unmodified from the initial version to the current latest version of the open source software, and the open source software and the version thereof used by the target code can be identified by the identification device according to the feature information without performing full comparison, so that the identification efficiency is improved.
Secondly, the embodiment of the application provides various modes for determining the open source software information corresponding to the target code according to the target characteristic information, and the flexibility of the scheme is improved.
In order to facilitate understanding of the present application, the following describes the open source software identification method in detail in a practical application scenario:
the identification device acquires an open source software package based on an Openstack open source official network, constructs an open source software 'feature library' to record hash values of newly-added files (end point feature information) of non-initial versions of each open source software relative to a previous version, hash values (bifurcation point feature information) of modified files of non-initial versions of each open source software relative to the previous version, and hash values (invalid feature information) of unmodified files of each open source software from the initial version to a current latest version, and is shown in a schematic diagram of feature information of certain open source software in fig. 3.
The identification device obtains an object code to be identified, and determines hash values (object file information) of respective files corresponding to the object code, as shown in fig. 4, where fig. 4 shows storage paths of respective files corresponding to the object code and hash values corresponding to the respective files.
The identification device obtains hash values corresponding to the endpoint feature information of each open source software in the open source software feature library, compares the hash values corresponding to the endpoint feature information with the hash values of each file shown in fig. 4, and finally identifies that the hash value with the file name "ssl.py" corresponding to the target code is equal to the hash value corresponding to the endpoint feature (target endpoint feature information) of the 1.0.3 version of murano software (first open source software) in the open source software feature library, the hash value with the file name "ext _ context.py" corresponding to the target code is equal to the hash value of the endpoint feature (target endpoint feature information) of the 2.0.2 version of murano software, and a version range list containing the two template endpoint feature information is identified as the 2.0.2 version of murano software and the subsequent versions thereof.
The identification device obtains hash values corresponding to the bifurcation point characteristics corresponding to the 2.0.2 version (target version) of the murano software and the subsequent versions (namely, the 2.0.2 version of the murano software and the 3.0.0 version of the murano software), compares the hash values corresponding to the bifurcation point characteristics with the hash values of the files corresponding to the target code, and determines that the 2.0.2 version of the murano software has no corresponding bifurcation point characteristics, and the hash values of the bifurcation point characteristics of the 3.0.0 version of the murano software are not matched with the hash values of the files corresponding to the target code, so that the identification device determines that the target code corresponds to the 2.0.0 version of the murano software.
The identification device obtains a hash value corresponding to the invalid feature of the murano software, compares the hash value corresponding to the invalid feature of the murano software with the hash values of the files corresponding to the target code, and determines that the hash values corresponding to the invalid features of the murano software (context.py and "_ init _. Py") are equal to the hash values corresponding to the "context.py" and "_ init _. Py" in the target code, namely, all the invalid features of the murano software are matched with the target code, the identification device can further determine that the target code is not modified for the "context.py" and the "_ init _. Py", and all the files corresponding to the target code are completely matched with the 2.0.0 version of the murano software.
The identification method of the open source software in the present embodiment is described above, and the identification apparatus in the present embodiment is described below, and the identification apparatus in the present embodiment may be any computer device, such as a Personal Computer (PC), a portable computer, a server that provides an open source software identification service to the outside, and the like. Referring to fig. 5, an embodiment of an identification apparatus in an embodiment of the present application includes:
a first obtaining module 501, configured to obtain a target code to be identified;
a first determining module 502, configured to match feature information of open source software in an open source software feature library with a target code, and determine target feature information matched with the target code;
a second determining module 503, configured to determine, according to the target feature information, open-source software information corresponding to the target code;
wherein the characteristic information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is newly added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from the initial version to the current latest version;
it should be understood that the flow executed by each module in the above-mentioned identification apparatus corresponding to fig. 5 is similar to the flow of the method described in the embodiment shown in fig. 1, and is not repeated herein.
In this embodiment of the application, the first determining module 502 matches feature information of open source software in the open source software feature library with a target code, and then the second determining module 503 identifies open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point feature information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the branch point feature information refers to file information modified from a non-initial version relative to the previous version, the invalid feature information refers to file information unmodified from the initial version to the current latest version of the open source software, and the open source software and the version thereof used by the target code can be identified by the identification device according to the feature information without performing full comparison, so that the identification efficiency is improved.
For ease of understanding, the following describes the identification apparatus in the present application in detail, and referring to fig. 6, another embodiment of the identification apparatus in the present application includes:
a first obtaining module 601, configured to obtain a target code to be identified;
a first determining module 602, configured to match feature information of open source software in an open source software feature library with a target code, and determine target feature information matched with the target code;
a second determining module 603, configured to determine, according to the target feature information, open-source software information corresponding to the target code;
the feature information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is newly added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from the initial version to the current latest version;
the first determining module 602 includes:
a first obtaining unit 6021, configured to obtain endpoint feature information of each open source software in the open source software feature library;
a first determining unit 6022, configured to determine target file information corresponding to the target code;
a first judging unit 6023, configured to judge whether there is target endpoint feature information of the first open-source software corresponding to the target file information in the open-source software feature library;
a second determining unit 6024 configured to determine that the target endpoint feature information matches the target code when the first judging unit determines that the target endpoint feature information exists;
correspondingly, the second determining module 603 includes:
a third determining unit 6031, configured to determine, according to the target endpoint feature information, that the information of the first open-source software is open-source software information corresponding to the target code.
As an optional manner, the identification apparatus may further include:
a third determining module 604, configured to determine a version number of the first open-source software corresponding to each endpoint characteristic information in the target endpoint characteristic information;
a fourth determining module 605, configured to determine a target version number in the version numbers of the first open source software, where the target version number is a highest version number in the version numbers of the first open source software; determining whether the target version number is the highest version number of all the version numbers corresponding to the first open source software in the open source software feature library, and if so, determining that the target version number is the version number of the open source software corresponding to the target code;
a second obtaining module 606, configured to obtain, if the target version number is not the highest version number of all version numbers corresponding to the first open-source software in the open-source software feature library, the target version number of the first open-source software in the open-source software feature library and the bifurcation feature information corresponding to the version number after the target version number;
a first determining module 607, configured to determine whether target branch point feature information matching the target file information exists in branch point feature information corresponding to the target version number of the first open source software and a version number subsequent to the target version number;
a fifth determining module 608, configured to determine, when the first determining module determines that the target branch point feature information exists, that the version number of the first open-source software corresponding to the target branch point feature information is the version number of the open-source software corresponding to the target code.
As an optional manner, the identification apparatus may further include:
a third obtaining module 609, configured to obtain invalid feature information of the first open source software in the open source software feature library;
the second judging module 610 is configured to judge whether target invalid feature information matched with the target file information exists in the invalid feature information of the first open-source software;
a sixth determining module 611, configured to determine that, when the second determining module determines that the target invalid feature information exists, invalid feature information other than the target invalid feature information in the invalid feature information of the first open-source software is modified in the target code.
It should be understood that the flow executed by each module in the above-mentioned identification apparatus corresponding to fig. 6 is similar to the flow of the method described in the embodiment shown in fig. 2, and is not repeated here.
In this embodiment of the application, the first determining module 602 matches feature information of open source software in the open source software feature library with a target code, and then the second determining module 603 identifies open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point feature information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the branch point feature information refers to file information modified from a non-initial version relative to the previous version, the invalid feature information refers to file information unmodified from the initial version to the current latest version of the open source software, and the open source software and the version thereof used by the target code can be identified by the identification device according to the feature information without performing full comparison, so that the identification efficiency is improved.
In addition, in this embodiment, the identification device provides a way of identifying the open source software and its version corresponding to the object code, which improves the realizability of the scheme.
For ease of understanding, the identification apparatus in the present application is described in detail below, and referring to fig. 7, another embodiment of the identification apparatus in the present application includes:
a first obtaining module 701, configured to obtain a target code to be identified;
a first determining module 702, configured to match feature information of open source software in an open source software feature library with a target code, and determine target feature information matched with the target code;
a second determining module 703, configured to determine, according to the target feature information, open-source software information corresponding to the target code;
the feature information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from the initial version to the current latest version;
the first determining module 702 includes:
a second obtaining unit 7021, configured to obtain branch point feature information of each open source software in the open source software feature library;
a fourth determining unit 7022, configured to determine target file information corresponding to the target code;
second determining unit 7023, configured to determine whether target bifurcation point feature information matching the target file information exists in the bifurcation point feature information of each open source software;
a fifth determining unit 7024, configured to determine that the target bifurcation point feature information matches the target code when the second determining unit determines that the target bifurcation point feature information exists;
the second determining module 703 includes:
sixth determining unit 7031 is configured to determine that the version number of the first open source software corresponding to the target bifurcation feature information and the version number of the first open source software corresponding to the target bifurcation feature information correspond to the target code.
In this embodiment, the first determining module 702 matches the feature information of the open source software in the open source software feature library with the target code, and then the second determining module 703 identifies the open source software information corresponding to the target code according to the target feature information matched with the target code. Wherein the characteristic information may include at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information refers to file information corresponding to the initial version of the open source software and file information newly added to the initial version of the open source software relative to the previous version, the bifurcation point characteristic information refers to file information of which the non-initial version is modified relative to the previous version, the invalid characteristic information refers to file information of which the open source software is not modified from the initial version to the current latest version, and the open source software and the version used by the target code can be identified by the identification device according to the characteristic information without carrying out full comparison, so that the identification efficiency is improved.
In addition, in the embodiment, the identification device provides another way for identifying the open source software and the version thereof corresponding to the target code, so that the flexibility of the scheme is improved.
The identification apparatus in the present application is introduced from the perspective of functional modules, and the identification apparatus in the present application is introduced from the perspective of physical hardware, please refer to fig. 8, which is a schematic structural diagram of the identification apparatus in the present application. The recognition apparatus 80 may include an input device 810, an output device 820, a processor 830, and a memory 840.
Memory 840 may include both read-only memory and random access memory and provides instructions and data to processor 830. A portion of the Memory 840 may also include Non-Volatile Random Access Memory (NVRAM).
The memory 840 stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:
and (3) operating instructions: including various operational instructions for performing various operations.
Operating the system: including various system programs for implementing various basic services and for handling hardware-based tasks.
In this embodiment, the processor 830 is configured to: acquiring a target code to be identified, matching feature information in an open source software feature library with the target code, determining target feature information matched with the target code, and determining open source software information corresponding to the target code according to the target feature information;
wherein the characteristic information includes at least one of: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information which is added to the non-initial version of the open source software relative to the previous version; the bifurcation point characteristic information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid feature information comprises unmodified file information of the open source software from the initial version to the current latest version.
As an optional mode, when the processor 830 is configured to determine the target feature information, it is specifically configured to: determining target file information corresponding to a target code, acquiring endpoint feature information of each open source software in an open source software feature library, judging whether the endpoint feature information exists in target endpoint feature information matched with the target file information, and if so, determining that the target endpoint feature information is the target feature information matched with the target code;
correspondingly, when the processor determination 830 is used to determine the open-source software information corresponding to the object code, it is specifically configured to: and determining the information of the open source software (first open source software) to which the target endpoint characteristic information belongs as the open source software information corresponding to the target code.
As an alternative, the processor 830 is further configured to: determining the version number of first open source software corresponding to each end point feature information in the target end point feature information, determining the highest version number in the version numbers as a target version number, then obtaining the target version number of the first open source software in an open source software feature library and the bifurcation point feature information corresponding to the version number after the target version number, judging whether the bifurcation point feature information exists in the target bifurcation point feature information matched with the target file information, and if so, determining the version number corresponding to the target bifurcation point feature information as the version number of the open source software corresponding to the target code.
As an alternative, the processor 830 is further configured to: the method comprises the steps of obtaining invalid feature information of first open source software in an open source software feature library, judging whether target invalid feature information matched with target file information exists in the invalid feature information of the first open source software, and if yes, determining that other invalid feature information except the target invalid feature information in the invalid feature information of the first open source software is modified in target codes.
As an optional manner, when the processor 830 is configured to determine the target endpoint characteristic information, it is specifically configured to: determining target file information corresponding to a target code, acquiring bifurcation point feature information of each open source software in an open source software feature library, judging whether the bifurcation point feature information exists in target bifurcation point feature information matched with the target file information, and if the bifurcation point feature information exists, determining that the target bifurcation point information is target feature information matched with the target code;
correspondingly, when the processor determination 830 is used to determine the open source software information corresponding to the target code, it is specifically configured to: and determining that the first open-source software to which the target bifurcation point feature information belongs and the version number of the first open-source software corresponding to the target bifurcation point feature information correspond to the target code, namely the open-source software corresponding to the target code is the first open-source software, and the version number of the open-source software corresponding to the target is the version number corresponding to the target bifurcation point information.
The processor 830 controls the operation of the recognition device 80, and the processor 830 may also be called a Central Processing Unit (CPU). Memory 840 may include both read-only memory and random-access memory, and provides instructions and data to processor 830. A portion of the memory 840 may also include NVRAM. In a particular application, the various components of the identification appliance 80 are coupled together by a bus system 850, wherein the bus system 850 may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, however, the various busses are illustrated as the bus system 850.
The method disclosed in the embodiments of the present application may be applied to the processor 830 or implemented by the processor 830. The processor 830 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 830. The processor 830 may be a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 840, and the processor 830 reads the information in the memory 840 and performs the steps of the above method in combination with the hardware thereof.
The related description of fig. 8 can be understood by referring to the related description and effects of the method portion of fig. 2, and will not be described in detail herein.
Embodiments of the present application further provide a computer storage medium for storing computer software instructions for the above-mentioned identification apparatus.
The embodiment of the present application further provides a computer program product, where the computer program product includes computer software instructions, and the computer software instructions may be loaded by a processor to implement the flow in the open source software identification method shown in fig. 1 and fig. 2.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. An open source software identification method, comprising:
acquiring a target code to be identified;
matching feature information of open source software in an open source software feature library with the target code, and determining target feature information matched with the target code, wherein the feature information comprises at least one of the following items: endpoint characteristic information, bifurcation point characteristic information and invalid characteristic information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information added to the non-initial version of the open source software relative to the previous version; the bifurcation point feature information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from an initial version to a current latest version;
and determining open source software information corresponding to the target code according to the target characteristic information, wherein the open source software information comprises a version number, a name, a file directory, a file size or a hash value of a file of the open source software.
2. The method of claim 1, wherein matching feature information of open source software in an open source software feature library with the object code comprises:
acquiring endpoint feature information of each open source software in the open source software feature library, and determining target file information corresponding to the target code;
judging whether the open source software feature library has target endpoint feature information of first open source software corresponding to the target file information;
if yes, determining that the target endpoint characteristic information is matched with the target code;
the determining the open source software information corresponding to the target code according to the target characteristic information includes:
and determining the information of the first open source software as the open source software information corresponding to the target code according to the target endpoint characteristic information.
3. The method of claim 2, wherein after determining that the target endpoint characteristic information matches the target code, the method further comprises: determining the version number of the first open source software corresponding to each endpoint characteristic information in the target endpoint characteristic information;
determining a target version number in the version numbers of the first open source software, wherein the target version number is the highest version number in the version numbers of the first open source software corresponding to each endpoint characteristic information in the target endpoint characteristic information;
determining whether the target version number is the highest version number in all version numbers corresponding to the first open source software in an open source software feature library;
if so, determining the target version number as the version number of the open source software corresponding to the target code;
if not, acquiring the target version number of the first open source software in the open source software feature library and the bifurcation point feature information corresponding to the version number behind the target version number;
judging whether target bifurcation point feature information matched with the target file information exists in bifurcation point feature information corresponding to the target version number of the first open-source software and a version number after the target version number;
and if so, determining that the version number of the first open-source software corresponding to the target bifurcation feature information is the version number of the open-source software corresponding to the target code.
4. The method of claim 2 or 3, wherein after determining that the target endpoint characteristic information matches the target code, the method further comprises:
acquiring invalid feature information of the first open source software in the open source software feature library; judging whether target invalid characteristic information matched with the target file information exists in the invalid characteristic information of the first open source software or not;
if yes, determining that other invalid feature information except the target invalid feature information in the invalid feature information of the first open-source software is modified in the target code.
5. The method of claim 1, wherein matching feature information of open source software in an open source software feature library with the object code comprises:
obtaining bifurcation point feature information of each open source software in the open source software feature library, and determining target file information corresponding to the target code;
judging whether target bifurcation point feature information matched with the target file information exists in the bifurcation point feature information of each open source software or not;
if yes, determining that the feature information of the target bifurcation is matched with the target code;
the determining the open source software information corresponding to the target code according to the target characteristic information includes:
and determining that the first open-source software corresponding to the target bifurcation point characteristic information and the version number of the first open-source software corresponding to the target bifurcation point characteristic information correspond to the target code.
6. An identification device, comprising: a processor and a memory;
the memory is used for storing programs;
the processor is configured to execute the program, and includes the steps of:
acquiring a target code to be identified;
matching feature information of open source software in an open source software feature library with the target code, and determining target feature information matched with the target code, wherein the feature information comprises at least one of the following items: endpoint feature information, bifurcation point feature information, invalid feature information; the end point characteristic information comprises file information corresponding to the initial version of the open source software and/or file information added to the non-initial version of the open source software relative to the previous version; the bifurcation point feature information comprises file information of a non-initial version of the open-source software modified relative to a previous version; the invalid characteristic information comprises unmodified file information of the open source software from an initial version to a current latest version;
determining open source software information corresponding to the target code according to the target characteristic information; the open source software information comprises a version number, a name, a file directory, a file size or a hash value of a file of the open source software.
7. The apparatus of claim 6, wherein matching the feature information of the source software in the source software feature library with the object code comprises:
the matching of the feature information of the open source software in the open source software feature library and the target code comprises:
acquiring endpoint feature information of each open source software in the open source software feature library, and determining target file information corresponding to the target code;
judging whether the open source software feature library has target endpoint feature information of first open source software corresponding to the target file information or not;
if yes, determining that the target endpoint characteristic information is matched with the target code;
the determining the open source software information corresponding to the object code according to the object characteristic information comprises:
and determining the information of the first open source software as the open source software information corresponding to the target code according to the target endpoint characteristic information.
8. The apparatus of claim 7, wherein after determining that the target endpoint characteristic information matches the target code, further comprising: determining the version number of the first open source software corresponding to each endpoint characteristic information in the target endpoint characteristic information;
determining a target version number in the version numbers of the first open source software, wherein the target version number is the highest version number in the version numbers of the first open source software corresponding to each endpoint feature information in the target endpoint feature information;
determining whether the target version number is the highest version number in all version numbers corresponding to first open source software in an open source software feature library;
if so, determining the target version number as the version number of the open source software corresponding to the target code;
if not, acquiring the target version number of the first open source software in the open source software feature library and the bifurcation point feature information corresponding to the version number behind the target version number;
judging whether target bifurcation point feature information matched with the target file information exists in the bifurcation point feature information corresponding to the target version number of the first open-source software and a version number after the target version number;
and if so, determining the version number of the first open source software corresponding to the target bifurcation point characteristic information as the version number of the open source software corresponding to the target code.
9. The apparatus according to claim 7 or 8, wherein after determining that the target endpoint characteristic information matches the target code, further comprising:
acquiring invalid feature information of the first open source software in the open source software feature library; judging whether target invalid characteristic information matched with the target file information exists in the invalid characteristic information of the first open source software or not;
if yes, determining that other invalid feature information except the target invalid feature information in the invalid feature information of the first open-source software is modified in the target code.
10. The apparatus of claim 6, wherein matching the feature information of the source software in the source software feature library with the object code comprises:
obtaining bifurcation point feature information of each open source software in the open source software feature library, and determining target file information corresponding to the target code;
judging whether target bifurcation point feature information matched with the target file information exists in the bifurcation point feature information of each open source software;
if yes, determining that the target bifurcation point feature information is matched with the target code;
the determining the open source software information corresponding to the target code according to the target characteristic information includes:
and determining that the first open-source software corresponding to the target bifurcation point characteristic information and the version number of the first open-source software corresponding to the target bifurcation point characteristic information correspond to the target code.
CN202210801205.4A 2017-12-28 2017-12-28 Open source software identification method and device Pending CN115357897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210801205.4A CN115357897A (en) 2017-12-28 2017-12-28 Open source software identification method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210801205.4A CN115357897A (en) 2017-12-28 2017-12-28 Open source software identification method and device
CN201711463010.9A CN109977675B (en) 2017-12-28 2017-12-28 Open source software identification method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201711463010.9A Division CN109977675B (en) 2017-12-28 2017-12-28 Open source software identification method and device

Publications (1)

Publication Number Publication Date
CN115357897A true CN115357897A (en) 2022-11-18

Family

ID=67075000

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201711463010.9A Active CN109977675B (en) 2017-12-28 2017-12-28 Open source software identification method and device
CN202210801205.4A Pending CN115357897A (en) 2017-12-28 2017-12-28 Open source software identification method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201711463010.9A Active CN109977675B (en) 2017-12-28 2017-12-28 Open source software identification method and device

Country Status (1)

Country Link
CN (2) CN109977675B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659502B (en) * 2019-09-05 2021-09-28 中国科学院软件研究所 Project version detection method and system based on text information incidence relation analysis
CN112965741A (en) * 2021-02-10 2021-06-15 中国工商银行股份有限公司 Method and device for identifying changed program
CN113360178B (en) * 2021-05-31 2023-05-05 东风商用车有限公司 Method, device and equipment for generating unique software identification code and readable storage medium

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9207933B2 (en) * 2006-10-10 2015-12-08 International Business Machines Corporation Identifying authors of changes between multiple versions of a file
EP1918839A1 (en) * 2006-11-03 2008-05-07 Siemens Aktiengesellschaft Modification of a software version of a control device software for a control device and identification of such a modification
US20080209399A1 (en) * 2007-02-27 2008-08-28 Michael Bonnet Methods and systems for tracking and auditing intellectual property in packages of open source software
US8359655B1 (en) * 2008-10-03 2013-01-22 Pham Andrew T Software code analysis and classification system and method
US8479161B2 (en) * 2009-03-18 2013-07-02 Oracle International Corporation System and method for performing software due diligence using a binary scan engine and parallel pattern matching
CN101751473A (en) * 2009-12-31 2010-06-23 中兴通讯股份有限公司 The searching of a kind of amendment record item, renewal and method for synchronous and data sync equipment
US8498982B1 (en) * 2010-07-07 2013-07-30 Openlogic, Inc. Noise reduction for content matching analysis results for protectable content
CN101901160B (en) * 2010-08-11 2015-06-03 中兴通讯股份有限公司 Packing method and device of version upgrading software package
US9104796B2 (en) * 2012-12-21 2015-08-11 International Business Machines Corporation Correlation of source code with system dump information
CN103559449B (en) * 2013-11-15 2016-09-21 华为技术有限公司 The detection method of a kind of code change and device
CN104035772B (en) * 2014-06-09 2017-11-14 中国科学院软件研究所 Source code multi version function calling relationship otherness identification method based on static analysis
CN105446723B (en) * 2014-09-02 2018-11-23 国际商业机器公司 Method and apparatus for identifying the semantic differential between source code version
US9639350B2 (en) * 2014-12-15 2017-05-02 Red Hat, Inc. Tagging non-upstream source code
US9436463B2 (en) * 2015-01-12 2016-09-06 WhiteSource Ltd. System and method for checking open source usage
US10289532B2 (en) * 2015-04-08 2019-05-14 Opshub, Inc. Method and system for providing delta code coverage information
CN106815135B (en) * 2015-11-30 2021-04-06 阿里巴巴集团控股有限公司 Vulnerability detection method and device
CN106446691B (en) * 2016-11-24 2019-07-05 工业和信息化部电信研究院 The method and apparatus for the open source projects loophole for integrating or customizing in inspection software

Also Published As

Publication number Publication date
CN109977675B (en) 2022-08-16
CN109977675A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN112527816B (en) Data blood relationship analysis method, system, computer equipment and storage medium
US8875303B2 (en) Detecting pirated applications
CN109977675B (en) Open source software identification method and device
CN107797854B (en) Transaction file processing method and device, storage medium and computer equipment
CN111382073A (en) Automatic test case determination method, device, equipment and storage medium
KR102006242B1 (en) Method and system for identifying an open source software package based on binary files
CN110825363A (en) Intelligent contract obtaining method and device, electronic equipment and storage medium
WO2018121464A1 (en) Method and device for detecting virus, and storage medium
US20160342615A1 (en) Method and device for generating pileup file from compressed genomic data
CN104951484A (en) Search result processing method and search result processing device
CN111124480A (en) Application package generation method and device, electronic equipment and storage medium
Akram et al. DroidMD: an efficient and scalable android malware detection approach at source code level
US20170141922A1 (en) Incremental upgrade method and system for file
WO2019061667A1 (en) Electronic apparatus, data processing method and system, and computer-readable storage medium
EP4232915A1 (en) Code similarity search
CN108509440A (en) A kind of data processing method and device
CN114139161A (en) Method, device, electronic equipment and medium for batch vulnerability detection
US20120011083A1 (en) Product-Centric Automatic Software Identification in z/OS Systems
CN114879985B (en) Method, device, equipment and storage medium for installing certificate file
US10740218B2 (en) Method and device for determining usage log
CN109002710B (en) Detection method, detection device and computer readable storage medium
WO2020231413A1 (en) Methodology for trustworthy software build
US9158537B2 (en) Streamlining hardware initialization code
US10210334B2 (en) Systems and methods for software integrity assurance via validation using build-time integrity windows
US20210336973A1 (en) Method and system for detecting malicious or suspicious activity by baselining host behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination