CN111666101A - Software homologous analysis method and device - Google Patents
Software homologous analysis method and device Download PDFInfo
- Publication number
- CN111666101A CN111666101A CN202010335325.0A CN202010335325A CN111666101A CN 111666101 A CN111666101 A CN 111666101A CN 202010335325 A CN202010335325 A CN 202010335325A CN 111666101 A CN111666101 A CN 111666101A
- Authority
- CN
- China
- Prior art keywords
- source code
- file
- homologous
- code file
- software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/74—Reverse engineering; Extracting design information from source code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/75—Structural analysis for program understanding
- G06F8/751—Code clone detection
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The embodiment of the invention provides a software homologous analysis method and a device, wherein the method comprises the following steps: obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file; acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file; determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and determining a software homology analysis result according to the final homology file. The software homologous analysis method provided by the embodiment of the invention solves the analysis result error caused by software propagation and improves the precision of software homologous analysis.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a software homology analysis method and device.
Background
With the development of software technology, copying and reference of software are common. On the code level, copying a section of code, and applying the code in a new scene with or without modification, wherein the code reuse mode is called code cloning; at the software level, copying and reference to the software system are called software homology. The same or similar software needs to be found through the analysis of the source code and the function of the software to be tested, and the homologous analysis result of the software is obtained.
In the software homology analysis method in the prior art, homologous source code files are directly searched, all the homologous source code files are collected to analyze parameter information, and a software homology analysis result is obtained.
The prior art cannot solve the problem of errors caused by software propagation to analysis work. Software propagation means that reference relations among software are not in one-to-one correspondence, but in a one-to-many or many-to-one situation, and the reference relations are transitive. The existence of software propagation brings difficulties for homology analysis of software, for example, if item A refers to item C, and item B also refers to item C, then a part of the code from item C is common between items A and B. At this time, if the existing software homologous analysis method is used to analyze the item a, similar components will be detected in all items directly or indirectly referring to the item C, and if these items are all regarded as the source of the code components of the item a, a large error will occur in the homologous analysis result.
Disclosure of Invention
Embodiments of the present invention provide a method and apparatus for software homology analysis that overcome the above-mentioned problems, or at least partially solve the above-mentioned problems.
In a first aspect, an embodiment of the present invention provides a software homology analysis method, including: obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file; acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file; determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and determining a software homology analysis result according to the final homology file.
In some embodiments, the determining an alternative homologous file of the target source code file based on a matching result of the feature information of the reference source code file and the feature information of the target source code file includes: determining a first type of alternative homologous files of the target source code file based on a matching result of the original MD5 characteristic information of the reference source code file and the original MD5 characteristic information of the target source code file; determining a second type of alternative homologous files of the target source code file based on a matching result of the de-annotated de-vacancy MD5 characteristic information of the reference source code file and the de-annotated de-vacancy MD5 characteristic information of the target source code file; and determining the alternative homologous files according to the first type alternative homologous file and the second type alternative homologous file.
In some embodiments, the de-annotated de-whiter MD5 feature information is generated for annotation rules according to different programming languages.
In some embodiments, the source code database further comprises: the project name of the reference source code file and the version information of the reference source code file; determining a software homology analysis result according to the final homology file, wherein the determining comprises the following steps: and determining a software homology analysis result according to the creation time of the final homologous file, the project name of the final homologous file and the version information of the final homologous file.
In some embodiments, the obtaining a source code database comprises: acquiring one million open source items ranked at the top in the GitHub; extracting feature information, creation time, project name and version information of the open source project; taking a source code file in the open source project as the reference source code file; and constructing the source code database based on the reference source code file.
In some embodiments, the determining a software homology analysis result according to the final homology file includes: the target software comprises a plurality of sections of target source code files, the final homologous files corresponding to the plurality of sections of target source code files are collected, and a software homologous analysis result is determined.
In a second aspect, an embodiment of the present invention provides a software homology analysis apparatus, including: a source code database acquisition unit configured to acquire a source code database, the source code database including: the characteristic information of the reference source code file and the creation time of the reference source code file; an object source code file obtaining unit, configured to obtain an object source code file of object software, where the object source code file includes: characteristic information of the target source code file; the alternative homologous file determining unit is used for determining an alternative homologous file of the target source code file based on a matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; a final homologous file determining unit, configured to use the candidate homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and the software homologous analysis result determining unit is used for determining a software homologous analysis result according to the final homologous file.
In some embodiments, the alternative homologous file determining unit includes: a first candidate homologous file determining subunit, configured to determine a first type of candidate homologous file of the target source code file based on a matching result of the original MD5 feature information of the reference source code file and the original MD5 feature information of the target source code file; a second alternative homologous file determining subunit, configured to determine a second type of alternative homologous file of the target source code file based on a matching result of the de-annotated de-vacancy MD5 characteristic information of the reference source code file and the de-annotated de-vacancy MD5 characteristic information of the target source code file; a third candidate homologous file determining subunit, configured to determine the candidate homologous file according to the first class of candidate homologous files and the second class of candidate homologous files.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the software homology analysis method provided in any one of the possible implementation schemes of the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the software homology analysis method provided in any one of the possible implementations of the first aspect.
According to the software homologous analysis method, the software homologous analysis device, the electronic device and the non-transitory computer readable storage medium, the alternative homologous file with the earliest corresponding creation time is used as the final homologous file corresponding to the target source code file and further used as the object of software homologous analysis, analysis result errors caused by software propagation are solved in the software homologous analysis process, and the precision of software homologous analysis is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a software homology analysis method according to an embodiment of the present invention;
FIG. 2 is a flowchart of determining alternative files for software homology analysis according to an embodiment of the present invention;
FIG. 3 is a flowchart of constructing a source code database according to the software homology analysis method of the embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a software homology analysis device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an alternative homologous file determining unit of the software homologous analysis apparatus according to the embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The software homology analysis method according to the embodiment of the present invention is described below with reference to fig. 1 to 3.
As shown in fig. 1, the software homology analysis method according to the embodiment of the present invention includes the following steps S100 to S500.
Step S100, obtaining a source code database, wherein the source code database comprises: the characteristic information of the reference source code file and the creation time of the reference source code file.
It can be understood that the source code database is obtained by extracting reference source code files from a large number of open source projects, and serves as a comparison library for software homology analysis, the source code database includes characteristic information of each reference source code file, the characteristic information is used for providing a mark for each reference source code file, and serves as a basis for retrieval or matching, and the source code database also includes creation time of each reference source code file.
Step S200, obtaining a target source code file of the target software, wherein the target source code file comprises: characteristic information of the target source code file.
It is understood that the target software is the software to be subjected to the homology analysis, and the target source code file is extracted according to a file name suffix, such as c, cpp, java, js, php, py, and the like, and has characteristic information for providing a mark for the target source code file.
And step S300, determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file.
It should be noted that there may be characteristic information of a reference source code file matching with the characteristic information of the target source code file in the source code database, the characteristic information of the target source code file is used to search in the source code database, a corresponding reference source code file is found, and the search result, that is, the corresponding reference source code file, is used as a candidate homologous file of the target source code file.
And step S400, taking the alternative homologous file with the earliest corresponding creation time as a final homologous file corresponding to the target source code file.
It should be noted that there may be multiple candidate homologous files matched in step S300, and due to the influence of software propagation, there may be direct or indirect reference between the multiple candidate homologous files, and here, a time tracing algorithm is adopted, and the earliest homologous file in the multiple candidate homologous files is taken as the final homologous file corresponding to the target source code file, so as to eliminate an error caused by multiple references.
And S500, determining a software homology analysis result according to the final homology file.
That is, the software homology analysis result is determined according to the source code file with the earliest creation time, i.e., the final homology file, in the matched reference source code file.
In practical applications, the source code database further includes: referring to the project name of the source code file and referring to the version information of the source code file; therefore, according to the final homologous file, determining the results of the software homology analysis, including: and determining a software homologous analysis result according to the creation time of the final homologous file, the project name of the final homologous file and the version information of the final homologous file.
It can be understood that the final homologous file of the target source code file includes parameters such as creation time, project name, and version information, and the parameters are further analyzed on the basis that the final homologous file is obtained in step S400, and the creation time, the project name, and the version information of the final homologous file are presented in the form of text or a form, so as to serve as a software homologous analysis result.
According to the embodiment of the invention, the corresponding alternative homologous file with the earliest creation time is used as the final homologous file corresponding to the target source code file and further used as the object of the software homologous analysis, so that the analysis result error caused by software propagation is solved and the precision of the software homologous analysis is improved in the software homologous analysis process.
As shown in fig. 2, in some embodiments, step S300: and determining an alternative homologous file of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file, wherein the steps comprise S310-S330.
Step S310, a first type of alternative homologous files of the target source code file are determined based on the matching result of the original MD5 characteristic information of the reference source code file and the original MD5 characteristic information of the target source code file.
It should be noted that MD5 (Message Digest Algorithm, MD5 Message-Digest Algorithm) is a widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value (hash value) to ensure the integrity of the Message transmission. The original MD5 feature information of the target source code file is used to search in the source code database to find the corresponding reference source code file, and the search result, i.e. the corresponding reference source code file, is used as the first kind of candidate homologous file.
And S320, determining a second type of alternative homologous files of the target source code file based on the matching result of the characteristic information of the de-annotated de-vacancy character MD5 of the reference source code file and the characteristic information of the de-annotated de-vacancy character MD5 of the target source code file.
The de-annotation de-whiteware MD5 feature information is generated according to the annotation rules of different programming languages.
Note that the feature information of the comment removal space MD5 is generated after removing space and comments in the code file according to the comment rules of different programming languages. And searching in the source code database by using the characteristic information of the de-annotated and de-blank character MD5 of the target source code file to find a corresponding reference source code file, and taking the search result, namely the corresponding reference source code file, as a second type alternative homologous file.
And step S330, determining alternative homologous files according to the first type alternative homologous file and the second type alternative homologous file.
It can be understood that, on the basis that the first type of candidate homologous file is obtained in step S310 and the second type of candidate homologous file is obtained in step S320, the first type of candidate homologous file and the second type of candidate homologous file are merged to obtain the candidate homologous file.
According to the embodiment of the invention, the original MD5 characteristic information and the de-annotated and de-blank symbol MD5 characteristic information are respectively searched in the source code database, so that the searching process is more accurate, and the software homology analysis precision is higher.
As shown in fig. 3, in some embodiments, step S110: and acquiring a source code database, wherein the method comprises the following steps S110-S140.
And step S110, acquiring one million open source items ranked at the top in the GitHub.
It is worth mentioning that the GitHub is a hosting platform facing open source and private software projects, and because only git is supported to be hosted as a unique version library format, the GitHub is named, and a user can very easily find massive open source codes in the GitHub. Here, one million top-ranked open source items in GitHub are obtained.
And step S120, extracting the characteristic information, the creation time, the project name and the version information of the open source project.
It will be appreciated that the open source item in the GitHub has characteristic information, creation time, item name, and version information, where these parameters are extracted.
And step S130, taking the source code file in the open source project as a reference source code file.
And step S140, constructing a source code database based on the reference source code file.
It is understood that the source code file in the open source project in the GitHub is used as the reference source code file mentioned in step S100, and the source code database is constructed on the basis of the reference source code file.
According to the embodiment of the invention, the open source project is acquired from the GitHub, so that the establishment process of the source code database is more convenient and quicker, the acquired source code database is richer, and the software homologous analysis precision is higher.
In some embodiments, step S500: determining the software homology analysis result according to the final homology file, wherein the software homology analysis result comprises the following steps: the target software comprises a plurality of sections of target source code files, final homologous files corresponding to the plurality of sections of target source code files are collected, and a software homologous analysis result is determined.
It can be understood that the target software is composed of multiple target source code files, each target source code file corresponds to one final homologous file on the basis of the above embodiment, the final homologous files corresponding to the multiple target source code files are summarized here, and relevant parameters of the final homologous files are presented in the form of text or form, so as to serve as a software homologous analysis result.
According to the embodiment of the invention, the final homologous files of the target software can be completely displayed by summarizing the final homologous files corresponding to the multiple sections of target source code files, so that the information quantity presented by the software homologous analysis result is richer.
In order to detect the technical effect of the embodiment of the invention, 7 open source items on the GitHub are selected, software homology analysis is carried out under the condition that a time tracing algorithm is used and is not used, and the used characteristic information is the original MD5 characteristic information and the MD5 characteristic information after removing comments and blank characters. The effect comparison is shown in table 1:
TABLE 1
The software homology analysis device provided by the embodiment of the invention is described below with reference to fig. 4 and 5, and the software homology analysis device described below and the software homology analysis method described above may be referred to correspondingly.
As shown in fig. 4, the software homology analysis apparatus according to the embodiment of the present invention includes a source code database obtaining unit 410, a target source code file obtaining unit 420, an alternative homology file determining unit 430, a final homology file determining unit 440, and a software homology analysis result determining unit 450.
A source code database obtaining unit 410, configured to obtain a source code database, where the source code database includes: the characteristic information of the reference source code file and the creation time of the reference source code file.
An object source code file obtaining unit 420, configured to obtain an object source code file of the object software, where the object source code file includes: characteristic information of the target source code file.
And an alternative homologous file determining unit 430, configured to determine an alternative homologous file of the target source code file based on a matching result of the feature information of the reference source code file and the feature information of the target source code file.
And a final homologous file determining unit 440, configured to use the candidate homologous file with the earliest creation time as the final homologous file corresponding to the target source code file.
And the software homology analysis result determining unit 450 is configured to determine a software homology analysis result according to the final homology file.
The software homology analysis device provided by the embodiment of the invention is used for executing the software homology analysis method, and the specific implementation mode of the software homology analysis device is consistent with the implementation mode of the method, which is not described herein again.
As shown in fig. 5, in some embodiments, the alternative homology file determining unit 430 in the software homology analyzing apparatus includes: a first alternative homologous file determining sub-unit 431, a second alternative homologous file determining sub-unit 432, and a third alternative homologous file determining sub-unit 433.
And the first alternative homologous file determining subunit 431 is used for determining a first type alternative homologous file of the target source code file based on the matching result of the original MD5 characteristic information of the reference source code file and the original MD5 characteristic information of the target source code file.
A second alternative homologous file determining subunit 432, configured to determine a second type of alternative homologous file of the target source code file based on a matching result of the de-annotated de-vacancy MD5 characteristic information of the reference source code file and the de-annotated de-vacancy MD5 characteristic information of the target source code file.
The third candidate homologous file determining subunit 433 is configured to determine a candidate homologous file according to the first type of candidate homologous file and the second type of candidate homologous file.
The software homology analysis device provided by the embodiment of the invention is used for executing the software homology analysis method, and the specific implementation mode of the software homology analysis device is consistent with the implementation mode of the method, which is not described herein again.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a software isogenesis analysis method, the method comprising: obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file; acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file; determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and determining the software homology analysis result according to the final homology file.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 610, the communication interface 620, the memory 630, and the communication bus 640 shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 complete mutual communication through the communication bus 640, and the processor 610 may call the logic instruction in the memory 630 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, an embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the computer can execute the software homology analysis method provided by the above method embodiments, the method includes: obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file; acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file; determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and determining the software homology analysis result according to the final homology file.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the software homology analysis method provided in the foregoing embodiments when executed by a processor, where the method includes: obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file; acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file; determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file; taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file; and determining the software homology analysis result according to the final homology file.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A software homology analysis method is characterized by comprising the following steps:
obtaining a source code database, the source code database comprising: the characteristic information of the reference source code file and the creation time of the reference source code file;
acquiring a target source code file of target software, wherein the target source code file comprises: characteristic information of the target source code file;
determining alternative homologous files of the target source code file based on the matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file;
taking the corresponding alternative homologous file with the earliest creation time as a final homologous file corresponding to the target source code file;
and determining a software homology analysis result according to the final homology file.
2. The software homology analysis method according to claim 1,
the determining, based on a matching result of the feature information of the reference source code file and the feature information of the target source code file, an alternative homologous file of the target source code file includes:
determining a first type of alternative homologous files of the target source code file based on a matching result of the original MD5 characteristic information of the reference source code file and the original MD5 characteristic information of the target source code file;
determining a second type of alternative homologous files of the target source code file based on a matching result of the de-annotated de-vacancy MD5 characteristic information of the reference source code file and the de-annotated de-vacancy MD5 characteristic information of the target source code file;
and determining the alternative homologous files according to the first type alternative homologous file and the second type alternative homologous file.
3. The software homology analysis method according to claim 2,
the de-annotation de-whiteware MD5 feature information is generated according to annotation rules of different programming languages.
4. The software homology analysis method according to claim 1,
the source code database further comprises: the project name of the reference source code file and the version information of the reference source code file;
determining a software homology analysis result according to the final homology file, wherein the determining comprises the following steps:
and determining a software homology analysis result according to the creation time of the final homologous file, the project name of the final homologous file and the version information of the final homologous file.
5. The software homology analysis method according to claim 4,
the acquiring of the source code database comprises:
acquiring one million open source items ranked at the top in the GitHub;
extracting feature information, creation time, project name and version information of the open source project;
taking a source code file in the open source project as the reference source code file;
and constructing the source code database based on the reference source code file.
6. The software homology analysis method according to any one of claims 1 to 5,
determining a software homology analysis result according to the final homology file, wherein the determining comprises the following steps:
the target software comprises a plurality of sections of target source code files, the final homologous files corresponding to the plurality of sections of target source code files are collected, and a software homologous analysis result is determined.
7. A software homology analysis apparatus, comprising:
a source code database acquisition unit configured to acquire a source code database, the source code database including: the characteristic information of the reference source code file and the creation time of the reference source code file;
an object source code file obtaining unit, configured to obtain an object source code file of object software, where the object source code file includes: characteristic information of the target source code file;
the alternative homologous file determining unit is used for determining an alternative homologous file of the target source code file based on a matching result of the characteristic information of the reference source code file and the characteristic information of the target source code file;
a final homologous file determining unit, configured to use the candidate homologous file with the earliest creation time as a final homologous file corresponding to the target source code file;
and the software homologous analysis result determining unit is used for determining a software homologous analysis result according to the final homologous file.
8. The software homology analysis device according to claim 7, wherein the alternative homology file determination unit comprises:
a first candidate homologous file determining subunit, configured to determine a first type of candidate homologous file of the target source code file based on a matching result of the original MD5 feature information of the reference source code file and the original MD5 feature information of the target source code file;
a second alternative homologous file determining subunit, configured to determine a second type of alternative homologous file of the target source code file based on a matching result of the de-annotated de-vacancy MD5 characteristic information of the reference source code file and the de-annotated de-vacancy MD5 characteristic information of the target source code file;
a third candidate homologous file determining subunit, configured to determine the candidate homologous file according to the first class of candidate homologous files and the second class of candidate homologous files.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the software homology analysis method according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the software homology analysis method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010335325.0A CN111666101A (en) | 2020-04-24 | 2020-04-24 | Software homologous analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010335325.0A CN111666101A (en) | 2020-04-24 | 2020-04-24 | Software homologous analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111666101A true CN111666101A (en) | 2020-09-15 |
Family
ID=72382867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010335325.0A Pending CN111666101A (en) | 2020-04-24 | 2020-04-24 | Software homologous analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666101A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579155A (en) * | 2021-02-23 | 2021-03-30 | 北京北大软件工程股份有限公司 | Code similarity detection method and device and storage medium |
CN114385231A (en) * | 2021-12-20 | 2022-04-22 | 杭州安恒信息安全技术有限公司 | Data processing method, data processing device, storage medium and electronic equipment |
CN115238102A (en) * | 2022-06-28 | 2022-10-25 | 北京关键科技股份有限公司 | Code data feature extraction and retrieval method and device |
CN115686623A (en) * | 2022-11-03 | 2023-02-03 | 苏州棱镜七彩信息科技有限公司 | Homologous detection method of closed-source software |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160103831A1 (en) * | 2014-10-14 | 2016-04-14 | Adobe Systems Incorporated | Detecting homologies in encrypted and unencrypted documents using fuzzy hashing |
CN106990956A (en) * | 2017-03-10 | 2017-07-28 | 苏州棱镜七彩信息科技有限公司 | Code file clone's detection method based on suffix tree |
CN108229170A (en) * | 2018-02-02 | 2018-06-29 | 中科软评科技(北京)有限公司 | Utilize big data and the software analysis method and device of neural network |
CN109710299A (en) * | 2018-12-14 | 2019-05-03 | 平安普惠企业管理有限公司 | A kind of open source class libraries monitoring method, device, equipment and computer storage medium |
CN110334248A (en) * | 2019-06-26 | 2019-10-15 | 京东数字科技控股有限公司 | A kind of system configuration information treating method and apparatus |
-
2020
- 2020-04-24 CN CN202010335325.0A patent/CN111666101A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160103831A1 (en) * | 2014-10-14 | 2016-04-14 | Adobe Systems Incorporated | Detecting homologies in encrypted and unencrypted documents using fuzzy hashing |
CN106990956A (en) * | 2017-03-10 | 2017-07-28 | 苏州棱镜七彩信息科技有限公司 | Code file clone's detection method based on suffix tree |
CN108229170A (en) * | 2018-02-02 | 2018-06-29 | 中科软评科技(北京)有限公司 | Utilize big data and the software analysis method and device of neural network |
CN109710299A (en) * | 2018-12-14 | 2019-05-03 | 平安普惠企业管理有限公司 | A kind of open source class libraries monitoring method, device, equipment and computer storage medium |
CN110334248A (en) * | 2019-06-26 | 2019-10-15 | 京东数字科技控股有限公司 | A kind of system configuration information treating method and apparatus |
Non-Patent Citations (1)
Title |
---|
李锁 等: "基于代码克隆检测的代码来源分析方法", 《计算机应用与软件》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579155A (en) * | 2021-02-23 | 2021-03-30 | 北京北大软件工程股份有限公司 | Code similarity detection method and device and storage medium |
CN114385231A (en) * | 2021-12-20 | 2022-04-22 | 杭州安恒信息安全技术有限公司 | Data processing method, data processing device, storage medium and electronic equipment |
CN114385231B (en) * | 2021-12-20 | 2024-05-28 | 杭州安恒信息安全技术有限公司 | Data processing method and device, storage medium and electronic equipment |
CN115238102A (en) * | 2022-06-28 | 2022-10-25 | 北京关键科技股份有限公司 | Code data feature extraction and retrieval method and device |
CN115686623A (en) * | 2022-11-03 | 2023-02-03 | 苏州棱镜七彩信息科技有限公司 | Homologous detection method of closed-source software |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111666101A (en) | Software homologous analysis method and device | |
AU2017101864A4 (en) | Method, device, server and storage apparatus of reviewing SQL | |
CN111507086B (en) | Automatic discovery of translated text locations in localized applications | |
CN108228231B (en) | Visualization drifting method of Git warehouse file annotation system | |
CN110474900B (en) | Game protocol testing method and device | |
CN110851209A (en) | Data processing method and device, electronic equipment and storage medium | |
CN111435367B (en) | Knowledge graph construction method, system, equipment and storage medium | |
US20150186195A1 (en) | Method of analysis application object which computer-executable, server performing the same and storage media storing the same | |
CN111930610B (en) | Software homology detection method, device, equipment and storage medium | |
CN112559526A (en) | Data table export method and device, computer equipment and storage medium | |
JP2017049639A (en) | Evaluation program, procedure manual evaluation method, and evaluation device | |
CN112612810A (en) | Slow SQL statement identification method and system | |
CN117093556A (en) | Log classification method, device, computer equipment and computer readable storage medium | |
JP2018133044A (en) | Webapi execution flow generation device and webapi execution flow generation method | |
CN117033309A (en) | Data conversion method and device, electronic equipment and readable storage medium | |
CN111078671A (en) | Method, device, equipment and medium for modifying data table field | |
JP2006023968A (en) | Unique expression extracting method and device and program to be used for the same | |
KR102153674B1 (en) | A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device | |
JP2016057715A (en) | Graphic type program analyzer | |
CN114816518A (en) | Simhash-based open source component screening and identifying method and system in source code | |
CN110110280B (en) | Curve integral calculation method, device and equipment for coordinates and storage medium | |
CN114579580A (en) | Data storage method and data query method and device | |
CN112540820A (en) | User interface updating method and device and electronic equipment | |
KR20200118965A (en) | A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device | |
CN115718696B (en) | Source code cryptography misuse detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200915 |