CN115016795A - Code similarity detection method and device, processor and electronic equipment - Google Patents

Code similarity detection method and device, processor and electronic equipment Download PDF

Info

Publication number
CN115016795A
CN115016795A CN202210770078.6A CN202210770078A CN115016795A CN 115016795 A CN115016795 A CN 115016795A CN 202210770078 A CN202210770078 A CN 202210770078A CN 115016795 A CN115016795 A CN 115016795A
Authority
CN
China
Prior art keywords
code
data
unit test
dimension
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210770078.6A
Other languages
Chinese (zh)
Inventor
严海伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210770078.6A priority Critical patent/CN115016795A/en
Publication of CN115016795A publication Critical patent/CN115016795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a code similarity detection method and device, a processor and electronic equipment, and relates to the field of financial technology or other related fields. The method comprises the following steps: acquiring a first code and a second code, wherein the first code and the second code are codes of similarity to be detected respectively; respectively determining a first unit test corresponding to a first code, a second unit test corresponding to a second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing a first unit test and a second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; and obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension. By the method and the device, the problem of poor effect of detecting the code similarity in the related technology is solved.

Description

Code similarity detection method and device, processor and electronic equipment
Technical Field
The application relates to the field of financial science and technology, in particular to a code similarity detection method and device, a processor and electronic equipment.
Background
At present, in the programming engineering, the implemented new functional methods (codes) have been introduced and used in a small part of classes. However, there are similar methods in engineering, which can cause problems with code redundancy, and therefore, need to be replaced.
In the related art, methods for detecting similar function codes mainly include the following three types:
(1) the text-based detection method comprises the steps of preprocessing a code block, removing spaces and the like, and converting codes into characters for comparison;
(2) the detection method based on the lexical method comprises the steps of analyzing a code into a character string sequence and then detecting a Token sequence in the code, wherein common detection algorithms comprise LCS (lower sequence similarity), suffix tree matching and the like;
(3) the syntax-based detection method constructs an abstract syntax tree by lexical and syntactic analysis of the code, and then compares the same or similar subtrees.
However, the first and second methods cannot recognize information such as syntax semantics of the program, which results in low detection accuracy; the cost of constructing the syntax tree in the method three is high, and along with the enlargement of the detection code scale, the time and space complexity of the detection method is also high, so that the detection efficiency is low.
Aiming at the problem of poor effect of detecting the code similarity in the related technology, an effective solution is not provided at present.
Disclosure of Invention
The present application mainly aims to provide a method and an apparatus for detecting code similarity, a processor and an electronic device, so as to solve the problem of poor effect of detecting code similarity in the related art.
In order to achieve the above object, according to one aspect of the present application, there is provided a method of detecting a code similarity. The method comprises the following steps: acquiring a first code and a second code, wherein the first code and the second code are respectively codes of similarity to be detected; respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing the first unit test and the second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; and obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension.
Further, under each dimension, respectively executing the first unit test and the second unit test, and obtaining a plurality of first similarities includes: executing the first unit test under each dimension to obtain a plurality of first data; converting each first data in the plurality of first data to obtain a plurality of first vectors; executing the second unit test under each dimension to obtain a plurality of second data; performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors; and obtaining a plurality of first similarities according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
Further, in each dimension, performing the first unit test to obtain a plurality of first data includes: executing the first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing information of parameters of the first code; executing the first unit test in a second dimension of the plurality of dimensions to obtain fourth data, wherein the fourth data is used for representing target information of a target method, the target method is a method in the first unit test and the second unit test, and the target information is at least one of the following: information of parameters of the target method and an expected result obtained by using the target method are transmitted; executing the first unit test in a third dimension of the plurality of dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test; executing the first unit test under a fourth dimension of the plurality of dimensions to obtain sixth data, wherein the sixth data is used for representing a result obtained by executing the first unit test; executing the first unit test in a fifth dimension of the plurality of dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test; summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain the plurality of first data.
Further, in each dimension, performing the second unit test to obtain a plurality of second data includes: executing the second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing information of parameters of the second code; executing the second unit test in the second dimension to obtain ninth data, wherein the ninth data is used for representing the target information of the target method; executing the second unit test in the third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists in the process of executing the second unit test; executing the second unit test under the fourth dimension to obtain eleventh data, wherein the eleventh data is used for representing a result obtained by executing the second unit test; executing the second unit test under the fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering the second code in the process of executing the second unit test; summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain the plurality of second data.
Further, obtaining the first code and the second code comprises: determining the code under development in the target project, and taking the code under development as the first code; acquiring a first calling number and a first return value of the first code when the first code runs in the target project; determining a third code in the target project, wherein the return value is the same as the first return value; acquiring a second calling frequency of the third code in the running process; and determining the third code as the second code according to the first calling times and the second calling times.
Further, determining the third code as the second code according to the first number of calls and the second number of calls comprises: calculating a difference value between the first calling times and the second calling times; judging whether the difference value is larger than a first preset value or not; and if the difference value is larger than the first preset value, determining that the third code is used as the second code.
Further, after obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension, the method further includes: judging whether the target similarity is smaller than a second preset value or not; and if the target similarity is smaller than the second preset value, performing replacement processing on the second code.
In order to achieve the above object, according to another aspect of the present application, there is provided a code similarity detecting apparatus. The device includes: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a first code and a second code, and the first code and the second code are respectively codes of similarity to be detected; a first determining unit, configured to determine a first unit test corresponding to the first code, a second unit test corresponding to the second code, multiple dimensions for detecting similarity between the first code and the second code, and a weight of each dimension, respectively; a first execution unit, configured to execute the first unit test and the second unit test respectively in each dimension to obtain a plurality of first similarities, where each first similarity is used to represent a similarity between the first code and the second code in each dimension; a second determining unit, configured to obtain a target similarity between the first code and the second code according to the multiple first similarities and the weight of each dimension.
Further, the first execution unit includes: the first execution module is used for executing the first unit test under each dimension to obtain a plurality of first data; the first processing module is used for carrying out conversion processing on each first data in the plurality of first data to obtain a plurality of first vectors; the second execution module is used for executing the second unit test under each dimension to obtain a plurality of second data; the second processing module is used for performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors; a first determining module, configured to obtain a plurality of first similarities according to each of the plurality of first vectors and each of the plurality of second vectors.
Further, the first execution module includes: a first execution submodule, configured to execute the first unit test in a first dimension of the multiple dimensions, to obtain third data, where the third data is used to represent information of a parameter of the first code; a second execution sub-module, configured to execute the first unit test in a second dimension of the multiple dimensions to obtain fourth data, where the fourth data is used to represent target information of a target device, the target device is a device in the first unit test and the second unit test, and the target information is at least one of: information of parameters of the target device is transmitted, and an expected result is obtained by using the target device; a third execution submodule, configured to execute the first unit test in a third dimension of the multiple dimensions to obtain fifth data, where the fifth data is used to indicate whether an abnormal condition exists in a process of executing the first unit test; a fourth execution submodule, configured to execute the first unit test in a fourth dimension of the multiple dimensions, to obtain sixth data, where the sixth data is used to indicate a result obtained by executing the first unit test; a fifth execution submodule, configured to execute the first unit test in a fifth dimension of the multiple dimensions, so as to obtain seventh data, where the seventh data is used to indicate a coverage rate of covering the first code in a process of executing the first unit test; the first summarizing submodule is configured to summarize the third data, the fourth data, the fifth data, the sixth data, and the seventh data to obtain the plurality of first data.
Further, the second execution module includes: a sixth execution submodule, configured to execute the second unit test in the first dimension to obtain eighth data, where the eighth data is used to represent information of a parameter of the second code; a seventh execution sub-module, configured to execute the second unit test in the second dimension to obtain ninth data, where the ninth data is used to represent the target information of the target device; an eighth execution submodule, configured to execute the second unit test in the third dimension to obtain tenth data, where the tenth data is used to indicate whether an abnormal condition exists in a process of executing the second unit test; a ninth execution submodule, configured to execute the second unit test in the fourth dimension to obtain eleventh data, where the eleventh data is used to indicate a result obtained by executing the second unit test; a tenth execution submodule, configured to execute the second unit test in the fifth dimension to obtain twelfth data, where the twelfth data is used to indicate a coverage rate of covering the second code in a process of executing the second unit test; a second summarizing submodule, configured to summarize the eighth data, the ninth data, the tenth data, the eleventh data, and the twelfth data to obtain the plurality of second data.
Further, the first acquisition unit includes: the second determining module is used for determining the code under development in the target project and taking the code under development as the first code; the second obtaining module is used for obtaining a first calling number and a first return value of the first code when the first code runs in the target project; a third determining module, configured to determine a third code in the target project, where a return value is the same as the first return value; the third obtaining module is used for obtaining a second calling frequency of the third code in the running process; and the fourth determining module is used for determining the third code as the second code according to the first calling times and the second calling times.
Further, the fourth determining module includes: the first calculation submodule is used for calculating the difference value between the first calling times and the second calling times; the first judgment submodule is used for judging whether the difference value is larger than a first preset value or not; and the first determining submodule is used for determining the third code as the second code if the difference value is greater than the first preset value.
Further, the apparatus further comprises: a first judging unit, configured to judge whether or not a target similarity between the first code and the second code is smaller than a second preset value after obtaining the target similarity according to the plurality of first similarities and the weight of each dimension; and the first processing unit is used for carrying out replacement processing on the second code if the target similarity is smaller than the second preset value.
In order to achieve the above object, according to another aspect of the present application, there is provided a processor configured to execute a program, where the program executes a method for detecting code similarity according to any one of the above methods.
In order to achieve the above object, according to another aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the code similarity detection method according to any one of the above.
Through the application, the following steps are adopted: acquiring a first code and a second code, wherein the first code and the second code are codes of similarity to be detected respectively; respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing a first unit test and a second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; according to the multiple first similarities and the weight of each dimension, the target similarity between the first code and the second code is obtained, and the problem that the effect of detecting the code similarity in the related technology is poor is solved. The similarity between the first code and the second code is obtained according to the similarity and the weight of each dimension, so that the effect of detecting the similarity of the codes can be improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flowchart of a method for detecting code similarity according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for detecting similarity of codes according to an embodiment of the present application;
fig. 3 is a schematic diagram of an electronic device provided according to an embodiment of the application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that relevant information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data that are authorized by the user or sufficiently authorized by various parties. For example, an interface is provided between the system and the relevant user or organization, before obtaining the relevant information, an obtaining request needs to be sent to the user or organization through the interface, and after receiving the consent information fed back by the user or organization, the relevant information is obtained.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
unit testing: the unit test is also called a module test, which is a test work for performing a correctness check for a program module. The program unit is the smallest testable part of the application. In procedural programming, a unit is a single program, function, procedure, etc.; for object-oriented programming, the smallest unit is a method, including methods in the base class (superclass), abstract class, or derived classes (subclass).
The present invention is described below with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for detecting code similarity according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, a first code and a second code are obtained, wherein the first code and the second code are codes of similarity to be detected respectively.
For example, two pieces of code, namely the first code and the second code, which need to be detected for similarity detection can be obtained from a programmed project.
Step S102, respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code, and the weight of each dimension.
For example, unit tests corresponding to two codes are found respectively, and then a plurality of dimensions and a weight of each dimension for detecting the similarity between the two codes are determined. For example, the multiple dimensions may be information of incoming parameters of the code, result information obtained by using a method in the unit test, whether an exception exists in the execution process of the unit test, a result obtained by executing the unit test, and a coverage rate when executing the unit test.
Step S103, respectively performing a first unit test and a second unit test in each dimension to obtain a plurality of first similarities, where each first similarity is used to indicate a similarity between the first code and the second code in each dimension.
For example, unit tests corresponding to two codes can be respectively executed in each dimension, and the similarity between the two codes in each dimension is obtained.
And step S104, obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension.
For example, the similarity between two codes in each dimension and the weight of each dimension are calculated to obtain the final similarity between two codes.
Through the steps S101 to S104, the unit test corresponding to the first code and the unit test corresponding to the second code are respectively executed in each dimension to obtain a plurality of similarities, and the similarity between the first code and the second code is obtained according to the plurality of similarities and the weight of each dimension, so that the effect of detecting the code similarity can be improved.
In order to quickly detect the similarity between codes, how to acquire the first code and the second code is also one of the key points, in the method for detecting the similarity between codes provided in the embodiment of the present application, the acquisition of the first code and the second code is further defined, and the acquisition of the first code and the second code may be implemented by using the following technical features: determining the code under development in the target project, and taking the code under development as a first code; acquiring a first calling number and a first return value of a first code when the first code runs in a target project; determining a third code with the return value being the same as the first return value in the target engineering; acquiring a second calling frequency of the third code in the running process; and determining the third code as the second code according to the first calling times and the second calling times.
For example, a new function method is operated in a project, the calling times and the return values of the new method in the operation process are recorded, then a method the same as the return value of the new method is searched, the calling times of the new method in the operation process and the searched calling times of the method the same as the return value of the new method are compared, and finally the method the same as the return value of the new method is determined as a method for detecting the similarity, namely the similarity between the new method and the method the same as the return value of the new method is detected.
By the scheme, the method for comparing the similarity with the new method can be quickly and accurately determined according to the return value of the method in the engineering.
In order to quickly detect similarity between codes, how to determine a second code is also one of key points, in the code similarity detection method provided in the embodiment of the present application, how to determine that a third code is used as the second code according to a first call time and a second call time is further defined, and according to the first call time and the second call time, determining that the third code is used as the second code may be implemented by using the following technical features: calculating a difference value between the first calling times and the second calling times; judging whether the difference value is larger than a first preset value or not; and if the difference value is larger than the first preset value, determining that the third code is used as the second code.
For example, if the number of calls of the new method in the project is very different from the number of calls of the method with the same value as the new method return value, the similarity between the new method and the method with the same value as the new method return value is compared. The specific steps may be that the difference between the number of times the new method is called in the engineering and the number of times the method with the same value as the new method return value is called is calculated to be 10, the first preset value may be set to be 5, and when the difference 10 between the number of times the new method is judged to be greater than the first preset value 5, the similarity between the new method and the method with the same value as the new method return value is compared.
By the scheme, the method to be compared with the new method in similarity can be quickly and accurately determined according to the calling times of the method in the engineering.
In order to obtain a plurality of first data quickly and accurately, in the detection method for code similarity provided in the embodiment of the present application, the plurality of first data may also be obtained through the following steps: executing a first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing information of parameters of the first code; executing the first unit test in a second dimension of the multiple dimensions to obtain fourth data, wherein the fourth data is used for representing target information of a target method, the target method is a method in the first unit test and the second unit test, and the target information is at least one of the following: transmitting information of parameters of the target method and an expected result obtained by using the target method; executing a first unit test in a third dimension of the multiple dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test; executing the first unit test under a fourth dimension of the plurality of dimensions to obtain sixth data, wherein the sixth data is used for representing a result obtained by executing the first unit test; executing a first unit test under a fifth dimension of the multiple dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test; and summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain a plurality of first data.
For example, a code sketch based on the results of unit test runs is drawn for a new code block and a found code block, respectively, and is mainly depicted from the following 5 aspects:
(1) parameters are as follows: the number of the incoming parameters and the types (number, type 1, type 2 and type 3.) of the corresponding source code blocks;
(2) the method comprises the following steps: the Mock method (a method in unit test) introduces the number and type of parameters and the expected result (parameters, type, expected result);
(3) exception: exceptions (exception 1, exception 2) resulting from the unit test execution;
(4) as a result: the results from the unit test execution (result 1, result 2);
(5) coverage rate: the unit test of the source code block performs coverage (coverage).
For example, method 1 may be:
(1) the method name is as follows: checkId (String name);
(2) return value type: boolean;
(3) the implementation process comprises the following steps: firstly, calling a getId method in an appServer class through an incoming name to obtain id information; then define the special character set [ -! @ # $% & () + | { }'; ' \ \ [ \\\\/? To! @ # -% (… … & [ + ] { } [; : ""'. And is? ] "; and then respectively defining Pattern p and Matcher m, and directly returning the value of m.find (), namely judging whether the special characters are contained.
In addition, the unit test of method 1 is mainly as follows:
(1) the getId method Return ("1") of Mock appServer does not contain a special character and then is passed in 1, the flag is returned as true;
(2) the getId method Return (";; 1") of Mock appServer contains a special character, which is then passed in 1, returning flag to false.
Then, data plotting was performed for method 1 for the above 5 aspects, as shown in table 1.
TABLE 1
Figure BDA0003726987920000091
By the scheme, the data description can be performed on the new method operated in the engineering under multiple dimensions, so that the similarity between two codes obtained subsequently is paved.
In order to obtain a plurality of second data quickly and accurately, in the detection method for code similarity provided in the embodiment of the present application, the plurality of second data may also be obtained through the following steps: executing a second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing the information of the parameters of the second code; executing a second unit test under a second dimension to obtain ninth data, wherein the ninth data is used for representing target information of a target method; executing a second unit test in a third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists in the process of executing the second unit test; executing a second unit test under a fourth dimension to obtain eleventh data, wherein the eleventh data is used for representing a result obtained by executing the second unit test; executing a second unit test under a fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering a second code in the process of executing the second unit test; and summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain a plurality of second data.
For example, method 2 may be:
(1) the method name is as follows: checkstring (string);
(2) return value type: boolean;
(3) the implementation process comprises the following steps: firstly, calling a getId method in an appServer class through an incoming name to obtain id information; then define the special character set [ -! @ # $% & () + | { }'; ' \ \ [ \\\\/? To! @ # -% (… … & [ + ] { } [; : ""'. Is there? ] "; and then defining Pattern p and Matcher m respectively, acquiring the value of m.find () first, then returning true if the value is judged to be true, and otherwise, returning false.
In addition, the unit test of method 2 is mainly as follows:
(1) the getId method Return ("1") of Mock appServer does not contain special characters, and then is transmitted into 1, and returns flag as true;
(2) the getId method Return ("; 1") of Mock appServer contains a special character, which is then passed in 1, returning flag to false.
Then, data plotting was performed for method 2 for the above 5 aspects, as shown in table 1.
Through the scheme, the data description can be performed on the method for detecting the similarity with the new method in multiple dimensions, so that the similarity between two codes obtained subsequently is laid.
In order to obtain a plurality of first similarities quickly and accurately, in the detection method for code similarity provided in the embodiment of the present application, the plurality of first similarities may also be obtained through the following steps: executing a first unit test under each dimension to obtain a plurality of first data; converting each first data in the plurality of first data to obtain a plurality of first vectors; executing a second unit test under each dimension to obtain a plurality of second data; converting each second data in the plurality of second data to obtain a plurality of second vectors; a plurality of first similarities are obtained according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
For example, the obtained data corresponding to the new method and the data corresponding to the method for detecting the similarity of the new method are converted into vectors, that is, the data in table 1 are converted into vectors, and then the calculation of the distance between the vectors can be adopted to perform weight calculation by integrating the calculation of five dimensions, so as to obtain the similarity between two codes.
The method comprises the following specific steps:
step 1: converting the data in table 1 into a vector representation, as shown in table 2;
TABLE 2
Figure BDA0003726987920000111
Step 2: for the data obtained in step 1, similarity calculation is performed on all five dimensions (a method of calculating distance between vectors can be used), then weight calculation is performed by synthesizing the calculation of the five dimensions, and the calculation formula is as follows:
similarity s ═ w (parameter) x0+ w (method) x0.5+ w (anomaly) x0+ w (result) x0+ w (coverage) x0 ═ 0.15.
In summary, the dynamic detection mechanism based on unit test operation can effectively reduce the time consumed by searching by the similar or repeated method, and can improve the detection efficiency. In addition, based on the dynamic detection of the unit test run, the detection can be performed without constructing a semantic syntax tree, i.e., the time and space complexity is relatively low. Moreover, the coverage rate of the unit test of the code block is as high as 80%, so that the judgment of a similar or repeated method is carried out by comparing the unit test operation process parameters with the result data, and the accuracy is higher.
In order to effectively solve the problem of code redundancy, in the detection method of code similarity provided by the embodiment of the present application, the problem of code redundancy may also be solved through the following steps: after the target similarity between the first code and the second code is obtained according to the plurality of first similarities and the weight of each dimension, judging whether the target similarity is smaller than a second preset value; and if the target similarity is smaller than a second preset value, performing replacement processing on the second code.
For example, it may be set that if the calculated similarity is less than 0.5 (the second preset value described above), it indicates that the method function similarity is high. That is, since the similarity obtained by the calculation is 0.15 and less than 0.5, the similarity between the first code and the second code is high, and it can be determined that the two codes have a duplicate or similar function. The second code described above is then replaced.
By the scheme, the detection judgment of the similar method is carried out by utilizing a dynamic detection mechanism, and codes with repeated or similar functions are replaced, so that the problem of code redundancy can be effectively solved.
According to the method provided by the embodiment of the application, for example, a code with the same return value as a new code is found according to the return value of the new code running in a project, and then a code to be subjected to similarity detection with the new code is determined according to the calling times of the new code and the return times of the code with the same return value as the new code; respectively executing unit tests corresponding to the two codes from multiple dimensions, obtaining vector data corresponding to the two codes under each dimension, and obtaining the similarity between the two codes under each dimension according to the vector data; and finally, according to the similarity between the two codes under each dimension and the weight of each dimension, the final similarity between the two codes can be obtained.
In summary, in the method for detecting similarity of codes provided in the embodiment of the present application, a first code and a second code are obtained, where the first code and the second code are codes of similarity to be detected, respectively; respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing a first unit test and a second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; according to the multiple first similarities and the weight of each dimension, the target similarity between the first code and the second code is obtained, and the problem that the effect of detecting the code similarity in the related technology is poor is solved. The unit test corresponding to the first code and the unit test corresponding to the second code are respectively executed under each dimension to obtain a plurality of similarities, and the similarity between the first code and the second code is obtained according to the similarities and the weight of each dimension, so that the effect of detecting the code similarity can be improved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the present application further provides a device for detecting code similarity, and it should be noted that the device for detecting code similarity according to the embodiment of the present application can be used to execute the method for detecting code similarity according to the embodiment of the present application. The following describes a device for detecting code similarity provided in an embodiment of the present application.
Fig. 2 is a schematic diagram of a device for detecting code similarity according to an embodiment of the present application. As shown in fig. 2, the apparatus includes: a first acquisition unit 201, a first determination unit 202, a first execution unit 203, and a second determination unit 204.
Specifically, the first obtaining unit 201 is configured to obtain a first code and a second code, where the first code and the second code are codes of similarity to be detected, respectively;
a first determining unit 202, configured to determine a first unit test corresponding to the first code, a second unit test corresponding to the second code, multiple dimensions for detecting similarity between the first code and the second code, and a weight of each dimension, respectively;
a first execution unit 203, configured to execute a first unit test and a second unit test respectively in each dimension to obtain a plurality of first similarities, where each first similarity is used to represent a similarity between a first code and a second code in each dimension;
a second determining unit 204, configured to obtain a target similarity between the first code and the second code according to the multiple first similarities and the weight of each dimension.
To sum up, the device for detecting similarity of codes according to the embodiment of the present application obtains a first code and a second code through the first obtaining unit 201, where the first code and the second code are codes of similarity to be detected, respectively; the first determining unit 202 determines a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting similarity between the first code and the second code, and a weight of each dimension, respectively; the first execution unit 203 respectively executes a first unit test and a second unit test in each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code in each dimension; the second determining unit 204 obtains the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension, and solves the problem of poor effect of detecting the code similarity in the related art. The unit test corresponding to the first code and the unit test corresponding to the second code are respectively executed under each dimension to obtain a plurality of similarities, and the similarity between the first code and the second code is obtained according to the similarities and the weight of each dimension, so that the effect of detecting the code similarity can be improved.
Optionally, in the apparatus for detecting code similarity provided in this embodiment of the present application, the first execution unit includes: the first execution module is used for executing the first unit test under each dimension to obtain a plurality of first data; the first processing module is used for carrying out conversion processing on each first data in the plurality of first data to obtain a plurality of first vectors; the second execution module is used for executing the second unit test under each dimension to obtain a plurality of second data; the second processing module is used for performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors; the first determining module is used for obtaining a plurality of first similarities according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
Optionally, in the apparatus for detecting code similarity provided in the embodiment of the present application, the first execution module includes: the first execution submodule is used for executing the first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing the information of the parameters of the first code; a second execution submodule, configured to execute the first unit test in a second dimension of the multiple dimensions to obtain fourth data, where the fourth data is used to represent target information of a target device, the target device is a device in the first unit test and the second unit test, and the target information is at least one of: information of parameters of the target device is transmitted, and an expected result is obtained by using the target device; the third execution submodule is used for executing the first unit test under a third dimension of the multiple dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test; the fourth execution submodule is used for executing the first unit test under a fourth dimension of the multiple dimensions to obtain sixth data, wherein the sixth data is used for expressing a result obtained by executing the first unit test; the fifth execution submodule is used for executing the first unit test in a fifth dimension of the multiple dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test; and the first summarizing submodule is used for summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain a plurality of first data.
Optionally, in the apparatus for detecting code similarity provided in this embodiment of the present application, the second execution module includes: the sixth execution submodule is used for executing the second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing the information of the parameters of the second code; the seventh execution submodule is used for executing the second unit test in the second dimension to obtain ninth data, wherein the ninth data is used for representing target information of the target device; the eighth execution submodule is used for executing the second unit test under the third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists or not in the process of executing the second unit test; the ninth execution submodule is used for executing the second unit test under the fourth dimension to obtain eleventh data, wherein the eleventh data are used for representing the result obtained by executing the second unit test; the tenth execution submodule is used for executing the second unit test in the fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering the second code in the process of executing the second unit test; and the second summarizing submodule is used for summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain a plurality of second data.
Optionally, in the apparatus for detecting similarity of codes provided in the embodiment of the present application, the first obtaining unit includes: the second determining module is used for determining the code under development in the target project and taking the code under development as the first code; the second acquisition module is used for acquiring a first calling number and a first return value of the first code when the first code runs in the target project; the third determining module is used for determining a third code with the return value being the same as the first return value in the target project; the third obtaining module is used for obtaining a second calling frequency of the third code in the running process; and the fourth determining module is used for determining the third code as the second code according to the first calling times and the second calling times.
Optionally, in the apparatus for detecting code similarity provided in this embodiment of the present application, the fourth determining module includes: the first calculation submodule is used for calculating the difference value between the first calling times and the second calling times; the first judgment submodule is used for judging whether the difference value is larger than a first preset value or not; and the first determining submodule is used for determining to take the third code as the second code if the difference value is larger than the first preset value.
Optionally, in the apparatus for detecting similarity of codes provided in the embodiment of the present application, the apparatus further includes: the first judging unit is used for judging whether the target similarity is smaller than a second preset value after the target similarity between the first code and the second code is obtained according to the plurality of first similarities and the weight of each dimension; and the first processing unit is used for carrying out replacement processing on the second code if the target similarity is smaller than a second preset value.
The device for detecting the similarity of the codes comprises a processor and a memory, wherein the first acquiring unit 201, the first determining unit 202, the first executing unit 203, the second determining unit 204 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the effect of detecting the code similarity is improved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.
The embodiment of the invention provides a processor, which is used for running a program, wherein the detection method of the code similarity is executed when the program runs.
As shown in fig. 3, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps: acquiring a first code and a second code, wherein the first code and the second code are codes of similarity to be detected respectively; respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing the first unit test and the second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; and obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension.
The processor executes the program and further realizes the following steps: respectively executing the first unit test and the second unit test in each dimension to obtain a plurality of first similarities, including: executing the first unit test under each dimension to obtain a plurality of first data; converting each first data in the plurality of first data to obtain a plurality of first vectors; executing the second unit test under each dimension to obtain a plurality of second data; performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors; and obtaining a plurality of first similarities according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
The processor executes the program and further realizes the following steps: executing the first unit test in each dimension to obtain a plurality of first data, including: executing the first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing information of parameters of the first code; executing the first unit test in a second dimension of the plurality of dimensions to obtain fourth data, wherein the fourth data is used for representing target information of a target method, the target method is a method in the first unit test and the second unit test, and the target information is at least one of the following: information of parameters of the target method and an expected result obtained by using the target method are transmitted; executing the first unit test in a third dimension of the plurality of dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test; executing the first unit test under a fourth dimension of the plurality of dimensions to obtain sixth data, wherein the sixth data is used for representing a result obtained by executing the first unit test; executing the first unit test in a fifth dimension of the plurality of dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test; summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain the plurality of first data.
The processor executes the program and further realizes the following steps: performing the second unit test in each dimension, and obtaining a plurality of second data comprises: executing the second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing information of parameters of the second code; executing the second unit test in the second dimension to obtain ninth data, wherein the ninth data is used for representing the target information of the target method; executing the second unit test under the third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists in the process of executing the second unit test; executing the second unit test under the fourth dimension to obtain eleventh data, wherein the eleventh data is used for representing a result obtained by executing the second unit test; executing the second unit test under the fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering the second code in the process of executing the second unit test; summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain the plurality of second data.
The processor executes the program and further realizes the following steps: acquiring the first code and the second code comprises: determining the code under development in the target project, and taking the code under development as the first code; acquiring a first calling number and a first return value of the first code when the first code runs in the target project; determining a third code in the target project, wherein the return value is the same as the first return value; acquiring a second calling frequency of the third code in the running process; and determining the third code as the second code according to the first calling times and the second calling times.
The processor executes the program and further realizes the following steps: determining, according to the first number of calls and the second number of calls, that the third code is the second code includes: calculating a difference value between the first calling times and the second calling times; judging whether the difference value is larger than a first preset value or not; and if the difference value is larger than the first preset value, determining that the third code is used as the second code.
The processor executes the program and further realizes the following steps: after obtaining a target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension, the method further includes: judging whether the target similarity is smaller than a second preset value or not; and if the target similarity is smaller than the second preset value, performing replacement processing on the second code.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a first code and a second code, wherein the first code and the second code are codes of similarity to be detected respectively; respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension; respectively executing the first unit test and the second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension; and obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: respectively executing the first unit test and the second unit test under each dimension to obtain a plurality of first similarities, including: executing the first unit test under each dimension to obtain a plurality of first data; converting each first data in the plurality of first data to obtain a plurality of first vectors; executing the second unit test under each dimension to obtain a plurality of second data; performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors; and obtaining a plurality of first similarities according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: executing the first unit test in each dimension to obtain a plurality of first data, including: executing the first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing information of parameters of the first code; executing the first unit test in a second dimension of the plurality of dimensions to obtain fourth data, wherein the fourth data is used for representing target information of a target method, the target method is a method in the first unit test and the second unit test, and the target information is at least one of the following: information of parameters of the target method and an expected result obtained by using the target method are transmitted; executing the first unit test in a third dimension of the plurality of dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test; executing the first unit test under a fourth dimension of the plurality of dimensions to obtain sixth data, wherein the sixth data is used for representing a result obtained by executing the first unit test; executing the first unit test in a fifth dimension of the plurality of dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test; summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain the plurality of first data.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: performing the second unit test in each dimension, and obtaining a plurality of second data comprises: executing the second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing information of parameters of the second code; executing the second unit test in the second dimension to obtain ninth data, wherein the ninth data is used for representing the target information of the target method; executing the second unit test in the third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists in the process of executing the second unit test; executing the second unit test under the fourth dimension to obtain eleventh data, wherein the eleventh data is used for representing a result obtained by executing the second unit test; executing the second unit test under the fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering the second code in the process of executing the second unit test; summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain the plurality of second data.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: acquiring the first code and the second code comprises: determining the code under development in the target project, and taking the code under development as the first code; acquiring a first calling number and a first return value of the first code when the first code runs in the target project; determining a third code in the target project, wherein the return value is the same as the first return value; acquiring a second calling frequency of the third code in the running process; and determining the third code as the second code according to the first calling times and the second calling times.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: determining, according to the first number of times of invocation and the second number of times of invocation, that the third code is taken as the second code includes: calculating a difference value between the first calling times and the second calling times; judging whether the difference value is larger than a first preset value or not; and if the difference value is larger than the first preset value, determining that the third code is used as the second code.
When executed on a data processing device, is further adapted to perform a procedure for initializing the following method steps: after obtaining a target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension, the method further includes: judging whether the target similarity is smaller than a second preset value or not; and if the target similarity is smaller than the second preset value, performing replacement processing on the second code.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for detecting code similarity, comprising:
acquiring a first code and a second code, wherein the first code and the second code are codes of similarity to be detected respectively;
respectively determining a first unit test corresponding to the first code, a second unit test corresponding to the second code, a plurality of dimensions for detecting the similarity between the first code and the second code and the weight of each dimension;
respectively executing the first unit test and the second unit test under each dimension to obtain a plurality of first similarities, wherein each first similarity is used for representing the similarity between the first code and the second code under each dimension;
and obtaining the target similarity between the first code and the second code according to the plurality of first similarities and the weight of each dimension.
2. The method of claim 1, wherein performing the first unit test and the second unit test separately in each dimension, and obtaining a plurality of first similarities comprises:
executing the first unit test under each dimension to obtain a plurality of first data;
converting each first data in the plurality of first data to obtain a plurality of first vectors;
executing the second unit test under each dimension to obtain a plurality of second data;
performing conversion processing on each second data in the plurality of second data to obtain a plurality of second vectors;
and obtaining a plurality of first similarities according to each first vector in the plurality of first vectors and each second vector in the plurality of second vectors.
3. The method of claim 2, wherein performing the first unit test in each dimension to obtain a plurality of first data comprises:
executing the first unit test under a first dimension of the plurality of dimensions to obtain third data, wherein the third data is used for representing information of parameters of the first code;
executing the first unit test in a second dimension of the plurality of dimensions to obtain fourth data, wherein the fourth data is used for representing target information of a target method, the target method is a method in the first unit test and the second unit test, and the target information is at least one of the following: information of parameters of the target method and an expected result obtained by using the target method are transmitted;
executing the first unit test in a third dimension of the plurality of dimensions to obtain fifth data, wherein the fifth data is used for indicating whether an abnormal condition exists in the process of executing the first unit test;
executing the first unit test under a fourth dimension of the plurality of dimensions to obtain sixth data, wherein the sixth data is used for representing a result obtained by executing the first unit test;
executing the first unit test in a fifth dimension of the plurality of dimensions to obtain seventh data, wherein the seventh data is used for representing the coverage rate of covering the first code in the process of executing the first unit test;
summarizing the third data, the fourth data, the fifth data, the sixth data and the seventh data to obtain the plurality of first data.
4. The method of claim 3, wherein performing the second unit test in each dimension to obtain a plurality of second data comprises:
executing the second unit test under the first dimension to obtain eighth data, wherein the eighth data is used for representing information of parameters of the second code;
executing the second unit test under the second dimension to obtain ninth data, wherein the ninth data is used for representing the target information of the target method;
executing the second unit test in the third dimension to obtain tenth data, wherein the tenth data is used for indicating whether an abnormal condition exists in the process of executing the second unit test;
executing the second unit test under the fourth dimension to obtain eleventh data, wherein the eleventh data is used for representing a result obtained by executing the second unit test;
executing the second unit test under the fifth dimension to obtain twelfth data, wherein the twelfth data is used for representing the coverage rate of covering the second code in the process of executing the second unit test;
summarizing the eighth data, the ninth data, the tenth data, the eleventh data and the twelfth data to obtain the plurality of second data.
5. The method of claim 1, wherein obtaining the first code and the second code comprises:
determining code under development in a target project, and taking the code under development as the first code;
acquiring a first calling number and a first return value of the first code when the first code runs in the target project;
determining a third code in the target project, wherein the return value is the same as the first return value;
acquiring a second calling frequency of the third code in the running process;
and determining the third code as the second code according to the first calling times and the second calling times.
6. The method of claim 5, wherein determining the third code as the second code according to the first number of calls and the second number of calls comprises:
calculating a difference value between the first calling times and the second calling times;
judging whether the difference value is larger than a first preset value or not;
and if the difference value is larger than the first preset value, determining that the third code is used as the second code.
7. The method of claim 1, wherein after deriving the target similarity between the first code and the second code according to the plurality of first similarities and the weight for each dimension, the method further comprises:
judging whether the target similarity is smaller than a second preset value or not;
and if the target similarity is smaller than the second preset value, performing replacement processing on the second code.
8. A device for detecting similarity between codes, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a first code and a second code, and the first code and the second code are respectively codes of similarity to be detected;
a first determining unit, configured to determine a first unit test corresponding to the first code, a second unit test corresponding to the second code, multiple dimensions for detecting similarity between the first code and the second code, and a weight of each dimension, respectively;
a first execution unit, configured to execute the first unit test and the second unit test respectively in each dimension to obtain a plurality of first similarities, where each first similarity is used to represent a similarity between the first code and the second code in each dimension;
a second determining unit, configured to obtain a target similarity between the first code and the second code according to the multiple first similarities and the weight of each dimension.
9. A processor, configured to execute a program, wherein the program executes the method for detecting similarity between codes according to any one of claims 1 to 7.
10. An electronic device comprising one or more processors and memory storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the code similarity detection method of any one of claims 1 to 7.
CN202210770078.6A 2022-07-01 2022-07-01 Code similarity detection method and device, processor and electronic equipment Pending CN115016795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210770078.6A CN115016795A (en) 2022-07-01 2022-07-01 Code similarity detection method and device, processor and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210770078.6A CN115016795A (en) 2022-07-01 2022-07-01 Code similarity detection method and device, processor and electronic equipment

Publications (1)

Publication Number Publication Date
CN115016795A true CN115016795A (en) 2022-09-06

Family

ID=83078718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210770078.6A Pending CN115016795A (en) 2022-07-01 2022-07-01 Code similarity detection method and device, processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN115016795A (en)

Similar Documents

Publication Publication Date Title
CN107992307B (en) Function compiling method and device
CN112800427B (en) Webshell detection method and device, electronic equipment and storage medium
US11327722B1 (en) Programming language corpus generation
CN103559123A (en) Function call stack analyzing method and device based on VxWorks operation system
CN113961919B (en) Malicious software detection method and device
WO2016130542A1 (en) Code relatives detection
CN111488573A (en) Link library detection method and device, electronic equipment and computer readable storage medium
CN112688966A (en) Webshell detection method, device, medium and equipment
Zuo Defense of Computer Network Viruses Based on Data Mining Technology.
CN103955425A (en) Webpage (WEB) exploring testing device and method
CN110826057A (en) Data processing path analysis method, computer device, and storage medium
CN112527302B (en) Error detection method and device, terminal and storage medium
CN116483888A (en) Program evaluation method and device, electronic equipment and computer readable storage medium
CN115016795A (en) Code similarity detection method and device, processor and electronic equipment
CN110968500A (en) Test case execution method and device
CN111143203B (en) Machine learning method, privacy code determination method, device and electronic equipment
Bluemke et al. Selection of metrics for the defect prediction
CN107015909B (en) Test method and device based on code change analysis
Pócza et al. Cross-language program slicing in the .NET framework
US20160291946A1 (en) Custom class library generation method and apparatus
CN108255802B (en) Universal text parsing architecture and method and device for parsing text based on architecture
CN112612471B (en) Code processing method, device, equipment and storage medium
Tian et al. Bbreglocator: A vulnerability detection system based on bounding box regression
US11356853B1 (en) Detection of malicious mobile apps
Romano et al. Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination