CN111459788A

CN111459788A - Test program plagiarism detection method based on support vector machine

Info

Publication number: CN111459788A
Application number: CN201910055791.0A
Authority: CN
Inventors: 陈振宇; 孙伟松; 孙泽嵩; 王兴亚; 段定
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2020-07-28

Abstract

The invention relates to a test program plagiarism detection method based on a support vector machine. The method comprises the steps of cutting and statically analyzing a program to be tested and a test program to obtain a mapping set of the method to be tested and a mapping set of the test method; secondly, traversing every two players, calculating the similarity of the test segments and summarizing to obtain a similarity set; then, selecting a proper kernel function and a proper reference point to establish a support vector machine model and optimizing; and finally, for other test programs, judging the plagiarism condition among the test programs by calculating the similarity set and inputting the similarity set into a support vector machine. The invention aims to fill the blank of the detection technology of the similarity of the codes of the test program, improve the accuracy and precision of detecting the code plagiarism of the test program, and further help developers to test the code plagiarism behaviors of automatic detection players of competitions, thereby saving the manual detection link, saving the labor cost and the time cost, and ensuring that the competitions are fairer and more fair.

Description

Test program plagiarism detection method based on support vector machine

Technical Field

The invention belongs to the field of software test code plagiarism detection, and particularly relates to a method for accurately solving the plagiarism phenomenon of test codes submitted by players in the test competition of developers at present. After the program to be tested and the test program are analyzed, model training, testing, verifying and optimizing are carried out on the existing test code data set by introducing a support vector machine method. The unlabeled test code data is then classified. Whether plagiarism exists between the test codes can be judged by analyzing the categories of the test codes, so that a manual detection link is omitted, the labor cost and the time cost are saved, and the competition is more fair and fair.

Background

In the process of writing codes, a software developer copies and pastes codes from different sources and achieves the purpose of the software developer by modification, so that the common behavior is realized, the efficiency of writing the software codes is improved, and the problem that long time is spent on ideas and codes which are already finished by previous people is avoided. However, in some cases, copying and pasting code can have unexpected consequences, such as the possibility of violating software copyrights, and the like. In developer test competitions, players also increase their scores by copying and pasting codes and then making certain modifications to others' codes. To guard against such behavior, test program plagiarism detection is indispensable.

For the test case code similarity detection of unit test, at present, no mature application or tool exists in both academic and industrial fields. Moreover, because the contestants often only copy and paste the codes of a plurality of test cases, different unit test cases are completely independent, unlike the source codes, the source codes are not dependent; the contestants can also modify the codes of the test cases to a certain extent (for example, modify the word size, insert and delete one or more sentences), so that the code similarity detection of the test cases directly performed by using the existing source code similarity detection tool cannot really reflect the plagiarism condition of the test codes, and the analysis accuracy of tools such as code cloning, plagiarism inspectors and the like is influenced.

The support vector machine has certain advantages in solving the problems of small samples, nonlinearity, high-dimensional and ultrahigh-dimensional pattern recognition. The method maps points in a low-dimensional space to a high-dimensional space by adopting a classification boundary-based method, so that the points are linearly separable and the interaction of nonlinear characteristics can be processed.

Therefore, the invention provides a test program plagiarism detection method based on a support vector machine. The basic idea of the invention is as follows: and cutting the test codes into segments, judging the to-be-tested method corresponding to each test code segment, and calculating the similarity of the test code segments corresponding to the same to-be-tested method segment. And the similarity of the test code segments and the result of 'plagiarism or not' are taken as input, and the support vector machine is used for carrying out secondary classification of 'plagiarism or not' label identification on other test codes, so that the plagiarism detection accuracy is improved.

By the method, on the basis of plagiarism detection with higher accuracy and precision, developers can be helped to test code plagiarism behaviors of contestants in automatic detection, so that a manual detection link is omitted, labor cost and time cost are saved, and the contestants are guaranteed to be fairer and more fair.

Disclosure of Invention

The invention provides a test program plagiarism detection method based on a support vector machine, so that the accuracy and precision of detecting the plagiarism of a test program code are improved, and the blank of a test program code similarity detection technology is filled.

In order to achieve the above objective, the present invention first uses a static analysis method, wherein an analysis source test program cuts each program to be tested put (program Under test) and a test program TP (test program) submitted by an analyst, and then classifies each segment in TP and calculates its similarity sv (similarity value). And then, taking the SV and the tag of 'plagiarism or not' as the input of a support vector machine, and performing two classifications identified by the tag of 'plagiarism or not' on other test codes.

Specifically, the method comprises the following steps:

1) the Method comprises the steps of (1) giving a program PUT to be tested, which is a project source program in a Test competition of a developer, and a Test program TP submitted by a participant Cid (constant id) of the Test competition of the developer, (7) performing static analysis on a file FUT (File Under Test) in each program to be tested to obtain a Class CUT (Class Under Test) and a Test Method MUT (Method Under Test) in the FUT, analyzing each MUT to obtain Class names of CUT (Class Under Test name) and a Method name MUT (Method Under Test name) of the Method to be tested, obtaining Type sequences AT L (ingredients Type L) of each parameter of the Method to be tested, finally splicing CUT, MUTN and AT L to obtain an Mtrd string of the Method to be tested, further calculating the corresponding maps of the Class PUT and the Test program, forming a corresponding map of the Class CUT and the Test program TMT, obtaining a corresponding map of the Class Cat and the Test Method TS, and the Test Method TMT by a Method, and obtaining a corresponding map of the Class ID, TMT, and the Test Method TMT, and the corresponding map of the Test Method TMT (2) to be tested by a Hast, and a Method for obtaining a corresponding map by a corresponding map (TCT, and a corresponding map).

2) And (5) testing program code similarity calculation. Giving a mapping set MUTS and a mapping set TMS of the method to be tested obtained in the step 1); firstly, traversing MUTS to obtain Mid set Mids; then, the players are traversed from TMS two by two, and the player Cid is taken₁、Cid₂And all mappings thereof<Mid₁，TM₁>、<Mid₂，TM₂>And go through to obtain Mid₁Set Mid₁s、Mid₂Set Mid₂s; finally, find Mid₁s and Mid₂Mid common to s and present in Mids, calculated at Mid₁s and Mid₂TM mapped in s₁、TM₂The similarity value SV of the player is obtained to obtain the player method similarity value mapping<Cid₁，Cid₂，<Mid，SV>>All player method similarity Value maps constitute a set svs (similarity Value set).

3) And (5) constructing a support vector machine model. Given step 2) to obtainThe similarity value set SVS; first, N pairs of players are selected from the SVS, and their player method similarity value maps are obtained<Cid₁，Cid₂，<Mid，SV>>And finally, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain the optimized SVM.

4) And testing program plagiarism detection. Giving the SVM model obtained in the step 3) and the similarity value set SVS obtained in the step 2); firstly, traversing player method similarity value mapping in a similarity value set, and inputting the player method similarity value mapping into an SVM (support vector machine); and then, obtaining an output result of the SVM, namely whether the player plagiants P or not.

Further, the specific steps of the step 1) are as follows:

step 1) -1: an initial state;

step 1) -2: inputting a program set PUT to be tested and a test program set TP submitted by a player Cid participating in a test competition of a developer;

step 1) -3: initializing a MUTS set to be tested to be empty and a TMS set to be empty;

step 1) -4: judging whether the number of files to be analyzed in the PUT is larger than 0, if so, executing the steps 1-5, otherwise, executing the steps 1-9;

step 1) -5, taking out a file FUT to be tested from the PUT, analyzing the class CUT to be tested and the method MUT to be tested to obtain a class name CUTN to be tested, a method name MUTN to be tested and a type sequence AT L of each parameter of the method;

and (1) -6) splicing the CUTN, the MUTN and the AT L to obtain the Mstr of the method to be tested, and further calculating mid corresponding to the Mstr by utilizing a hash function, wherein the mid is expressed by a formula:

Mid＝Hash(Append(CUTN+MUTN+ATL))；

step 1) -7: adding < Mid, MUT > to the set MUTs;

step 1) -8: judging whether all FUTs, CUTs and MUTs in the PUT are traversed or not, if so, executing the steps 1-9, otherwise, executing the steps 1-5;

step 1) -9: judging whether the number of the test program files to be analyzed in the TP is greater than 0, if so, executing the steps 1-10, otherwise, executing the steps 1-14;

step 1) -10, taking out a test file TF from TP, analyzing a test class TC and a test method TM thereof, and obtaining a test class name TCN, a test method name TMN and a type sequence AT L of each parameter of the method;

step 1) -11, splicing the TCN, the TMN and the AT L to obtain the Mstr of the method to be tested, and further calculating mid corresponding to the Mstr by utilizing a hash function, wherein the mid is expressed by a formula:

Mid＝Hash(Append(TCN+TMN+ATL))；

step 1) -12: add < Cid, < Mid, TM > > to the collective TMs;

step 1) -13: judging whether all TF, TC and TM in TP are traversed, if so, executing steps 1-14, otherwise, executing steps 1-10;

step 1) -14: outputting sets MUTS and TFS;

step 1) -15: and ending the state.

Further, the specific steps of the step 2) are as follows:

step 2) -1: an initial state;

step 2) -2: inputting the MUTS and TMS obtained in the step 1);

step 2) -3: initializing the similarity value set SVS to be null, and the Mid set Mids to be null;

step 2) -4: judging whether the number of < Mid, MUT > to be analyzed in the MUTS is larger than 0, if so, executing the steps 2) -5, otherwise, executing the steps 2) -21;

step 2) -5: taking a < Mid, MUT > from MUTS, and adding Mid to the set Mids;

step 2) -6: judging whether MUTS traversal is finished, if so, executing the steps 2-7, otherwise, executing the steps 2-5;

step 2) -7: judging whether the quantity of < Cid, < Mid, TM > > to be analyzed in TMS is larger than 0, if so, executing the steps 2-8, otherwise, executing the steps 2-21;

step 2) -8: taking out of TMSTwo players Cid₁、Cid₂And all mappings thereof<Mid₁，TM₁>、<Mid₂，TM₂>；

Step 2) -9: initializing Mid₁Set Mid₁s is null, Mid₂Set Mid₂s is null;

step 2) -10: judging whether to be analyzed<Mid₁，TM₁>Whether the number is greater than 0, if so, executing the steps 2) -11, otherwise, executing the steps 2) -13;

step 2) -11: get one<Mid₁，TM₁>Will Mid₁Add to set Mid₁s in;

step 2) -12: all are judged<Mid₁，TM₁>If the traversal is finished, executing the steps 2) -13 if the traversal is finished, otherwise executing the steps 2) -11;

step 2) -13: judging whether to be analyzed<Mid₂，TM₂>Whether the number is greater than 0, if so, executing steps 2) -14, otherwise, executing steps 2) -16;

step 2) -14: get one<Mid₂，TM₂>Will Mid₂Add to set Mid₂s in;

step 2) -15: all are judged<Mid₂，TM₂>Whether traversing is finished or not, if so, executing the steps 2) -16, otherwise, executing the steps 2) -14;

step 2) -16: taking an Mid from the Mids, if Mid₁s contains Mid, perform steps 2) -17, otherwise perform steps 2) -15;

step 2) -17: if Mid₂s contains Mid, perform steps 2) -18, otherwise perform steps 2) -15;

step 2) -18: get the corresponding Mid₁、Mid₂Corresponding TM₁、TM₂Calculating the similarity value SV to obtain<Cid₁，Cid₂，<Mid，SV>>Adding the SVS into the set SVS;

step 2) -19: judging whether the Mids is traversed or not, if so, executing the steps 2-20, and otherwise, executing the steps 2-16;

step 2) -20: judging whether the two-by-two comparison and traversal of the selected hand in the TMS are finished, if so, executing the steps 2) -21, otherwise, executing the steps 2) -8;

step 2) -21: and ending the state.

Further, the specific steps of the step 3) are as follows:

step 3) -1: an initial state;

step 3) -2: inputting the SVS obtained in the step 2);

step 3) -3: taking N pairs of players from the SVS to obtain their player method similarity value mapping<Cid₁，Cid₂，<Mid，SV>>And manually judging whether the N pairs of players plagiass P;

step 3) -4: selecting a proper kernel function kf and calculating a reference point;

step 3) -5: establishing a support vector machine model by using kf and a reference point;

3) -6, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain the optimized SVM;

step 3) -7: and ending the state.

Further, the specific steps of the step 4) are as follows:

step 4) -1: an initial state;

step 4) -2: inputting the SVM obtained in the step 3) and the SVS obtained in the step 2);

step 4) -3: initializing a copy-strike player set PCS to be null;

step 4) -4: judging whether the mapping number of the similarity values in the SVS is greater than 0, if so, executing the steps 4-5, otherwise, executing the steps 4-8;

step 4) -5: extracting a similarity value mapping from the SVS, and inputting the mapping into the SVM;

step 4) -6: obtaining the output result of the SVM, namely whether the player plagiants P, if the P is true, adding the player Cid into the PCS set;

step 4) -7: judging whether the SVS is traversed or not, if so, executing the steps 4-8, and otherwise, executing the steps 4-5;

step 4) -8: and ending the state.

Drawings

FIG. 1 is a flowchart of a test program plagiarism detection method based on a support vector machine in an embodiment of the present invention.

FIG. 2 is a flow chart of the slicing and analysis of the process.

FIG. 3 is a flow chart of test program code similarity calculation.

FIG. 4 is a flow chart of support vector machine model construction.

FIG. 5 is a flow chart of test program plagiarism detection.

Detailed Description

In order to better understand the technical content of the invention, specific examples are illustrated below in conjunction with the accompanying drawings.

FIG. 1 is an overall framework diagram of a test program plagiarism detection method based on a support vector machine according to an embodiment of the present invention.

A test program plagiarism detection method based on a support vector machine is characterized by comprising the following steps.

Cutting and analysis of Step1 program: and giving a project source program in the test competition of the developer, namely the program to be tested, and the test program submitted by the players participating in the test competition of the developer, and performing static analysis on the file in each program to be tested to obtain the class to be tested and the method to be tested. Analyzing each test method to obtain the class name, the method name and the type sequence of each parameter of the method to be tested, splicing, further utilizing a hash function to calculate the corresponding Mid, and forming mapping corresponding to the method to be tested to form a mapping set of the method to be tested. Similarly, by finding the test file in the test program submitted by the player, the test class and the test method are analyzed by the static analysis technique. Analyzing each test method to obtain the class name, method name and type sequence of each parameter of the method to be tested, splicing and calculating Mid which can represent the method to be tested by the test segment by using a hash function, and forming mapping corresponding to players and the test method to form a test method mapping set.

Step2 similarity calculation of test program codes: given method to be tested mapping set and testing method mapping setAnd traversing to obtain the Mid set. Every two players traverse, take all the mappings and traverse to obtain Mid₁Set Mid₁s、Mid₂Set Mid₂And s. Finding Mid₁s and Mid₂Mid common to s and present in Mids, calculated at Mid₁s and Mid₂And (5) obtaining player method similarity value mapping by the similarity values of the test methods mapped in the step s, wherein all player method similarity value mapping forms a similarity value set.

Step3 support vector machine model construction: and giving a similarity value set and taking N pairs of players to obtain player method similarity value mapping of the players, and manually judging whether the N pairs of players plagiass. Selecting a proper kernel function, calculating a reference point, establishing a support vector machine model, and optimizing by utilizing a maximum likelihood estimation function.

Step4 test program plagiarism detection: and giving a support vector machine model and a similarity value set, traversing the similarity value mapping of the player method in the set, inputting the similarity value mapping into the support vector machine model, and obtaining an output result, namely whether the player plagiants or not.

FIG. 2 is a flow chart of the slicing and analysis of the process. The method comprises the following specific steps:

step1, starting a state, step2, inputting a program set PUT to be tested and a test program set TP submitted by a player Cid participating in a test competition of a developer, step3, initializing a method set MUTS to be tested to be empty and a test method set TMS to be empty, step4, judging whether the number of files to be tested in the PUT is greater than 0, if yes, executing step 5, otherwise, executing step 9, step 5, taking out a file FUT to be tested from the PUT, analyzing the class CUT to be tested and the method MUT to be tested to obtain class names CUTN to be tested, the name MUTN to be tested and type sequences AT L of all parameters of the method, step 6, splicing CUTN, MUTN and AT L to obtain an Mstr of the method to be tested, further utilizing a function to calculate the Mid corresponding to Mstr, step 7, adding < Mid, MUT > to the set TS, step 8, judging whether all the files, CUT and FUT in the PUT are traversed, if yes, executing step 9, step 14, adding the < MiT > to be tested MUT > to the hash function to the step 14, if no, executing step 14, executing the hash function, and judging whether the number of the files are traversed and the steps of the hash of the test program set TMTP, otherwise, executing step 13, and finishing the test method.

FIG. 3 is a flow chart of similarity calculation for test program code. The method comprises the following specific steps:

step 1: an initial state; step 2: inputting MUTS and TMS; and step 3: initializing the similarity value set SVS to be null, and the Mid set Mids to be null; and 4, step 4: determination of pending analysis in MUTS<Mid，MUT>Whether the number is greater than 0, if yes, executing step 5, otherwise executing step 21; and 5: take one from MUTS<Mid，MUT>Add Mid to the set Mids; step 6: judging whether MUTS traversal is finished, if yes, executing a step 7, and if not, executing a step 5; and 7: determining the analyte in TMS<Cid，<Mid，TM>>Whether the number is greater than 0, if yes, executing step 8, otherwise executing step 21; and 8: taking out two players Cid from TMS₁、Cid₂And all mappings thereof<Mid₁，TM₁>、<Mid₂，TM₂>(ii) a And step 9: initializing Mid₁Set Mid₁s is null, Mid₂Set Mid₂s is null; step 10: judging whether to be analyzed<Mid₁，TM₁>Whether the number is greater than 0, if yes, executing step 11, otherwise executing step 13; step 11: get one<Mid₁，TM₁>Will Mid₁Add to set Mid₁s in; step 12: all are judged<Mid₁，TM₁>If the traversal is finished, executing a step 13 if the traversal is finished, otherwise executing a step 11; step 13: judging whether to be analyzed<Mid₂，TM₂>Whether the number is greater than 0, ifIf yes, executing step 14, otherwise, executing step 16; step 14: get one<Mid₂，TM₂>Will Mid₂Add to set Mid₂s in; step 15: all are judged<Mid₂，TM₂>If the traversal is finished, executing a step 16 if the traversal is finished, otherwise executing a step 14; step 16: taking an Mid from the Mids, if Mid₁s contains Mid, go to step 17, otherwise go to step 15; and step 17: if Mid₂s contains Mid, go to step 18, otherwise go to step 15; step 18: get the corresponding Mid₁、Mid₂Corresponding TM₁、TM₂Calculating the similarity value SV to obtain<Cid₁，Cid₂，<Mid，SV>>Adding the SVS into the set SVS; step 19: judging whether the Mids is traversed or not, if so, executing a step 20, otherwise, executing a step 16; step 20: judging whether the two-by-two comparison traversal of the selected hand in the TMS is finished, if so, executing the step 21, otherwise, executing the step 8; step 21: and ending the state.

FIG. 4 is a flow chart of support vector machine model construction. The method comprises the following specific steps:

step 1: an initial state; step 2: inputting the SVS; and step 3: taking N pairs of players from the SVS to obtain their player method similarity value mapping<Cid₁，Cid₂，<Mid，SV>>And judging whether the N pairs of players plagiarize P manually, step4, selecting a proper kernel function kf and calculating a reference point, step 5, establishing a support vector machine model by using the kf and the reference point, step 6, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain an optimized SVM, and step 7, ending the state.

FIG. 5 is a flow chart of test program plagiarism detection. The method comprises the following specific steps:

step 1: an initial state; step 2: inputting the SVM and the SVS; and step 3: initializing a copy-strike player set PCS to be null; and 4, step 4: judging whether the mapping number of the similarity values in the SVS is greater than 0, if so, executing the step 5, otherwise, executing the step 8; and 5: extracting a similarity value mapping from the SVS, and inputting the mapping into the SVM; step 6: obtaining the output result of the SVM, namely whether the player plagiants P, if the P is true, adding the player Cid into the PCS set; and 7: judging whether the SVS is traversed or not, if so, executing the step 8, otherwise, executing the step 5; and 8: and ending the state.

In conclusion, the invention fills the blank of the detection technology of the code similarity of the test program, and improves the accuracy and precision of detecting the code plagiarism of the test program, thereby helping developers to test the code plagiarism behavior of automatic detection players of competitions, avoiding the manual detection link, saving the labor cost and the time cost, and ensuring that the competitions are fairer and more fair.

Claims

1. A test program plagiarism detection method based on a support vector machine is characterized in that static analysis is carried out on a program to be tested and the test program to extract a mapping set of all methods to be tested and a mapping set of all test methods; based on the mapping set of the method to be tested and the mapping set of the test method, performing similarity calculation on the codes of the test program, taking a similarity value mapping formed set obtained by calculation as input, constructing a support vector machine model, and optimizing the support vector machine model; finally, the support vector machine judges that the players are sufficient to copy, so that the accuracy and precision of code copying of the detection test program are improved, developers are helped to test the code copying behaviors of the players in the automatic detection of competitions, the manual detection link is omitted, the labor cost and the time cost are saved, and the competition is guaranteed to be more fair and fair; the method comprises the following steps:

1) the method comprises the steps of cutting and analyzing a program, giving a project source program, namely a program PUT to be tested in a test competition of a developer and a test program TP submitted by a player Cid participating in the test competition of the developer, performing static analysis on a file FUT in each program to be tested to obtain a class CUT to be tested and a method MUT to be tested in the FUT, analyzing each MUT to obtain a class name CUTN, a method name MUTN and a type sequence AT L of each parameter of the method to be tested, splicing CUTN, MUTN and AT L to obtain an Mstr of the method to be tested, further calculating the corresponding Mstr by using a hash function, forming a mapping by corresponding each Mid and MUT, forming a mapping set MUTS of the method to be tested by mapping all Mid and MUT, and similarly, analyzing the class TC and the test method TM in the test method TF by finding the test file TF in the submitted player CiTP and analyzing the test class TC and the test method TM in the static analysis technology, then analyzing each TM to obtain a class name TCN of the method to be tested, the TMN, the parameter sequences of the Cid and the TMT mapping set, and the TMAT, splicing the corresponding mTST to be tested to obtain a corresponding Mid mapping set of the Mid and the Mid to be tested by using a corresponding Mid mapping function to be tested to obtain a Mid mapping set of the Mi;

2) calculating the similarity of the test program codes; giving a mapping set MUTS and a mapping set TMS of the method to be tested obtained in the step 1); firstly, traversing MUTS to obtain Mid set Mids; then, the players are traversed from TMS two by two, and the player Cid is taken₁、Cid₂And all mappings thereof<Mid₁，TM₁>、<Mid₂，TM₂>And go through to obtain Mid₁Set Mid₁s、Mid₂Set Mid₂s; finally, find Mid₁s and Mid₂Mid common to s and present in Mids, calculated at Mid₁s and Mid₂TM mapped in s₁、TM₂The similarity value SV of the player is obtained to obtain the player method similarity value mapping<Cid₁，Cid₂，<Mid，SV>>Mapping all player method similarity values to form a set SVS;

3) constructing a support vector machine model; giving a similarity value set SVS obtained in the step 2); first, N pairs of players are selected from the SVS, and their player method similarity value maps are obtained<Cid₁，Cid₂，<Mid，SV>>Selecting a proper kernel function kf, calculating a reference point, and establishing a support vector machine model by utilizing kf and the reference point;

4) detecting the plagiarism of the test program; giving the SVM model obtained in the step 3) and the similarity value set SVS obtained in the step 2); firstly, traversing player method similarity value mapping in a similarity value set, and inputting the player method similarity value mapping into an SVM (support vector machine); and then, obtaining an output result of the SVM, namely whether the player plagiants P or not.

2. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 1), the program is cut and analyzed; the method comprises the steps that a project source program, namely a program to be tested, in a test competition of a developer and a test program submitted by a player participating in the test competition of the developer are given, and a file in each program to be tested is subjected to static analysis to obtain a class to be tested and a method to be tested; analyzing each test method to obtain the class name, the method name and the type sequence of each parameter of the method to be tested, splicing, further calculating the corresponding Mid by using a hash function, and forming mapping corresponding to the method to be tested to form a mapping set of the method to be tested; similarly, by finding the test file in the test program submitted by the player, the test class and the test method are analyzed by the static analysis technology; analyzing each test method to obtain the class name, method name and type sequence of each parameter of the method to be tested, splicing and calculating Mid which can represent the method to be tested by the test segment by using a hash function, and forming mapping corresponding to players and the test method to form a test method mapping set.

3. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 2), the similarity of the test program code is calculated by: giving a mapping set of a method to be tested and a mapping set of a test method, and traversing to obtain a Mid set; every two players traverse, take all the mappings and traverse to obtain Mid₁Set Mid₁s、Mid₂Set Mid₂s; finding Mid₁s and Mid₂Mid common to s and present in Mids, calculated at Mid₁s and Mid₂And (5) obtaining player method similarity value mapping by the similarity values of the test methods mapped in the step s, wherein all player method similarity value mapping forms a similarity value set.

4. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 3), the support vector machine model is constructed by: giving a similarity value set and taking N pairs of players to obtain player method similarity value mapping of the players, and manually judging whether the N pairs of players copy; selecting a proper kernel function, calculating a reference point, establishing a support vector machine model, and optimizing by utilizing a maximum likelihood estimation function.

5. The method for detecting the plagiarism of the test program based on the support vector machine of claim 1, wherein in the step 4), the test program plagiarism is detected by: and giving a support vector machine model and a similarity value set, traversing the similarity value mapping of the player method in the set, inputting the similarity value mapping into the support vector machine model, and obtaining an output result, namely whether the player plagiants or not.