CN111459788A - Test program plagiarism detection method based on support vector machine - Google Patents

Test program plagiarism detection method based on support vector machine Download PDF

Info

Publication number
CN111459788A
CN111459788A CN201910055791.0A CN201910055791A CN111459788A CN 111459788 A CN111459788 A CN 111459788A CN 201910055791 A CN201910055791 A CN 201910055791A CN 111459788 A CN111459788 A CN 111459788A
Authority
CN
China
Prior art keywords
mid
test
tested
mapping
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910055791.0A
Other languages
Chinese (zh)
Inventor
陈振宇
孙伟松
孙泽嵩
王兴亚
段定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910055791.0A priority Critical patent/CN111459788A/en
Publication of CN111459788A publication Critical patent/CN111459788A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention relates to a test program plagiarism detection method based on a support vector machine. The method comprises the steps of cutting and statically analyzing a program to be tested and a test program to obtain a mapping set of the method to be tested and a mapping set of the test method; secondly, traversing every two players, calculating the similarity of the test segments and summarizing to obtain a similarity set; then, selecting a proper kernel function and a proper reference point to establish a support vector machine model and optimizing; and finally, for other test programs, judging the plagiarism condition among the test programs by calculating the similarity set and inputting the similarity set into a support vector machine. The invention aims to fill the blank of the detection technology of the similarity of the codes of the test program, improve the accuracy and precision of detecting the code plagiarism of the test program, and further help developers to test the code plagiarism behaviors of automatic detection players of competitions, thereby saving the manual detection link, saving the labor cost and the time cost, and ensuring that the competitions are fairer and more fair.

Description

Test program plagiarism detection method based on support vector machine
Technical Field
The invention belongs to the field of software test code plagiarism detection, and particularly relates to a method for accurately solving the plagiarism phenomenon of test codes submitted by players in the test competition of developers at present. After the program to be tested and the test program are analyzed, model training, testing, verifying and optimizing are carried out on the existing test code data set by introducing a support vector machine method. The unlabeled test code data is then classified. Whether plagiarism exists between the test codes can be judged by analyzing the categories of the test codes, so that a manual detection link is omitted, the labor cost and the time cost are saved, and the competition is more fair and fair.
Background
In the process of writing codes, a software developer copies and pastes codes from different sources and achieves the purpose of the software developer by modification, so that the common behavior is realized, the efficiency of writing the software codes is improved, and the problem that long time is spent on ideas and codes which are already finished by previous people is avoided. However, in some cases, copying and pasting code can have unexpected consequences, such as the possibility of violating software copyrights, and the like. In developer test competitions, players also increase their scores by copying and pasting codes and then making certain modifications to others' codes. To guard against such behavior, test program plagiarism detection is indispensable.
For the test case code similarity detection of unit test, at present, no mature application or tool exists in both academic and industrial fields. Moreover, because the contestants often only copy and paste the codes of a plurality of test cases, different unit test cases are completely independent, unlike the source codes, the source codes are not dependent; the contestants can also modify the codes of the test cases to a certain extent (for example, modify the word size, insert and delete one or more sentences), so that the code similarity detection of the test cases directly performed by using the existing source code similarity detection tool cannot really reflect the plagiarism condition of the test codes, and the analysis accuracy of tools such as code cloning, plagiarism inspectors and the like is influenced.
The support vector machine has certain advantages in solving the problems of small samples, nonlinearity, high-dimensional and ultrahigh-dimensional pattern recognition. The method maps points in a low-dimensional space to a high-dimensional space by adopting a classification boundary-based method, so that the points are linearly separable and the interaction of nonlinear characteristics can be processed.
Therefore, the invention provides a test program plagiarism detection method based on a support vector machine. The basic idea of the invention is as follows: and cutting the test codes into segments, judging the to-be-tested method corresponding to each test code segment, and calculating the similarity of the test code segments corresponding to the same to-be-tested method segment. And the similarity of the test code segments and the result of 'plagiarism or not' are taken as input, and the support vector machine is used for carrying out secondary classification of 'plagiarism or not' label identification on other test codes, so that the plagiarism detection accuracy is improved.
By the method, on the basis of plagiarism detection with higher accuracy and precision, developers can be helped to test code plagiarism behaviors of contestants in automatic detection, so that a manual detection link is omitted, labor cost and time cost are saved, and the contestants are guaranteed to be fairer and more fair.
Disclosure of Invention
The invention provides a test program plagiarism detection method based on a support vector machine, so that the accuracy and precision of detecting the plagiarism of a test program code are improved, and the blank of a test program code similarity detection technology is filled.
In order to achieve the above objective, the present invention first uses a static analysis method, wherein an analysis source test program cuts each program to be tested put (program Under test) and a test program TP (test program) submitted by an analyst, and then classifies each segment in TP and calculates its similarity sv (similarity value). And then, taking the SV and the tag of 'plagiarism or not' as the input of a support vector machine, and performing two classifications identified by the tag of 'plagiarism or not' on other test codes.
Specifically, the method comprises the following steps:
1) the Method comprises the steps of (1) giving a program PUT to be tested, which is a project source program in a Test competition of a developer, and a Test program TP submitted by a participant Cid (constant id) of the Test competition of the developer, (7) performing static analysis on a file FUT (File Under Test) in each program to be tested to obtain a Class CUT (Class Under Test) and a Test Method MUT (Method Under Test) in the FUT, analyzing each MUT to obtain Class names of CUT (Class Under Test name) and a Method name MUT (Method Under Test name) of the Method to be tested, obtaining Type sequences AT L (ingredients Type L) of each parameter of the Method to be tested, finally splicing CUT, MUTN and AT L to obtain an Mtrd string of the Method to be tested, further calculating the corresponding maps of the Class PUT and the Test program, forming a corresponding map of the Class CUT and the Test program TMT, obtaining a corresponding map of the Class Cat and the Test Method TS, and the Test Method TMT by a Method, and obtaining a corresponding map of the Class ID, TMT, and the Test Method TMT, and the corresponding map of the Test Method TMT (2) to be tested by a Hast, and a Method for obtaining a corresponding map by a corresponding map (TCT, and a corresponding map).
2) And (5) testing program code similarity calculation. Giving a mapping set MUTS and a mapping set TMS of the method to be tested obtained in the step 1); firstly, traversing MUTS to obtain Mid set Mids; then, the players are traversed from TMS two by two, and the player Cid is taken1、Cid2And all mappings thereof<Mid1,TM1>、<Mid2,TM2>And go through to obtain Mid1Set Mid1s、Mid2Set Mid2s; finally, find Mid1s and Mid2Mid common to s and present in Mids, calculated at Mid1s and Mid2TM mapped in s1、TM2The similarity value SV of the player is obtained to obtain the player method similarity value mapping<Cid1,Cid2,<Mid,SV>>All player method similarity Value maps constitute a set svs (similarity Value set).
3) And (5) constructing a support vector machine model. Given step 2) to obtainThe similarity value set SVS; first, N pairs of players are selected from the SVS, and their player method similarity value maps are obtained<Cid1,Cid2,<Mid,SV>>And finally, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain the optimized SVM.
4) And testing program plagiarism detection. Giving the SVM model obtained in the step 3) and the similarity value set SVS obtained in the step 2); firstly, traversing player method similarity value mapping in a similarity value set, and inputting the player method similarity value mapping into an SVM (support vector machine); and then, obtaining an output result of the SVM, namely whether the player plagiants P or not.
Further, the specific steps of the step 1) are as follows:
step 1) -1: an initial state;
step 1) -2: inputting a program set PUT to be tested and a test program set TP submitted by a player Cid participating in a test competition of a developer;
step 1) -3: initializing a MUTS set to be tested to be empty and a TMS set to be empty;
step 1) -4: judging whether the number of files to be analyzed in the PUT is larger than 0, if so, executing the steps 1-5, otherwise, executing the steps 1-9;
step 1) -5, taking out a file FUT to be tested from the PUT, analyzing the class CUT to be tested and the method MUT to be tested to obtain a class name CUTN to be tested, a method name MUTN to be tested and a type sequence AT L of each parameter of the method;
and (1) -6) splicing the CUTN, the MUTN and the AT L to obtain the Mstr of the method to be tested, and further calculating mid corresponding to the Mstr by utilizing a hash function, wherein the mid is expressed by a formula:
Mid=Hash(Append(CUTN+MUTN+ATL));
step 1) -7: adding < Mid, MUT > to the set MUTs;
step 1) -8: judging whether all FUTs, CUTs and MUTs in the PUT are traversed or not, if so, executing the steps 1-9, otherwise, executing the steps 1-5;
step 1) -9: judging whether the number of the test program files to be analyzed in the TP is greater than 0, if so, executing the steps 1-10, otherwise, executing the steps 1-14;
step 1) -10, taking out a test file TF from TP, analyzing a test class TC and a test method TM thereof, and obtaining a test class name TCN, a test method name TMN and a type sequence AT L of each parameter of the method;
step 1) -11, splicing the TCN, the TMN and the AT L to obtain the Mstr of the method to be tested, and further calculating mid corresponding to the Mstr by utilizing a hash function, wherein the mid is expressed by a formula:
Mid=Hash(Append(TCN+TMN+ATL));
step 1) -12: add < Cid, < Mid, TM > > to the collective TMs;
step 1) -13: judging whether all TF, TC and TM in TP are traversed, if so, executing steps 1-14, otherwise, executing steps 1-10;
step 1) -14: outputting sets MUTS and TFS;
step 1) -15: and ending the state.
Further, the specific steps of the step 2) are as follows:
step 2) -1: an initial state;
step 2) -2: inputting the MUTS and TMS obtained in the step 1);
step 2) -3: initializing the similarity value set SVS to be null, and the Mid set Mids to be null;
step 2) -4: judging whether the number of < Mid, MUT > to be analyzed in the MUTS is larger than 0, if so, executing the steps 2) -5, otherwise, executing the steps 2) -21;
step 2) -5: taking a < Mid, MUT > from MUTS, and adding Mid to the set Mids;
step 2) -6: judging whether MUTS traversal is finished, if so, executing the steps 2-7, otherwise, executing the steps 2-5;
step 2) -7: judging whether the quantity of < Cid, < Mid, TM > > to be analyzed in TMS is larger than 0, if so, executing the steps 2-8, otherwise, executing the steps 2-21;
step 2) -8: taking out of TMSTwo players Cid1、Cid2And all mappings thereof<Mid1,TM1>、<Mid2,TM2>;
Step 2) -9: initializing Mid1Set Mid1s is null, Mid2Set Mid2s is null;
step 2) -10: judging whether to be analyzed<Mid1,TM1>Whether the number is greater than 0, if so, executing the steps 2) -11, otherwise, executing the steps 2) -13;
step 2) -11: get one<Mid1,TM1>Will Mid1Add to set Mid1s in;
step 2) -12: all are judged<Mid1,TM1>If the traversal is finished, executing the steps 2) -13 if the traversal is finished, otherwise executing the steps 2) -11;
step 2) -13: judging whether to be analyzed<Mid2,TM2>Whether the number is greater than 0, if so, executing steps 2) -14, otherwise, executing steps 2) -16;
step 2) -14: get one<Mid2,TM2>Will Mid2Add to set Mid2s in;
step 2) -15: all are judged<Mid2,TM2>Whether traversing is finished or not, if so, executing the steps 2) -16, otherwise, executing the steps 2) -14;
step 2) -16: taking an Mid from the Mids, if Mid1s contains Mid, perform steps 2) -17, otherwise perform steps 2) -15;
step 2) -17: if Mid2s contains Mid, perform steps 2) -18, otherwise perform steps 2) -15;
step 2) -18: get the corresponding Mid1、Mid2Corresponding TM1、TM2Calculating the similarity value SV to obtain<Cid1,Cid2,<Mid,SV>>Adding the SVS into the set SVS;
step 2) -19: judging whether the Mids is traversed or not, if so, executing the steps 2-20, and otherwise, executing the steps 2-16;
step 2) -20: judging whether the two-by-two comparison and traversal of the selected hand in the TMS are finished, if so, executing the steps 2) -21, otherwise, executing the steps 2) -8;
step 2) -21: and ending the state.
Further, the specific steps of the step 3) are as follows:
step 3) -1: an initial state;
step 3) -2: inputting the SVS obtained in the step 2);
step 3) -3: taking N pairs of players from the SVS to obtain their player method similarity value mapping<Cid1,Cid2,<Mid,SV>>And manually judging whether the N pairs of players plagiass P;
step 3) -4: selecting a proper kernel function kf and calculating a reference point;
step 3) -5: establishing a support vector machine model by using kf and a reference point;
3) -6, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain the optimized SVM;
step 3) -7: and ending the state.
Further, the specific steps of the step 4) are as follows:
step 4) -1: an initial state;
step 4) -2: inputting the SVM obtained in the step 3) and the SVS obtained in the step 2);
step 4) -3: initializing a copy-strike player set PCS to be null;
step 4) -4: judging whether the mapping number of the similarity values in the SVS is greater than 0, if so, executing the steps 4-5, otherwise, executing the steps 4-8;
step 4) -5: extracting a similarity value mapping from the SVS, and inputting the mapping into the SVM;
step 4) -6: obtaining the output result of the SVM, namely whether the player plagiants P, if the P is true, adding the player Cid into the PCS set;
step 4) -7: judging whether the SVS is traversed or not, if so, executing the steps 4-8, and otherwise, executing the steps 4-5;
step 4) -8: and ending the state.
Drawings
FIG. 1 is a flowchart of a test program plagiarism detection method based on a support vector machine in an embodiment of the present invention.
FIG. 2 is a flow chart of the slicing and analysis of the process.
FIG. 3 is a flow chart of test program code similarity calculation.
FIG. 4 is a flow chart of support vector machine model construction.
FIG. 5 is a flow chart of test program plagiarism detection.
Detailed Description
In order to better understand the technical content of the invention, specific examples are illustrated below in conjunction with the accompanying drawings.
FIG. 1 is an overall framework diagram of a test program plagiarism detection method based on a support vector machine according to an embodiment of the present invention.
A test program plagiarism detection method based on a support vector machine is characterized by comprising the following steps.
Cutting and analysis of Step1 program: and giving a project source program in the test competition of the developer, namely the program to be tested, and the test program submitted by the players participating in the test competition of the developer, and performing static analysis on the file in each program to be tested to obtain the class to be tested and the method to be tested. Analyzing each test method to obtain the class name, the method name and the type sequence of each parameter of the method to be tested, splicing, further utilizing a hash function to calculate the corresponding Mid, and forming mapping corresponding to the method to be tested to form a mapping set of the method to be tested. Similarly, by finding the test file in the test program submitted by the player, the test class and the test method are analyzed by the static analysis technique. Analyzing each test method to obtain the class name, method name and type sequence of each parameter of the method to be tested, splicing and calculating Mid which can represent the method to be tested by the test segment by using a hash function, and forming mapping corresponding to players and the test method to form a test method mapping set.
Step2 similarity calculation of test program codes: given method to be tested mapping set and testing method mapping setAnd traversing to obtain the Mid set. Every two players traverse, take all the mappings and traverse to obtain Mid1Set Mid1s、Mid2Set Mid2And s. Finding Mid1s and Mid2Mid common to s and present in Mids, calculated at Mid1s and Mid2And (5) obtaining player method similarity value mapping by the similarity values of the test methods mapped in the step s, wherein all player method similarity value mapping forms a similarity value set.
Step3 support vector machine model construction: and giving a similarity value set and taking N pairs of players to obtain player method similarity value mapping of the players, and manually judging whether the N pairs of players plagiass. Selecting a proper kernel function, calculating a reference point, establishing a support vector machine model, and optimizing by utilizing a maximum likelihood estimation function.
Step4 test program plagiarism detection: and giving a support vector machine model and a similarity value set, traversing the similarity value mapping of the player method in the set, inputting the similarity value mapping into the support vector machine model, and obtaining an output result, namely whether the player plagiants or not.
FIG. 2 is a flow chart of the slicing and analysis of the process. The method comprises the following specific steps:
step1, starting a state, step2, inputting a program set PUT to be tested and a test program set TP submitted by a player Cid participating in a test competition of a developer, step3, initializing a method set MUTS to be tested to be empty and a test method set TMS to be empty, step4, judging whether the number of files to be tested in the PUT is greater than 0, if yes, executing step 5, otherwise, executing step 9, step 5, taking out a file FUT to be tested from the PUT, analyzing the class CUT to be tested and the method MUT to be tested to obtain class names CUTN to be tested, the name MUTN to be tested and type sequences AT L of all parameters of the method, step 6, splicing CUTN, MUTN and AT L to obtain an Mstr of the method to be tested, further utilizing a function to calculate the Mid corresponding to Mstr, step 7, adding < Mid, MUT > to the set TS, step 8, judging whether all the files, CUT and FUT in the PUT are traversed, if yes, executing step 9, step 14, adding the < MiT > to be tested MUT > to the hash function to the step 14, if no, executing step 14, executing the hash function, and judging whether the number of the files are traversed and the steps of the hash of the test program set TMTP, otherwise, executing step 13, and finishing the test method.
FIG. 3 is a flow chart of similarity calculation for test program code. The method comprises the following specific steps:
step 1: an initial state; step 2: inputting MUTS and TMS; and step 3: initializing the similarity value set SVS to be null, and the Mid set Mids to be null; and 4, step 4: determination of pending analysis in MUTS<Mid,MUT>Whether the number is greater than 0, if yes, executing step 5, otherwise executing step 21; and 5: take one from MUTS<Mid,MUT>Add Mid to the set Mids; step 6: judging whether MUTS traversal is finished, if yes, executing a step 7, and if not, executing a step 5; and 7: determining the analyte in TMS<Cid,<Mid,TM>>Whether the number is greater than 0, if yes, executing step 8, otherwise executing step 21; and 8: taking out two players Cid from TMS1、Cid2And all mappings thereof<Mid1,TM1>、<Mid2,TM2>(ii) a And step 9: initializing Mid1Set Mid1s is null, Mid2Set Mid2s is null; step 10: judging whether to be analyzed<Mid1,TM1>Whether the number is greater than 0, if yes, executing step 11, otherwise executing step 13; step 11: get one<Mid1,TM1>Will Mid1Add to set Mid1s in; step 12: all are judged<Mid1,TM1>If the traversal is finished, executing a step 13 if the traversal is finished, otherwise executing a step 11; step 13: judging whether to be analyzed<Mid2,TM2>Whether the number is greater than 0, ifIf yes, executing step 14, otherwise, executing step 16; step 14: get one<Mid2,TM2>Will Mid2Add to set Mid2s in; step 15: all are judged<Mid2,TM2>If the traversal is finished, executing a step 16 if the traversal is finished, otherwise executing a step 14; step 16: taking an Mid from the Mids, if Mid1s contains Mid, go to step 17, otherwise go to step 15; and step 17: if Mid2s contains Mid, go to step 18, otherwise go to step 15; step 18: get the corresponding Mid1、Mid2Corresponding TM1、TM2Calculating the similarity value SV to obtain<Cid1,Cid2,<Mid,SV>>Adding the SVS into the set SVS; step 19: judging whether the Mids is traversed or not, if so, executing a step 20, otherwise, executing a step 16; step 20: judging whether the two-by-two comparison traversal of the selected hand in the TMS is finished, if so, executing the step 21, otherwise, executing the step 8; step 21: and ending the state.
FIG. 4 is a flow chart of support vector machine model construction. The method comprises the following specific steps:
step 1: an initial state; step 2: inputting the SVS; and step 3: taking N pairs of players from the SVS to obtain their player method similarity value mapping<Cid1,Cid2,<Mid,SV>>And judging whether the N pairs of players plagiarize P manually, step4, selecting a proper kernel function kf and calculating a reference point, step 5, establishing a support vector machine model by using the kf and the reference point, step 6, optimizing the SVM by using a maximum likelihood estimation function argmin L oss to obtain an optimized SVM, and step 7, ending the state.
FIG. 5 is a flow chart of test program plagiarism detection. The method comprises the following specific steps:
step 1: an initial state; step 2: inputting the SVM and the SVS; and step 3: initializing a copy-strike player set PCS to be null; and 4, step 4: judging whether the mapping number of the similarity values in the SVS is greater than 0, if so, executing the step 5, otherwise, executing the step 8; and 5: extracting a similarity value mapping from the SVS, and inputting the mapping into the SVM; step 6: obtaining the output result of the SVM, namely whether the player plagiants P, if the P is true, adding the player Cid into the PCS set; and 7: judging whether the SVS is traversed or not, if so, executing the step 8, otherwise, executing the step 5; and 8: and ending the state.
In conclusion, the invention fills the blank of the detection technology of the code similarity of the test program, and improves the accuracy and precision of detecting the code plagiarism of the test program, thereby helping developers to test the code plagiarism behavior of automatic detection players of competitions, avoiding the manual detection link, saving the labor cost and the time cost, and ensuring that the competitions are fairer and more fair.

Claims (5)

1. A test program plagiarism detection method based on a support vector machine is characterized in that static analysis is carried out on a program to be tested and the test program to extract a mapping set of all methods to be tested and a mapping set of all test methods; based on the mapping set of the method to be tested and the mapping set of the test method, performing similarity calculation on the codes of the test program, taking a similarity value mapping formed set obtained by calculation as input, constructing a support vector machine model, and optimizing the support vector machine model; finally, the support vector machine judges that the players are sufficient to copy, so that the accuracy and precision of code copying of the detection test program are improved, developers are helped to test the code copying behaviors of the players in the automatic detection of competitions, the manual detection link is omitted, the labor cost and the time cost are saved, and the competition is guaranteed to be more fair and fair; the method comprises the following steps:
1) the method comprises the steps of cutting and analyzing a program, giving a project source program, namely a program PUT to be tested in a test competition of a developer and a test program TP submitted by a player Cid participating in the test competition of the developer, performing static analysis on a file FUT in each program to be tested to obtain a class CUT to be tested and a method MUT to be tested in the FUT, analyzing each MUT to obtain a class name CUTN, a method name MUTN and a type sequence AT L of each parameter of the method to be tested, splicing CUTN, MUTN and AT L to obtain an Mstr of the method to be tested, further calculating the corresponding Mstr by using a hash function, forming a mapping by corresponding each Mid and MUT, forming a mapping set MUTS of the method to be tested by mapping all Mid and MUT, and similarly, analyzing the class TC and the test method TM in the test method TF by finding the test file TF in the submitted player CiTP and analyzing the test class TC and the test method TM in the static analysis technology, then analyzing each TM to obtain a class name TCN of the method to be tested, the TMN, the parameter sequences of the Cid and the TMT mapping set, and the TMAT, splicing the corresponding mTST to be tested to obtain a corresponding Mid mapping set of the Mid and the Mid to be tested by using a corresponding Mid mapping function to be tested to obtain a Mid mapping set of the Mi;
2) calculating the similarity of the test program codes; giving a mapping set MUTS and a mapping set TMS of the method to be tested obtained in the step 1); firstly, traversing MUTS to obtain Mid set Mids; then, the players are traversed from TMS two by two, and the player Cid is taken1、Cid2And all mappings thereof<Mid1,TM1>、<Mid2,TM2>And go through to obtain Mid1Set Mid1s、Mid2Set Mid2s; finally, find Mid1s and Mid2Mid common to s and present in Mids, calculated at Mid1s and Mid2TM mapped in s1、TM2The similarity value SV of the player is obtained to obtain the player method similarity value mapping<Cid1,Cid2,<Mid,SV>>Mapping all player method similarity values to form a set SVS;
3) constructing a support vector machine model; giving a similarity value set SVS obtained in the step 2); first, N pairs of players are selected from the SVS, and their player method similarity value maps are obtained<Cid1,Cid2,<Mid,SV>>Selecting a proper kernel function kf, calculating a reference point, and establishing a support vector machine model by utilizing kf and the reference point;
4) detecting the plagiarism of the test program; giving the SVM model obtained in the step 3) and the similarity value set SVS obtained in the step 2); firstly, traversing player method similarity value mapping in a similarity value set, and inputting the player method similarity value mapping into an SVM (support vector machine); and then, obtaining an output result of the SVM, namely whether the player plagiants P or not.
2. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 1), the program is cut and analyzed; the method comprises the steps that a project source program, namely a program to be tested, in a test competition of a developer and a test program submitted by a player participating in the test competition of the developer are given, and a file in each program to be tested is subjected to static analysis to obtain a class to be tested and a method to be tested; analyzing each test method to obtain the class name, the method name and the type sequence of each parameter of the method to be tested, splicing, further calculating the corresponding Mid by using a hash function, and forming mapping corresponding to the method to be tested to form a mapping set of the method to be tested; similarly, by finding the test file in the test program submitted by the player, the test class and the test method are analyzed by the static analysis technology; analyzing each test method to obtain the class name, method name and type sequence of each parameter of the method to be tested, splicing and calculating Mid which can represent the method to be tested by the test segment by using a hash function, and forming mapping corresponding to players and the test method to form a test method mapping set.
3. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 2), the similarity of the test program code is calculated by: giving a mapping set of a method to be tested and a mapping set of a test method, and traversing to obtain a Mid set; every two players traverse, take all the mappings and traverse to obtain Mid1Set Mid1s、Mid2Set Mid2s; finding Mid1s and Mid2Mid common to s and present in Mids, calculated at Mid1s and Mid2And (5) obtaining player method similarity value mapping by the similarity values of the test methods mapped in the step s, wherein all player method similarity value mapping forms a similarity value set.
4. The method for detecting plagiarism of a test program based on a support vector machine according to claim 1, wherein in step 3), the support vector machine model is constructed by: giving a similarity value set and taking N pairs of players to obtain player method similarity value mapping of the players, and manually judging whether the N pairs of players copy; selecting a proper kernel function, calculating a reference point, establishing a support vector machine model, and optimizing by utilizing a maximum likelihood estimation function.
5. The method for detecting the plagiarism of the test program based on the support vector machine of claim 1, wherein in the step 4), the test program plagiarism is detected by: and giving a support vector machine model and a similarity value set, traversing the similarity value mapping of the player method in the set, inputting the similarity value mapping into the support vector machine model, and obtaining an output result, namely whether the player plagiants or not.
CN201910055791.0A 2019-01-18 2019-01-18 Test program plagiarism detection method based on support vector machine Pending CN111459788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910055791.0A CN111459788A (en) 2019-01-18 2019-01-18 Test program plagiarism detection method based on support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910055791.0A CN111459788A (en) 2019-01-18 2019-01-18 Test program plagiarism detection method based on support vector machine

Publications (1)

Publication Number Publication Date
CN111459788A true CN111459788A (en) 2020-07-28

Family

ID=71684942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910055791.0A Pending CN111459788A (en) 2019-01-18 2019-01-18 Test program plagiarism detection method based on support vector machine

Country Status (1)

Country Link
CN (1) CN111459788A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598340A (en) * 2021-03-04 2021-04-02 成都飞机工业(集团)有限责任公司 Data model comparison method based on uncertainty support vector machine

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101976318A (en) * 2010-11-15 2011-02-16 北京理工大学 Detection method of code similarity based on digital fingerprints
US20130339930A1 (en) * 2012-06-18 2013-12-19 South Dakota Board Of Regents Model-based test code generation for software testing
CN104335219A (en) * 2012-03-30 2015-02-04 爱迪德加拿大公司 Securing accessible systems using variable dependent coding
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN106960003A (en) * 2017-02-15 2017-07-18 黑龙江工程学院 Plagiarize the query generation method of the retrieval of the source based on machine learning in detection
CN108446540A (en) * 2018-03-19 2018-08-24 中山大学 Program code based on source code multi-tag figure neural network plagiarizes type detection method and system
CN109241706A (en) * 2018-01-16 2019-01-18 西安邮电大学 Software plagiarism detection method based on static birthmark

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101976318A (en) * 2010-11-15 2011-02-16 北京理工大学 Detection method of code similarity based on digital fingerprints
CN104335219A (en) * 2012-03-30 2015-02-04 爱迪德加拿大公司 Securing accessible systems using variable dependent coding
US20130339930A1 (en) * 2012-06-18 2013-12-19 South Dakota Board Of Regents Model-based test code generation for software testing
CN105868108A (en) * 2016-03-28 2016-08-17 中国科学院信息工程研究所 Instruction-set-irrelevant binary code similarity detection method based on neural network
CN106960003A (en) * 2017-02-15 2017-07-18 黑龙江工程学院 Plagiarize the query generation method of the retrieval of the source based on machine learning in detection
CN109241706A (en) * 2018-01-16 2019-01-18 西安邮电大学 Software plagiarism detection method based on static birthmark
CN108446540A (en) * 2018-03-19 2018-08-24 中山大学 Program code based on source code multi-tag figure neural network plagiarizes type detection method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
殷丹平: "《基于CNN的代码相似度检测研究与代码查重系统》" *
王卉: "《一种C 程序代码相似度检测方法》" *
王曙燕 等: "《基于多特征的静态软件胎记提取算法》" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598340A (en) * 2021-03-04 2021-04-02 成都飞机工业(集团)有限责任公司 Data model comparison method based on uncertainty support vector machine

Similar Documents

Publication Publication Date Title
CN101866317B (en) Regression test case selection method based on cluster analysis
CN103810200B (en) The database search method of opened protein matter qualification and system thereof
CN102541958A (en) Method, device and computer equipment for identifying short text category information
CN103294594A (en) Test based static analysis misinformation eliminating method
EP3330817A2 (en) Model processing method and apparatus, and machine-readable medium
CN111522942B (en) Training method and device for text classification model, storage medium and computer equipment
CN110059003B (en) Automatic test method, device, electronic equipment and readable storage medium
CN109684190A (en) Software testing device and method
CN112488769B (en) Advertisement putting test method, device, equipment and storage medium
CN109542783B (en) Extended finite-state machine test data generation method based on variable segmentation
CN111459788A (en) Test program plagiarism detection method based on support vector machine
CN111092769A (en) Web fingerprint identification method based on machine learning
CN116028702A (en) Learning resource recommendation method and system and electronic equipment
CN109582575A (en) Game test method and device
CN116341428B (en) Method for constructing reference model, chip verification method and system
CN110543331B (en) Test program plagiarism detection method based on test code segment similarity
CN110472054B (en) Data processing method and device
CN111859539A (en) Finite element automatic attribute and material parameter assigning method based on Tcl or Tk secondary development
CN105373473B (en) CDR accuracys method of testing and test system based on original signaling decoding
CN115809622A (en) Chip simulation acceleration system with automatic optimization configuration function
CN111459787A (en) Test plagiarism detection method based on machine learning
JP5075695B2 (en) Property description coverage measuring apparatus and program
CN112328951B (en) Processing method of experimental data of analysis sample
CN109684615B (en) Pedestrian collision test report generation method and device
CN112069050B (en) Intelligent contract testing method based on multi-objective optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200728

WD01 Invention patent application deemed withdrawn after publication