CN113900654A - Code plagiarism detection method and system based on program language teaching practice platform - Google Patents

Code plagiarism detection method and system based on program language teaching practice platform Download PDF

Info

Publication number
CN113900654A
CN113900654A CN202111043203.5A CN202111043203A CN113900654A CN 113900654 A CN113900654 A CN 113900654A CN 202111043203 A CN202111043203 A CN 202111043203A CN 113900654 A CN113900654 A CN 113900654A
Authority
CN
China
Prior art keywords
codes
code
plagiarism
similarity
homework
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111043203.5A
Other languages
Chinese (zh)
Inventor
李兆鹏
顾建平
王柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Zhongke Guochuanggao Trusted Software Co ltd
Original Assignee
Anhui Zhongke Guochuanggao Trusted Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Zhongke Guochuanggao Trusted Software Co ltd filed Critical Anhui Zhongke Guochuanggao Trusted Software Co ltd
Priority to CN202111043203.5A priority Critical patent/CN113900654A/en
Publication of CN113900654A publication Critical patent/CN113900654A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • G06Q50/2057Career enhancement or continuing education service

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a code plagiarism detection method and a system based on a program language teaching practice platform, wherein the method comprises the following steps: acquiring two operation codes, and performing matching comparison based on the content of the operation codes to determine the similarity of the two operation codes; and processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes. The method and the system combine a specific language teaching practice use scene and the editing operation characteristics of students during editing the homework codes to further integrate the similarity of the code texts, so that the homework code plagiarism results combined with the teaching scene are more accurate.

Description

Code plagiarism detection method and system based on program language teaching practice platform
Technical Field
The invention relates to the technical field of programming language teaching, in particular to a code plagiarism detection method and system based on a programming language teaching practice platform.
Background
With the progress and development of social science and technology, intelligent teaching is more applied to various colleges and universities. In a practical teaching scene, the number of students is far more than that of teachers, so that the homework correcting time of the teachers is larger in the whole teaching time, and although the existing teaching practice platform provides the homework correcting function, the teachers often need to spend much time for checking the homework code plagiarism conditions of the students. After students submit jobs on line, teachers need to review each job, and the phenomenon of job code copying is often prohibited in the teaching process, so that the method is used for performing code copying comparison on the submitted jobs before correcting the jobs to obtain possible job code copying samples, thereby reducing the troubleshooting time of teachers and lightening the burden of teachers.
Most of the existing inventions are to analyze the semantics and characteristics of the codes or the processed codes and detect the variables, function renaming, reconstruction and code format characteristic modification, and can be roughly divided into two stages: converting code formats and determining code similarity. The method comprises the steps of firstly removing irrelevant items from codes, packaging the codes into codes to be compared by using a certain algorithm, and secondly comparing the similarity of the codes to be compared to finally obtain a code similarity comparison result.
The existing operation code plagiarism detection scheme is only used for simply carrying out similarity contrast detection on operation codes, and does not combine with a specific use scene, so that a contrast detection result is separated from reality, the practical significance is lacked, and the continuity is not provided. Therefore, the existing homework code plagiarism detection scheme is only to analyze from code similarity and does not combine with the actual behaviors of students, such as daily performance, homework submission time and the like, and the obtained data is stylized and has little reference meaning for teachers.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a code plagiarism detection method and system based on a programming language teaching practice platform, which not only completes the examination of the similarity of the student homework codes, but also integrates the homework performance and long-term performance of students, and improves the reliability of the final plagiarism result.
In order to achieve the above purpose, the technical solution of the embodiment of the present application is implemented as follows:
on one hand, the embodiment of the application provides a code plagiarism detection method based on a program language teaching practice platform, which comprises the following steps:
acquiring two operation codes, and performing matching comparison based on the content of the operation codes to determine the similarity of the two operation codes;
and processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
In an alternative embodiment, the first parameter is generated based on the historical plagiarism behavior characteristics of the student and the plagiarism likelihood score of the student by the teacher.
In an alternative embodiment, the obtaining of the first parameter includes:
calculating a first plagiarism credibility parameter of a student editing homework process based on the editing operation characteristics of the student when editing homework codes;
calculating a second plagiarism credibility parameter of the student based on the historical plagiarism behavior characteristics of the student;
calculating a third plagiarism credibility parameter of the student based on the plagiarism possibility score of the student by the teacher;
and obtaining a first parameter based on the fusion of the first plagiarism credibility parameter, the second plagiarism credibility parameter and the third plagiarism credibility parameter.
In an alternative embodiment, the editing operation features of the student when editing the homework code include:
keyboard input operation, copy and paste operation, code static analysis operation, job debugging operation, job running operation, job saving operation and job submitting operation;
the first plagiarism credibility parameter a3Comprises the following steps:
Figure BDA0003250231780000021
wherein, the number of times p of copying and pasting codes for students1Static analysis result p of operation code by program language teaching practice platform2Whether or not online debugging operation p is performed3And online debugging operation result p4Job submission time p5
In an optional embodiment, the processing the similarity to obtain a final code plagiarism detection result X (M, N) of the two job codes includes:
X(M,N)=Y(M,N)S(AM,AN) Where M is one job code, N is another job code, Y (M, N) is the similarity of job codes M and N, AMAnd ANThe first parameters of job codes M and N, respectively.
In an alternative embodiment, the obtaining two job codes includes:
based on the homework codes submitted by all students, passing the codes through the test cases corresponding to the homework;
acquiring a test result, packaging jobs with the same test result in the same code file, separating the jobs with mutually independent naming spaces, and adding student information as a space name;
the homework codes submitted by two students are obtained in the same code file.
In an alternative embodiment, the performing matching comparison based on the job code content and determining the similarity between two job codes includes:
packaging the two operation codes in two mutually independent working spaces;
carrying out format conversion and irrelevant data cleaning on the operation code;
compiling words and sentences with different texts but the same semantics in the operation codes into the same text to obtain intermediate codes;
and carrying out similarity comparison on the two intermediate codes through an LCS function to obtain the similarity of the two operation codes.
In another aspect, an embodiment of the present application provides a code plagiarism detection system based on a programming language teaching practice platform, including:
the operation code text similarity analysis module is used for acquiring two operation codes, performing matching comparison based on the operation code content and determining the similarity of the two operation codes;
and the homework code plagiarism detection module is used for processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
In another aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes:
a processor;
a memory for storing processor-executable instructions;
the processor executes the executable instructions to realize the code plagiarism detection method based on the programming language teaching practice platform.
In yet another aspect, embodiments of the present application further provide a computer-readable storage medium, on which computer instructions are stored, and when executed by a processor, the computer instructions implement the steps of the code plagiarism detection method based on a program language teaching practice platform.
The code plagiarism detection method and the system based on the program language teaching practice platform have the following beneficial effects:
1. the invention is a scene programming for the practice course submission homework of language teaching, the course homework submitted by students is uniformly screened primarily by a program, the plagiarism comparison detection is completed by a homework code plagiarism detection algorithm, the manual review time is saved for teacher correction homework, the homework correction efficiency is improved, the detection accuracy of the plagiarism of homework codes is increased, bad learning habits are restrained, and the teaching quality is improved.
2. The method is different from the prior art that whether plagiarism exists or not is determined only through similarity analysis of the code text, after similarity calculation of the code text is carried out, the method is combined with a specific teaching practice use scene, and further integrates the similarity of the code text by combining editing operation characteristics of students when editing the homework code, such as copying and pasting operations, homework submission time and other factors, so that the plagiarism result of the homework code combined with the teaching scene is more accurate.
3. The method further integrates the similarity of the code texts, and also adds the possibility judgment of plagiarism behaviors of students by teachers and the historical plagiarism behavior characteristics of the students, and obtains a final plagiarism result after the overall planning, so that the homework code plagiarism result combined with a teaching scene is more accurate.
Drawings
FIG. 1 is a flowchart of a code plagiarism detection method based on a programming language teaching practice platform in an embodiment of the present application;
fig. 2 is a flowchart of a first parameter obtaining method in an embodiment of the present application;
fig. 3 is a structural diagram of a code plagiarism detection system based on a programming language teaching practice platform in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that references in the specification of the present application to the terms "comprises" and "comprising," and variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
(1) Code similarity: the amount of similarity measure for codes is a percentage.
(2) Code formatting: the codes are arranged into an ordered pattern according to a certain rule.
(3) Code amount: the number of codes is measured by a row unit.
(4) Invalid code: code that has never been executed in a program.
(5) Redundant code: after the program is optimized, the code amount is reduced, the same purpose can be achieved, the execution efficiency is enhanced, and then the reduced code is redundant code.
(6) An intermediate code: a syntax-oriented, source program equivalent internal representation code that is easily translated into a target program.
(7) Online IDE: the invention provides an integrated development environment for on-line programming.
(8) Static analysis: a method for analyzing a program without running the program.
(10) Machine learning: machine learning is a multi-disciplinary cross specialty, covers probability theory knowledge, statistical knowledge, approximate theoretical knowledge and complex algorithm knowledge, uses a computer as a tool and is dedicated to a real-time simulation human learning mode, and knowledge structure division is carried out on the existing content to effectively improve learning efficiency.
The present application will be described in further detail with reference to the following drawings and specific embodiments.
In a practical teaching scene, the number of students is far more than that of teachers, so that the homework correcting time of the teachers is larger in the whole teaching time, and although the existing teaching practice platform provides the homework correcting function, the teachers often need to spend much time for checking the homework code plagiarism conditions of the students. The embodiment of the application provides a code plagiarism detection method, a system, electronic equipment and a computer readable storage medium based on a program language teaching practice platform, similarity comparison is carried out on homework codes submitted by students, a code similarity comparison result is obtained, and further the code similarity comparison result is integrated according to the characteristics of the students and homework and by combining the historical plagiarism condition of the students, the editing behavior of the current homework, the submission time of the current homework and other factors. Meanwhile, subjective evaluation of the teacher on students is added, a final plagiarism result is obtained after planning, and the manual evaluation of the teacher is combined, so that the plagiarism result of the operation code combined with the teaching scene is more accurate.
The code plagiarism detection method based on the program language teaching practice platform provided by the embodiment of the application comprises the following steps:
step S1, acquiring two job codes, matching and comparing based on the content of the job codes, and determining the similarity of the two job codes;
in the embodiment of the application, matching comparison is performed based on the content of the homework codes, and similarity or identity judgment of words and sentences can be performed on the text content of the two homework codes.
And step S2, processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
Different from the prior art that whether plagiarism exists is determined only through similarity analysis of code texts, in the embodiment of the application, after similarity calculation of the code texts, the similarity of the code texts is further integrated by combining a specific use scene and combining editing operation characteristics of students when editing the homework codes, such as copying and pasting operations and factors such as homework submission time, so that the plagiarism result of the homework codes combined with the teaching scene is more accurate, it can be understood that the first parameter can be a parameter representing plagiarism credibility of the homework codes submitted by the students, the first parameter acts on the similarity data to obtain a final code plagiarism detection result of the two homework codes, and when the plagiarism detection result exceeds a preset threshold, plagiarism behavior of the two homework codes is judged.
In the process of submitting homework by students, the editing behavior of the online IDE can be recorded, including keyboard input operation, copy and paste operation, program analysis, code debugging, running and other related operations, the time point of user operation and the like, and the following points are taken as examples:
the times of copying and pasting are positively correlated with the possibility of plagiarism;
the keyboard input number and the modification times are negatively related to the plagiarism possibility;
the closer the time for submitting the operation for the first time is to the deadline, the more possible the plagiarism is; the later the last modification time is, the lower the probability of plagiarism is;
therefore, in the embodiment of the application, the acquisition of the first parameter fuses the editing operation characteristics of the student when editing the homework code.
The first parameter is generated based on the historical plagiarism behavior characteristics of the students and the plagiarism possibility scores of the students by the teachers.
Specifically, in some embodiments, the first parameter in step S2 further considers the historical plagiarism behavior characteristics of the student and the plagiarism probability score of the student by the teacher, adds the teacher 'S judgment on the student, integrates the evaluation to obtain the final plagiarism result, and combines with the manual evaluation of the teacher, so as to make the plagiarism result of the homework code combined with the teaching scene more accurate, wherein the teacher' S judgment on the student is the judgment on the plagiarism probability of the student by the teacher according to the daily performance of the student.
According to the embodiment of the application, the first parameter integrates the homework editing operation characteristics of students, the historical plagiarism behavior characteristics of students and the judgment of teachers on the students, and in the scene of language teaching practice, plagiarism comparison is carried out on homework codes submitted by the students, other factors such as the homework editing behaviors, the homework execution results and the homework submission time of the students and the actual performance behaviors of the students given by teachers are comprehensively referred to and finally combined with code similarity data, and the final plagiarism results of the students are obtained through algorithm calculation.
In some embodiments, the obtaining of the first parameter in step S2 includes:
calculating a first plagiarism credibility parameter of a student editing homework process based on the editing operation characteristics of the student when editing homework codes;
calculating a second plagiarism credibility parameter of the student based on the historical plagiarism behavior characteristics of the student;
calculating a third plagiarism credibility parameter of the student based on the plagiarism possibility score of the student by the teacher;
and obtaining a first parameter based on the fusion of the first plagiarism credibility parameter, the second plagiarism credibility parameter and the third plagiarism credibility parameter.
For example, for the second plagiarism credibility parameter of the student, the following steps may be taken:
determining the frequency characteristic of plagiarism behaviors and the occurrence rule of plagiarism behaviors based on the statistics of the occurrence time of the historical plagiarism behaviors of the student, wherein the occurrence rule of plagiarism behaviors comprises the probability of plagiarism behaviors determined by the statistics of plagiarism behaviors occurring in historical homework codes and the weight of the historical plagiarism behaviors, the weight of the historical plagiarism behaviors is determined based on the occurrence time interval with the last plagiarism behavior, and in one implementation mode, the longer the occurrence time interval with the last plagiarism behavior, the smaller the weight of the plagiarism behaviors occurring at this time;
and predicting the probability of plagiarism behavior of the current operation code according to the historical frequency characteristics and the occurrence rule of the historical plagiarism behavior, and taking the predicted probability as a second plagiarism credibility parameter, wherein the probability prediction can be carried out by adopting a neural network model.
In the embodiment of the application, the second plagiarism credibility parameter of the student is dynamically changed by combining with the historical homework performance of the student, so that the credibility of the final plagiarism result is improved.
Similarly, in the embodiment of the present application, the third plagiarism credibility parameter of the student may also be dynamically changed based on the historical course performance of the student, for example, the teacher comprehensively evaluates the student according to the historical classroom performance of the student, including attendance, classroom enthusiasm, homework excellent evaluation result, and the like, and evaluates the possibility of plagiarism behavior of the student as the third plagiarism credibility parameter of the student.
Of course, the dynamic update of the second plagiarism credibility parameter and the third plagiarism credibility parameter may be performed on the basis of one update per operation code, or may be performed on the basis of one update over a period of time, such as several learning courses.
The above-mentioned reliability parameter a based on first plagiarism1The second plagiarism credibility parameter a2And the third plagiarism credibility parameter a3The first parameter a obtained by fusion may be: a ═ Σ (a)1H1,a2H2,a3H3)
In some embodiments, the editing operation features of the student when editing the assignment code include:
keyboard input operation, copy and paste operation, code static analysis operation, job debugging operation, job running operation, job saving operation and job submitting operation;
the first plagiarism credibility parameter a1Comprises the following steps:
Figure BDA0003250231780000071
wherein, the number of times p of copying and pasting codes for students1Static analysis result p of operation code by program language teaching practice platform2Whether or not online debugging operation p is performed3And online debugging operation result p4Job submission time p5
In the embodiment of the application, the first plagiarism credibility parameter of the student is comprehensively judged based on the editing operation characteristics of the student when editing the homework code, and of course, the acquisition of the first plagiarism credibility parameter can also be based on the following steps:
acquiring editing operation characteristics of all students of the historical homework codes when editing the homework codes, wherein the editing operation characteristics comprise the number of keyboard inputs, the number of copying and pasting operations, the number of modification times, the time of submitting homework for the first time, the time of modifying for the last time and the like in the process of editing the homework codes of each student;
based on the editing operation characteristics of each student in each homework code editing process as training samples, taking the result of whether each student homework code is plagiarized as the marking data of the training samples, and acquiring a prediction model between the editing operation characteristics of the homework code and the plagiarism behavior through a training neural network;
and based on the prediction model, extracting and inputting the editing operation characteristics of the student in each homework code editing process into the prediction model, and acquiring a first plagiarism credibility parameter of the student when editing the homework code.
The processing the similarity to obtain the final code plagiarism detection result X (M, N) of the two job codes includes:
X(M,N)=Y(M,N)S(AM,AN) Where M is one job code, N is another job code, Y (M, N) is the similarity of job codes M and N, AMAnd ANThe first parameters of job codes M and N, respectively.
In the embodiment of the application, the final code plagiarism detection result of the two operation codes is obtained by acting on the similarity data based on the first parameter, and when the plagiarism detection result exceeds a preset threshold, it is determined that plagiarism behaviors exist in the two operation codes.
In the embodiment of the present application, acquiring two job codes includes:
based on the homework codes submitted by all students, passing the codes through the test cases corresponding to the homework;
storing the jobs with the same test result into the same job set;
the assignment codes submitted by two students are obtained in the same assignment set.
In the embodiment of the application, the passing condition of the test case is used as one of reference factors for plagiarism judgment, and if the passing rate of two compared operation codes is consistent and the passing/failing test cases are the same, the probability that the two codes are judged to be plagiarism is increased.
In the embodiment of the application, statistical analysis is carried out on the passing condition of the test case of each homework code, the homework codes with the same passing condition are classified into the same homework set, the plagiarism possibility between any two homework codes in the same homework set is high, specifically, the process can be that homework with the same test result is packaged in the same code file, the homework codes are separated by mutually independent namespaces, and student information is added as a space name, so that the similarity comparison operation is carried out in the later period.
In the embodiment of the application, before similarity judgment of two job codes is carried out, the job codes needing to be subjected to the job code similarity judgment are screened and classified according to conditions based on the test cases of the codes, so that the situation that the job codes submitted by any two students are directly subjected to similarity comparison one by one is avoided, the calculation amount of the similarity comparison is reduced, and the plagiarism detection efficiency is improved.
In some embodiments, the performing matching comparison based on the job code content and determining the similarity between the two job codes includes:
packaging the two operation codes in two mutually independent working spaces;
carrying out format conversion and irrelevant invalid data cleaning on the operation code;
compiling words and sentences with different texts but the same semantics in the operation codes into the same text to obtain intermediate codes;
and carrying out similarity comparison on the two intermediate codes through an LCS function to obtain the similarity of the two operation codes.
In the prior art, when plagiarism detection is performed on an operation code, most of the similarity measurement methods adopted are semantic and feature analysis on the code or the processed code, and detection on renaming, reconstruction and code format feature modification of variables and functions, which can be roughly divided into two stages: converting code formats and determining code similarity. The method comprises the steps of firstly removing irrelevant items from codes, packaging the codes into codes to be compared by using a certain algorithm, and secondly comparing the similarity of the codes to be compared to finally obtain a code similarity comparison result.
In the embodiment of the present application, for two job codes to be subjected to similarity calculation:
firstly, processing the text format of each code into a consistent format through a code formatting tool so as to facilitate subsequent plagiarism comparison;
secondly, removing invalid lines in the codes, wherein the invalid lines comprise comment lines, null statement lines and the like;
thirdly, performing static analysis on the codes, and deleting invalid codes and redundant codes which possibly interfere with the similarity contrast of the codes;
fourthly, compiling the operation source codes submitted by the students into intermediate codes, and eliminating the conditions that the similarity comparison of the codes is possibly interfered by different names, different but equivalent control sentences and the like;
and fifthly, comparing the character data of the two operation codes one by one based on the intermediate code to obtain a code similarity result.
Before plagiarism comparison detection is carried out on the operation codes, the embodiment of the application uses a static code analysis technology to eliminate invalid codes and redundant codes, so that the accuracy of a comparison result is greatly improved; the plagiarism contrast detection process of the operation code is established on the basis of code abstraction and packaging, and is an analysis contrast of intermediate language, and the similarity contrast is performed after variable interference items are eliminated.
The code plagiarism detection system based on the program language teaching practice platform can be arranged in a server or terminal equipment. Because the code plagiarism detection system is a system corresponding to the code plagiarism detection method in the embodiment of the application, and the principle of the code plagiarism detection system for solving the problem is similar to that of the method, the implementation of the code plagiarism detection system can refer to the implementation of the code plagiarism detection method, and repeated details are not repeated.
The code plagiarism detection system based on the program language teaching practice platform provided by the embodiment of the application comprises:
the operation code text similarity analysis module is used for acquiring two operation codes, performing matching comparison based on the operation code content and determining the similarity of the two operation codes;
and the homework code plagiarism detection module is used for processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
In an alternative embodiment, the job code plagiarism detection module comprises:
the first plagiarism credibility parameter acquiring unit is used for calculating a first plagiarism credibility parameter of a student editing homework process based on the editing operation characteristics of the student when editing homework codes;
the second plagiarism credibility parameter acquiring unit is used for calculating a second plagiarism credibility parameter of the student based on the historical plagiarism behavior characteristics of the student;
the third plagiarism credibility parameter acquiring unit is used for calculating a third plagiarism credibility parameter of the student based on the plagiarism possibility score of the student by the teacher;
and the first parameter acquisition unit is used for obtaining a first parameter based on the fusion of the first plagiarism credibility parameter, the second plagiarism credibility parameter and the third plagiarism credibility parameter.
It should be noted that: in the code plagiarism detection system provided in this embodiment, only the division of each functional module is exemplified when performing plagiarism detection, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the code plagiarism detection system is divided into different functional modules to complete all or part of the functions described above.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Correspondingly to the foregoing method embodiment, an embodiment of the present application further provides an electronic device, where the electronic device may be a server, and the electronic device includes:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the code plagiarism detection method based on the program language teaching practice platform by executing the executable instructions.
The processor for data processing may be implemented by a microprocessor, a CPU, a DSP, or an FPGA when executing processing. For the memory, the memory may be a volatile memory or a nonvolatile memory, and may also include both a volatile memory and a nonvolatile memory, and the memory stores therein operation instructions, which may be computer-executable codes, and the operation instructions implement the steps in the flow of the code plagiarism detection method according to the embodiment of the present application.
In correspondence with the above method embodiments, the present application further provides a computer-readable storage medium, on which computer instructions are stored, and when the instructions are executed by a processor, the code plagiarism detection method based on a program language teaching practice platform as described above is implemented.
The computer-readable storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a compact disc-read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage node, and the like.
The present invention is not limited to the above-described embodiments, and those skilled in the art will be able to make various modifications without creative efforts from the above-described conception, and fall within the scope of the present invention.

Claims (10)

1. The code plagiarism detection method based on the program language teaching practice platform is characterized by comprising the following steps:
acquiring two operation codes, and performing matching comparison based on the content of the operation codes to determine the similarity of the two operation codes;
and processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
2. The code plagiarism detection method based on a programming language teaching practice platform of claim 1, wherein the generation of the first parameter is further based on historical plagiarism behavior characteristics of the student and a teacher score for plagiarism likelihood of the student.
3. The method for detecting code plagiarism based on a programming language teaching practice platform as claimed in claim 2, wherein the obtaining of the first parameter comprises:
calculating a first plagiarism credibility parameter of a student editing homework process based on the editing operation characteristics of the student when editing homework codes;
calculating a second plagiarism credibility parameter of the student based on the historical plagiarism behavior characteristics of the student;
calculating a third plagiarism credibility parameter of the student based on the plagiarism possibility score of the student by the teacher;
and obtaining a first parameter based on the fusion of the first plagiarism credibility parameter, the second plagiarism credibility parameter and the third plagiarism credibility parameter.
4. The code plagiarism detection method based on a programming language teaching practice platform of claim 3, wherein the editing operation characteristics of the student when editing the homework code comprise:
keyboard input operation, copy and paste operation, code static analysis operation, job debugging operation, job running operation, job saving operation and job submitting operation;
the first plagiarism credibility parameter a3Comprises the following steps:
Figure FDA0003250231770000011
wherein, the number of times p of copying and pasting codes for students1Static analysis result p of operation code by program language teaching practice platform2Whether or not online debugging operation p is performed3And online debugging operation result p4Job submission time p5
5. The code plagiarism detection method based on a programming language teaching practice platform according to claim 2, wherein the processing the similarity to obtain a final code plagiarism detection result X (M, N) of two job codes comprises:
X(M,N)=Y(M,N)S(AM,AN) Where M is one job code, N is another job code, Y (M, N) is the similarity of job codes M and N, AMAnd ANThe first parameters of job codes M and N, respectively.
6. The code plagiarism detection method based on a programming language teaching practice platform of claim 1, wherein the obtaining of the two job codes comprises:
based on the homework codes submitted by all students, passing the codes through the test cases corresponding to the homework;
acquiring a test result, packaging jobs with the same test result in the same code file, separating the jobs with mutually independent naming spaces, and adding student information as a space name;
the homework codes submitted by two students are obtained in the same code file.
7. The code plagiarism detection method based on a programming language teaching practice platform as claimed in claim 1, wherein the matching comparison based on the content of the job code and the determination of the similarity between the two job codes comprise:
packaging the two operation codes in two mutually independent working spaces;
carrying out format conversion and irrelevant data cleaning on the operation code;
compiling words and sentences with different texts but the same semantics in the operation codes into the same text to obtain intermediate codes;
and carrying out similarity comparison on the two intermediate codes through an LCS function to obtain the similarity of the two operation codes.
8. Code plagiarism detection system based on a program language teaching practice platform is characterized by comprising the following steps:
the operation code text similarity analysis module is used for acquiring two operation codes, performing matching comparison based on the operation code content and determining the similarity of the two operation codes;
and the homework code plagiarism detection module is used for processing the similarity to obtain a final code plagiarism detection result of the two homework codes, wherein the processing of the similarity comprises acting a first parameter on the similarity data, and the first parameter is generated based on the editing operation characteristics of students when editing the homework codes.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1-7 by executing the executable instructions.
10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method according to any one of claims 1-7.
CN202111043203.5A 2021-09-07 2021-09-07 Code plagiarism detection method and system based on program language teaching practice platform Pending CN113900654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111043203.5A CN113900654A (en) 2021-09-07 2021-09-07 Code plagiarism detection method and system based on program language teaching practice platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111043203.5A CN113900654A (en) 2021-09-07 2021-09-07 Code plagiarism detection method and system based on program language teaching practice platform

Publications (1)

Publication Number Publication Date
CN113900654A true CN113900654A (en) 2022-01-07

Family

ID=79188712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111043203.5A Pending CN113900654A (en) 2021-09-07 2021-09-07 Code plagiarism detection method and system based on program language teaching practice platform

Country Status (1)

Country Link
CN (1) CN113900654A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969674A (en) * 2022-04-11 2022-08-30 呼伦贝尔学院 Program code plagiarism detection method and device
CN115145633A (en) * 2022-07-25 2022-10-04 杭州师范大学 Code error automatic detection method based on control flow graph
CN115268860A (en) * 2022-06-21 2022-11-01 北京浩泰思特科技有限公司 Intelligent teaching diagnosis method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969674A (en) * 2022-04-11 2022-08-30 呼伦贝尔学院 Program code plagiarism detection method and device
CN115268860A (en) * 2022-06-21 2022-11-01 北京浩泰思特科技有限公司 Intelligent teaching diagnosis method and system
CN115268860B (en) * 2022-06-21 2023-04-28 北京浩泰思特科技有限公司 Intelligent teaching diagnosis method and system
CN115145633A (en) * 2022-07-25 2022-10-04 杭州师范大学 Code error automatic detection method based on control flow graph

Similar Documents

Publication Publication Date Title
CN113900654A (en) Code plagiarism detection method and system based on program language teaching practice platform
CN111241243B (en) Test question, knowledge and capability tensor construction and labeling method oriented to knowledge measurement
CN111651676B (en) Method, device, equipment and medium for performing occupation recommendation based on capability model
US9443193B2 (en) Systems and methods for generating automated evaluation models
US20140370485A1 (en) Systems and Methods for Generating Automated Evaluation Models
CN113851020A (en) Self-adaptive learning platform based on knowledge graph
CN108153730A (en) A kind of polysemant term vector training method and device
CN116991467A (en) Code repairing system and repairing method based on large language model
CN114580346A (en) Information generation method and device combining RPA and AI, electronic equipment and storage medium
CN112256576B (en) Man-machine dialogue corpus testing method, device, equipment and storage medium
US6889219B2 (en) Method of tuning a decision network and a decision tree model
CN117194258A (en) Method and device for evaluating large code model
CN115830419A (en) Data-driven artificial intelligence technology evaluation system and method
CN113377962B (en) Intelligent process simulation method based on image recognition and natural language processing
US20220300836A1 (en) Machine Learning Techniques for Generating Visualization Recommendations
CN113918471A (en) Test case processing method and device and computer readable storage medium
CN109684466B (en) Intelligent education advisor system
CN113569112A (en) Tutoring strategy providing method, system, device and medium based on question
Azahar et al. A Hybrid Automated Essay Scoring Using NLP and Random Forest Regression
US20200034735A1 (en) System for generating topic inference information of lyrics
Butner et al. Ghost hunting in the nonlinear dynamic machine
Li et al. Applying Coding Behavior Features to Student Plagiarism Detection on Programming Assignments
JP7453116B2 (en) Information processing device, information processing method, and program
CN115270802B (en) Question sentence processing method, electronic equipment and storage medium
Laksitowening et al. Temporal Learning Type Analysis Based on Triple-Factor Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination