CN115345144A - Thesis is looked for duplicate and is carried matter integration service system - Google Patents

Thesis is looked for duplicate and is carried matter integration service system Download PDF

Info

Publication number
CN115345144A
CN115345144A CN202210802753.9A CN202210802753A CN115345144A CN 115345144 A CN115345144 A CN 115345144A CN 202210802753 A CN202210802753 A CN 202210802753A CN 115345144 A CN115345144 A CN 115345144A
Authority
CN
China
Prior art keywords
module
detection
submodule
checking
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210802753.9A
Other languages
Chinese (zh)
Inventor
杨玉林
陈小明
周宝春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Hanma Technology Co ltd
Original Assignee
Hunan Hanma Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Hanma Technology Co ltd filed Critical Hunan Hanma Technology Co ltd
Priority to CN202210802753.9A priority Critical patent/CN115345144A/en
Publication of CN115345144A publication Critical patent/CN115345144A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a thesis duplicate checking and quality improving integrated service system, which comprises: the student terminal system is connected with the institution management subsystem; the institution management subsystem includes: the system comprises an account management module, a thesis detection module I, an in-school mutual inspection module, a system configuration module, a statistical analysis module, a personal center module I, a self-library building module I and an operation log module which are arranged in parallel; the service system can check and correct the graduation papers and automatically typeset the papers according to a selected template, and can set a self-built library in the school for checking the repetition among the papers in the school, thereby effectively improving the quality of the papers.

Description

Thesis is looked for duplicate and is carried matter integration service system
Technical Field
The invention relates to the technical field of paper duplicate checking and detecting, in particular to a paper duplicate checking and quality improving integrated service system.
Background
At the present time, a paper is often used to refer to an article, referred to simply as a paper, that conducts research in various academic fields and describes the results of academic research. It is a means for studying problems and carrying out academic research, and a tool for describing academic research results and carrying out academic exchange. It includes academic annual papers, graduation papers, academic position papers, scientific papers, achievement papers, etc.
The graduation paper is the last link of the professional education industry of the ordinary middle-level professional school, the high-level specialty school, the college of the department, the high education self-learning examination department and the student academic calendar according to a course, and requires students to summarize independent work and write before graduation for intensively carrying out scientific research training on the students of the department.
In terms of literature, a paper is also a paper with certain meaning for scientific research and exploration of practical problems or theoretical problems in a professional field. Typically scheduled to be performed during the last school year (school period) of the repair industry. Under the guidance of teachers, students must select topics to study, write and submit papers.
In order to attack academic counterfeiting and academic misbehavior, the paper duplicate checking system comes from the beginning, but the paper duplicate checking system in the prior art can only check the duplicate of a paper, has single function, can only check the duplicate in an uploaded and published paper library, and cannot check the duplicate between the same graduation papers in a school.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a thesis duplicate checking and quality improving integrated service system. The method is realized by the following technical scheme:
a thesis is looked for and is repeated quality improvement integration service system includes: the student terminal system is connected with the institution management subsystem;
the institution management subsystem includes: the system comprises an account management module, a thesis detection module I, an in-school mutual inspection module, a system configuration module, a statistical analysis module, a personal center module I, a self-library building module I and an operation log module which are arranged in parallel; the account management module is used for managing school institutions, instructors, professions and students, and can modify, delete and view management levels of account information; the paper detection module is used for the management layer to submit a paper for detection and issue a detection report; the in-school mutual detection module is used for packaging and uploading student papers to detect whether all papers in the compressed package are mutually plagiarized or not; the system configuration module is used for setting detection standards and setting hospital department permissions; the statistical analysis module is used for counting the detection conditions of institutions and students; the personal center module I is used for the management layer to check account information, modify passwords and check purchase records; the self-building library module I is used for taking school resources as detection resources, enlarging the scope of paper detection and providing the quality of paper; the operation log module is used for checking specific operation records of the account;
student terminal system includes: a second thesis detection module, a second thesis error correction module, a second thesis typesetting module, a second personal center module, a second recharging center module and a second self-building library module which are arranged in parallel; the paper detection module is used for students to submit papers for detection and issue detection reports; the paper error correction module is used for submitting a paper for error correction and issuing an error correction report; the thesis typesetting module is used for displaying all the thesis templates of the school and automatically selecting the display templates for automatic typesetting; the personal center module is used for the students to check account information, modify passwords and check distribution records; the recharging center module is used for recharging the individual household; and the self-building library module is used for adding school resources into the personal self-building library by students as detection resources, so that the scope of paper detection is enlarged and the quality of papers is improved.
Further, the account management module includes: the hospital management submodule, the instructor management submodule, the professional management submodule, the student management submodule, the account deletion submodule and the management level submodule are arranged in parallel;
the hospital system management submodule is used for singly increasing and introducing hospital system information in batches, distributing and canceling duplicate checking times of the hospital systems and checking detection records of all the hospital systems; the teacher management submodule is used for singly adding and importing teacher information in batches, distributing and canceling the duplicate checking times of the teacher and checking the detection records of each teacher; the professional management submodule is used for singly adding and importing professional information in batches, distributing and canceling the duplicate checking times of the professionals and checking the detection records of each professional; the student management submodule is used for individually adding and importing student information in batches, distributing and canceling duplicate checking times for students and checking detection records of the students and whether account numbers are activated or not; the account deleting submodule is used for deleting account information of inactivated institutions, instructors, professions and students; the management level submodule is used for viewing the management level of the school.
Further, the first thesis detection module includes: the first submitting detection submodule is used for providing a detection entrance, uploading a single document and a compressed packet, and automatically calculating the number of articles and the required detection times by the system; the submission detection submodule is connected with the first submission detection submodule and is used for checking the detection record, the detection result and the download detection report; the unqualified report sub-module is connected with the first submission recording sub-module and is used for checking an unqualified detection report; and the paper grouping management submodule is connected with the first submission detection submodule and is used for grouping the detected papers.
Further, the school mutual inspection module comprises: an interactive inspection module is provided for submitting a compressed packet containing a plurality of papers and setting similarity intervals of the papers, and the number of reference documents is limited; the mutual inspection sub-module is connected with the submission mutual inspection sub-module and is used for mutually inspecting the documents in the compressed package submitted by the submission mutual inspection sub-module; and the mutual inspection record submodule connected with the mutual inspection submodule is used for checking and extracting the mutual inspection record.
Further, the system configuration module includes: the duplication checking parameter submodule is used for setting a similarity interval of a thesis and limiting the number of reference documents; and the hospital system permission submodule is connected with the duplication checking parameter submodule in parallel and is used for setting the permission of the hospital system.
Further, the statistical analysis module comprises: the hospital detection and statistics submodule is used for the hospital distribution system to count student numbers, detection space numbers and passing rate; and the student detection detail submodule is connected with the institution detection statistics submodule in parallel and is used for counting student information, instructor information, similarity, viewing reports and historical detection records.
Further, the second thesis detection module includes: a second submission detection submodule, which is used for providing a detection entrance and uploading a single document; the detection submodule connected with the submission detection submodule II is used for automatically checking and detecting the document submitted by the detection submodule; and the detection report sub-module is connected with the detection sub-module and is used for checking the detection records and the detection results and downloading the detection reports.
Further, the paper error correction module includes: submitting an error correction submodule for providing a detection entry and uploading a single document; the error correction submodule is connected with the submission error correction submodule and is used for automatically correcting the document submitted by the submission error correction submodule; and the error correction report submodule is connected with the error correction submodule and is used for counting the submission time, the error number, the error correction state and the title, and checking the error correction report, downloading the report and deleting the error correction report.
Further, the thesis typesetting module comprises: the submission and typesetting submodule is used for selecting the template and then selecting the papers to be typeset to submit and typeset; the typesetting sub-module is connected with the submission typesetting sub-module and is used for automatically typesetting the thesis submitted by the submission typesetting sub-module; and the typesetting recording submodule connected with the typesetting submodule is used for counting titles, format templates, typesetting states, typesetting time and typesetting pages, and can check the PDF version and word version typesetting files.
Further, the second personal center module includes: the account information submodule is used for binding the WeChat and the mobile phone and can check the detection times, the distribution times and the purchase times and carry out recharging; the password modification submodule is connected with the account information submodule in parallel and used for inputting an old password and a new password to complete the password modification; and the upper-level distribution recording submodule is connected with the modified codon module in parallel and is used for checking distribution conditions.
Compared with the prior art, the invention has the following beneficial effects: the system has the advantages of strong manageability, clear business process, reasonable authority division, complete functions, simple and convenient operation, convenient control of the quality of the papers by schools, convenient self-checking of the papers by students, and convenient modification of the papers to versions meeting the requirements of the schools with high efficiency.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and are not intended to limit the invention.
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a block diagram of a paper review and quality improvement integrated service system according to the present invention;
FIG. 2 is a block diagram of an account management module in the thesis duplication checking and quality improvement integration service system of the present invention;
FIG. 3 is a block diagram of a first thesis detection module in the thesis review and quality improvement integrated service system of the present invention;
FIG. 4 is a block diagram of an intra-school mutual inspection module in the thesis review and quality improvement integrated service system of the present invention;
FIG. 5 is a block diagram of a system configuration module in the thesis review and quality improvement integrated service system according to the present invention;
FIG. 6 is a block diagram of a statistical analysis module in the thesis review and quality improvement integrated service system of the present invention;
FIG. 7 is a block diagram of a second paper detection module in the integrated paper inspection and quality improvement service system according to the present invention;
FIG. 8 is a block diagram of a paper error correction module of the paper review and quality improvement integrated service system according to the present invention;
FIG. 9 is a block diagram of a thesis layout module in the thesis duplication checking and quality improvement integration service system of the present invention;
FIG. 10 is a block diagram of a personal center module in the thesis review and quality improvement integration service system of the present invention.
Reference numerals:
1. an organization management subsystem; 11. an account management module; 111. a hospital system management submodule; 112. a teacher management submodule; 113. a professional management submodule; 114. a student management submodule; 115. deleting the account submodule; 116. a management level submodule; 12. a first thesis detection module; 121. submitting a first detection submodule; 122. a submission record submodule; 123. a fail report sub-module; 124. a thesis grouping management submodule; 13. an intra-school mutual inspection module; 131. extracting an interactive detection module; 132. a mutual detection sub-module; 133. a mutual inspection recording submodule; 14. a system configuration module; 141. a duplicate checking parameter submodule; 142. a hospital department authority submodule; 15. a statistical analysis module; 151. a hospital detection statistics submodule; 152. a student detection detail submodule; 16. a personal center module I; 17. a first self-building library module; 18. an operation log module; 2. a student terminal system; 21. a second thesis detection module; 211. submitting a second detection submodule; 212. a detection submodule; 213. a detection report submodule; 22. a paper error correction module; 221. submitting an error correction submodule; 222. an error correction submodule; 223. an error correction reporting sub-module; 23. a thesis typesetting module; 231. submitting a typesetting submodule; 232. typesetting the submodule; 233. a typesetting recording submodule; 24. a second personal center module; 241. an account information submodule; 242. modifying the password submodule; 243. a superior distribution record submodule; 25. a recharge center module; 26. and building a library module II.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention; it is to be understood that the embodiments described are merely exemplary embodiments, rather than exemplary embodiments, and that all other embodiments may be devised by those skilled in the art without departing from the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "top/bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," "sleeved/connected," "connected," and the like are to be construed broadly, e.g., "connected," which may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Referring to fig. 1, a system for integrating paper review and quality improvement according to a preferred embodiment of the present invention includes: the system comprises an organization management subsystem 1 and a student terminal system 2 connected with the organization management subsystem 1;
the institution management subsystem 1 includes: the system comprises an account management module 11, a thesis detection module I12, an in-school mutual inspection module 13, a system configuration module 14, a statistical analysis module 15, a personal center module I16, a self-database building module I17 and an operation log module 18 which are arranged in parallel; the account management module 11 is configured to implement management of school institutions, instructors, professions, and students, and may modify, delete, and view management levels of account information; the paper detection module I12 is used for the management layer to submit a paper for detection and issue a detection report; the in-school mutual inspection module 13 is used for packaging and uploading student papers, wherein the packaging means that a plurality of papers are placed in a folder and compressed to generate a compressed packet so as to detect whether all papers in the compressed packet are mutually plagiarized; the system configuration module 14 is used for setting the detection standard and setting the department authority; the statistical analysis module 15 is used for counting the detection conditions of the hospital department and the students; the personal center module I16 is used for the management layer to check account information, modify passwords and check purchase records;
the self-building library module I17 is used for taking school resources as detection resources, enlarging the scope of paper detection and providing the quality of papers, wherein, documents in a format supporting word and pdf serve as the detection resources, and when a self-building library is newly built, classification is firstly built, then classification is selected, and finally articles in the self-building library are uploaded;
the operation log module 18 is configured to view a specific operation record of the self account, where the operation log can view function information including an operation IP, time, a function, and an operation: <xnotran> "* * * * * * * (3496 ) 1 , 3496 "; </xnotran>
The student terminal system 2 includes: a second thesis detection module 21, a second thesis error correction module 22, a second thesis typesetting module 23, a second personal center module 24, a recharging center module 25 and a second self-library building module 26 which are arranged in parallel; the second thesis detection module 21 is used for the student to submit a thesis for detection and issue a detection report; the paper error correction module 22 is used for submitting a paper for error correction and issuing an error correction report; the thesis typesetting module 23 is used for displaying all the thesis templates of the school and automatically selecting the display templates for automatic typesetting; the second personal center module 24 is used for the students to check account information, modify passwords and check distribution records; the recharging center module 25 is used for recharging the individual middle user, wherein the recharging center supports WeChat and Payment recharging, can complete password modification by inputting an old password and a new password, and can check consumption conditions; the second self-built library module 26 is used for adding school resources into the personal self-built library by students as detection resources, enlarging the scope of paper detection and improving the quality of papers, wherein, word and pdf format files are supported as detection resources, and when the self-built library is newly built, classification is firstly built, then classification is selected, and finally articles in the self-built library are uploaded;
the service system of the embodiment can check and correct the graduation papers, automatically typeset the papers according to the selected template, and can set a self-built library in the school for checking the repetition among the papers in the school, thereby effectively improving the quality of the papers.
Referring to fig. 2, in a further embodiment, the account management module 11 includes: an institution management submodule 111, a teacher management submodule 112, a professional management submodule 113, a student management submodule 114, an account deletion submodule 115 and a management level submodule 116 which are arranged in parallel with one another;
the hospital system management submodule 111 is used for individually adding and importing hospital system information in batches, distributing and canceling duplicate checking times of the hospital systems, and checking detection records of all the hospital systems; the teacher management sub-module 112 is used for individually adding and importing teacher information in batches, distributing and canceling the duplicate checking times of the teacher and checking the detection records of each teacher; the professional management submodule 113 is used for individually adding and importing professional information in batches, distributing and canceling the duplicate checking times of the professionals, and checking the detection records of each professional; the student management submodule 114 is used for individually adding and importing student information in batches, distributing and canceling the duplicate checking times of students, and checking the detection records of the students and whether the account is activated; the account deleting submodule 115 is used for deleting account information of inactivated institutions, instructors, professions and students; the management level sub-module 116 is used to check the management level of the school, and if there is a subordinate account, the management level can modify the subordinate account.
Referring to fig. 3, in a further embodiment, the paper detection module one 12 includes: submitting a detection submodule I121, which is used for providing a detection entrance, uploading a single document and a compressed packet, and automatically calculating how many articles and the required detection times by the system, wherein if a self-built library exists, whether the article needs to be selected according to the requirement during detection can be selected, and the current graduates and the corresponding graduates can be detected; the submission recording submodule 122 is connected with the submission detection submodule I121 and is used for checking detection records, detection results and downloading detection reports; the unqualified report submodule 123 is connected with the first submission recording submodule 122 and is used for checking an unqualified detection report; and a paper grouping management submodule 124 connected to the first submission detection submodule 121, configured to group the detected papers.
Referring to fig. 4, in a further embodiment, the school mutual inspection module 13 includes: an interactive check module 131, configured to submit a compressed packet containing a plurality of papers and set a similarity interval of the papers, where the number of references is limited; the mutual inspection submodule 132 connected to the submission mutual inspection submodule 131 is configured to perform mutual inspection on the documents in the compressed package submitted by the submission mutual inspection submodule 131; and a mutual examination record submodule 133 coupled to the mutual examination submodule 132 for viewing the extracted mutual examination record.
Referring to fig. 5, in a further embodiment, the system configuration module 14 includes: a duplicate checking parameter submodule 141, configured to set a similarity interval of the thesis and limit the number of reference documents; the hospital system permission submodule 142, which is arranged in parallel with the duplication checking parameter submodule 141, is used for setting the permission of the hospital system, and specifically includes: the method comprises the steps of allowing the institution to customize the total similarity ratio interval value of qualified papers, allowing the institution to select a detection algorithm, allowing the institution to select whether a past theoretical library participates in detection or not, and allowing the institution to customize the lower limit value of the reference quantity of student papers.
Referring to fig. 6, in a further embodiment, the statistical analysis module 15 includes: the hospital detection and statistics sub-module 151 is used for counting the student number, detecting the number and the passing rate of the hospital distribution system and downloading the excel statistical table; and a student detection details submodule 152 arranged in parallel with the hospital detection statistics submodule 151 for counting student information, instructor information, similarity, viewing reports and historical detection records, and being capable of downloading an excel statistical form.
Referring to fig. 7, in a further embodiment, the second thesis detection module 21 includes: a second submission detection submodule 211, which is used for providing a detection entry and uploading a single document; the detection submodule 212 connected with the second submission detection submodule 211 is used for automatically checking and detecting the document submitted by the detection submodule 212; and a test report submodule 213 coupled to the test submodule 212 for reviewing the test records and test results and downloading the test reports.
Referring to fig. 8, in a further embodiment, the paper error correction module 22 includes: a submit error correction submodule 221, configured to provide a detection entry, which can upload a single document; an error correction sub-module 222 connected to the submission error correction sub-module 221, configured to automatically correct errors in the documents submitted by the submission error correction sub-module 221; an error correction reporting sub-module 223 connected to the error correction sub-module 222 is used for counting the submission time, the number of errors, the error correction status, and the title, and enabling to view the error correction report, download the report, and delete the error correction report.
Referring to FIG. 9, in a further embodiment, the paper layout module 23 includes: a submission and typesetting submodule 231, configured to select a template and then select a thesis to be typeset to submit and typeset; a typesetting submodule 232 connected with the submission and typesetting submodule 231 and used for automatically typesetting the thesis submitted by the submission and typesetting submodule 231; and a layout record sub-module 233 connected to the layout sub-module 232, configured to count titles, format templates, layout states, layout time, and number of pages to be laid out, and enable viewing of the PDF version and word version layout files.
Referring to fig. 10, in a further embodiment, the second personal center module 24 includes: the account information submodule 241 is used for binding WeChat and a mobile phone, and can check the detection times, the distribution times and the purchase times and carry out recharging; a password modification submodule 242 connected in parallel with the account information submodule 241 and configured to input the old password and the new password to complete modification of the password; and an upper allocation recording submodule 243 arranged in parallel with the modified codon module 242 for checking allocation.
The specific process of duplicate checking of the thesis comprises the following steps:
text extraction:
the database for the thesis duplicate checking and comparison mainly comprises a local database and a network database. The local database collects excellent papers published previously, but the publication format of the papers is generally PDF, so the paper content is extracted from PDF files according to an open source tool pdfbox provided by Apache, and the web database is a document resource captured from the internet in real time by using a crawler technology, and the document content can be extracted after denoising the internet document.
Establishing a fingerprint index library for the paper:
properly segmenting the thesis content extracted in the last step, generally segmenting the thesis content by taking sentences as units, and then calculating fingerprints of each sentence by using a hash algorithm, wherein the traditional hash algorithm only can ensure that the fingerprints calculated by the original content are as uniform and random as possible, and for two identical fingerprints, the original contents of the two identical fingerprints are equal under a certain probability; for two different fingerprints, except for indicating that original contents are not equal, no information is provided, because even if the original contents only differ by one byte, the generated fingerprints are likely to differ greatly, a similarity hash algorithm is adopted, which belongs to a locality sensitive hash algorithm, the corresponding fingerprints can represent the similarity of the original contents to a certain extent, the main idea of the similarity hash algorithm is dimension reduction, high-dimensional feature vectors are mapped into low-dimensional feature vectors, and whether text contents are repeated or highly similar is determined through the Hamming distance of the two vectors. According to the empirical value, for the 64-bit similarity hash fingerprints, the similarity is higher if the hamming distance is within 3, in other words, for the 64-bit similarity hash fingerprints, all similar texts can be found as long as all fingerprints with the hamming distance within 3 are found, but in order to enable the real-time duplication checking response speed to be completed within a few seconds, multi-level indexes are established for massive fingerprints in a database in a blocking mode, the 64-bit binary similarity hash fingerprints are divided into 4 blocks, each block is 16 bits, and according to the drawer principle, if the hamming distances of the two fingerprints are within 3, one block is required to be completely the same. Then each block in the divided 4 blocks is used as the first 16 bits to search and establish the reverse index.
Segmentation of the inspection fragments:
the content of the thesis to be detected is properly segmented, generally, the sentence is taken as a unit, and then, a 64-bit binary fingerprint is calculated for each sentence by using a similarity hash algorithm.
Searching in a fingerprint database:
according to the block rule when establishing index, dividing the 64-bit binary fingerprint to be searched into 4 blocks with 16 bits each, and then searching the same fingerprint in the database for each of the 4 divided blocks.
Text content comparison:
loading corresponding texts according to similar fingerprints found in a database, then performing word segmentation on the texts to obtain a series of characteristic vectors, and then calculating the distance between the characteristic vectors, so that the similarity of the two articles is judged according to the distance, and the similarity is calculated by adding the editing distance and the Jacard similarity coefficient according to the weight. The editing distance refers to the minimum number of editing operations required for converting one character string into another character string, if the distance between the two character strings is larger, the characters are more different, the permitted editing operations comprise replacing one character with another character, inserting one character, deleting one character, the Jacard similarity coefficient is used for comparing the similarity and the difference between limited sample sets, the Jacard coefficient value is larger, the sample similarity is higher, the proportion of intersection elements of two sets A and B in the union of A and B is called the Jacard similarity coefficient of the two sets, the Jacard similarity coefficient is represented by a symbol J (A, B), J (A, B) = | A intersects B | A and B |, and the Jacard similarity coefficient is an index for measuring the similarity of the two sets;
1. the edit distance of the sentences a and b is editDistance (a, b), and the similarity calculation method using the edit distance idea is editSimiar =1-editDistance (a, b)/max (length of sentence a, length of sentence b)
2. Dividing words of sentences a and B to obtain word sets A and B, and calculating similarity by using Jacgard thought as jaccardSimiar = | A is crossed with B |/| A and B is as follows
Comprehensively considering the edit distance and the Jacard similarity coefficient, introducing a weight factor of the edit distance (0 < = factor < = 1), and calculating the similarity of sentences a and b in the following ways: similar (a, b) = factor × edit similar + (1-factor) × jaccardSimilar.
And (3) calculating the overall similarity:
overall similarity = number of similar words/number of detected words
The non-text parts (such as directories, titles, formulas, charts, references, etc.) automatically identified by the system do not participate in the detection, and the number of detected words is generally slightly less than that of the papers.
Number of similar words = (sentence 1 word count × sentence 1 similarity + sentence 2 word count × sentence 2 similarity + ·.
And (3) generating a detection report:
according to the similar result of the previous step, a webpage report with good readability is generated by loading the webpage template by using a Freemarker technology, wherein the Freemarker is a template engine, namely a universal tool which is based on the template and data to be changed and is used for generating output text (HTML webpage), and the universal tool can enable a program to realize interface and data separation and business code and logic code separation
In practical application, the service system can be developed by adopting main stream languages such as Java and the like, client software is not required to be installed, the service system can run in main stream operating systems such as Linux and Windows and the like, and is mainly applied to a thesis duplication checking stage of a school, a management layer of the school logs in a mechanism management subsystem 1 by using an account number and a password, and students of the school use a student terminal system 2;
the school management layer can place the academy or professional student papers belonging to the management layer in the same folder and compress the papers into a compressed packet, the compressed packet is uploaded from the first submitting detection submodule 121, the first submitting detection submodule 121 can compress and identify the papers in the compressed packet so as to perform duplication checking detection in a paper library, the detection records and results can be checked through the first sending detection recording submodule 122 and the detection reports can be downloaded, the detected unqualified paper statistical reports which do not meet the requirements can be checked in the unqualified reporting submodule 123, the paper grouping management submodule 124 can automatically group the detected papers, meanwhile, the school management layer can upload the compressed packet to the interactive detection submodule 131 for duplication checking and mutual detection among all papers in the compressed packet, the interactive detection recording submodule 133 can check the mutual detection records and the results of the mutual detection, meanwhile, the school management layer can set the detection standard and the institutional system authority and set the detection condition of the self-constructed papers, and the school management layer can build a self-library through the self-library building block 17 and provide the detection resource range of the papers and the quality detection resources of the school papers;
school students can submit single thesis documents from the submission detection submodule 212, the detection submodule 212 can perform automatic duplicate checking detection on the submitted thesis documents and can check detection results from the detection report submodule 213, the students can modify the thesis according to the detection results, the thesis can be uploaded from the submission error correction submodule 221 after the duplicate checking of the thesis of the students meets the school requirements, the error correction submodule 222 can perform automatic error correction on the thesis and mainly correct errors of wrongly written characters of the thesis, the error correction report submodule 223 can count submission time, the number of errors, the error correction state and titles, review error correction reports, download reports and delete error correction reports, the students can review the presentation of the thesis templates meeting the school requirements in the submission submodule 231 after completing the checking and the error correction of the thesis according to the error correction reports, select one thesis template and submit the thesis document needing to be typeset, the typesetting submodule 232 can automatically typeset the submitted thesis according to the template, and the recording submodule 233 can count titles, format templates, the typesetting state, time and pages; the PDF version and word version typesetting files can be checked, school students can also build a self-built library by themselves through the self-built library module II 26, school resources are used as detection resources through the function, the paper detection range is enlarged, the paper quality is provided, graduation papers can be checked and corrected, automatic typesetting can be performed on the papers according to a selected template, the self-built library can be set in the school for checking the repetition among the papers in the school, the quality of the papers can be effectively improved, the school can conveniently control the quality of the papers, and meanwhile, the students can conveniently perform self-checking on the papers, so that the papers can be efficiently modified to versions meeting the requirements of the school.
The above are only preferred embodiments of the present invention; the scope of the invention is not limited thereto. Any person skilled in the art should be able to cover the technical scope of the present invention by equivalent or modified solutions and modifications within the technical scope of the present invention.

Claims (10)

1. A paper review and quality improvement integrated service system is characterized by comprising:
the student terminal system is connected with the institution management subsystem;
the organization management subsystem includes: the system comprises an account management module, a thesis detection module I, an in-school mutual inspection module, a system configuration module, a statistical analysis module, a personal center module I, a self-library building module I and an operation log module which are arranged in parallel;
the account management module is used for managing school institutions, instructors, professions and students, and can modify, delete and view management levels of account information;
the paper detection module is used for the management layer to submit a paper for detection and issue a detection report;
the in-school mutual detection module is used for packaging and uploading student papers to detect whether all papers in the compressed package are mutually plagiarized or not;
the system configuration module is used for setting detection standards and setting hospital department permissions;
the statistical analysis module is used for counting the detection conditions of the institutions and the students;
the personal center module I is used for the management layer to check account information, modify passwords and check purchase records;
the self-building library module I is used for taking school resources as detection resources, enlarging the scope of paper detection and providing the quality of papers;
the operation log module is used for checking specific operation records of the account;
student terminal system includes: a second thesis detection module, a second thesis error correction module, a second thesis typesetting module, a second personal center module, a second recharging center module and a second self-building library module which are arranged in parallel;
the paper detection module is used for students to submit papers for detection and issue detection reports;
the paper error correction module is used for submitting a paper for error correction and issuing an error correction report;
the thesis typesetting module is used for displaying all the thesis templates of the school and can automatically select the display templates for automatic typesetting;
the personal center module is used for the students to check account information, modify passwords and check distribution records;
the recharging center module is used for realizing recharging of a user in a person;
and the self-building library module is used for adding school resources into the personal self-building library by students as detection resources, so that the scope of paper detection is enlarged and the quality of papers is improved.
2. The paper review and quality improvement integrated service system according to claim 1, wherein the account management module comprises: the hospital management submodule, the instructor management submodule, the professional management submodule, the student management submodule, the account deletion submodule and the management level submodule are arranged in parallel;
the hospital system management submodule is used for singly increasing and introducing hospital system information in batches, distributing and canceling duplicate checking times of the hospital systems and checking detection records of all the hospital systems;
the teacher management sub-module is used for individually adding and importing teacher information in batches, distributing and revoking the duplicate checking times of the teacher and checking the detection records of each teacher;
the professional management submodule is used for singly adding and importing professional information in batches, distributing and canceling the duplicate checking times of the professionals and checking the detection records of each professional;
the student management submodule is used for singly adding and importing student information in batches, distributing and cancelling duplicate checking times for students, and checking detection records of the students and whether account numbers are activated;
the account deleting submodule is used for deleting account information of inactivated institutions, instructors, professions and students;
the management level submodule is used for viewing the management level of the school.
3. The system of claim 1, wherein the first paper detection module comprises: the first submitting detection submodule is used for providing a detection entrance, uploading a single document and a compressed packet, and automatically calculating the number of articles and the required detection times by the system;
the submission recording submodule is connected with the submission detection submodule I and is used for checking the detection record, the detection result and downloading the detection report;
the unqualified report sub-module is connected with the first submission recording sub-module and is used for checking an unqualified detection report;
and the paper grouping management submodule is connected with the first submission detection submodule and is used for grouping the detected papers.
4. The paper review and quality improvement integrated service system according to claim 1, wherein the school mutual inspection module comprises: an interactive inspection module is provided for submitting a compressed packet containing a plurality of papers and setting similarity intervals of the papers, and the number of reference documents is limited;
the mutual inspection sub-module is connected with the submission mutual inspection sub-module and is used for mutually inspecting the documents in the compressed package submitted by the submission mutual inspection sub-module;
and the mutual inspection record submodule is connected with the mutual inspection submodule and is used for checking and extracting the mutual inspection record.
5. The paper review and quality improvement integrated service system according to claim 1, wherein the system configuration module comprises: the duplication checking parameter submodule is used for setting a similarity interval of a thesis and limiting the number of reference documents;
and the hospital system permission submodule is connected with the duplication checking parameter submodule in parallel and is used for setting the permission of the hospital system.
6. The paper review and quality improvement integrated service system according to claim 1, wherein the statistical analysis module comprises: the hospital detection and statistics submodule is used for counting the student number, the detection space and the passing rate of the hospital distribution system;
and the student detection detail submodule is connected with the institution detection statistics submodule in parallel and is used for counting student information, instructor information, similarity, viewing reports and historical detection records.
7. The system of claim 1, wherein the second paper detection module comprises: a second submission detection submodule, which is used for providing a detection entrance and uploading a single document;
the detection submodule connected with the submission detection submodule II is used for automatically checking and detecting the document submitted by the detection submodule;
and the detection report submodule is connected with the detection submodule and is used for checking the detection record and the detection result and downloading the detection report.
8. The paper review and quality improvement integrated service system according to claim 1, wherein the paper error correction module comprises: submitting an error correction submodule for providing a detection entry and uploading a single document;
the error correction sub-module is connected with the submission error correction sub-module and is used for automatically correcting the error of the document submitted by the submission error correction sub-module;
and the error correction report submodule is connected with the error correction submodule and is used for counting the submission time, the error number, the error correction state and the title, and checking the error correction report, downloading the report and deleting the error correction report.
9. The integrated thesis screening and quality improvement service system as claimed in claim 1, wherein said thesis layout module comprises: the submission and typesetting submodule is used for selecting the template and then selecting the papers to be typeset to submit and typeset;
the typesetting sub-module is connected with the submission typesetting sub-module and is used for automatically typesetting the thesis submitted by the submission typesetting sub-module;
and the typesetting recording submodule connected with the typesetting submodule is used for counting titles, format templates, typesetting states, typesetting time and typesetting pages, and can check the PDF version and word version typesetting files.
10. The paper review and quality improvement integrated service system according to claim 1, wherein the second personal center module comprises: the account information sub-module is used for binding the WeChat and the mobile phone, and can check the detection times, the distribution times and the purchase times and recharge;
the password modification submodule is connected with the account information submodule in parallel and used for inputting an old password and a new password to complete the password modification;
and the superior distribution record submodule is connected with the modified codon module in parallel and is used for checking the distribution condition.
CN202210802753.9A 2022-07-07 2022-07-07 Thesis is looked for duplicate and is carried matter integration service system Pending CN115345144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210802753.9A CN115345144A (en) 2022-07-07 2022-07-07 Thesis is looked for duplicate and is carried matter integration service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210802753.9A CN115345144A (en) 2022-07-07 2022-07-07 Thesis is looked for duplicate and is carried matter integration service system

Publications (1)

Publication Number Publication Date
CN115345144A true CN115345144A (en) 2022-11-15

Family

ID=83947613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210802753.9A Pending CN115345144A (en) 2022-07-07 2022-07-07 Thesis is looked for duplicate and is carried matter integration service system

Country Status (1)

Country Link
CN (1) CN115345144A (en)

Similar Documents

Publication Publication Date Title
CN106886509B (en) Automatic detection method for academic paper format
Vittorini et al. An AI-based system for formative and summative assessment in data science courses
WO2018051233A1 (en) Electronic document management using classification taxonomy
CN110119395B (en) Method for realizing association processing of data standard and data quality based on metadata in big data management
CN109614375B (en) Data storage system based on personal computer
CN112801530A (en) Intelligent review system based on semantic splitting and working method
CN112052396A (en) Course matching method, system, computer equipment and storage medium
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
CN115687647A (en) Notarization document generation method and device, electronic equipment and storage medium
Bicevskis et al. Data quality evaluation: a comparative analysis of company registers' open data in four European countries.
Liaqat et al. Plagiarism detection in java code
JP5766438B2 (en) Method and system for click-through function in electronic media
US20220406210A1 (en) Automatic generation of lectures derived from generic, educational or scientific contents, fitting specified parameters
CN115345144A (en) Thesis is looked for duplicate and is carried matter integration service system
Cleland et al. Using databases in medical education research: AMEE Guide No. 77
Peer et al. New curation software: Step-by-step preparation of social science data and code for publication and preservation
Marco Jr et al. Theses and Capstone Projects Plagiarism Checker using Kolmogorov Complexity Algorithm
CN115098581B (en) Method, device and equipment for storing numerical heterogeneous data and storage medium
CN111143337B (en) Method for improving data quality in product data management system
Suroto et al. Design of a Marriage Service Management Application at the Pasar Jambi Religious Affairs Office
Oghenerume et al. Comparative analysis of item statistics of WASSCE and NECO SSCE 2022 Data Processing multiple choice tests using item response theory
Jia et al. Investigating Risk-Management Disclosure: Manual Analysis Versus Computer-Based Analysis
KARSHIYEVICH Basic principles of creating software system to control and correct errors in text
AWAL et al. FACULTY PUBLICATION MANAGEMENT SYSTEM
KR20230061782A (en) Intellectual property data platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination