CN101957809A

CN101957809A - Anti-plagiarism method

Info

Publication number: CN101957809A
Application number: CN 201010506555
Authority: CN
Inventors: 江潮
Original assignee: TRANSN (BEIJING) INFORMATION TECHNOLOGY Co Ltd
Current assignee: TRANSN (BEIJING) INFORMATION TECHNOLOGY Co Ltd
Priority date: 2010-10-14
Filing date: 2010-10-14
Publication date: 2011-01-26

Abstract

The invention discloses an anti-plagiarism method, which comprises the following steps of: inputting a detection file, and extracting key words from the detection file; calling a search engine for searching the key words, retrieving source files of search result pages, and obtaining a match result of the key words to obtain the match rate of the key words; and when the match rate accords with the preset match rate, labeling contents which correspond to the key words in the file to be detected according to the set labeling mode. The method solves the problem of plagiarism identification of article contents; and the article contents are only input, so that the contents which already exist in the network and the contents which are owned by an author can be identified in a short period of time.

Description

A kind of anti-plagiarism method

Technical field

The present invention relates to a kind of file content recognition technology, specifically, relate to a kind of anti-plagiarism method.

Background technology

Along with popularizing of Internet, people can be in very first time online inquiry when solving thing.In order to tackle this plagiarism, a lot of websites also are to want to have use up various ways.Common forbids right mouse button with page script exactly, and right-click menu has not had just to can not find " duplicating ", and this has been ordered.This means, it also is easy cracking, for example, with keyboard shortcut Ctrl+C just; Perhaps, press the mouse right and left key simultaneously, fall dialog box with the left button point then; Perhaps, directly check the webpage source file.

Same problem also can appear at publishing business, plagiarized problem into present maximum, a lot of authors when writing book in order to save trouble, the direct content of using online enquiries to arrive, author's literary property has often been invaded in this behavior, but this act of plagiarism is had no idea to identify and prevented to prior art.

Summary of the invention

Technical matters solved by the invention provides a kind of anti-plagiarism method, has solved the plagiarism identification problem to article content.

Technical scheme is as follows:

A kind of anti-plagiarism method comprises:

Input detects file, extracts key word from described detection file;

Calling search engine is searched for described key word, transfers the source file of result of page searching, obtains the matching result of described key word, draws the matching rate of described key word;

When described matching rate with pre-set matching rate when conforming to, according to setting mark mode the content that conforms to described key word in the described file to be detected is carried out the mark processing.

Further: with the format conversion of described file to be detected is text formatting, the file to be detected of described text formatting is made pauses in reading unpunctuated ancient writings or staging treating, with the punctuate or staging treating after content as key word.

Further: described setting mark mode is selected for use and is adjusted font size, font-weight, font underlines or font is changed color.

Further: when described matching rate greater than 50% the time, content identical with described key word in the described file to be detected is presetted color according to first carries out mark; When described matching rate 50%～30% the time, content content identical with described key word in the described file to be detected is presetted color according to second carries out mark.

Further: described file to be detected is a complete word document or the passage in the word document.

Further: obtain content search as a result behind the matching rate, set according to pre-seting matching rate and font color, in the word document, described key word is carried out mark and handle, at this moment, call the interface that word software provides, the key word in the described word document is carried out mark handle.

Further: whether all search complete to judge the content of the current described file to be detected that is detecting; When described file to be detected when all search complete, call described search engine and utilize described key word that the residue content of described file to be detected is detected; Finish when described file to be detected detects, Save and Close described file to be detected.

Further: described matching rate equals the number of words % of matching result number of words * 100/ key word of key word.

The technique effect that technical solution of the present invention is brought comprises:

The invention solves the plagiarism identification problem to article content, only need the input article content, which just can tell at short notice is the content that has existed on the net, and which is author's oneself a content.

Description of drawings

Fig. 1 is a main flow chart of the present invention;

Fig. 2 is a result schematic diagram of utilizing search engine Baidu to search among the present invention;

Fig. 3 is a synoptic diagram of transferring Baidu's Search Results source file among the present invention;

Fig. 4 is the document synoptic diagram behind the mark among the present invention.

Embodiment

Whether the present invention adopts the source file that obtains search engine, and obtains the matching result of search key by source file, and then discern word content by matching rate and plagiarized.Source file is meant the set of source code, and source code then is the one group of character that can realize specific function (program development code) with certain sense.

Below with reference to accompanying drawing and preferred embodiment, technical scheme of the present invention is described in detail.

As shown in Figure 1, be main flow chart of the present invention.

Step 101: import file to be detected, file layout is selected the word form for use;

File to be detected can be a complete word document, also can be certain section literal in the word document.

Step 102: the file layout conversion is converted to text formatting (txt formatted file) with the word formatted file;

This purpose of larding speech with literary allusions mainly is to handle literal for convenience, and directly the WORD content being carried out can be very low by the sentence treatment effeciency, convert plain text to after, convenient follow-up punctuate is handled.

Step 103: the txt formatted file is made pauses in reading unpunctuated ancient writings or staging treating, and the content after punctuate or the staging treating is as the key word of search;

Step 104: utilize search engine, the content of txt formatted file after after punctuate or the staging treating is searched for as key word; If Search Results is arranged then carry out step 105, otherwise finish to analyze;

Technical solution of the present invention is applicable to any one search engine, for example, txt formatted file content side is arrived removal search in the Baidu (http://www.baidu.com).

As shown in Figure 2, be the result schematic diagram that the present invention utilizes search engine Baidu to search for.When Baidu was searched for, the Search Results that matching rate is high can come the front, and simultaneously, the key word in the content can be marked, and the key word that matches is labeled as redness.

Step S105: resolve Search Results; Carry out step S106 if the content that searches is underlined, be further analyzed, otherwise carry out step S108;

Step S106: calculate matching rate;

As shown in Figure 3, be the synoptic diagram of transferring Baidu's Search Results source file among the present invention.Therefrom can see the key word of search.

To the txt formatted file behind the punctuate, inquire about in network by sentence, by extracting key word, use regular expression＜em for the result who inquires about again〉(.*? the em of)＜/〉 can get access to the matching result of search key, calculate matching rate this moment.

Transfer the html source file of result of page searching, the html source file of the analysis result page can be found, with＜em〉＜/em〉mark part is key word, based on this, by regular expression＜em〉(.*? the em of)＜/〉 can get access to the matching result of searching key word.

Calculate the matching rate of keyword:

The number of words % of matching result number of words * 100/ key word of matching rate=key word.

Step S107: the matching rate of judging key word; When matching rate greater than 50% the time, content identical with key word in the word document is presetted color according to first carries out mark;

Step S108: in the word document, matching rate is presetted color in 50%～30% the content content identical with key word according to second carry out mark;

Calculate content search as a result behind the matching rate, set, need in the word document, carry out mark and handle, at this moment, call the interface that word software provides, the key word in the word document is carried out mark handle key word according to pre-seting matching rate and font color.Matching rate and color settings can be self-defined.Mark to the word document also can adopt other modes, mode such as for example adjust font size, overstriking, underline.

As shown in Figure 4, be the document synoptic diagram behind the mark among the present invention.As can be seen from the figure, Xiang Guan key word has carried out the mark processing.

Step S109: whether all search complete to judge the content of the current word document that is detecting; When needing content retrieved in addition, all search complete for the promptly current word document that is detecting, and carries out step S104; If all search complete for the content of the current word document that is detecting, carry out step S110;

Step S110: Save and Close the word document;

When search complete, just finished word document mark in full, promptly according to pre-seting matching rate and color has been carried out mark in full to the word document.

Step S111: generate form, finish dealing with.

Form is to generate according to the literal that is labeled, and can find out intuitively by form which content is plagiarized in the word document, and which has the plagiarism possibility in, and which is author's oneself a content.

Claims

1. anti-plagiarism method comprises:

Input detects file, extracts key word from described detection file;

2. anti-plagiarism method as claimed in claim 1, it is characterized in that: with the format conversion of described file to be detected is text formatting, file to be detected to described text formatting is made pauses in reading unpunctuated ancient writings or staging treating, with the punctuate or staging treating after content as key word.

3. as claim 1 or 2 described anti-plagiarism methods, it is characterized in that: described setting mark mode is selected for use and is adjusted font size, font-weight, font underlines or font is changed color.

4. as claim 1 or 2 described anti-plagiarism methods, it is characterized in that: when described matching rate greater than 50% the time, content identical with described key word in the described file to be detected is presetted color according to first carries out mark; When described matching rate 50%～30% the time, content content identical with described key word in the described file to be detected is presetted color according to second carries out mark.

5. as claim 1 or 2 described anti-plagiarism methods, it is characterized in that: described file to be detected is a complete word document or the passage in the word document.

6. anti-plagiarism method as claimed in claim 5, it is characterized in that: obtain content search as a result behind the matching rate, set according to pre-seting matching rate and font color, in the word document described key word being carried out mark handles, at this moment, call the interface that word software provides, the key word in the described word document is carried out mark handle.

7. as claim 1 or 2 described anti-plagiarism methods, it is characterized in that: whether all search complete to judge the content of the current described file to be detected that is detecting; When described file to be detected when all search complete, call described search engine and utilize described key word that the residue content of described file to be detected is detected; Finish when described file to be detected detects, Save and Close described file to be detected.

8. anti-plagiarism method as claimed in claim 1 is characterized in that: described matching rate equals the number of words % of matching result number of words * 100/ key word of key word.