CN107885808B

CN107885808B - Shared resource file anti-cheating method

Info

Publication number: CN107885808B
Application number: CN201711070780.7A
Authority: CN
Inventors: 李禹江; 何渔; 吴豪
Original assignee: Sichuan Winshare Education Science & Technology Co ltd
Current assignee: Sichuan Winshare Education Science & Technology Co ltd
Priority date: 2017-11-03
Filing date: 2017-11-03
Publication date: 2021-03-30
Anticipated expiration: 2037-11-03
Also published as: CN107885808A

Abstract

The invention discloses a shared resource file anti-cheating method, which comprises the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; the time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.

Description

Shared resource file anti-cheating method

Technical Field

The invention relates to a file anti-cheating method, in particular to a shared resource file anti-cheating method.

Background

With the rapid development of network technology, people can share their own resource files. Under the condition of paid sharing, a small number of people are found, the files shared by others are downloaded, then the files are slightly changed and shared, and the reward is illegally obtained. The following problems arise if the shared resource file cannot be effectively cheated:

1. resulting in increased collection costs for the shared resources.

2. Similar resource files result in wasted storage space.

3. Similar resource files result in increased resource file retrievers selection costs.

Disclosure of Invention

The invention aims to solve the technical problems of high collection cost of shared resources, waste of storage space caused by similar resource files, overlong time consumption and the like, and aims to provide a method for reducing server consumption, quickly acquiring the similarity between a new shared resource file and a stored resource file and preventing cheating of the shared resource file.

The invention is realized by the following technical scheme:

a shared resource file anti-cheating method, the method comprising the steps of: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.

In order to prevent the cheating behavior of the shared resource file, the prior art adopts a file content processing technology to calculate the similarity between the new shared resource file and the stock resource file by using a vector space model. And if the file similarity exceeds the judgment value, judging that the new shared resource file is a cheating file, and not allowing the file to enter the resource library. The technology can consume a great deal of server resources for judging the similarity of the files. And the identification process of the file similarity is longer and longer as the quantity of the stock resources is increased.

Further, the warehousing file in the step S1 is entirely converted into a PDF format file by a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.

Further, the database in the step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method.

Further, the search result in the step S6 is a file list corresponding to the content segment.

Further, Luncene in step S2 is a searcher for open source programs, and full-text retrieval can be implemented in the target system through Luncene.

Further, the Luncene analyzes the documents and divides the words to establish indexes.

The key point of the invention is to randomly sample the content of the shared resource file to obtain content segments, search the stock resource file list by using the search engine service, find the stock resource list corresponding to the shared resource file by using the relation among the shared resource file, the file content segments and the corresponding stock resource file list, and judge whether the shared resource is cheating. The time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the shared resource file anti-cheating method can reduce the consumption of the server, can quickly acquire the similarity between a new shared resource file and a stock resource file, and prevents the cheating behavior of the shared resource file;

2. the shared resource file anti-cheating method has the advantages that the use cost of the whole server is low, and the storage space can be effectively saved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart of the system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Examples

As shown in fig. 1, the shared resource file anti-cheating method of the present invention includes the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.

Taking the existing educational resource content service center as an example, the educational resource content service center is a system for uploading, managing, searching, checking and downloading educational resources. The users can share the original information, and if the information sharing is successful, the online bonus is issued according to the quality of the shared document.

For example, a primary school chinese teacher wants to share teaching courseware to an educational resource content service center, and the resource system has built a shared resource file anti-cheating system. After a teacher opens a system and enters a shared resource function to select courseware files needing to be shared, the system extracts 3 times of samples from the contents of the courseware files, namely 30 th to 40 th characters (N1) <' content parallel refute and solve newly >, 100 th to 110 th characters (N2) < fishing fire chimes are alive and dyed >, and the sequence is reversed from 200 th to 210 th characters (N3) < Jiangfeng fishing fire indicates Jiangfeng Danfeng Cheng et Shen, and sends the samples to a server, the server searches in a resource content library in parallel according to the 3 times of sampling (N1-N3) contents, and the search results are N1:3 files, N2:5 files and N3:4 files, the server counts the number of times of repetition of 12 searched files, 1 file appears 3 times, and the hit rate is 100%; 2 files appear for 2 times, and the hit rate is 66.6%; other files appear 1 time, hit 33%. And obtaining the number Fn of the files to be shared in the similar resource files at the stock as 3 according to the statistical result, and returning stock resource list information to the teacher end by the server and prompting the user that the resource files exist and cannot be shared.

Example two

In this embodiment, optimization and model selection are performed on the basis of the first embodiment, and the warehousing file in the step S1 is entirely converted into a PDF format file through a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.

The database in step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method. The retrieval result in step S6 is a file list corresponding to the content segment. The Luncene in the step S2 is a searcher for an open source program, and full-text retrieval can be realized in the target system through the Luncene. And analyzing the document and segmenting words to establish an index by the Luncene.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A shared resource file anti-cheating method, characterized in that said method comprises the steps of:

s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library;

s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file;

s3: randomly extracting content segments of a new shared resource file, wherein the segment number N > =3 times of sampling, loading the shared resource file, obtaining the total character length T and the content segment step length S =10 of the shared resource file, and constructing a random number set C = the total character length T-step length S;

s4: if C < =0, all contents of the shared file are sample fragment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N;

s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words;

s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results;

s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R = H/N and the hit rate R > =60%, the file is the stock similar resource file.

2. The shared resource file anti-cheating method according to claim 1, wherein the binned file in step S1 is entirely converted into a PDF-formatted file by a converter.

3. The shared resource file anti-cheating method according to claim 1, wherein said database in step S2 is a MYSQL database.

4. The method of claim 1, wherein the search result in step S6 is a file list corresponding to the content segment.

5. The method of claim 1, wherein Luncene in step S2 is a searcher of an open source program, and full-text search can be realized in a target system through Luncene.

6. The shared resource file anti-cheating method according to claim 5, wherein said Luncene analyzes and tokenizes documents to build an index.