CN107885808B - Shared resource file anti-cheating method - Google Patents

Shared resource file anti-cheating method Download PDF

Info

Publication number
CN107885808B
CN107885808B CN201711070780.7A CN201711070780A CN107885808B CN 107885808 B CN107885808 B CN 107885808B CN 201711070780 A CN201711070780 A CN 201711070780A CN 107885808 B CN107885808 B CN 107885808B
Authority
CN
China
Prior art keywords
file
resource
resource file
shared resource
stock
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711070780.7A
Other languages
Chinese (zh)
Other versions
CN107885808A (en
Inventor
李禹江
何渔
吴豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Winshare Education Science & Technology Co ltd
Original Assignee
Sichuan Winshare Education Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Winshare Education Science & Technology Co ltd filed Critical Sichuan Winshare Education Science & Technology Co ltd
Priority to CN201711070780.7A priority Critical patent/CN107885808B/en
Publication of CN107885808A publication Critical patent/CN107885808A/en
Application granted granted Critical
Publication of CN107885808B publication Critical patent/CN107885808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a shared resource file anti-cheating method, which comprises the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; the time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.

Description

Shared resource file anti-cheating method
Technical Field
The invention relates to a file anti-cheating method, in particular to a shared resource file anti-cheating method.
Background
With the rapid development of network technology, people can share their own resource files. Under the condition of paid sharing, a small number of people are found, the files shared by others are downloaded, then the files are slightly changed and shared, and the reward is illegally obtained. The following problems arise if the shared resource file cannot be effectively cheated:
1. resulting in increased collection costs for the shared resources.
2. Similar resource files result in wasted storage space.
3. Similar resource files result in increased resource file retrievers selection costs.
Disclosure of Invention
The invention aims to solve the technical problems of high collection cost of shared resources, waste of storage space caused by similar resource files, overlong time consumption and the like, and aims to provide a method for reducing server consumption, quickly acquiring the similarity between a new shared resource file and a stored resource file and preventing cheating of the shared resource file.
The invention is realized by the following technical scheme:
a shared resource file anti-cheating method, the method comprising the steps of: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.
In order to prevent the cheating behavior of the shared resource file, the prior art adopts a file content processing technology to calculate the similarity between the new shared resource file and the stock resource file by using a vector space model. And if the file similarity exceeds the judgment value, judging that the new shared resource file is a cheating file, and not allowing the file to enter the resource library. The technology can consume a great deal of server resources for judging the similarity of the files. And the identification process of the file similarity is longer and longer as the quantity of the stock resources is increased.
Further, the warehousing file in the step S1 is entirely converted into a PDF format file by a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.
Further, the database in the step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method.
Further, the search result in the step S6 is a file list corresponding to the content segment.
Further, Luncene in step S2 is a searcher for open source programs, and full-text retrieval can be implemented in the target system through Luncene.
Further, the Luncene analyzes the documents and divides the words to establish indexes.
The key point of the invention is to randomly sample the content of the shared resource file to obtain content segments, search the stock resource file list by using the search engine service, find the stock resource list corresponding to the shared resource file by using the relation among the shared resource file, the file content segments and the corresponding stock resource file list, and judge whether the shared resource is cheating. The time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the shared resource file anti-cheating method can reduce the consumption of the server, can quickly acquire the similarity between a new shared resource file and a stock resource file, and prevents the cheating behavior of the shared resource file;
2. the shared resource file anti-cheating method has the advantages that the use cost of the whole server is low, and the storage space can be effectively saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Examples
As shown in fig. 1, the shared resource file anti-cheating method of the present invention includes the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.
In order to prevent the cheating behavior of the shared resource file, the prior art adopts a file content processing technology to calculate the similarity between the new shared resource file and the stock resource file by using a vector space model. And if the file similarity exceeds the judgment value, judging that the new shared resource file is a cheating file, and not allowing the file to enter the resource library. The technology can consume a great deal of server resources for judging the similarity of the files. And the identification process of the file similarity is longer and longer as the quantity of the stock resources is increased.
Taking the existing educational resource content service center as an example, the educational resource content service center is a system for uploading, managing, searching, checking and downloading educational resources. The users can share the original information, and if the information sharing is successful, the online bonus is issued according to the quality of the shared document.
For example, a primary school chinese teacher wants to share teaching courseware to an educational resource content service center, and the resource system has built a shared resource file anti-cheating system. After a teacher opens a system and enters a shared resource function to select courseware files needing to be shared, the system extracts 3 times of samples from the contents of the courseware files, namely 30 th to 40 th characters (N1) <' content parallel refute and solve newly >, 100 th to 110 th characters (N2) < fishing fire chimes are alive and dyed >, and the sequence is reversed from 200 th to 210 th characters (N3) < Jiangfeng fishing fire indicates Jiangfeng Danfeng Cheng et Shen, and sends the samples to a server, the server searches in a resource content library in parallel according to the 3 times of sampling (N1-N3) contents, and the search results are N1:3 files, N2:5 files and N3:4 files, the server counts the number of times of repetition of 12 searched files, 1 file appears 3 times, and the hit rate is 100%; 2 files appear for 2 times, and the hit rate is 66.6%; other files appear 1 time, hit 33%. And obtaining the number Fn of the files to be shared in the similar resource files at the stock as 3 according to the statistical result, and returning stock resource list information to the teacher end by the server and prompting the user that the resource files exist and cannot be shared.
Example two
In this embodiment, optimization and model selection are performed on the basis of the first embodiment, and the warehousing file in the step S1 is entirely converted into a PDF format file through a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.
The database in step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method. The retrieval result in step S6 is a file list corresponding to the content segment. The Luncene in the step S2 is a searcher for an open source program, and full-text retrieval can be realized in the target system through the Luncene. And analyzing the document and segmenting words to establish an index by the Luncene.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A shared resource file anti-cheating method, characterized in that said method comprises the steps of:
s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library;
s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file;
s3: randomly extracting content segments of a new shared resource file, wherein the segment number N > =3 times of sampling, loading the shared resource file, obtaining the total character length T and the content segment step length S =10 of the shared resource file, and constructing a random number set C = the total character length T-step length S;
s4: if C < =0, all contents of the shared file are sample fragment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N;
s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words;
s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results;
s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R = H/N and the hit rate R > =60%, the file is the stock similar resource file.
2. The shared resource file anti-cheating method according to claim 1, wherein the binned file in step S1 is entirely converted into a PDF-formatted file by a converter.
3. The shared resource file anti-cheating method according to claim 1, wherein said database in step S2 is a MYSQL database.
4. The method of claim 1, wherein the search result in step S6 is a file list corresponding to the content segment.
5. The method of claim 1, wherein Luncene in step S2 is a searcher of an open source program, and full-text search can be realized in a target system through Luncene.
6. The shared resource file anti-cheating method according to claim 5, wherein said Luncene analyzes and tokenizes documents to build an index.
CN201711070780.7A 2017-11-03 2017-11-03 Shared resource file anti-cheating method Active CN107885808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711070780.7A CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711070780.7A CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Publications (2)

Publication Number Publication Date
CN107885808A CN107885808A (en) 2018-04-06
CN107885808B true CN107885808B (en) 2021-03-30

Family

ID=61778734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711070780.7A Active CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Country Status (1)

Country Link
CN (1) CN107885808B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032954B (en) * 2018-08-16 2022-04-05 五八有限公司 User selection method and device for A/B test, storage medium and terminal

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095258A (en) * 2014-05-08 2015-11-25 腾讯科技(北京)有限公司 Media information sorting method and apparatus and media information recommendation system
CN106909609A (en) * 2017-01-09 2017-06-30 北方工业大学 Method for determining similar character strings, method and system for searching duplicate files

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303716B2 (en) * 2014-01-31 2019-05-28 Nbcuniversal Media, Llc Fingerprint-defined segment-based content delivery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095258A (en) * 2014-05-08 2015-11-25 腾讯科技(北京)有限公司 Media information sorting method and apparatus and media information recommendation system
CN106909609A (en) * 2017-01-09 2017-06-30 北方工业大学 Method for determining similar character strings, method and system for searching duplicate files

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高校实验教学中基于文件的防作弊技术;胡艳;《科教文汇》;20141031;全文 *

Also Published As

Publication number Publication date
CN107885808A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN111105209B (en) Job resume matching method and device suitable for person post matching recommendation system
US9798831B2 (en) Processing data in a MapReduce framework
US20190058609A1 (en) Method and apparatus for pushing information based on artificial intelligence
US20170235726A1 (en) Information identification and extraction
WO2023108980A1 (en) Information push method and device based on text adversarial sample
CN103793434A (en) Content-based image search method and device
CN112883734B (en) Block chain security event public opinion monitoring method and system
CN107480200A (en) Word mask method, device, server and the storage medium of word-based label
WO2022068543A1 (en) Multimedia content publishing method and apparatus, and electronic device and storage medium
Wu et al. Extracting topics based on Word2Vec and improved Jaccard similarity coefficient
CN112087667A (en) Information processing method and device and computer storage medium
Huang et al. A Low‐Cost Named Entity Recognition Research Based on Active Learning
McKenzie et al. Of Oxen and Birds: Is Yik Yak a useful new data source in the geosocial zoo or just another Twitter?
Zhao et al. Text sentiment analysis algorithm optimization and platform development in social network
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN108897819B (en) Data searching method and device
CN107885808B (en) Shared resource file anti-cheating method
Mazloom et al. Few-example video event retrieval using tag propagation
CN110580301A (en) efficient trademark retrieval method, system and platform
US9547701B2 (en) Method of discovering and exploring feature knowledge
Chen et al. Research on clustering analysis of Internet public opinion
Kordumova et al. Exploring the long tail of social media tags
Zhang et al. A system for extracting top-k lists from the web
Brambilla et al. On the quest for changing knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant