CN107885808B - Shared resource file anti-cheating method - Google Patents
Shared resource file anti-cheating method Download PDFInfo
- Publication number
- CN107885808B CN107885808B CN201711070780.7A CN201711070780A CN107885808B CN 107885808 B CN107885808 B CN 107885808B CN 201711070780 A CN201711070780 A CN 201711070780A CN 107885808 B CN107885808 B CN 107885808B
- Authority
- CN
- China
- Prior art keywords
- file
- resource
- resource file
- shared resource
- stock
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a shared resource file anti-cheating method, which comprises the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; the time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.
Description
Technical Field
The invention relates to a file anti-cheating method, in particular to a shared resource file anti-cheating method.
Background
With the rapid development of network technology, people can share their own resource files. Under the condition of paid sharing, a small number of people are found, the files shared by others are downloaded, then the files are slightly changed and shared, and the reward is illegally obtained. The following problems arise if the shared resource file cannot be effectively cheated:
1. resulting in increased collection costs for the shared resources.
2. Similar resource files result in wasted storage space.
3. Similar resource files result in increased resource file retrievers selection costs.
Disclosure of Invention
The invention aims to solve the technical problems of high collection cost of shared resources, waste of storage space caused by similar resource files, overlong time consumption and the like, and aims to provide a method for reducing server consumption, quickly acquiring the similarity between a new shared resource file and a stored resource file and preventing cheating of the shared resource file.
The invention is realized by the following technical scheme:
a shared resource file anti-cheating method, the method comprising the steps of: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.
In order to prevent the cheating behavior of the shared resource file, the prior art adopts a file content processing technology to calculate the similarity between the new shared resource file and the stock resource file by using a vector space model. And if the file similarity exceeds the judgment value, judging that the new shared resource file is a cheating file, and not allowing the file to enter the resource library. The technology can consume a great deal of server resources for judging the similarity of the files. And the identification process of the file similarity is longer and longer as the quantity of the stock resources is increased.
Further, the warehousing file in the step S1 is entirely converted into a PDF format file by a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.
Further, the database in the step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method.
Further, the search result in the step S6 is a file list corresponding to the content segment.
Further, Luncene in step S2 is a searcher for open source programs, and full-text retrieval can be implemented in the target system through Luncene.
Further, the Luncene analyzes the documents and divides the words to establish indexes.
The key point of the invention is to randomly sample the content of the shared resource file to obtain content segments, search the stock resource file list by using the search engine service, find the stock resource list corresponding to the shared resource file by using the relation among the shared resource file, the file content segments and the corresponding stock resource file list, and judge whether the shared resource is cheating. The time for judging whether the shared resource is cheating or not is prolonged, and the whole efficiency is improved. Meanwhile, similar files are prevented from entering a resource library, and storage space is saved.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the shared resource file anti-cheating method can reduce the consumption of the server, can quickly acquire the similarity between a new shared resource file and a stock resource file, and prevents the cheating behavior of the shared resource file;
2. the shared resource file anti-cheating method has the advantages that the use cost of the whole server is low, and the storage space can be effectively saved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Examples
As shown in fig. 1, the shared resource file anti-cheating method of the present invention includes the following steps: s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library; s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file; s3: randomly extracting content segments of the new shared resource file, sampling for 3 times, loading the shared resource file, obtaining the total character length T and the content segment step length S of the shared resource file, and constructing a random number set C which is the total character length T-step length S; s4: if C is less than 0, all contents of the shared file are the most sampled segment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N; s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words; s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results; s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R is H/N and the hit rate R > is 60%, the file is the stock similar resource file.
In order to prevent the cheating behavior of the shared resource file, the prior art adopts a file content processing technology to calculate the similarity between the new shared resource file and the stock resource file by using a vector space model. And if the file similarity exceeds the judgment value, judging that the new shared resource file is a cheating file, and not allowing the file to enter the resource library. The technology can consume a great deal of server resources for judging the similarity of the files. And the identification process of the file similarity is longer and longer as the quantity of the stock resources is increased.
Taking the existing educational resource content service center as an example, the educational resource content service center is a system for uploading, managing, searching, checking and downloading educational resources. The users can share the original information, and if the information sharing is successful, the online bonus is issued according to the quality of the shared document.
For example, a primary school chinese teacher wants to share teaching courseware to an educational resource content service center, and the resource system has built a shared resource file anti-cheating system. After a teacher opens a system and enters a shared resource function to select courseware files needing to be shared, the system extracts 3 times of samples from the contents of the courseware files, namely 30 th to 40 th characters (N1) <' content parallel refute and solve newly >, 100 th to 110 th characters (N2) < fishing fire chimes are alive and dyed >, and the sequence is reversed from 200 th to 210 th characters (N3) < Jiangfeng fishing fire indicates Jiangfeng Danfeng Cheng et Shen, and sends the samples to a server, the server searches in a resource content library in parallel according to the 3 times of sampling (N1-N3) contents, and the search results are N1:3 files, N2:5 files and N3:4 files, the server counts the number of times of repetition of 12 searched files, 1 file appears 3 times, and the hit rate is 100%; 2 files appear for 2 times, and the hit rate is 66.6%; other files appear 1 time, hit 33%. And obtaining the number Fn of the files to be shared in the similar resource files at the stock as 3 according to the statistical result, and returning stock resource list information to the teacher end by the server and prompting the user that the resource files exist and cannot be shared.
Example two
In this embodiment, optimization and model selection are performed on the basis of the first embodiment, and the warehousing file in the step S1 is entirely converted into a PDF format file through a converter. When the PDF format file is adopted to store and share the file content and compare the file content fragments, the online check of the PDF format file can be better realized, and in comparison, characters can be quickly identified and processed through character identification software such as OCR (optical character recognition) software and the like.
The database in step S2 is a MYSQL database. Compared with other large databases such as Oracle, DB2, SQL Server and the like, MySQL has the disadvantages of small scale and limited functions, but the invention only needs simple storage, and MySQL is an open database, so that a stable and free website system can be established without spending a lot of money (except labor cost) by using the method. The retrieval result in step S6 is a file list corresponding to the content segment. The Luncene in the step S2 is a searcher for an open source program, and full-text retrieval can be realized in the target system through the Luncene. And analyzing the document and segmenting words to establish an index by the Luncene.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. A shared resource file anti-cheating method, characterized in that said method comprises the steps of:
s1: converting the file to be put in storage into a PDF format file, and uploading the converted PDF format file to a resource stock library;
s2: acquiring path information in a resource stock library through a database by Luncene, acquiring a resource file in the resource stock library through the path information, loading and constructing a document object by the Luncene, segmenting the stock resource file, and creating an index file;
s3: randomly extracting content segments of a new shared resource file, wherein the segment number N > =3 times of sampling, loading the shared resource file, obtaining the total character length T and the content segment step length S =10 of the shared resource file, and constructing a random number set C = the total character length T-step length S;
s4: if C < =0, all contents of the shared file are sample fragment contents; if C is greater than 0, generating a random number K by taking the random number set C as a limit, acquiring content segments from K to K + S, repeating the step S3, and stopping sampling when the number of the content segments is equal to N;
s5: searching and temporarily storing the search result in the search engine for N times by using the N-time sampling content fragments as the search key words;
s6: analyzing the N times of retrieval results, calculating the number H of hits of the file in the N times of searching, wherein the number H of hits is increased by 1 when the file appears once in the searching results;
s7; and obtaining a stock similar resource file list and the number Fn, comparing the file hit number H with the content fragment number N, and if the hit rate R = H/N and the hit rate R > =60%, the file is the stock similar resource file.
2. The shared resource file anti-cheating method according to claim 1, wherein the binned file in step S1 is entirely converted into a PDF-formatted file by a converter.
3. The shared resource file anti-cheating method according to claim 1, wherein said database in step S2 is a MYSQL database.
4. The method of claim 1, wherein the search result in step S6 is a file list corresponding to the content segment.
5. The method of claim 1, wherein Luncene in step S2 is a searcher of an open source program, and full-text search can be realized in a target system through Luncene.
6. The shared resource file anti-cheating method according to claim 5, wherein said Luncene analyzes and tokenizes documents to build an index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711070780.7A CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711070780.7A CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885808A CN107885808A (en) | 2018-04-06 |
CN107885808B true CN107885808B (en) | 2021-03-30 |
Family
ID=61778734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711070780.7A Active CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885808B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109032954B (en) * | 2018-08-16 | 2022-04-05 | 五八有限公司 | User selection method and device for A/B test, storage medium and terminal |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095258A (en) * | 2014-05-08 | 2015-11-25 | 腾讯科技(北京)有限公司 | Media information sorting method and apparatus and media information recommendation system |
CN106909609A (en) * | 2017-01-09 | 2017-06-30 | 北方工业大学 | Method for determining similar character strings, method and system for searching duplicate files |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10303716B2 (en) * | 2014-01-31 | 2019-05-28 | Nbcuniversal Media, Llc | Fingerprint-defined segment-based content delivery |
-
2017
- 2017-11-03 CN CN201711070780.7A patent/CN107885808B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095258A (en) * | 2014-05-08 | 2015-11-25 | 腾讯科技(北京)有限公司 | Media information sorting method and apparatus and media information recommendation system |
CN106909609A (en) * | 2017-01-09 | 2017-06-30 | 北方工业大学 | Method for determining similar character strings, method and system for searching duplicate files |
Non-Patent Citations (1)
Title |
---|
高校实验教学中基于文件的防作弊技术;胡艳;《科教文汇》;20141031;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107885808A (en) | 2018-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649818B (en) | Application search intention identification method and device, application search method and server | |
CN111105209B (en) | Job resume matching method and device suitable for person post matching recommendation system | |
US9798831B2 (en) | Processing data in a MapReduce framework | |
US20190058609A1 (en) | Method and apparatus for pushing information based on artificial intelligence | |
US20170235726A1 (en) | Information identification and extraction | |
WO2023108980A1 (en) | Information push method and device based on text adversarial sample | |
CN103793434A (en) | Content-based image search method and device | |
CN112883734B (en) | Block chain security event public opinion monitoring method and system | |
CN107480200A (en) | Word mask method, device, server and the storage medium of word-based label | |
WO2022068543A1 (en) | Multimedia content publishing method and apparatus, and electronic device and storage medium | |
Wu et al. | Extracting topics based on Word2Vec and improved Jaccard similarity coefficient | |
CN112087667A (en) | Information processing method and device and computer storage medium | |
Huang et al. | A Low‐Cost Named Entity Recognition Research Based on Active Learning | |
McKenzie et al. | Of Oxen and Birds: Is Yik Yak a useful new data source in the geosocial zoo or just another Twitter? | |
Zhao et al. | Text sentiment analysis algorithm optimization and platform development in social network | |
WO2015084757A1 (en) | Systems and methods for processing data stored in a database | |
CN108897819B (en) | Data searching method and device | |
CN107885808B (en) | Shared resource file anti-cheating method | |
Mazloom et al. | Few-example video event retrieval using tag propagation | |
CN110580301A (en) | efficient trademark retrieval method, system and platform | |
US9547701B2 (en) | Method of discovering and exploring feature knowledge | |
Chen et al. | Research on clustering analysis of Internet public opinion | |
Kordumova et al. | Exploring the long tail of social media tags | |
Zhang et al. | A system for extracting top-k lists from the web | |
Brambilla et al. | On the quest for changing knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |