CN107885808A - Shared resource file anti-cheating method - Google Patents
Shared resource file anti-cheating method Download PDFInfo
- Publication number
- CN107885808A CN107885808A CN201711070780.7A CN201711070780A CN107885808A CN 107885808 A CN107885808 A CN 107885808A CN 201711070780 A CN201711070780 A CN 201711070780A CN 107885808 A CN107885808 A CN 107885808A
- Authority
- CN
- China
- Prior art keywords
- file
- shared resource
- resource file
- luncene
- resources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses shared resource file anti-cheating method, the described method comprises the following steps:S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information obtains resource file in Resources Reserve storehouse, and Luncene is loaded and structure document object, storage resources file is segmented, and create index file;S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource file, obtain shared resource file character total length T, contents fragment step-length S=10, structure set of random numbers C=character total length T step-lengths S;Improve judge shared resource whether be cheating time, improve whole efficiency.Avoiding similar documents enters resources bank simultaneously, has saved memory space.
Description
Technical field
The present invention relates to a kind of file anti-cheating method, and in particular to shared resource file anti-cheating method.
Background technology
The high speed development of network technology, everybody can share the resource file of oneself.Under the conditions of paid share, find
Fraction people, it is downloaded in the file basis that other people have shared, then micro change, then shared, it is illegal to obtain report
Reward.Problems with occurs if effectively anti-cheating can not being carried out to shared resource file:
1st, the compiling costs of shared resource is caused to increase.
2nd, similar resource file causes waste of storage space.
3rd, similar resource file causes resource file to obtain the increasing of taker alternative costs.
The content of the invention
The technical problems to be solved by the invention are to cause that the compiling costs of shared resource is big, similar resource file causes to deposit
Store up space waste, take the problems such as long, the purpose of the present invention is exactly to propose that one kind can either reduce server consumption, and and can is fast
Speed obtains the similarity of new shared resource file and storage resources file, prevents shared resource file cheating.
The present invention is achieved through the following technical solutions:
Shared resource file anti-cheating method, the described method comprises the following steps:S1:It is PDF lattice by storage file translations
Formula file, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;S2:Luncene obtains resource by database
Routing information in storage storehouse, passage path information obtain resource file in Resources Reserve storehouse, and Luncene is loaded and structure text
Shelves object, is segmented, and create index file to storage resources file;S3:Randomly select new shared resource file content piece
Section, segments N>=3 sampling, load shared resource file, obtain shared resource file character total length T, contents fragment step
Long S=10, structure set of random numbers C=character total length T- step-lengths S;S4:If C<=0, shared file all the elements are taken out the most
Print section content;If C>0, it is limited with set of random numbers C, generates random number K, obtains K to the contents fragment between K+S, and repeat
S3 steps, stop sampling when contents fragment quantity is equal to N;S5:Searched by the use of n times sampling contents fragment as search key
Index, which is held up, does the temporary retrieval result of n times retrieval;S6:N times retrieval result is analyzed, calculation document is hit in n times search
Number H, file occur once in search result, hits H increases by 1;S7;Obtain storage similar resource listed files and quantity
Fn, file hits H are compared with content pieces hop count N, if hit rate R=H/N hit rates R>=60%, this document is
Storage similar resource file.
Prior art now utilizes " vector to prevent shared resource file cheating using file content treatment technology
Spatial model " calculates the similarity of new shared resource file and storage resources file.If file similarity has exceeded decision content,
Then judge that new shared resource file does not allow access into resources bank for file of practising fraud, file.Judgement of the technology to file similarity
A large amount of server resources can be expended.And increase with storage resources quantity, the identification process of file similarity can increasingly be grown.
Further, the library file that enters in the step S1 by converter transformation in planta is PDF format file.Using
PDF format file carries out the storing of file content, shares and during the contrast of file content fragment because PDF format file can
Preferably realize and check online, and in contrast, quickly can be known word by softwares for discerning characters such as OCR
Manage in other places.
Further, the database in the step S2 is MYSQL database.With other large databases for example
Oracle, DB2, SQL Server etc. are compared, and MySQL has its weak point by oneself, as small scale, function are limited, but this hair
Only need simply to be stored in bright, and MYSQL is open database, therefore make not having to flower red cent in this way
(except cost of labor) can sets up a stabilization, free web station system.
Further, the retrieval result in the step S6 is listed files corresponding to contents fragment.
Further, the Luncene in the step S2 is the search engine of open source program, can be by Luncene
Full-text search is realized in goal systems.
Further, the Luncene is analyzed document, is segmented foundation index.
Key point of the present invention is to carry out content sampling at random to shared resource file, obtains contents fragment, is drawn using search
The service of holding up, storage resources listed files is searched, utilize shared resource file, file content fragment, corresponding storage resources file row
Relation between table finds storage resources list corresponding to shared resource file, judges whether shared resource is cheating.Improve
Judge shared resource whether be cheating time, improve whole efficiency.Avoiding similar documents enters resource simultaneously
Storehouse, memory space is saved.
The present invention compared with prior art, has the following advantages and advantages:
1st, shared resource file anti-cheating method of the present invention, can either reduce server consumption, and and can quick obtaining is newly common
The similarity of resource file and storage resources file is enjoyed, prevents shared resource file cheating;
2nd, shared resource file anti-cheating method of the present invention, the use cost of whole server is low, memory space can have
The saving of effect.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, forms one of the application
Point, do not form the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is present system flow chart.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make
For limitation of the invention.
Embodiment
As shown in figure 1, shared resource file anti-cheating method of the present invention, shared resource file anti-cheating method, the side
Method comprises the following steps:S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded to money
In the storage storehouse of source;S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information is in resource
Resource file is obtained in storage storehouse, Luncene is loaded and structure document object, and storage resources file is segmented, and creates
Index file;S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource text
Part, shared resource file character total length T, contents fragment step-length S=10 are obtained, build set of random numbers C=character total lengths
T- step-lengths S;S4:If C<=0, shared file all the elements sampled segment content the most;If C>0, it is limited with set of random numbers C,
Random number K is generated, obtains K to the contents fragment between K+S, and repeats S3 steps, stops taking out when contents fragment quantity is equal to N
Sample;S5:By the use of n times sampling contents fragment as search key the temporary retrieval result of n times retrieval is done in search engine;S6:To N
Secondary retrieval result is analyzed, and calculation document hits H, file in n times search occur once in search result, hits
H increases by 1;S7;Storage similar resource listed files and quantity Fn, file hits H are obtained compared with content pieces hop count N,
If hit rate R=H/N hit rates R>=60%, this document is storage similar resource file.
Prior art now utilizes " vector to prevent shared resource file cheating using file content treatment technology
Spatial model " calculates the similarity of new shared resource file and storage resources file.If file similarity has exceeded decision content,
Then judge that new shared resource file does not allow access into resources bank for file of practising fraud, file.Judgement of the technology to file similarity
A large amount of server resources can be expended.And increase with storage resources quantity, the identification process of file similarity can increasingly be grown.
By taking existing educational resource content service center as an example, educational resource content service center is that educational resource is carried out
Pass, manage, search, check, the system of download.Wherein user can share original information, if shared information success, by shared
Document quality carries out online bonus granting.
Such as primary school Chinese teacher thinks that shared teaching courseware has been taken to educational resource content service center, the resource system
Build shared resource file anti-cheating system.Teacher opens system and enters the courseware file that the selection of shared resource function needs to share
Afterwards, system extracts the character of order the 30th to 40 (N1) of 3 samples, respectively file content to the Courseware Resource file content<
" content is known from experience parallel refutes new explanation ">, the 100th to 110 character (N2)<Lights on fishing boats stroke is just vibrant just to be contaminated>, inverted order 200 to 210
Character (N3)<River maple lights on fishing boats refer to red maple along the river and worried>, server is sent to, server is arrived parallel with 3 sampling (N1-N3) contents
Resource content is retrieved in storehouse, retrieval result N1:3 files, N2:5 files, N3:4 files, server is to being retrieved
12 file number of repetition statistics, 1 file occur 3 times, hit rate 100%;2 files occur 2 times, hit rate 66.6%;
Alternative document occurs 1 time, hit rate 33%.Drawn by statistical result above, file to be shared is in storage similar resource file
Quantity Fn=3, server returns to storage resources list information to teacher side, and prompts user that the resource file is existing, it is impossible to
Shared.
Embodiment two
The present embodiment optimizes type selecting on the basis of embodiment one, and the library file that enters in the step S1 passes through conversion
Device transformation in planta is PDF format file.The storing of file content is carried out using PDF format file, shared and file content piece
Checked online during the contrast of section because PDF format file can be realized preferably, and in contrast, the texts such as OCR can be passed through
Word identification software, processing quickly is identified in word.
The database entered in the step S2 is MYSQL database.With other large databases such as Oracle, DB2,
SQL Server etc. are compared, and MySQL has its weak point by oneself, as small scale, function are limited, but only needed in the present invention into
The simple storage of row, and MYSQL is open database, therefore make not having to flower red cent in this way (except cost of labor)
Can sets up a stabilization, free web station system.Retrieval result in the step S6 is literary corresponding to contents fragment
Part list.Luncene in the step S2 is the search engine of open source program, can be in goal systems by Luncene
Realize full-text search.The Luncene is analyzed document, is segmented foundation index.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further
Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention
Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include
Within protection scope of the present invention.
Claims (6)
1. shared resource file anti-cheating method, it is characterised in that the described method comprises the following steps:
S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;
S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information is in Resources Reserve storehouse
Resource file is obtained, Luncene is loaded and structure document object, storage resources file is segmented, and create index file;
S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource file, obtain
Shared resource file character total length T, contents fragment step-length S=10, structure set of random numbers C=character total length T- step-lengths S;
S4:If C<=0, shared file all the elements sampled segment content the most;If C>0, it is limited with set of random numbers C, is generated
Random number K, K is obtained to the contents fragment between K+S, and repeat S3 steps, stop sampling when contents fragment quantity is equal to N;
S5:By the use of n times sampling contents fragment as search key the temporary retrieval result of n times retrieval is done in search engine;
S6:N times retrieval result is analyzed, calculation document hits H, file in n times search occur in search result
Once, hits H increases by 1;
S7;Storage similar resource listed files and quantity Fn, file hits H are obtained compared with content pieces hop count N, hit
If rate R=H/N hit rates R>=60%, this document is storage similar resource file.
2. shared resource file anti-cheating method according to claim 1, it is characterised in that the storage in the step S1
File is PDF format file by converter transformation in planta.
3. shared resource file anti-cheating method according to claim 1, it is characterised in that the data in the step S2
Storehouse is MYSQL database.
4. shared resource file anti-cheating method according to claim 1, it is characterised in that the retrieval in the step S6
As a result it is listed files corresponding to contents fragment.
5. shared resource file anti-cheating method according to claim 1, it is characterised in that in the step S2
Luncene is the search engine of open source program, and full-text search can be realized in goal systems by Luncene.
6. shared resource file anti-cheating method according to claim 5, it is characterised in that the Luncene is to document
Analyzed, segment foundation index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711070780.7A CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711070780.7A CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107885808A true CN107885808A (en) | 2018-04-06 |
CN107885808B CN107885808B (en) | 2021-03-30 |
Family
ID=61778734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711070780.7A Active CN107885808B (en) | 2017-11-03 | 2017-11-03 | Shared resource file anti-cheating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107885808B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109032954A (en) * | 2018-08-16 | 2018-12-18 | 五八有限公司 | A kind of user's choosing method, device, storage medium and the terminal of A/B test |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150220635A1 (en) * | 2014-01-31 | 2015-08-06 | Nbcuniversal Media, Llc | Fingerprint-defined segment-based content delivery |
CN105095258A (en) * | 2014-05-08 | 2015-11-25 | 腾讯科技(北京)有限公司 | Media information sorting method and apparatus and media information recommendation system |
CN106909609A (en) * | 2017-01-09 | 2017-06-30 | 北方工业大学 | Method for determining similar character strings, method and system for searching duplicate files |
-
2017
- 2017-11-03 CN CN201711070780.7A patent/CN107885808B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150220635A1 (en) * | 2014-01-31 | 2015-08-06 | Nbcuniversal Media, Llc | Fingerprint-defined segment-based content delivery |
CN105095258A (en) * | 2014-05-08 | 2015-11-25 | 腾讯科技(北京)有限公司 | Media information sorting method and apparatus and media information recommendation system |
CN106909609A (en) * | 2017-01-09 | 2017-06-30 | 北方工业大学 | Method for determining similar character strings, method and system for searching duplicate files |
Non-Patent Citations (1)
Title |
---|
胡艳: "高校实验教学中基于文件的防作弊技术", 《科教文汇》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109032954A (en) * | 2018-08-16 | 2018-12-18 | 五八有限公司 | A kind of user's choosing method, device, storage medium and the terminal of A/B test |
CN109032954B (en) * | 2018-08-16 | 2022-04-05 | 五八有限公司 | User selection method and device for A/B test, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN107885808B (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103744981B (en) | System for automatic classification analysis for website based on website content | |
CN107566376A (en) | One kind threatens information generation method, apparatus and system | |
CN102332025A (en) | Intelligent vertical search method and system | |
CN103678436B (en) | Information processing system and information processing method | |
CN106951422A (en) | The method and apparatus of webpage training, the method and apparatus of search intention identification | |
CN107291780A (en) | A kind of user comment information methods of exhibiting and device | |
CN106844530A (en) | Training method and device of a kind of question and answer to disaggregated model | |
CN106845265A (en) | A kind of document security level automatic identifying method | |
CN110222328B (en) | Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium | |
CN108763529A (en) | A kind of intelligent search method, device and computer readable storage medium | |
CN106940726A (en) | The intention automatic generation method and terminal of a kind of knowledge based network | |
Zou et al. | LDA-TM: A two-step approach to Twitter topic data clustering | |
CN107122478A (en) | A kind of method based on keyword extraction much-talked-about topic | |
CN103309857B (en) | A kind of taxonomy determines method and apparatus | |
CN110532450A (en) | A kind of Theme Crawler of Content method based on improvement shark search | |
CN104537028A (en) | Webpage information processing method and device | |
Galkin et al. | An open challenge for inductive link prediction on knowledge graphs | |
CN107885808A (en) | Shared resource file anti-cheating method | |
CN109344400A (en) | A kind of judgment method and device of document storage | |
Xu et al. | Generating risk maps for evolution analysis of societal risk events | |
Meel et al. | A contemporary survey of machine learning techniques for fake news identification | |
Sekiya et al. | Investigation on university websites for semi-automated syllabus crawling | |
CN104063514A (en) | Vertical search method | |
Caruana et al. | An Analysis of the Relationship between Words within the Voynich Manuscript. | |
CN111858918A (en) | News classification method and device, network element and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |