CN107885808A - Shared resource file anti-cheating method - Google Patents

Shared resource file anti-cheating method Download PDF

Info

Publication number
CN107885808A
CN107885808A CN201711070780.7A CN201711070780A CN107885808A CN 107885808 A CN107885808 A CN 107885808A CN 201711070780 A CN201711070780 A CN 201711070780A CN 107885808 A CN107885808 A CN 107885808A
Authority
CN
China
Prior art keywords
file
shared resource
resource file
luncene
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711070780.7A
Other languages
Chinese (zh)
Other versions
CN107885808B (en
Inventor
李禹江
何渔
吴豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Wenxuan Education Science & Technology Co Ltd
Original Assignee
Sichuan Wenxuan Education Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Wenxuan Education Science & Technology Co Ltd filed Critical Sichuan Wenxuan Education Science & Technology Co Ltd
Priority to CN201711070780.7A priority Critical patent/CN107885808B/en
Publication of CN107885808A publication Critical patent/CN107885808A/en
Application granted granted Critical
Publication of CN107885808B publication Critical patent/CN107885808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses shared resource file anti-cheating method, the described method comprises the following steps:S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information obtains resource file in Resources Reserve storehouse, and Luncene is loaded and structure document object, storage resources file is segmented, and create index file;S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource file, obtain shared resource file character total length T, contents fragment step-length S=10, structure set of random numbers C=character total length T step-lengths S;Improve judge shared resource whether be cheating time, improve whole efficiency.Avoiding similar documents enters resources bank simultaneously, has saved memory space.

Description

Shared resource file anti-cheating method
Technical field
The present invention relates to a kind of file anti-cheating method, and in particular to shared resource file anti-cheating method.
Background technology
The high speed development of network technology, everybody can share the resource file of oneself.Under the conditions of paid share, find Fraction people, it is downloaded in the file basis that other people have shared, then micro change, then shared, it is illegal to obtain report Reward.Problems with occurs if effectively anti-cheating can not being carried out to shared resource file:
1st, the compiling costs of shared resource is caused to increase.
2nd, similar resource file causes waste of storage space.
3rd, similar resource file causes resource file to obtain the increasing of taker alternative costs.
The content of the invention
The technical problems to be solved by the invention are to cause that the compiling costs of shared resource is big, similar resource file causes to deposit Store up space waste, take the problems such as long, the purpose of the present invention is exactly to propose that one kind can either reduce server consumption, and and can is fast Speed obtains the similarity of new shared resource file and storage resources file, prevents shared resource file cheating.
The present invention is achieved through the following technical solutions:
Shared resource file anti-cheating method, the described method comprises the following steps:S1:It is PDF lattice by storage file translations Formula file, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;S2:Luncene obtains resource by database Routing information in storage storehouse, passage path information obtain resource file in Resources Reserve storehouse, and Luncene is loaded and structure text Shelves object, is segmented, and create index file to storage resources file;S3:Randomly select new shared resource file content piece Section, segments N>=3 sampling, load shared resource file, obtain shared resource file character total length T, contents fragment step Long S=10, structure set of random numbers C=character total length T- step-lengths S;S4:If C<=0, shared file all the elements are taken out the most Print section content;If C>0, it is limited with set of random numbers C, generates random number K, obtains K to the contents fragment between K+S, and repeat S3 steps, stop sampling when contents fragment quantity is equal to N;S5:Searched by the use of n times sampling contents fragment as search key Index, which is held up, does the temporary retrieval result of n times retrieval;S6:N times retrieval result is analyzed, calculation document is hit in n times search Number H, file occur once in search result, hits H increases by 1;S7;Obtain storage similar resource listed files and quantity Fn, file hits H are compared with content pieces hop count N, if hit rate R=H/N hit rates R>=60%, this document is Storage similar resource file.
Prior art now utilizes " vector to prevent shared resource file cheating using file content treatment technology Spatial model " calculates the similarity of new shared resource file and storage resources file.If file similarity has exceeded decision content, Then judge that new shared resource file does not allow access into resources bank for file of practising fraud, file.Judgement of the technology to file similarity A large amount of server resources can be expended.And increase with storage resources quantity, the identification process of file similarity can increasingly be grown.
Further, the library file that enters in the step S1 by converter transformation in planta is PDF format file.Using PDF format file carries out the storing of file content, shares and during the contrast of file content fragment because PDF format file can Preferably realize and check online, and in contrast, quickly can be known word by softwares for discerning characters such as OCR Manage in other places.
Further, the database in the step S2 is MYSQL database.With other large databases for example Oracle, DB2, SQL Server etc. are compared, and MySQL has its weak point by oneself, as small scale, function are limited, but this hair Only need simply to be stored in bright, and MYSQL is open database, therefore make not having to flower red cent in this way (except cost of labor) can sets up a stabilization, free web station system.
Further, the retrieval result in the step S6 is listed files corresponding to contents fragment.
Further, the Luncene in the step S2 is the search engine of open source program, can be by Luncene Full-text search is realized in goal systems.
Further, the Luncene is analyzed document, is segmented foundation index.
Key point of the present invention is to carry out content sampling at random to shared resource file, obtains contents fragment, is drawn using search The service of holding up, storage resources listed files is searched, utilize shared resource file, file content fragment, corresponding storage resources file row Relation between table finds storage resources list corresponding to shared resource file, judges whether shared resource is cheating.Improve Judge shared resource whether be cheating time, improve whole efficiency.Avoiding similar documents enters resource simultaneously Storehouse, memory space is saved.
The present invention compared with prior art, has the following advantages and advantages:
1st, shared resource file anti-cheating method of the present invention, can either reduce server consumption, and and can quick obtaining is newly common The similarity of resource file and storage resources file is enjoyed, prevents shared resource file cheating;
2nd, shared resource file anti-cheating method of the present invention, the use cost of whole server is low, memory space can have The saving of effect.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, forms one of the application Point, do not form the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is present system flow chart.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make For limitation of the invention.
Embodiment
As shown in figure 1, shared resource file anti-cheating method of the present invention, shared resource file anti-cheating method, the side Method comprises the following steps:S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded to money In the storage storehouse of source;S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information is in resource Resource file is obtained in storage storehouse, Luncene is loaded and structure document object, and storage resources file is segmented, and creates Index file;S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource text Part, shared resource file character total length T, contents fragment step-length S=10 are obtained, build set of random numbers C=character total lengths T- step-lengths S;S4:If C<=0, shared file all the elements sampled segment content the most;If C>0, it is limited with set of random numbers C, Random number K is generated, obtains K to the contents fragment between K+S, and repeats S3 steps, stops taking out when contents fragment quantity is equal to N Sample;S5:By the use of n times sampling contents fragment as search key the temporary retrieval result of n times retrieval is done in search engine;S6:To N Secondary retrieval result is analyzed, and calculation document hits H, file in n times search occur once in search result, hits H increases by 1;S7;Storage similar resource listed files and quantity Fn, file hits H are obtained compared with content pieces hop count N, If hit rate R=H/N hit rates R>=60%, this document is storage similar resource file.
Prior art now utilizes " vector to prevent shared resource file cheating using file content treatment technology Spatial model " calculates the similarity of new shared resource file and storage resources file.If file similarity has exceeded decision content, Then judge that new shared resource file does not allow access into resources bank for file of practising fraud, file.Judgement of the technology to file similarity A large amount of server resources can be expended.And increase with storage resources quantity, the identification process of file similarity can increasingly be grown.
By taking existing educational resource content service center as an example, educational resource content service center is that educational resource is carried out Pass, manage, search, check, the system of download.Wherein user can share original information, if shared information success, by shared Document quality carries out online bonus granting.
Such as primary school Chinese teacher thinks that shared teaching courseware has been taken to educational resource content service center, the resource system Build shared resource file anti-cheating system.Teacher opens system and enters the courseware file that the selection of shared resource function needs to share Afterwards, system extracts the character of order the 30th to 40 (N1) of 3 samples, respectively file content to the Courseware Resource file content< " content is known from experience parallel refutes new explanation ">, the 100th to 110 character (N2)<Lights on fishing boats stroke is just vibrant just to be contaminated>, inverted order 200 to 210 Character (N3)<River maple lights on fishing boats refer to red maple along the river and worried>, server is sent to, server is arrived parallel with 3 sampling (N1-N3) contents Resource content is retrieved in storehouse, retrieval result N1:3 files, N2:5 files, N3:4 files, server is to being retrieved 12 file number of repetition statistics, 1 file occur 3 times, hit rate 100%;2 files occur 2 times, hit rate 66.6%; Alternative document occurs 1 time, hit rate 33%.Drawn by statistical result above, file to be shared is in storage similar resource file Quantity Fn=3, server returns to storage resources list information to teacher side, and prompts user that the resource file is existing, it is impossible to Shared.
Embodiment two
The present embodiment optimizes type selecting on the basis of embodiment one, and the library file that enters in the step S1 passes through conversion Device transformation in planta is PDF format file.The storing of file content is carried out using PDF format file, shared and file content piece Checked online during the contrast of section because PDF format file can be realized preferably, and in contrast, the texts such as OCR can be passed through Word identification software, processing quickly is identified in word.
The database entered in the step S2 is MYSQL database.With other large databases such as Oracle, DB2, SQL Server etc. are compared, and MySQL has its weak point by oneself, as small scale, function are limited, but only needed in the present invention into The simple storage of row, and MYSQL is open database, therefore make not having to flower red cent in this way (except cost of labor) Can sets up a stabilization, free web station system.Retrieval result in the step S6 is literary corresponding to contents fragment Part list.Luncene in the step S2 is the search engine of open source program, can be in goal systems by Luncene Realize full-text search.The Luncene is analyzed document, is segmented foundation index.
Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include Within protection scope of the present invention.

Claims (6)

1. shared resource file anti-cheating method, it is characterised in that the described method comprises the following steps:
S1:It is PDF format file by storage file translations, and the PDF format file after conversion is uploaded in Resources Reserve storehouse;
S2:Luncene obtains the routing information in Resources Reserve storehouse by database, and passage path information is in Resources Reserve storehouse Resource file is obtained, Luncene is loaded and structure document object, storage resources file is segmented, and create index file;
S3:Randomly select new shared resource file content fragment, segments N>=3 sampling, load shared resource file, obtain Shared resource file character total length T, contents fragment step-length S=10, structure set of random numbers C=character total length T- step-lengths S;
S4:If C<=0, shared file all the elements sampled segment content the most;If C>0, it is limited with set of random numbers C, is generated Random number K, K is obtained to the contents fragment between K+S, and repeat S3 steps, stop sampling when contents fragment quantity is equal to N;
S5:By the use of n times sampling contents fragment as search key the temporary retrieval result of n times retrieval is done in search engine;
S6:N times retrieval result is analyzed, calculation document hits H, file in n times search occur in search result Once, hits H increases by 1;
S7;Storage similar resource listed files and quantity Fn, file hits H are obtained compared with content pieces hop count N, hit If rate R=H/N hit rates R>=60%, this document is storage similar resource file.
2. shared resource file anti-cheating method according to claim 1, it is characterised in that the storage in the step S1 File is PDF format file by converter transformation in planta.
3. shared resource file anti-cheating method according to claim 1, it is characterised in that the data in the step S2 Storehouse is MYSQL database.
4. shared resource file anti-cheating method according to claim 1, it is characterised in that the retrieval in the step S6 As a result it is listed files corresponding to contents fragment.
5. shared resource file anti-cheating method according to claim 1, it is characterised in that in the step S2 Luncene is the search engine of open source program, and full-text search can be realized in goal systems by Luncene.
6. shared resource file anti-cheating method according to claim 5, it is characterised in that the Luncene is to document Analyzed, segment foundation index.
CN201711070780.7A 2017-11-03 2017-11-03 Shared resource file anti-cheating method Active CN107885808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711070780.7A CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711070780.7A CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Publications (2)

Publication Number Publication Date
CN107885808A true CN107885808A (en) 2018-04-06
CN107885808B CN107885808B (en) 2021-03-30

Family

ID=61778734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711070780.7A Active CN107885808B (en) 2017-11-03 2017-11-03 Shared resource file anti-cheating method

Country Status (1)

Country Link
CN (1) CN107885808B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032954A (en) * 2018-08-16 2018-12-18 五八有限公司 A kind of user's choosing method, device, storage medium and the terminal of A/B test

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220635A1 (en) * 2014-01-31 2015-08-06 Nbcuniversal Media, Llc Fingerprint-defined segment-based content delivery
CN105095258A (en) * 2014-05-08 2015-11-25 腾讯科技(北京)有限公司 Media information sorting method and apparatus and media information recommendation system
CN106909609A (en) * 2017-01-09 2017-06-30 北方工业大学 Method for determining similar character strings, method and system for searching duplicate files

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150220635A1 (en) * 2014-01-31 2015-08-06 Nbcuniversal Media, Llc Fingerprint-defined segment-based content delivery
CN105095258A (en) * 2014-05-08 2015-11-25 腾讯科技(北京)有限公司 Media information sorting method and apparatus and media information recommendation system
CN106909609A (en) * 2017-01-09 2017-06-30 北方工业大学 Method for determining similar character strings, method and system for searching duplicate files

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡艳: "高校实验教学中基于文件的防作弊技术", 《科教文汇》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032954A (en) * 2018-08-16 2018-12-18 五八有限公司 A kind of user's choosing method, device, storage medium and the terminal of A/B test
CN109032954B (en) * 2018-08-16 2022-04-05 五八有限公司 User selection method and device for A/B test, storage medium and terminal

Also Published As

Publication number Publication date
CN107885808B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN103744981B (en) System for automatic classification analysis for website based on website content
CN107566376A (en) One kind threatens information generation method, apparatus and system
CN102332025A (en) Intelligent vertical search method and system
CN103678436B (en) Information processing system and information processing method
CN106951422A (en) The method and apparatus of webpage training, the method and apparatus of search intention identification
CN107291780A (en) A kind of user comment information methods of exhibiting and device
CN106844530A (en) Training method and device of a kind of question and answer to disaggregated model
CN106845265A (en) A kind of document security level automatic identifying method
CN110222328B (en) Method, device and equipment for labeling participles and parts of speech based on neural network and storage medium
CN108763529A (en) A kind of intelligent search method, device and computer readable storage medium
CN106940726A (en) The intention automatic generation method and terminal of a kind of knowledge based network
Zou et al. LDA-TM: A two-step approach to Twitter topic data clustering
CN107122478A (en) A kind of method based on keyword extraction much-talked-about topic
CN103309857B (en) A kind of taxonomy determines method and apparatus
CN110532450A (en) A kind of Theme Crawler of Content method based on improvement shark search
CN104537028A (en) Webpage information processing method and device
Galkin et al. An open challenge for inductive link prediction on knowledge graphs
CN107885808A (en) Shared resource file anti-cheating method
CN109344400A (en) A kind of judgment method and device of document storage
Xu et al. Generating risk maps for evolution analysis of societal risk events
Meel et al. A contemporary survey of machine learning techniques for fake news identification
Sekiya et al. Investigation on university websites for semi-automated syllabus crawling
CN104063514A (en) Vertical search method
Caruana et al. An Analysis of the Relationship between Words within the Voynich Manuscript.
CN111858918A (en) News classification method and device, network element and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant