CN105590068A - File fingerprint check method and device - Google Patents

File fingerprint check method and device Download PDF

Info

Publication number
CN105590068A
CN105590068A CN201510997054.4A CN201510997054A CN105590068A CN 105590068 A CN105590068 A CN 105590068A CN 201510997054 A CN201510997054 A CN 201510997054A CN 105590068 A CN105590068 A CN 105590068A
Authority
CN
China
Prior art keywords
fingerprint
file
server
piecemeal
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510997054.4A
Other languages
Chinese (zh)
Inventor
朱细智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Beijing Qianxin Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Beijing Qianxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Beijing Qianxin Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510997054.4A priority Critical patent/CN105590068A/en
Publication of CN105590068A publication Critical patent/CN105590068A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Abstract

The invention discloses a file fingerprint check method and device, relating to the technical field of information. By means of the file fingerprint check method and device, the file fingerprint check efficiency can be increased. The method comprises the following steps: obtaining load state information corresponding to each fingerprint server in a current distributed network at first while receiving a file to be checked; then, determining the fingerprint server, the load state information of which accords with a preset condition; and finally, distributing the file to be checked to the fingerprint server, the load state information of which accords with a preset condition, such that file fingerprint check is conveniently carried out.

Description

File fingerprint method of calibration and device
Technical field
The present invention relates to a kind of areas of information technology, particularly relate to a kind of file fingerprint method of calibration andDevice.
Background technology
Along with the development of information technology, Network Information Security Problem more and more obtains people's attention,Download the document files data that need in users from networks in, also need to guarantee the document fileData were not revised by people, for example, had added wooden horse, virus, unofficial plug-in unit etc., orIn download, do not have destroyed.
At present, can confirm by file fingerprint calibration technology, specifically can be in fingerprint serverIn carry out corresponding operating, but, in the time needing the file data amount of verification more, refer at synchronizationLine server need to be processed a large amount of file datas, can increase the load pressure of fingerprint server, and thenAffect the efficiency of file fingerprint verification.
Summary of the invention
In view of this, the invention provides a kind of file fingerprint method of calibration and device, main purpose existsIn the efficiency that can improve file fingerprint verification.
According to one aspect of the invention, a kind of file fingerprint method of calibration is provided, the method comprises:
In the time receiving file to be verified, obtain each fingerprint server correspondence in current distributed networkLoad state information;
Determine that described load state information meets the fingerprint server of prerequisite;
Described file allocation to be verified is met to the fingerprint service of prerequisite to described load state informationDevice, to carry out file fingerprint verification.
According to another aspect of the present invention, a kind of file fingerprint calibration equipment is provided, this device comprises:
Acquiring unit, in the time receiving file to be verified, obtains in current distributed network eachThe load state information that fingerprint server is corresponding;
Determining unit, for determining that the load state information that described acquiring unit is obtained meets prerequisiteFingerprint server;
Allocation units, for meeting preset by described file allocation to be verified to described load state informationThe fingerprint server of condition, to carry out file fingerprint verification.
By technique scheme, the technical scheme that the embodiment of the present invention provides at least has following advantages:
A kind of file fingerprint method of calibration provided by the invention and device, in the time receiving file to be verified,First obtain load state information corresponding to each fingerprint server in current distributed network; Then trueFixed described load state information meets the fingerprint server of prerequisite; Finally by described file to be verifiedBe assigned to the fingerprint server that described load state information meets prerequisite, to carry out file fingerprintVerification. Compared with prior art, the present invention passes through file allocation to be verified to working as in distributed networkFront load state information meets the fingerprint server of prerequisite and carries out file fingerprint verification, can realizeAccording to load balancing principle, fingerprint server is rationally utilized, improved the effect of file fingerprint verificationRate, and alleviated the single fingerprint server of excessive use and the load pressure that causes.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, of the present invention in order to better understandTechnological means, and can being implemented according to the content of description, and for allow of the present invention above-mentioned andOther objects, features and advantages can become apparent, below especially exemplified by the specific embodiment of the present invention.
Brief description of the drawings
By reading below detailed description of the preferred embodiment, various other advantage and benefit forIt is cheer and bright that those of ordinary skill in the art will become. Accompanying drawing is only for illustrating the order of preferred embodiment, and do not think limitation of the present invention. And in whole accompanying drawing, with identical reference symbolNumber represent identical parts. In the accompanying drawings:
Fig. 1 shows a kind of file fingerprint method of calibration schematic flow sheet that the embodiment of the present invention provides;
Fig. 2 shows the another kind of file fingerprint method of calibration schematic flow sheet that the embodiment of the present invention provides;
Fig. 3 shows a kind of file fingerprint calibration equipment structural representation that the embodiment of the present invention provides;
Fig. 4 shows the another kind of file fingerprint calibration equipment structural representation that the embodiment of the present invention provides.
Detailed description of the invention
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail. Although aobvious in accompanying drawingShow exemplary embodiment of the present disclosure, but should be appreciated that and can realize the disclosure with various formsAnd the embodiment that should do not set forth here limits. On the contrary, providing these embodiment is for can be moreThoroughly understand the disclosure, and can be by the technology that conveys to this area complete the scope of the present disclosurePersonnel.
The embodiment of the present invention provides a kind of file fingerprint method of calibration, as shown in Figure 1, and described methodComprise:
101,, in the time receiving file to be verified, obtain each fingerprint server in current distributed networkCorresponding load state information.
Wherein, in described load state information, can comprise load value corresponding to fingerprint server, also canTo comprise load condition corresponding to fingerprint server etc.
102, determine that load state information meets the fingerprint server of prerequisite.
Wherein, described prerequisite can be configured according to the actual requirements, also can be by system defaultBe configured, the embodiment of the present invention does not limit.
For example, comprise load value corresponding to fingerprint server in load state information, prerequisite canFor being configured to load value minimum, when receiving each fingerprint server correspondence in current distributed networkLoad state information time, therefrom determine the fingerprint server of a load value minimum.
For example, comprise load condition corresponding to fingerprint server in load state information, prerequisite canTo be configured to load condition as undercapacity state, when receiving each fingerprint clothes in current distributed networkWhen load state information corresponding to business device, therefrom determine a fingerprint that load condition is undercapacity stateServer.
103, file allocation to be verified is met to the fingerprint server of prerequisite to load state information.
Further, so that carry out file fingerprint verification.
A kind of file fingerprint method of calibration that the embodiment of the present invention provides, in the time receiving file to be verified,First obtain load state information corresponding to each fingerprint server in current distributed network; Then trueFixed described load state information meets the fingerprint server of prerequisite; Finally by described file to be verifiedBe assigned to the fingerprint server that described load state information meets prerequisite, to carry out file fingerprintVerification. Compared with prior art, the present invention passes through file allocation to be verified to working as in distributed networkFront load state information meets the fingerprint server of prerequisite and carries out file fingerprint verification, can realizeAccording to load balancing principle, fingerprint server is rationally utilized, improved the effect of file fingerprint verificationRate, and alleviated the single fingerprint server of excessive use and the load pressure that causes.
The embodiment of the present invention provides another kind of file fingerprint method of calibration, as shown in Figure 2, and described sideMethod comprises:
201,, in the time receiving file to be verified, obtain each fingerprint server in current distributed networkCorresponding load state information.
Wherein, in described load state information, can comprise load value corresponding to fingerprint server, also canTo comprise load condition corresponding to fingerprint server.
For the embodiment of the present invention, before described step 201, comprise: in distributed network, configuration is manyIndividual fingerprint server. Particularly, can configure according to the actual requirements the fingerprint server of respective numbers.
202, determine that load state information meets the fingerprint server of prerequisite.
For inventive embodiments, described step 202 specifically can comprise: determine current distributed networkDescribed in the fingerprint server of load value minimum.
For inventive embodiments, described step 202 specifically can also comprise: determine current distributed networkLoad condition described in network is the fingerprint server of undercapacity state.
203, file allocation to be verified is met to the fingerprint server of prerequisite to load state information.
For the embodiment of the present invention, by file allocation to be verified is met preset to load state informationThe fingerprint server of condition, can realize according to load balancing principle fingerprint server is carried out to reasonable profitWith, improve the efficiency of file fingerprint verification.
204, extract the document content information in file to be verified and carry out pretreatment.
Wherein, in described document content information, comprise the file content in file to be verified.
For the embodiment of the present invention, can be by the mode of extensive processing, i.e. the mode of templating processing,Extract the document content information in file to be verified, the document content information of extraction is default UnicodeCoded format. Particularly, can utilize default document content to extract function, extract in file to be verifiedFile content, wherein, default document content extracts function and can join by user's actual demandPut; Recycling default characters encoding detection function, the coded format of the file content that identification is extracted,Then utilize default code conversion storehouse, the coded format of the file content of extraction is converted to defaultUnicode coded format, and then complete the document content information extracting in file to be verified.
For the embodiment of the present invention, the document content information extracting is carried out, pretreated process is passableComprise: in removing punctuation mark in file content, removing and be not intended to word and file content is carried outLiterary composition word segmentation processing etc.
205, extract the finger print information in pretreated document content information.
Wherein, described finger print information refers to feature that can certain file of unique identification, each fileThere is unique finger print information in capital. So-called fingerprint, is mapped to a number by file content in fact exactlyWord, the numeral that each section of different content shines upon can be not identical, the same like people's fingerprint. Literary compositionPart fingerprint is in protected data integrality, and all many-sides such as identification have very important effect.
For the embodiment of the present invention, described step 205 specifically comprises: by preset Karp-Rabin letterNumber, extracts the finger print information in pretreated document content information. Wherein, described preset Karp-RabinFunction is the function of writing by Karp-Rabin algorithm.
Particularly, described by preset Karp-Rabin function, extract pretreated file content letterFinger print information in breath comprises: by preset K-gram function, described pretreated file content is believedBreath is divided, and obtains multiple fingerprint piecemeals; Calculate cryptographic Hash corresponding to each fingerprint piecemeal, asThe finger print information extracting.
Particularly, cryptographic Hash corresponding to the described each fingerprint piecemeal of described calculating, as the fingerprint extractingInformation comprises: according to the sequencing of each fingerprint piecemeal position in document content information, calculate instituteState cryptographic Hash corresponding to each fingerprint piecemeal; When needs calculate the fingerprint in all the other positions except first placeWhen the cryptographic Hash of piecemeal, obtain cryptographic Hash corresponding to previous fingerprint piecemeal; Calculate described previous fingerCryptographic Hash corresponding to first character in line piecemeal, and the described finger in all the other positions except first placeCryptographic Hash corresponding to last character in line piecemeal; Calculate Kazakhstan corresponding to described previous fingerprint piecemealThe uncommon value cryptographic Hash corresponding with first character in described previous fingerprint piecemeal poor, then with described placeThe cryptographic Hash summation that last character is corresponding in the fingerprint piecemeal of all the other positions except first place, obtainsDescribed in cryptographic Hash corresponding to the fingerprint piecemeal of all the other positions except first place.
For example, to one section be the file content of " mobile phone quality is fine ", a default point of block size k is 5,Be divided into two piecemeals, be respectively h1, h2, respectively corresponding " mobile phone quality very ", " machine quality is fine "Two contents fragments. Calculate cryptographic Hash corresponding to h1, specifically by formula h 1=A × 2^4+B × 2^3+C × 2^2+D × 2^1+E × 2^0 calculates the cryptographic Hash that h1 is corresponding, wherein, A to E be " hand ",The numeral of " machine ", " matter ", " amount ", " very " correspondence in Unicode. H2 is being calculated to correspondenceCryptographic Hash time, calculate the poor of the cryptographic Hash of the h1 cryptographic Hash corresponding with " hand " character, then with " good "The cryptographic Hash summation that character is corresponding, finally obtains the cryptographic Hash that h2 is corresponding. Respectively by h1, h2 finallyCorresponding cryptographic Hash is as the finger print information extracting.
Whether the similarity 206, detecting between the finger print information in finger print information and preset fingerprint base is greater thanOr equal predetermined threshold value.
Wherein, in described preset fingerprint base, preserve the finger print information similar for document of identify content.Described predetermined threshold value can be configured according to the actual requirements. For example, can be configured to 70%, 90%Deng.
If 207 similarities are more than or equal to predetermined threshold value, determine that file to be verified is the similar literary composition of contentPart.
For example, predetermined threshold value is 75%, and the document content information in extraction document A also carries out pretreatment,Then extract the finger print information in pretreated document content information, when detect this finger print information withWhen similarity between finger print information in preset fingerprint base is 80%, determine that file A is that content is similarFile; Similarity between the finger print information detecting in this finger print information and preset fingerprint base is40% time, determine that file A is not content similar documents.
The another kind of file fingerprint method of calibration that the embodiment of the present invention provides, when receiving file to be verifiedTime, first obtain load state information corresponding to each fingerprint server in current distributed network; SoRear definite described load state information meets the fingerprint server of prerequisite; Finally by described to be verifiedFile allocation meets the fingerprint server of prerequisite to described load state information, to carry out fileFinger-mark check. Compared with prior art, the present invention passes through file allocation to be verified to distributed networkMiddle present load status information meets the fingerprint server of prerequisite and carries out file fingerprint verification, canRealize and according to load balancing principle, fingerprint server rationally being utilized, improved file fingerprint verificationEfficiency, and alleviated the single fingerprint server of excessive use and the load pressure that causes.
Further, as the specific implementation of method described in Fig. 1, the embodiment of the present invention provides oneFile fingerprint calibration equipment, as shown in Figure 3, described device comprises: acquiring unit 31, determining unit32, allocation units 33.
Described acquiring unit 31, can be in the time receiving file to be verified, obtains current distributedLoad state information corresponding to each fingerprint server in network.
Described determining unit 32, can be for the load state information of determining that described acquiring unit 31 is obtainedMeet the fingerprint server of prerequisite.
Described allocation units 33, can be for believing described file allocation to be verified to described load conditionBreath meets the fingerprint server of prerequisite.
Further, so that carry out file fingerprint verification.
It should be noted that, a kind of file fingerprint calibration equipment that the embodiment of the present invention provides is related eachOther corresponding descriptions of functional unit, can describe with reference to the correspondence in figure 1, do not repeat them here.
A kind of file fingerprint calibration equipment that the embodiment of the present invention provides, in the time receiving file to be verified,First obtain load state information corresponding to each fingerprint server in current distributed network; Then trueFixed described load state information meets the fingerprint server of prerequisite; Finally by described file to be verifiedBe assigned to the fingerprint server that described load state information meets prerequisite, to carry out file fingerprintVerification. Compared with prior art, the present invention passes through file allocation to be verified to working as in distributed networkFront load state information meets the fingerprint server of prerequisite and carries out file fingerprint verification, can realizeAccording to load balancing principle, fingerprint server is rationally utilized, improved the effect of file fingerprint verificationRate, and alleviated the single fingerprint server of excessive use and the load pressure that causes.
Further, as the specific implementation of method described in Fig. 2, the embodiment of the present invention provides anotherPlant file fingerprint calibration equipment, as shown in Figure 4, described device comprises: acquiring unit 41, definite singleUnit 42, allocation units 43.
Described acquiring unit 41, can be in the time receiving file to be verified, obtains current distributedLoad state information corresponding to each fingerprint server in network.
Described determining unit 42, can be for the load state information of determining that described acquiring unit 41 is obtainedMeet the fingerprint server of prerequisite.
Described allocation units 43, can be for believing described file allocation to be verified to described load conditionBreath meets the fingerprint server of prerequisite.
Further, so that carry out file fingerprint verification.
Alternatively, in described load state information, comprise load value corresponding to fingerprint server.
Described determining unit 42, specifically can be for determining that load value described in current distributed networkLittle fingerprint server.
Alternatively, in described load state information, comprise load condition corresponding to fingerprint server.
Described determining unit 42, specifically can be for determining load condition described in current distributed networkFor the fingerprint server of undercapacity state.
Further, described device also comprises: dispensing unit 44.
Described dispensing unit 44 can be for configuring multiple fingerprint server in distributed network.
Further, described device also comprises: extraction unit 45, pretreatment unit 46, detecting unit47。
Described extraction unit 45, can be for extracting the document content information in described file to be verified.
Described pretreatment unit 46, can be in the file to be verified that described extraction unit 45 is extractedDocument content information carry out pretreatment.
Described extraction unit 45, can also be used for extracting the fingerprint of pretreated document content informationInformation.
Described detecting unit 47, the finger print information that can extract for detection of described extraction unit 45 is with pre-Whether the similarity of putting between the finger print information in fingerprint base is more than or equal to predetermined threshold value.
Described determining unit 42, if can also be used for described detecting unit 47, to detect described similarity largeIn or equal described predetermined threshold value, determine that described file to be verified is content similar documents.
Described extraction unit 45, specifically can, for by preset Karp-Rabin function, extract and locate in advanceFinger print information in document content information after reason.
Further, described extraction unit 45 comprises: divide module 451, computing module 452.
Described division module 451, can be for passing through preset K-gram function by described pretreatedDocument content information is divided, and obtains multiple fingerprint piecemeals.
Described computing module 452, can be for calculating cryptographic Hash corresponding to each fingerprint piecemeal, as carryingThe finger print information of getting.
Described computing module 452, specifically can for according to each fingerprint piecemeal in document content informationThe sequencing of position, calculates cryptographic Hash corresponding to described each fingerprint piecemeal.
Described computing module 452, specifically can also be used for calculating in all the other positions except first place when needsThe cryptographic Hash of fingerprint piecemeal time, obtain cryptographic Hash corresponding to previous fingerprint piecemeal.
Described computing module 452, specifically can also be used for calculating described previous fingerprint piecemeal firstThe cryptographic Hash that character is corresponding, and in the described fingerprint piecemeal in all the other positions except first place lastThe cryptographic Hash that individual character is corresponding.
Described computing module 452, specifically can also be used for calculating Kazakhstan corresponding to described previous fingerprint piecemealThe uncommon value cryptographic Hash corresponding with first character in described previous fingerprint piecemeal poor, then with described placeThe cryptographic Hash summation that last character is corresponding in the fingerprint piecemeal of all the other positions except first place, obtainsDescribed in cryptographic Hash corresponding to the fingerprint piecemeal of all the other positions except first place.
It should be noted that, the another kind of file fingerprint calibration equipment that the embodiment of the present invention provides is relatedOther corresponding descriptions of each functional unit, can describe with reference to the correspondence in figure 2, do not repeat them here.
The another kind of file fingerprint calibration equipment that the embodiment of the present invention provides, when receiving file to be verifiedTime, first obtain load state information corresponding to each fingerprint server in current distributed network; SoRear definite described load state information meets the fingerprint server of prerequisite; Finally by described to be verifiedFile allocation meets the fingerprint server of prerequisite to described load state information, to carry out fileFinger-mark check. Compared with prior art, the present invention passes through file allocation to be verified to distributed networkMiddle present load status information meets the fingerprint server of prerequisite and carries out file fingerprint verification, canRealize and according to load balancing principle, fingerprint server rationally being utilized, improved file fingerprint verificationEfficiency, and alleviated the single fingerprint server of excessive use and the load pressure that causes.
Embodiments of the invention disclose:
A1, a kind of file fingerprint method of calibration, is characterized in that, comprising:
In the time receiving file to be verified, obtain each fingerprint server correspondence in current distributed networkLoad state information;
Determine that described load state information meets the fingerprint server of prerequisite;
Described file allocation to be verified is met to the fingerprint service of prerequisite to described load state informationDevice, to carry out file fingerprint verification.
A2, according to the file fingerprint method of calibration described in A1, it is characterized in that described load conditionIn information, comprise load value corresponding to fingerprint server, described definite described load state information meets in advanceThe fingerprint server of putting condition comprises:
Determine the fingerprint server of load value minimum described in current distributed network.
A3, according to the file fingerprint method of calibration described in A1, it is characterized in that described load conditionIn information, comprise load condition corresponding to fingerprint server, described definite described load state information meetsThe fingerprint server of prerequisite comprises:
Determine that load condition described in current distributed network is the fingerprint server of undercapacity state.
A4, according to the file fingerprint method of calibration described in A1, it is characterized in that, described when receivingWhen file to be verified, obtain load condition letter corresponding to each fingerprint server in current distributed networkBefore breath, described method also comprises:
In distributed network, configure multiple fingerprint server.
A5, according to the file fingerprint method of calibration described in A1, it is characterized in that, described will described in treatAfter verification file is assigned to the fingerprint server that described load state information meets prerequisite, described inMethod also comprises:
Extract the document content information in described file to be verified and carry out pretreatment;
Extract the finger print information in pretreated document content information;
Whether the similarity detecting between the finger print information in described finger print information and preset fingerprint base is greater thanOr equal predetermined threshold value;
If described similarity is more than or equal to described predetermined threshold value, determine that described file to be verified is inHold similar documents.
A6, according to the file fingerprint method of calibration described in A5, it is characterized in that, described extraction is located in advanceFinger print information in document content information after reason comprises:
By preset Karp-Rabin function, extract the fingerprint letter in pretreated document content informationBreath.
A7, according to the file fingerprint method of calibration described in A6, it is characterized in that, described by presetKarp-Rabin function, the finger print information extracting in pretreated document content information comprises:
By preset K-gram function, described pretreated document content information is divided, obtainedMultiple fingerprint piecemeals;
Calculate cryptographic Hash corresponding to each fingerprint piecemeal, as the finger print information extracting.
A8, according to the file fingerprint method of calibration described in A7, it is characterized in that, described in described calculatingCryptographic Hash corresponding to each fingerprint piecemeal, comprises as the finger print information extracting:
According to the sequencing of each fingerprint piecemeal position in document content information, calculate described eachThe cryptographic Hash that fingerprint piecemeal is corresponding;
In the time that needs calculate the cryptographic Hash of the fingerprint piecemeal in all the other positions except first place, obtain lastCryptographic Hash corresponding to individual fingerprint piecemeal;
Calculate cryptographic Hash corresponding to first character in described previous fingerprint piecemeal, and described inCryptographic Hash corresponding to last character in the fingerprint piecemeal of all the other positions except first place;
Calculate in cryptographic Hash that described previous fingerprint piecemeal is corresponding and described previous fingerprint piecemeal firstThe cryptographic Hash that individual character is corresponding poor, then with the described fingerprint piecemeal in all the other positions except first place inThe cryptographic Hash summation that last character is corresponding, obtains the described fingerprint in all the other positions except first placeThe cryptographic Hash that piecemeal is corresponding.
B9, a kind of file fingerprint calibration equipment, is characterized in that, comprising:
Acquiring unit, in the time receiving file to be verified, obtains in current distributed network eachThe load state information that fingerprint server is corresponding;
Determining unit, for determining that the load state information that described acquiring unit is obtained meets prerequisiteFingerprint server;
Allocation units, for meeting preset by described file allocation to be verified to described load state informationThe fingerprint server of condition, to carry out file fingerprint verification.
B10, according to the file fingerprint calibration equipment described in B9, it is characterized in that described load conditionIn information, comprise load value corresponding to fingerprint server,
Described determining unit, specifically for determining the finger of load value minimum described in current distributed networkLine server.
B11, according to the file fingerprint calibration equipment described in B9, it is characterized in that described load conditionIn information, comprise load condition corresponding to fingerprint server,
Described determining unit, specifically for determining that described in current distributed network, load condition is non-fullThe fingerprint server of the state of carrying.
B12, according to the file fingerprint calibration equipment described in B9, it is characterized in that, described device also wrapsDraw together:
Dispensing unit, for configuring multiple fingerprint server at distributed network.
B13, according to the file fingerprint calibration equipment described in B9, it is characterized in that, described device also wrapsDraw together:
Extraction unit, for extracting the document content information of described file to be verified;
Pretreatment unit, for the file content letter of the file to be verified to described extraction unit extractionBreath carries out pretreatment;
Described extraction unit, also for extracting the finger print information of pretreated document content information;
Detecting unit, in the finger print information extracting for detection of described extraction unit and preset fingerprint baseWhether the similarity between finger print information is more than or equal to predetermined threshold value;
Described determining unit, is more than or equal to if also detect described similarity for described detecting unitDescribed predetermined threshold value, determines that described file to be verified is content similar documents.
B14, according to the file fingerprint calibration equipment described in B13, it is characterized in that,
Described extraction unit, specifically for by preset Karp-Rabin function, extracts pretreatedFinger print information in document content information.
B15, according to the file fingerprint calibration equipment described in B14, it is characterized in that described extraction unitComprise:
Divide module, for passing through preset K-gram function by described pretreated document content informationDivide, obtain multiple fingerprint piecemeals;
Computing module, for calculating cryptographic Hash corresponding to each fingerprint piecemeal, as the fingerprint letter extractingBreath.
B16, according to the file fingerprint method of calibration described in B15, it is characterized in that,
Described computing module, specifically for according to each fingerprint piecemeal position in document content informationSequencing, calculates cryptographic Hash corresponding to described each fingerprint piecemeal;
Described computing module, concrete also for the fingerprint when needing calculating in all the other positions except first placeWhen the cryptographic Hash of piecemeal, obtain cryptographic Hash corresponding to previous fingerprint piecemeal;
Described computing module, specifically also for calculating described previous fingerprint piecemeal first character pairThe cryptographic Hash of answering, and last character in the described fingerprint piecemeal in all the other positions except first placeCorresponding cryptographic Hash;
Described computing module, concrete also for calculate cryptographic Hash that described previous fingerprint piecemeal is corresponding withThe cryptographic Hash that in described previous fingerprint piecemeal, first character is corresponding poor, then with described in except firstThe cryptographic Hash summation that in the fingerprint piecemeal of outer all the other positions, position, last character is corresponding, obtains described placeIn cryptographic Hash corresponding to the fingerprint piecemeal of all the other positions except first place.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, do not haveThere is the part of detailed description, can be referring to the associated description of other embodiment.
Be understandable that the correlated characteristic reference mutually in said method and device. In addition," first ", " second " etc. in above-described embodiment are for distinguishing each embodiment, and do not represent each enforcementThe quality of example.
Those skilled in the art can be well understood to, for convenience and simplicity of description, above-mentionedThe system of describing, device and the specific works process of unit, can be with reference in preceding method embodimentCorresponding process, does not repeat them here.
The algorithm providing at this and show not with any certain computer, virtual system or miscellaneous equipmentIntrinsic relevant. Various general-purpose systems also can with based on using together with this teaching. According to aboveDescribe, it is apparent constructing the desired structure of this type systematic. In addition, the present invention also not forAny certain programmed language. It should be understood that and can utilize various programming languages to realize described hereThe content of invention, and the description of above language-specific being done is of the present invention best real in order to discloseExecute mode.
In the description that provided herein, a large amount of details are described. But, can understand,Embodiments of the invention can be put into practice in the situation that there is no these details. In some instances,Be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, should be appreciated that in order to simplify the disclosure and to help to understand one in each inventive aspectIndividual or multiple, in the above in the description of exemplary embodiment of the present invention, each feature of the present inventionSometimes be grouped together into single embodiment, figure or in its description. But, should be byThe method of the disclosure is construed to the following intention of reflection: the present invention for required protection requires ratio eachThe more feature of feature of clearly recording in claim. Or rather, as right is below wantedAsk that book reflects like that, inventive aspect is to be less than all spies of disclosed single embodiment aboveLevy. Therefore claims of, following detailed description of the invention are incorporated to this specific embodiment party thus clearlyFormula, wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can carry out the module in the equipment in embodimentAdaptively change and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, andIn addition can put them into multiple submodules or subelement or sub-component. Except such feature and/orAt least some in process or unit are, outside mutually repelling, can adopt any combination to illustrate thisDisclosed all features and so open in book (comprising claim, summary and the accompanying drawing followed)Any method or all processes or the unit of equipment combine. Unless clearly statement in addition, thisIn description (comprising claim, summary and the accompanying drawing followed), disclosed each feature can be by carryingFor identical, be equal to or the alternative features of similar object replaces.
In addition, although those skilled in the art will appreciate that embodiment more described herein compriseIncluded some feature instead of further feature in other embodiment, but the feature of different embodimentCombination mean within scope of the present invention and form different embodiment. For example, underIn claims of face, the one of any of embodiment required for protection can be to combine arbitrarilyMode is used.
All parts embodiment of the present invention can realize with hardware, or with at one or more placeThe software module of moving on reason device realizes, or realizes with their combination. Those skilled in the artShould be appreciated that and can use in practice microprocessor or digital signal processor (DSP) to realizeSome according to a kind of file fingerprint method of calibration of the embodiment of the present invention and in installing or completeThe some or all functions of portion's parts. The present invention can also be embodied as for carrying out as described hereinThe equipment of part or all of method or device program (for example, computer program and computerProgram product). Realizing program of the present invention and can be stored on computer-readable medium like this, orCan there is the form of one or more signal. Such signal can be downloaded from internet websiteObtain, or provide on carrier signal, or provide with any other form.
It should be noted above-described embodiment the present invention will be described instead of the present invention is limitSystem, and those skilled in the art can design in the case of not departing from the scope of claimsAlternative embodiment. In the claims, any reference symbol between bracket should be configured toLimitations on claims. Word " comprises " not to be got rid of existence and is not listed as element or step in the claimsSuddenly. Being positioned at word " " before element or " one " does not get rid of and has multiple such elements. The present inventionCan be by means of including the hardware of some different elements and coming real by means of the computer of suitably programmingExisting. In the unit claim of having enumerated some devices, several in these devices can be logicalCross same hardware branch and carry out imbody. The use of word first, second and C grade does not representAny order. Can be title by these word explanations.

Claims (10)

1. a file fingerprint method of calibration, is characterized in that, comprising:
In the time receiving file to be verified, obtain each fingerprint server correspondence in current distributed networkLoad state information;
Determine that described load state information meets the fingerprint server of prerequisite;
Described file allocation to be verified is met to the fingerprint service of prerequisite to described load state informationDevice, to carry out file fingerprint verification.
2. file fingerprint method of calibration according to claim 1, is characterized in that, described loadIn status information, comprise load value corresponding to fingerprint server, described definite described load state information symbolThe fingerprint server of closing prerequisite comprises:
Determine the fingerprint server of load value minimum described in current distributed network.
3. file fingerprint method of calibration according to claim 1, is characterized in that, described loadIn status information, comprise load condition corresponding to fingerprint server, described definite described load state informationThe fingerprint server that meets prerequisite comprises:
Determine that load condition described in current distributed network is the fingerprint server of undercapacity state.
4. file fingerprint method of calibration according to claim 1, is characterized in that, described when connecingWhile receiving file to be verified, obtain load shape corresponding to each fingerprint server in current distributed networkBefore state information, described method also comprises:
In distributed network, configure multiple fingerprint server.
5. file fingerprint method of calibration according to claim 1, is characterized in that, described by instituteState file allocation to be verified after described load state information meets the fingerprint server of prerequisite,Described method also comprises:
Extract the document content information in described file to be verified and carry out pretreatment;
Extract the finger print information in pretreated document content information;
Whether the similarity detecting between the finger print information in described finger print information and preset fingerprint base is greater thanOr equal predetermined threshold value;
If described similarity is more than or equal to described predetermined threshold value, determine that described file to be verified is inHold similar documents.
6. file fingerprint method of calibration according to claim 5, is characterized in that, described extractionFinger print information in pretreated document content information comprises:
By preset Karp-Rabin function, extract the fingerprint letter in pretreated document content informationBreath.
7. file fingerprint method of calibration according to claim 6, is characterized in that, described in pass throughPreset Karp-Rabin function, the finger print information extracting in pretreated document content information comprises:
By preset K-gram function, described pretreated document content information is divided, obtainedMultiple fingerprint piecemeals;
Calculate cryptographic Hash corresponding to each fingerprint piecemeal, as the finger print information extracting.
8. file fingerprint method of calibration according to claim 7, is characterized in that, described calculatingCryptographic Hash corresponding to described each fingerprint piecemeal, comprises as the finger print information extracting:
According to the sequencing of each fingerprint piecemeal position in document content information, calculate described eachThe cryptographic Hash that fingerprint piecemeal is corresponding;
In the time that needs calculate the cryptographic Hash of the fingerprint piecemeal in all the other positions except first place, obtain lastCryptographic Hash corresponding to individual fingerprint piecemeal;
Calculate cryptographic Hash corresponding to first character in described previous fingerprint piecemeal, and described inCryptographic Hash corresponding to last character in the fingerprint piecemeal of all the other positions except first place;
Calculate in cryptographic Hash that described previous fingerprint piecemeal is corresponding and described previous fingerprint piecemeal firstThe cryptographic Hash that individual character is corresponding poor, then with the described fingerprint piecemeal in all the other positions except first place inThe cryptographic Hash summation that last character is corresponding, obtains the described fingerprint in all the other positions except first placeThe cryptographic Hash that piecemeal is corresponding.
9. a file fingerprint calibration equipment, is characterized in that, comprising:
Acquiring unit, in the time receiving file to be verified, obtains in current distributed network eachThe load state information that fingerprint server is corresponding;
Determining unit, for determining that the load state information that described acquiring unit is obtained meets prerequisiteFingerprint server;
Allocation units, for meeting preset by described file allocation to be verified to described load state informationThe fingerprint server of condition, to carry out file fingerprint verification.
10. file fingerprint calibration equipment according to claim 9, is characterized in that, described negativeCarry in status information and comprise load value corresponding to fingerprint server,
Described determining unit, specifically for determining the finger of load value minimum described in current distributed networkLine server.
CN201510997054.4A 2015-12-25 2015-12-25 File fingerprint check method and device Pending CN105590068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510997054.4A CN105590068A (en) 2015-12-25 2015-12-25 File fingerprint check method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510997054.4A CN105590068A (en) 2015-12-25 2015-12-25 File fingerprint check method and device

Publications (1)

Publication Number Publication Date
CN105590068A true CN105590068A (en) 2016-05-18

Family

ID=55929641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510997054.4A Pending CN105590068A (en) 2015-12-25 2015-12-25 File fingerprint check method and device

Country Status (1)

Country Link
CN (1) CN105590068A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102195950A (en) * 2010-03-16 2011-09-21 无锡指网生物识别科技有限公司 Fingerprint identification platform based on cloud computation
CN102542964A (en) * 2011-12-27 2012-07-04 深圳创维数字技术股份有限公司 Method and device for displaying characters on vacuum fluorescent display
CN103020174A (en) * 2012-11-28 2013-04-03 华为技术有限公司 Similarity analysis method, device and system
CN103634308A (en) * 2013-11-19 2014-03-12 北京奇虎科技有限公司 Safety detection method and device for instant messaging tool
CN103839246A (en) * 2012-11-20 2014-06-04 腾讯科技(深圳)有限公司 Method and apparatus for obtaining image contour line
CN104462157A (en) * 2013-09-24 2015-03-25 北大方正集团有限公司 Method and device for secondary structuralizing of text data
CN104699833A (en) * 2015-03-31 2015-06-10 北京奇艺世纪科技有限公司 Picture presentation method, picture storage method, picture presentation device and picture storage device
CN105094130A (en) * 2015-07-29 2015-11-25 广东省自动化研究所 AGV (Automatic Guided Vehicle) navigation method and device constructed by laser guidance map
CN105138918A (en) * 2015-09-01 2015-12-09 百度在线网络技术(北京)有限公司 Recognition method and device for secure file

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102195950A (en) * 2010-03-16 2011-09-21 无锡指网生物识别科技有限公司 Fingerprint identification platform based on cloud computation
CN102542964A (en) * 2011-12-27 2012-07-04 深圳创维数字技术股份有限公司 Method and device for displaying characters on vacuum fluorescent display
CN103839246A (en) * 2012-11-20 2014-06-04 腾讯科技(深圳)有限公司 Method and apparatus for obtaining image contour line
CN103020174A (en) * 2012-11-28 2013-04-03 华为技术有限公司 Similarity analysis method, device and system
CN104462157A (en) * 2013-09-24 2015-03-25 北大方正集团有限公司 Method and device for secondary structuralizing of text data
CN103634308A (en) * 2013-11-19 2014-03-12 北京奇虎科技有限公司 Safety detection method and device for instant messaging tool
CN104699833A (en) * 2015-03-31 2015-06-10 北京奇艺世纪科技有限公司 Picture presentation method, picture storage method, picture presentation device and picture storage device
CN105094130A (en) * 2015-07-29 2015-11-25 广东省自动化研究所 AGV (Automatic Guided Vehicle) navigation method and device constructed by laser guidance map
CN105138918A (en) * 2015-09-01 2015-12-09 百度在线网络技术(北京)有限公司 Recognition method and device for secure file

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
朱扬勇等: "序列数据相似性查询技术研究综述", 《计算机研究与发展》 *
李香云等: "Winnowing算法在作业剽窃检测中的应用", 《安徽科技学院学报》 *
李香云等: "基于JSP的《C语言》作业在线提交批改系统设计与实现", 《安徽科技学院学报》 *
王柠等: "基于指纹和推导模型的泄密信息检测方案", 《燕山大学学报》 *

Similar Documents

Publication Publication Date Title
CN105653984A (en) File fingerprint check method and apparatus
CN104424402B (en) It is a kind of for detecting the method and device of pirate application program
EP2693356B1 (en) Detecting pirated applications
CN106528508A (en) Repeated text judgment method and apparatus
CN103605690A (en) Device and method for recognizing advertising messages in instant messaging
CN106469144A (en) Text similarity computing method and device
US11048934B2 (en) Identifying augmented features based on a bayesian analysis of a text document
CN103685254A (en) Common account information safety detecting method and server
CN106095972B (en) Information classification method and device
CN104281842A (en) Face picture name identification method and device
CN105302626B (en) Analytic method of XPS (XPS) structured data
Kamsin et al. Program for developing the novel Quran and hadith authentication system
CN104021185A (en) Method and device for identifying information attributes of data in web pages
CN105590068A (en) File fingerprint check method and device
CN102855424A (en) Digital fingerprint extraction method and device and literary works identification method and device
US20200125532A1 (en) Fingerprints for open source code governance
CN110647832A (en) Method and device for acquiring information in certificate, electronic equipment and storage medium
CN105608205A (en) Fingerprint verification method and device for structural data
CN110866116A (en) Policy document processing method and device, storage medium and electronic equipment
US20200089882A1 (en) System and method for machine based detection of a malicious executable file
US20140177951A1 (en) Method, apparatus, and storage medium having computer executable instructions for processing of an electronic document
CN108399025A (en) A kind of method, apparatus and terminal device for correcting identification deviation
CN110647896A (en) Fishing page identification method based on logo image and related equipment
KR102103525B1 (en) CityGML file watermarking method, watermark extraction method and watermarking system using isomorphic characters
CN105243049A (en) Method for reading-in, analyzing and calculating mathematical function expression based on executable file of stack

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160518