CN104317823A - Method for carrying out data detection by utilizing data fingerprints - Google Patents

Method for carrying out data detection by utilizing data fingerprints Download PDF

Info

Publication number
CN104317823A
CN104317823A CN201410515557.9A CN201410515557A CN104317823A CN 104317823 A CN104317823 A CN 104317823A CN 201410515557 A CN201410515557 A CN 201410515557A CN 104317823 A CN104317823 A CN 104317823A
Authority
CN
China
Prior art keywords
data
cryptographic hash
fingerprint sequence
tested
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410515557.9A
Other languages
Chinese (zh)
Other versions
CN104317823B (en
Inventor
刘水
丁世杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING HOLYSTONE TECHNOLOGY CO., LTD.
Original Assignee
BEIJING HELI SITENG TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING HELI SITENG TECHNOLOGY CO LTD filed Critical BEIJING HELI SITENG TECHNOLOGY CO LTD
Priority to CN201410515557.9A priority Critical patent/CN104317823B/en
Publication of CN104317823A publication Critical patent/CN104317823A/en
Application granted granted Critical
Publication of CN104317823B publication Critical patent/CN104317823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Collating Specific Patterns (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to the field of data processing, in particular to a method for carrying out data detection by utilizing data fingerprints. The method for carrying out data detection by utilizing data fingerprints generates a fingerprint sequence from data to be detected according to a preset first algorithm, and uses a second algorithm to generate an influencing fingerprint sequence from standard data, and then by judging whether Hash values in the two sequences are matched by comparison, whether the fingerprint sequence to be detected meets preset requirement is determined; because both the first fingerprint sequence and the second fingerprint sequence which are used for comparison are Hash value sequences, even if the difference between the first fingerprint sequence and the second fingerprint sequence is small, the difference still can be visually reflected from the Hash value sequences, i.e., the generated Hash values are different, the degree of matching between the data to be detected and the standard data still can be detected even if the standard data are modified, and thereby the defect of the prior art is solved.

Description

A kind of method utilizing data fingerprint to carry out Data Detection
Technical field
The present invention relates to data processing field, in particular to a kind of method utilizing data fingerprint to carry out Data Detection.
Background technology
Along with the fast development of infotech, " large data (and big data; or claim flood tide data; refer to involved data quantity huge to cannot by current main software instrument, to reach within reasonable time acquisition, management, process and arrange become the more positive object information of help enterprise management decision-making) " concept widely know.What produce along with large data is carry out retrieving and contrast to mass data.Contrast mass data, or computer can only be relied on to complete the data retrieval of some, manual retrieval almost can not complete.
Common retrieval method is character string (character string to be detected) to data to be tested and retrieves in the database (database comprise store character string) having stored data.The principle of retrieval is exactly check whether the character string of data to be tested is present in have stored in the database of data.As having A-G in database, 7 letters, the character string of data to be tested is BC, so just can be found out by contrast and there is BC in the database established.
Traditional Data Detection depends on string matching and matching regular expressions, these detection modes have certain limitation, such as can only detect the data of character string class, and after word being increased simply to data to be tested or subtracting word amendment, this traditional detection mode just fails, as in database as there being tactic 7 letters of A-G in database, the character string of data to be tested is BEC, just between BC, add a letter, can but cause cannot the result of proper retrieval.
To sum up, existing data detection method, as long as character string to be detected with store character string slightly difference, just cannot normally detect.
Summary of the invention
The object of the present invention is to provide a kind of utilize data fingerprint to carry out Data Detection method and a kind of device utilizing data fingerprint to carry out Data Detection, to solve the above problems.
Provide a kind of method utilizing data fingerprint to carry out Data Detection in an embodiment of the present invention, comprising:
Obtain data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
If so, then data to be tested meet default requirement.
Preferably, whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches and comprises:
Search the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generate position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Whether the difference of the position offset that comparison is adjacent is less than default detection numerical value.
Preferably, also comprise:
According to the fragment length preset, normal data is divided into multistage normal data segments;
Respectively every segment standard data slot is generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
By standard fingerprint sequence fragment composition standard fingerprint sequence.
Preferably, the first algorithm and the second algorithm are identical hash algorithm, or mapping algorithm.
Preferably, also comprise:
Raw data base is set up according to multistage standard fingerprint sequence fragment;
Set up runtime database according to raw data base, the major key of runtime database is finger print data;
The position offset of the major key of each cryptographic hash in record raw data base;
In runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, and generate position offset sequence.
Preferably, the method for searching the cryptographic hash of the multiple normal datas corresponding with each cryptographic hash to be detected is:
The method of major key index is adopted to search.
Preferably, obtain the accuracy of detection of data to be tested, determine to detect numerical value according to accuracy of detection.
The embodiment of the present invention additionally provides data detection device, comprising:
First generation module, for obtaining data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Contrast module, whether the cryptographic hash for contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
Authentication module, if contrast module is judged as YES, then meets default requirement for certification data to be tested.
Preferably, contrast module to comprise:
Search unit, for searching the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Acquiring unit, for obtain each normal data found cryptographic hash corresponding to position offset, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generation unit, for generating position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Contrast unit, whether the difference for the adjacent position offset of comparison is less than default detection numerical value.
Preferably, comprising:
Segmentation module, for according to the fragment length preset, is divided into multistage normal data segments by normal data;
Second generation module, for respectively every segment standard data slot being generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
Composite module, for forming standard fingerprint sequence by standard fingerprint sequence fragment.
A kind of method utilizing data fingerprint to carry out Data Detection that the embodiment of the present invention provides, with of the prior art when detecting string data, if have modified the normal data (for detecting the data occurred as standard of other data) in database, between data, character is increased further as under the condition that arranges at former data sequence, then can cause and effectively cannot carry out string matching, or cannot Data Detection be caused unsuccessfully to compare according to matching regular expressions, it is by generating fingerprint sequence according to the first algorithm preset by data to be tested, normal data is generated with use second algorithm the fingerprint sequence affected, whether mated by the cryptographic hash in contrast two sequences, and then confirm whether fingerprint sequence to be detected meets the requirement preset, because the first fingerprint sequence for contrasting and the second fingerprint sequence are sequence of hash values, so, even if the difference of the two is very little, also can embody intuitively from sequence of hash values, the cryptographic hash namely produced is not identical, and whether use cryptographic hash to carry out coupling can effectively check the continuity of the cryptographic hash of data to be tested to mate with standard data item, even if normal data has carried out revising the matching degree that also effectively can test out data to be tested and normal data, thus solve the deficiencies in the prior art.
Accompanying drawing explanation
Fig. 1 shows a kind of basic procedure utilizing data fingerprint to carry out the method for Data Detection of the embodiment of the present invention;
Fig. 2 shows a kind of contrast step refinement flow process utilizing data fingerprint to carry out the method for Data Detection of the embodiment of the present invention;
Fig. 3 shows a kind of data sectional schematic diagram utilizing data fingerprint to carry out the method for Data Detection of the embodiment of the present invention;
Fig. 4 shows a kind of schematic diagram being carried out Hash values match by raw data base utilizing data fingerprint to carry out the method for Data Detection of the embodiment of the present invention;
Fig. 5 shows a kind of schematic diagram being carried out Hash values match by runtime database utilizing data fingerprint to carry out the method for Data Detection of the embodiment of the present invention;
Fig. 6 shows the model calling figure of the data detection device of the embodiment of the present invention.
Embodiment
Also by reference to the accompanying drawings the present invention is described in further detail below by specific embodiment.The embodiment of the present invention 1 provides a kind of method utilizing data fingerprint to carry out Data Detection, as shown in Figure 1, comprises the steps:
S101, obtains data to be tested, and according to the first algorithm preset, data to be tested is generated fingerprint sequence to be detected;
S102, whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches;
S103, certification data to be tested meet default requirement.
Wherein, step S101, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested.The first algorithm preset can be hash algorithm, mapping algorithm, or other can obtain the algorithm of cryptographic hash.Cryptographic hash in fingerprint sequence to be detected obtains by calculating data to be tested, namely can adopt by the information such as the character string of data to be tested, code through converting the mapping code obtained, and this mapping code may be used for comparison two groups maps code and whether have similarity.
In step S102, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data.It should be noted that, in order to the cryptographic hash of the cryptographic hash with normal data that enable data to be tested is effectively mated, first algorithm and the second algorithm should use identical algorithm types, or algorithmic formula, namely for information such as identical word, character string or codes map the cryptographic hash of generation, or should to arrange be identical.It should be noted that, the first algorithm mentioned herein and the second algorithm are interpreted as producing the whole algorithms for the data of comparing, and the data of comparing are the bases whether comparison two groups of data can mate.Concrete, the first algorithm and the second algorithm, Ke Yiru, hash algorithm and mapping algorithm etc., meanwhile, the cryptographic hash of generation is also interpreted as mapping value, or cryptographic hash etc. has the numerical value of comparison function, and different data can produce different mapping value.
In step S103, after two groups of cryptographic hash are contrasted, just can confirm whether two groups of cryptographic hash mate, and then just can judge whether data to be tested meet the requirement preset.The requirement preset refers to accuracy requirement, the comparison requirement that registration requirement etc. are concrete.As, require that the cryptographic hash more than 80% in the fingerprint sequence produced according to data to be tested finds the sequence that can form from the cryptographic hash of normal data.And for example, whether the cryptographic hash in fingerprint sequence to be detected puts in order arranged according to the cryptographic hash in standard fingerprint sequence.And for example, can by above-mentioned both combine and compare, to confirm whether the two mates, whether data to be tested meet the requirement preset in other words.Wherein, normal data can obtain from reliable data source in advance.
Concrete, as shown in Figure 2, step S102 can be refined as following steps:
S1021, searches the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
S1022, obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data;
S1023, generate position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
S1024, whether the difference of the position offset that comparison is adjacent is less than default detection numerical value.
In step S1021, need the cryptographic hash first finding the normal data corresponding with the cryptographic hash of each data to be tested.Correspondence is herein that exponential quantity is identical, or by same raw data, through the numerical value that identical algorithm (hash algorithm, or mapping algorithm) produces.The numeral produced after hash algorithm calculates as " I " is 34567, so no matter " I " these two words are in data to be tested, or in normal data, all can 34567 be generated after the calculating of hash algorithm, namely in normal data, whether having 34567 by searching, just can confirm whether data to be tested are consistent with normal data.Again on basis, in order to increase the confidentiality of data message, or security, can on the basis of cryptographic hash calculating normal data, use cryptographic algorithm, or the cryptographic hash of normal data is encrypted computing by other conversion methods, or Quadratic Map computing, the cryptographic hash produced as certain section of word is 12345, reduction formula is adopted to be (X+1) * 2, wherein, X is the cryptographic hash got, and that so obtain after reduction formula computing is (12345+1) * 2=24692.That is, in fact, if the cryptographic hash of certain field of data to be tested is through mapping, or after hash algorithm calculates, what obtain is 12345, and what so should search in the sequence of hash values of normal data is 24692 these numerals, also can be by 24692 these new numerals produced, be stored in other sequence, or in database reservoir vessel.Same reason, cryptographic algorithm is used to calculate, also same effect can be played, as used symmetric cryptography and asymmetric encryption scheduling algorithm, first the cryptographic hash of normal data is encrypted, time band needs to detect data to be tested, then by being decrypted the cryptographic hash after encryption, then mate with data to be tested.Or after data to be tested being carried out the encryption of same way, use and all mate through the cryptographic hash of encryption, it should be noted that, the cryptographic hash expression formula represented after encryption of different numerical value should be different, unique in other words conj.or perhaps, to prevent the mailbox of error hiding.
In step S1022, obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence
Each cryptographic hash is in the sequence by its fixed position, position offset have recorded the side-play amount of some cryptographic hash relative to certain numeral (being generally first numeral), also the position that this numeral is residing in raw data base just can be recognized by this side-play amount, so, in the sequence of hash values that this set of number forms, position offset corresponding to each cryptographic hash also just can represent the position of the cryptographic hash of each normal data in raw data base, further, also the sequence of a position offset can just be obtained, the sequence of this position offset is in the sequence formed for the cryptographic hash of description standard data, position residing for each cryptographic hash.
In step S1023, on the basis of step S102, the multiple position offsets got are formed a position offset sequence.Multiplely refer to, the cryptographic hash of each normal data all can a corresponding position offset, and what so describe an end data is multiple cryptographic hash, corresponding, also just have multiple position offset and produce.And then multiple position offset is formed a sequence, i.e. position offset sequence.
In step S1024, as shown in Figure 4, whether the difference contrasting adjacent position offset is less than default detection numerical value.As above, position offset illustrates certain cryptographic hash position residing in raw data base, so whether position offset just describes in data to be tested continuously, and identical whether with normal data of the order of certain section of word, if so identical, can think that data to be tested meet the requirements.Further, if in data to be tested, the position offset corresponding to each cryptographic hash, with previous, or the position offset corresponding to a rear cryptographic hash is the relation of going forward one by one; In other words with previous, or the difference of position offset corresponding to a rear cryptographic hash is 1, or less numerical value, the data sequence that so just can illustrate in data to be tested is identical with the data sequence of normal data, close in other words conj.or perhaps, so just, can determine that the two mutually mates, also just can confirm that data to be tested meet default requirement, to complete the work of Data Detection.
In order to normal data is generated fingerprint, need to operate according to the algorithm identical with the first algorithm, meanwhile, consider precision and the accuracy of retrieval, normal data can be divided into multistage, and each section all can carry out reference when retrieval.Concrete steps are as follows:
According to the fragment length preset, normal data is divided into multistage normal data segments;
Respectively every segment standard data slot is generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
By standard fingerprint sequence fragment composition standard fingerprint sequence.
As shown in Figure 3, concrete, setting fragment length is that (after segmentation, the length of every section is n) to n, be that the data of m (m>=n) do a burst every data measurement unit (n) by length, finally obtain m-n+1 fragment.
Be the fragment of n to each length, calculate its Hash or mapping value, finally obtain a hash value sequences h 1, h2 ... hm-n+1.Finally, then by these sequence fragments a complete standard fingerprint sequence is formed.As shown in the figure, segmentation can be carried out according to the mode of repeating data segmentation, also can carry out segmentation according to non-repetitive mode.The segmented mode repeated can improve the precision of detection.
It should be noted that, reasonably carry out to enable a kind of method utilizing data fingerprint to carry out Data Detection, first algorithm and the second algorithm are identical hash algorithm, or mapping algorithm, namely can ensure identical data, identical cryptographic hash can be obtained after use algorithm, or mapping value.
In raw data base, search the cryptographic hash of the normal data corresponding with the cryptographic hash of data to be tested, can use as methods such as traditional binary chops, but in order to improve the execution efficiency of searching action, the mode of setting major key can be adopted to search.The concrete course of work is as follows, as shown in Figure 5,
Raw data base is set up according to multistage standard fingerprint sequence fragment;
Set up runtime database according to raw data base, the major key of runtime database is finger print data;
The position offset of the major key of each cryptographic hash in record raw data base;
In runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, and generate position offset sequence.
Set up take finger print data as the runtime database of major key after, the mode can searched by major key is retrieved, and like this, the speed of retrieval also just substantially increases.And after establishing runtime database, from runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, to improve speed.That is: the method for searching the cryptographic hash of the multiple normal datas corresponding with each cryptographic hash to be detected is:
The method of major key index is adopted to search.
In order to adjust a kind of specific aim utilizing data fingerprint to carry out the method for Data Detection, first can also obtain the accuracy of detection of data to be tested, determining to detect numerical value according to accuracy of detection.
Namely, in fingerprint sequence to be detected, the difference of the position offset corresponding to adjacent cryptographic hash is excessive, then illustrate that the data sequence of data to be tested has certain difference relative to the data sequence of normal data.But it should be noted, simple reduction detects the size of numerical value, accuracy of detection can be improved, but it is corresponding, improve accuracy of detection and also can cause a certain proportion of wrong report, namely, certain data segment in possible data to be tested only increases/have modified little content (as unessential function word etc.) than the corresponding data section of normal data, but make data to be tested not meet default requirement due to the raising of precision, such and unreasonable, therefore, the accuracy of detection of concrete consideration data to be tested is needed to decide to detect numerical value.
It should be noted that, except detecting numerical value, the detection numerical value preset that can also be exceeded by the difference setting a continuous X position offset is judged, whether data to be tested match with normal data.Certain set as the position offset of data to be tested is: 123,124,125,126,130,131,132.So differ 4 between 126 and 130, but return to again normal difference after 130, namely illustrate that data to be tested are relative to normal data, normal data has more a little content than data to be tested, is exactly the content between 126 and 130.Namely only have sub-fraction different, other position offset is all continuous print, so namely can be understood as data to be tested and normal data matches.Certainly, also likely the cryptographic hash of data to be tested is that the cryptographic hash of normal data is unexistent, so usually skips this cryptographic hash.
A kind of method utilizing data fingerprint to carry out Data Detection provided by the present invention, by data to be tested being generated fingerprint sequence according to the first algorithm preset, normal data is generated with use second algorithm the fingerprint sequence affected, whether mated by the cryptographic hash in contrast two sequences, and then confirm whether fingerprint sequence to be detected meets the requirement preset, because the first fingerprint sequence for contrasting and the second fingerprint sequence are sequence of hash values, so, even if the difference of the two is very little, also can embody intuitively from sequence of hash values, the cryptographic hash namely produced is not identical, and whether use cryptographic hash to carry out coupling can effectively check the continuity of the cryptographic hash of data to be tested to mate with standard data item, even if normal data has carried out revising the matching degree that also effectively can test out data to be tested and normal data, and by establishing runtime database, and the major key of this database is finger print data, so when retrieving according to the cryptographic hash of data to be tested, effectiveness of retrieval can be accelerated, whether the difference also by setting a continuous X position offset exceedes preset value judges whether data to be tested territory normal data matches further, thus better solve the deficiencies in the prior art.
The embodiment of the present invention 2 provides data detection device, as shown in Figure 6, comprising:
First generation module 301, for obtaining data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Contrast module 302, whether the cryptographic hash for contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
Authentication module 303, if contrast module 302 is judged as YES, then meets default requirement for certification data to be tested.
Contrast module 302 comprises:
Search unit, for searching the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Acquiring unit, for obtain each normal data found cryptographic hash corresponding to position offset, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generation unit, for generating position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Contrast unit, whether the difference for the adjacent position offset of comparison is less than default detection numerical value.
Segmentation module, for according to the fragment length preset, is divided into multistage normal data segments by normal data;
Second generation module, for respectively every segment standard data slot being generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
Composite module, for forming standard fingerprint sequence by standard fingerprint sequence fragment.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (7)

1. utilize data fingerprint to carry out a method for Data Detection, it is characterized in that, comprising:
Obtain data to be tested, and according to the first algorithm preset, described data to be tested are generated fingerprint sequence to be detected, described fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in described fingerprint sequence to be detected and standard fingerprint sequence matches, described standard fingerprint sequence is generated through the second algorithm by normal data, and described standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
If so, then judge that described data to be tested meet default requirement.
2. method according to claim 1, is characterized in that, whether the cryptographic hash of the multiple normal datas in the cryptographic hash of the multiple data to be tested in the described fingerprint sequence to be detected of described contrast and standard fingerprint sequence matches and comprise:
Search the cryptographic hash of the described normal data corresponding with the cryptographic hash of each described data to be tested;
Obtain the position offset corresponding to cryptographic hash of each described normal data found, described position offset is for identifying the position of cryptographic hash in raw data base of each described normal data, and described raw data base is for storing described standard fingerprint sequence;
Generate position offset sequence, the order of described position offset in described position offset sequence is corresponding with the position of cryptographic hash in described fingerprint sequence to be detected of described normal data;
Whether the difference of the described position offset that comparison is adjacent is less than default detection numerical value.
3. method according to claim 2, is characterized in that, also comprises:
According to the fragment length preset, described normal data is divided into multistage normal data segments;
Respectively every section of described normal data segments is generated standard fingerprint sequence fragment according to described second algorithm, described standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
Described standard fingerprint sequence fragment is formed described standard fingerprint sequence.
4. method according to claim 1, is characterized in that, described first algorithm is identical hash algorithm with described second algorithm, or mapping algorithm.
5. method according to claim 3, is characterized in that, also comprises:
According to multistage, standard fingerprint sequence fragment sets up raw data base;
Set up runtime database according to described raw data base, the major key of described runtime database is finger print data;
Record the position offset of the major key of each cryptographic hash in described raw data base;
In described runtime database, search the position offset corresponding with the cryptographic hash of described data to be tested, and generate position offset sequence.
6. method according to claim 5, is characterized in that, described in search the cryptographic hash of the multiple described normal data corresponding with each described cryptographic hash to be detected method be:
The method of major key index is adopted to search.
7. method according to claim 2, is characterized in that, obtains the accuracy of detection of described data to be tested, determines described detection numerical value according to described accuracy of detection.
CN201410515557.9A 2014-09-30 2014-09-30 A kind of method utilizing data fingerprint to carry out Data Detection Active CN104317823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410515557.9A CN104317823B (en) 2014-09-30 2014-09-30 A kind of method utilizing data fingerprint to carry out Data Detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410515557.9A CN104317823B (en) 2014-09-30 2014-09-30 A kind of method utilizing data fingerprint to carry out Data Detection

Publications (2)

Publication Number Publication Date
CN104317823A true CN104317823A (en) 2015-01-28
CN104317823B CN104317823B (en) 2016-03-16

Family

ID=52373055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410515557.9A Active CN104317823B (en) 2014-09-30 2014-09-30 A kind of method utilizing data fingerprint to carry out Data Detection

Country Status (1)

Country Link
CN (1) CN104317823B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608205A (en) * 2015-12-25 2016-05-25 北京奇虎科技有限公司 Fingerprint verification method and device for structural data
CN106060026A (en) * 2016-05-24 2016-10-26 杭州华三通信技术有限公司 Information detection method and information detection device
CN106101060A (en) * 2016-05-24 2016-11-09 杭州华三通信技术有限公司 A kind of information detecting method and device
CN108062399A (en) * 2017-12-21 2018-05-22 新华三大数据技术有限公司 Data processing method and device
CN108400996A (en) * 2017-02-04 2018-08-14 中兴通讯股份有限公司 A kind of sending method of data, device and data gateway
CN111026740A (en) * 2019-12-03 2020-04-17 厦门市美亚柏科信息股份有限公司 Data reconciliation method, system and data system based on data fingerprints
CN111598576A (en) * 2020-05-22 2020-08-28 支付宝(杭州)信息技术有限公司 Privacy-protecting image information processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101882216A (en) * 2009-05-08 2010-11-10 成都市华为赛门铁克科技有限公司 Method, device and electronic equipment for structuring data fingerprint
CN101882141A (en) * 2009-05-08 2010-11-10 北京众志和达信息技术有限公司 Method and system for implementing repeated data deletion
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology
CN103870514A (en) * 2012-12-18 2014-06-18 华为技术有限公司 Repeating data deleting method and device
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101882216A (en) * 2009-05-08 2010-11-10 成都市华为赛门铁克科技有限公司 Method, device and electronic equipment for structuring data fingerprint
CN101882141A (en) * 2009-05-08 2010-11-10 北京众志和达信息技术有限公司 Method and system for implementing repeated data deletion
CN103870514A (en) * 2012-12-18 2014-06-18 华为技术有限公司 Repeating data deleting method and device
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method
CN103473278A (en) * 2013-08-28 2013-12-25 苏州天永备网络科技有限公司 Repeating data processing technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周斌 等: "布隆过滤器在重复数据删除中的应用", 《电脑知识与技术》 *
马晓旭 等: "基于重复数据删除的快速文件归档方法", 《四川大学学报( 工程科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608205A (en) * 2015-12-25 2016-05-25 北京奇虎科技有限公司 Fingerprint verification method and device for structural data
CN105608205B (en) * 2015-12-25 2019-05-14 北京奇虎科技有限公司 The finger-mark check method and device of structural data
CN106060026A (en) * 2016-05-24 2016-10-26 杭州华三通信技术有限公司 Information detection method and information detection device
CN106101060A (en) * 2016-05-24 2016-11-09 杭州华三通信技术有限公司 A kind of information detecting method and device
CN106060026B (en) * 2016-05-24 2020-05-22 新华三技术有限公司 Information detection method and device
CN108400996A (en) * 2017-02-04 2018-08-14 中兴通讯股份有限公司 A kind of sending method of data, device and data gateway
CN108062399A (en) * 2017-12-21 2018-05-22 新华三大数据技术有限公司 Data processing method and device
CN111026740A (en) * 2019-12-03 2020-04-17 厦门市美亚柏科信息股份有限公司 Data reconciliation method, system and data system based on data fingerprints
CN111026740B (en) * 2019-12-03 2022-07-12 厦门市美亚柏科信息股份有限公司 Data reconciliation method and system based on data fingerprints
CN111598576A (en) * 2020-05-22 2020-08-28 支付宝(杭州)信息技术有限公司 Privacy-protecting image information processing method and device

Also Published As

Publication number Publication date
CN104317823B (en) 2016-03-16

Similar Documents

Publication Publication Date Title
CN104317823B (en) A kind of method utilizing data fingerprint to carry out Data Detection
Fu et al. Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement
Zhao et al. A partition-based approach to structure similarity search
CN106250319B (en) Static code scanning result treating method and apparatus
CN110457873A (en) A kind of watermark embedding and detection method and device
CN113656807A (en) Vulnerability management method, device, equipment and storage medium
CN108154185A (en) A kind of k-means clustering methods of secret protection
CN104102748B (en) File Mapping method and device and file recommendation method and device
CN111124421B (en) Abnormal contract data detection method and device for blockchain intelligent contract
CN110457916B (en) Electronic contract encryption method and device and terminal equipment
Bauspieß et al. Privacy-preserving preselection for protected biometric identification using public-key encryption with keyword search
Popic et al. Privacy-preserving read mapping using locality sensitive hashing and secure kmer voting
CN116506230B (en) Data acquisition method and system based on RSA asymmetric encryption
Lockett Assessing the effectiveness of yara rules for signature-based malware detection and classification
US20230252185A1 (en) Apparatuses and methods for revealing user identifiers on an immutable sequential listing
CN109919180B (en) Electronic device, processing method of user operation record data and storage medium
US11451368B2 (en) Encrypted information matching device, encrypted information matching method, and recording medium having encrypted information matching program stored thereon
CN113656466A (en) Policy data query method, device, equipment and storage medium
CN114579580A (en) Data storage method and data query method and device
CN111898157B (en) Unintentional storage access method for machine learning multisource training set
Chen et al. CGAP-align: a high performance DNA short read alignment tool
Wang Search over encrypted data in cloud computing
CN115238689B (en) Word segmentation and sentence segmentation index processing method, document retrieval equipment and storage medium
Chalise et al. A Study of Feasibility and Diversity of Web Audio Fingerprints
CN113992334B (en) Storage method and verification method and device of equipment side data and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: BEIJING AIXIU XIN AN TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: BEIJING HELI SITENG TECHNOLOGY CO., LTD.

Effective date: 20150514

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20150514

Address after: 100038 Beijing City, Haidian District No. 2 North cellular Zhongsheng Building 9 to 10 layer 915A

Applicant after: BEIJING HOLYSTONE TECHNOLOGY CO., LTD.

Address before: 101407 Beijing city Huairou District Yanqi Economic Development Zone No. 888 floor 1 South No. 1

Applicant before: Beijing Heli Siteng Technology Co.,Ltd.

C14 Grant of patent or utility model
GR01 Patent grant