Summary of the invention
The object of the present invention is to provide a kind of utilize data fingerprint to carry out Data Detection method and a kind of device utilizing data fingerprint to carry out Data Detection, to solve the above problems.
Provide a kind of method utilizing data fingerprint to carry out Data Detection in an embodiment of the present invention, comprising:
Obtain data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
If so, then data to be tested meet default requirement.
Preferably, whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches and comprises:
Search the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generate position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Whether the difference of the position offset that comparison is adjacent is less than default detection numerical value.
Preferably, also comprise:
According to the fragment length preset, normal data is divided into multistage normal data segments;
Respectively every segment standard data slot is generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
By standard fingerprint sequence fragment composition standard fingerprint sequence.
Preferably, the first algorithm and the second algorithm are identical hash algorithm, or mapping algorithm.
Preferably, also comprise:
Raw data base is set up according to multistage standard fingerprint sequence fragment;
Set up runtime database according to raw data base, the major key of runtime database is finger print data;
The position offset of the major key of each cryptographic hash in record raw data base;
In runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, and generate position offset sequence.
Preferably, the method for searching the cryptographic hash of the multiple normal datas corresponding with each cryptographic hash to be detected is:
The method of major key index is adopted to search.
Preferably, obtain the accuracy of detection of data to be tested, determine to detect numerical value according to accuracy of detection.
The embodiment of the present invention additionally provides data detection device, comprising:
First generation module, for obtaining data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Contrast module, whether the cryptographic hash for contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
Authentication module, if contrast module is judged as YES, then meets default requirement for certification data to be tested.
Preferably, contrast module to comprise:
Search unit, for searching the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Acquiring unit, for obtain each normal data found cryptographic hash corresponding to position offset, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generation unit, for generating position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Contrast unit, whether the difference for the adjacent position offset of comparison is less than default detection numerical value.
Preferably, comprising:
Segmentation module, for according to the fragment length preset, is divided into multistage normal data segments by normal data;
Second generation module, for respectively every segment standard data slot being generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
Composite module, for forming standard fingerprint sequence by standard fingerprint sequence fragment.
A kind of method utilizing data fingerprint to carry out Data Detection that the embodiment of the present invention provides, with of the prior art when detecting string data, if have modified the normal data (for detecting the data occurred as standard of other data) in database, between data, character is increased further as under the condition that arranges at former data sequence, then can cause and effectively cannot carry out string matching, or cannot Data Detection be caused unsuccessfully to compare according to matching regular expressions, it is by generating fingerprint sequence according to the first algorithm preset by data to be tested, normal data is generated with use second algorithm the fingerprint sequence affected, whether mated by the cryptographic hash in contrast two sequences, and then confirm whether fingerprint sequence to be detected meets the requirement preset, because the first fingerprint sequence for contrasting and the second fingerprint sequence are sequence of hash values, so, even if the difference of the two is very little, also can embody intuitively from sequence of hash values, the cryptographic hash namely produced is not identical, and whether use cryptographic hash to carry out coupling can effectively check the continuity of the cryptographic hash of data to be tested to mate with standard data item, even if normal data has carried out revising the matching degree that also effectively can test out data to be tested and normal data, thus solve the deficiencies in the prior art.
Embodiment
Also by reference to the accompanying drawings the present invention is described in further detail below by specific embodiment.The embodiment of the present invention 1 provides a kind of method utilizing data fingerprint to carry out Data Detection, as shown in Figure 1, comprises the steps:
S101, obtains data to be tested, and according to the first algorithm preset, data to be tested is generated fingerprint sequence to be detected;
S102, whether the cryptographic hash contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches;
S103, certification data to be tested meet default requirement.
Wherein, step S101, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested.The first algorithm preset can be hash algorithm, mapping algorithm, or other can obtain the algorithm of cryptographic hash.Cryptographic hash in fingerprint sequence to be detected obtains by calculating data to be tested, namely can adopt by the information such as the character string of data to be tested, code through converting the mapping code obtained, and this mapping code may be used for comparison two groups maps code and whether have similarity.
In step S102, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data.It should be noted that, in order to the cryptographic hash of the cryptographic hash with normal data that enable data to be tested is effectively mated, first algorithm and the second algorithm should use identical algorithm types, or algorithmic formula, namely for information such as identical word, character string or codes map the cryptographic hash of generation, or should to arrange be identical.It should be noted that, the first algorithm mentioned herein and the second algorithm are interpreted as producing the whole algorithms for the data of comparing, and the data of comparing are the bases whether comparison two groups of data can mate.Concrete, the first algorithm and the second algorithm, Ke Yiru, hash algorithm and mapping algorithm etc., meanwhile, the cryptographic hash of generation is also interpreted as mapping value, or cryptographic hash etc. has the numerical value of comparison function, and different data can produce different mapping value.
In step S103, after two groups of cryptographic hash are contrasted, just can confirm whether two groups of cryptographic hash mate, and then just can judge whether data to be tested meet the requirement preset.The requirement preset refers to accuracy requirement, the comparison requirement that registration requirement etc. are concrete.As, require that the cryptographic hash more than 80% in the fingerprint sequence produced according to data to be tested finds the sequence that can form from the cryptographic hash of normal data.And for example, whether the cryptographic hash in fingerprint sequence to be detected puts in order arranged according to the cryptographic hash in standard fingerprint sequence.And for example, can by above-mentioned both combine and compare, to confirm whether the two mates, whether data to be tested meet the requirement preset in other words.Wherein, normal data can obtain from reliable data source in advance.
Concrete, as shown in Figure 2, step S102 can be refined as following steps:
S1021, searches the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
S1022, obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data;
S1023, generate position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
S1024, whether the difference of the position offset that comparison is adjacent is less than default detection numerical value.
In step S1021, need the cryptographic hash first finding the normal data corresponding with the cryptographic hash of each data to be tested.Correspondence is herein that exponential quantity is identical, or by same raw data, through the numerical value that identical algorithm (hash algorithm, or mapping algorithm) produces.The numeral produced after hash algorithm calculates as " I " is 34567, so no matter " I " these two words are in data to be tested, or in normal data, all can 34567 be generated after the calculating of hash algorithm, namely in normal data, whether having 34567 by searching, just can confirm whether data to be tested are consistent with normal data.Again on basis, in order to increase the confidentiality of data message, or security, can on the basis of cryptographic hash calculating normal data, use cryptographic algorithm, or the cryptographic hash of normal data is encrypted computing by other conversion methods, or Quadratic Map computing, the cryptographic hash produced as certain section of word is 12345, reduction formula is adopted to be (X+1) * 2, wherein, X is the cryptographic hash got, and that so obtain after reduction formula computing is (12345+1) * 2=24692.That is, in fact, if the cryptographic hash of certain field of data to be tested is through mapping, or after hash algorithm calculates, what obtain is 12345, and what so should search in the sequence of hash values of normal data is 24692 these numerals, also can be by 24692 these new numerals produced, be stored in other sequence, or in database reservoir vessel.Same reason, cryptographic algorithm is used to calculate, also same effect can be played, as used symmetric cryptography and asymmetric encryption scheduling algorithm, first the cryptographic hash of normal data is encrypted, time band needs to detect data to be tested, then by being decrypted the cryptographic hash after encryption, then mate with data to be tested.Or after data to be tested being carried out the encryption of same way, use and all mate through the cryptographic hash of encryption, it should be noted that, the cryptographic hash expression formula represented after encryption of different numerical value should be different, unique in other words conj.or perhaps, to prevent the mailbox of error hiding.
In step S1022, obtain the position offset corresponding to cryptographic hash of each normal data found, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence
Each cryptographic hash is in the sequence by its fixed position, position offset have recorded the side-play amount of some cryptographic hash relative to certain numeral (being generally first numeral), also the position that this numeral is residing in raw data base just can be recognized by this side-play amount, so, in the sequence of hash values that this set of number forms, position offset corresponding to each cryptographic hash also just can represent the position of the cryptographic hash of each normal data in raw data base, further, also the sequence of a position offset can just be obtained, the sequence of this position offset is in the sequence formed for the cryptographic hash of description standard data, position residing for each cryptographic hash.
In step S1023, on the basis of step S102, the multiple position offsets got are formed a position offset sequence.Multiplely refer to, the cryptographic hash of each normal data all can a corresponding position offset, and what so describe an end data is multiple cryptographic hash, corresponding, also just have multiple position offset and produce.And then multiple position offset is formed a sequence, i.e. position offset sequence.
In step S1024, as shown in Figure 4, whether the difference contrasting adjacent position offset is less than default detection numerical value.As above, position offset illustrates certain cryptographic hash position residing in raw data base, so whether position offset just describes in data to be tested continuously, and identical whether with normal data of the order of certain section of word, if so identical, can think that data to be tested meet the requirements.Further, if in data to be tested, the position offset corresponding to each cryptographic hash, with previous, or the position offset corresponding to a rear cryptographic hash is the relation of going forward one by one; In other words with previous, or the difference of position offset corresponding to a rear cryptographic hash is 1, or less numerical value, the data sequence that so just can illustrate in data to be tested is identical with the data sequence of normal data, close in other words conj.or perhaps, so just, can determine that the two mutually mates, also just can confirm that data to be tested meet default requirement, to complete the work of Data Detection.
In order to normal data is generated fingerprint, need to operate according to the algorithm identical with the first algorithm, meanwhile, consider precision and the accuracy of retrieval, normal data can be divided into multistage, and each section all can carry out reference when retrieval.Concrete steps are as follows:
According to the fragment length preset, normal data is divided into multistage normal data segments;
Respectively every segment standard data slot is generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
By standard fingerprint sequence fragment composition standard fingerprint sequence.
As shown in Figure 3, concrete, setting fragment length is that (after segmentation, the length of every section is n) to n, be that the data of m (m>=n) do a burst every data measurement unit (n) by length, finally obtain m-n+1 fragment.
Be the fragment of n to each length, calculate its Hash or mapping value, finally obtain a hash value sequences h 1, h2 ... hm-n+1.Finally, then by these sequence fragments a complete standard fingerprint sequence is formed.As shown in the figure, segmentation can be carried out according to the mode of repeating data segmentation, also can carry out segmentation according to non-repetitive mode.The segmented mode repeated can improve the precision of detection.
It should be noted that, reasonably carry out to enable a kind of method utilizing data fingerprint to carry out Data Detection, first algorithm and the second algorithm are identical hash algorithm, or mapping algorithm, namely can ensure identical data, identical cryptographic hash can be obtained after use algorithm, or mapping value.
In raw data base, search the cryptographic hash of the normal data corresponding with the cryptographic hash of data to be tested, can use as methods such as traditional binary chops, but in order to improve the execution efficiency of searching action, the mode of setting major key can be adopted to search.The concrete course of work is as follows, as shown in Figure 5,
Raw data base is set up according to multistage standard fingerprint sequence fragment;
Set up runtime database according to raw data base, the major key of runtime database is finger print data;
The position offset of the major key of each cryptographic hash in record raw data base;
In runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, and generate position offset sequence.
Set up take finger print data as the runtime database of major key after, the mode can searched by major key is retrieved, and like this, the speed of retrieval also just substantially increases.And after establishing runtime database, from runtime database, search the position offset corresponding with the cryptographic hash of data to be tested, to improve speed.That is: the method for searching the cryptographic hash of the multiple normal datas corresponding with each cryptographic hash to be detected is:
The method of major key index is adopted to search.
In order to adjust a kind of specific aim utilizing data fingerprint to carry out the method for Data Detection, first can also obtain the accuracy of detection of data to be tested, determining to detect numerical value according to accuracy of detection.
Namely, in fingerprint sequence to be detected, the difference of the position offset corresponding to adjacent cryptographic hash is excessive, then illustrate that the data sequence of data to be tested has certain difference relative to the data sequence of normal data.But it should be noted, simple reduction detects the size of numerical value, accuracy of detection can be improved, but it is corresponding, improve accuracy of detection and also can cause a certain proportion of wrong report, namely, certain data segment in possible data to be tested only increases/have modified little content (as unessential function word etc.) than the corresponding data section of normal data, but make data to be tested not meet default requirement due to the raising of precision, such and unreasonable, therefore, the accuracy of detection of concrete consideration data to be tested is needed to decide to detect numerical value.
It should be noted that, except detecting numerical value, the detection numerical value preset that can also be exceeded by the difference setting a continuous X position offset is judged, whether data to be tested match with normal data.Certain set as the position offset of data to be tested is: 123,124,125,126,130,131,132.So differ 4 between 126 and 130, but return to again normal difference after 130, namely illustrate that data to be tested are relative to normal data, normal data has more a little content than data to be tested, is exactly the content between 126 and 130.Namely only have sub-fraction different, other position offset is all continuous print, so namely can be understood as data to be tested and normal data matches.Certainly, also likely the cryptographic hash of data to be tested is that the cryptographic hash of normal data is unexistent, so usually skips this cryptographic hash.
A kind of method utilizing data fingerprint to carry out Data Detection provided by the present invention, by data to be tested being generated fingerprint sequence according to the first algorithm preset, normal data is generated with use second algorithm the fingerprint sequence affected, whether mated by the cryptographic hash in contrast two sequences, and then confirm whether fingerprint sequence to be detected meets the requirement preset, because the first fingerprint sequence for contrasting and the second fingerprint sequence are sequence of hash values, so, even if the difference of the two is very little, also can embody intuitively from sequence of hash values, the cryptographic hash namely produced is not identical, and whether use cryptographic hash to carry out coupling can effectively check the continuity of the cryptographic hash of data to be tested to mate with standard data item, even if normal data has carried out revising the matching degree that also effectively can test out data to be tested and normal data, and by establishing runtime database, and the major key of this database is finger print data, so when retrieving according to the cryptographic hash of data to be tested, effectiveness of retrieval can be accelerated, whether the difference also by setting a continuous X position offset exceedes preset value judges whether data to be tested territory normal data matches further, thus better solve the deficiencies in the prior art.
The embodiment of the present invention 2 provides data detection device, as shown in Figure 6, comprising:
First generation module 301, for obtaining data to be tested, and according to the first algorithm preset, data to be tested are generated fingerprint sequence to be detected, fingerprint sequence to be detected comprises the cryptographic hash of multiple data to be tested;
Contrast module 302, whether the cryptographic hash for contrasting the multiple normal datas in the cryptographic hash of the multiple data to be tested in fingerprint sequence to be detected and standard fingerprint sequence matches, standard fingerprint sequence is generated through the second algorithm by normal data, and standard fingerprint sequence comprises the cryptographic hash of multiple normal data;
Authentication module 303, if contrast module 302 is judged as YES, then meets default requirement for certification data to be tested.
Contrast module 302 comprises:
Search unit, for searching the cryptographic hash of the normal data corresponding with the cryptographic hash of each data to be tested;
Acquiring unit, for obtain each normal data found cryptographic hash corresponding to position offset, position offset is for identifying the position of cryptographic hash in raw data base of each normal data, and raw data base is used for storage standard fingerprint sequence;
Generation unit, for generating position offset sequence, the order of position offset in position offset sequence is corresponding with the position of cryptographic hash in fingerprint sequence to be detected of normal data;
Contrast unit, whether the difference for the adjacent position offset of comparison is less than default detection numerical value.
Segmentation module, for according to the fragment length preset, is divided into multistage normal data segments by normal data;
Second generation module, for respectively every segment standard data slot being generated standard fingerprint sequence fragment according to the second algorithm, standard fingerprint sequence fragment comprises the cryptographic hash of multiple normal data;
Composite module, for forming standard fingerprint sequence by standard fingerprint sequence fragment.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.