CN105844214B - A kind of information fingerprint extracting method of the multipath depth coding based on bit space - Google Patents

A kind of information fingerprint extracting method of the multipath depth coding based on bit space Download PDF

Info

Publication number
CN105844214B
CN105844214B CN201610119377.8A CN201610119377A CN105844214B CN 105844214 B CN105844214 B CN 105844214B CN 201610119377 A CN201610119377 A CN 201610119377A CN 105844214 B CN105844214 B CN 105844214B
Authority
CN
China
Prior art keywords
bit
fingerprint
information
row
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610119377.8A
Other languages
Chinese (zh)
Other versions
CN105844214A (en
Inventor
杨灿
任思璇
韩国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201610119377.8A priority Critical patent/CN105844214B/en
Publication of CN105844214A publication Critical patent/CN105844214A/en
Application granted granted Critical
Publication of CN105844214B publication Critical patent/CN105844214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1365Matching; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The information fingerprint extracting method of the invention discloses a kind of multipath depth coding based on bit space, comprising the following steps: Step 1: building bit window;Step 2: building bit-planes;Step 3: dimensionality reduction constructs bit chain;Step 4: being encoded to bit chain: carrying out traversal statistics for the dimensionality reduction bit chain of above-mentioned construction, successively obtain the number of continuous 0 and continuous 1, it constitutes 1 new decimal system ordered series of numbers: carrying out binarization again for gained decimal system ordered series of numbers and obtain the non-zero binary bits chain in new first place, above-mentioned traversal statistical counting operation is repeated to gained bit chain, iteration is circuited sequentially until the element in the decimal system ordered series of numbers newly obtained is 1, the value and loop iteration number that record finally obtained element are a characteristic value of information fingerprint feature space;Step 5: fingerprint comparison.The present invention has many advantages, such as efficiency when improving identical information detection.

Description

A kind of information fingerprint extracting method of the multipath depth coding based on bit space
Technical field
The present invention relates to a kind of computer and the communication technology, in particular to a kind of multipath depth based on bit space is compiled The information fingerprint extracting method of code.
Background technique
Ideal information fingerprint follows two basic principles: the first, the finger print information amount extracted from initial data is fewer It is better, to save being taken up space for finger print information itself;The second, by fingerprint comparison, it is capable of the consistency of accurate discriminative information. Information fingerprint extract with comparison technology mainly solves the problems, such as two classes: if 1) fingerprint is inconsistent, conclude content to be sentenced with it is original interior The inconsistency of appearance;If 2) fingerprint is consistent, the consistency of content and original contents to be sentenced is concluded.Current existing information fingerprint Extractive technique mainly has MD5 fingerprint for data, for fingerprints such as the histogram of image, characteristic value and feature sampled points Type, there is the grand filter of cloth (Bloom Filter) technology in terms of information comparison, these existing methods are solving the above institute In terms of the 1st class problem stated, work well;But in face of the 2nd class problem, the performance that has not shown.Current information fingerprint The 2nd class problem of technical treatment, that is, fingerprint is consistent necessary but can not determine that the fragility of the whether genuine consistent problem of content causes A large amount of computing costs of raw information comparison are carried out, and fingerprint comparison itself constitutes a redundancy, hence it is imperative that new finger Line extractive technique promotes its ability for solving the problems, such as the second class;Especially the MD5 of international popular is successfully cracked by China scientist Later, this problem is more aobvious urgent.
Based on this, the present invention provides a kind of information fingerprint extracting method of multipath depth coding based on bit space (referred to as: MDB) can satisfy its coherence request to information judgement, that is, be exactly that not only fingerprint is inconsistent can be concluded that content Certainly it is inconsistent, as long as and fingerprint it is consistent, content has the consistency of high probability.In order to improve the efficiency of fingerprint comparison, The present invention also provides the quick fingerprint comparison technology based on the method.It is provided for quick information comparison and information retrieval new Method (MDB-M1 and MDB-M2).Application of the invention is for data especially big data transmission and storage, information centre's net The redundant content detection in the fields such as Cache mechanism, CCN content center net and grain communication is also of great significance.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology and deficiency, provides a kind of multipath based on bit space The information fingerprint extracting method of depth coding, this method are a kind of information fingerprints based on the multichannel of bit space through depth coding It extracts and comparison method (MDB).
The purpose of the present invention can be achieved through the following technical solutions: a kind of multipath depth coding based on bit space Information fingerprint extracting method, comprising the following steps:
Step 1: building bit window;
Step 2: building bit-planes: by bit window according to the extra heavy new segmentation of a certain width versus, being arranged in a ratio Special plane;
Step 3: dimensionality reduction constructs bit chain: carrying out dimensionality reduction arrangement by a variety of travel paths modes, construct different bit chains BC, and record the coding of corresponding arrangement mode;
Step 4: being encoded to bit chain: carrying out traversal statistics for the dimensionality reduction bit chain of above-mentioned construction, successively obtain Continuous 0 and continuous 1 number constitutes 1 new decimal system ordered series of numbers: carrying out binarization again for gained decimal system ordered series of numbers The non-zero binary bits chain in new first place is obtained, above-mentioned traversal statistical counting is repeated to gained bit chain and is operated, successively Loop iteration records the value and circulation of finally obtained element until the element in the decimal system ordered series of numbers newly obtained is 1 The number of iterations is a characteristic value of information fingerprint feature space;
Step 5: fingerprint comparison.
In step 1, bit window is constructed according to the effective bit that can uniquely characterize original information bytes stream, than The size of special window is any positive integer m for being not more than raw information total bit number (Tb), for different system or byte stream, The case where value of m can change, especially m=8, the integral number power that 16,32,64 grades are 2;In step 2, by raw information Byte stream window is divided by the width of m bit and is arranged side by side one by one, and lowest order 0, highest order m-1 obtains a m row * n The bit-planes of column;It is n that the m, which divides exactly Tb, is omitted or mended 0 processing to remainder if aliquant.
In step 3, dimensionality reduction arrangement is carried out by following travel paths modes, constructs different bit chains, and is recorded corresponding Arrangement mode coding:
A) by above-mentioned bit-planes using m as fixed step size, since a high position for the first row, from the high-order column of the row to low level Column carry out dimensionality reduction sequence, and it is one by two-dimensional bits plane conversion that the tail position bit of low level row is connected with the first bit of high-order row A length is the one-dimensional bit chain of m*n, and records the encoded radio of above-mentioned sortord;
B) it by above-mentioned bit-planes using m as fixed step size, since the tail position of the first row, arranges from the low level of the row to a high position Column carry out dimensionality reduction sequence, and it is one by two-dimensional bits plane conversion that the high order bit of low level row is connected with the tail position bit of high-order row A length is the one-dimensional bit chain of m*n, and records the encoded radio of above-mentioned sortord;
C) supreme from the low level row of the column since the high-order column of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the low level row bit of low level column is connected with the high-order row bit of high-order column, two-dimensional bits plane is turned It is changed to a length and is the one-dimensional bit chain of n*m, and record the encoded radio of above-mentioned sortord;
D) supreme from the low level row of the column since ranking the tail of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the low level row bit of high-order column, two-dimensional bits plane is turned It is changed to a length and is the one-dimensional bit chain of n*m, and record the encoded radio of the above-mentioned sortord;
E) supreme from the low level row of the column since ranking the tail of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the high-order row bit of high-order column, until low level column is minimum Position row bit is connected with the lowest order row bit of high-order column, and so on, it be a length by two-dimensional bits plane conversion is n*m One-dimensional bit chain, and record the encoded radio of above-mentioned sortord;
F) by above-mentioned bit-planes using n as fixed step size, since the first column of last line, from the high-order rows of the column to Low level carries out dimensionality reduction sequence row, and the low level row bit of high-order column is connected with the low level row bit that low level arranges, until high-order column is most Low level row bit is connected with the lowest order row bit that low level arranges, and so on, it is that a length is by two-dimensional bits plane conversion The one-dimensional bit chain of n*m, and record the encoded radio of above-mentioned sortord.
The step 4 the following steps are included:
Step 41 tentatively encodes bit chain;The dimensionality reduction bit chain is subjected to the first coding, that is, records above-mentioned ratio 1st bit of special chain is then encoded to 0 if it is 0;If it is 1, then 1 is encoded to;First place coding represents corresponding bits chain Start bit is 0 or 1;
Step 42 carries out simplifying run-length encoding to the dimensionality reduction bit chain of formation, i.e., order statistics are continuous from the beginning to the end 0/1 number occurred, obtain the nonzero integer sequence of a corresponding bit chain, and initial code depth value be set, then Carry out depth coding;
Step 43 carries out depth coding to bit chain;
It is described to bit chain carry out depth coding method the following steps are included:
Step A, resulting nonzero integer sequence is subjected to binarization again, omits the corresponding binary digit of each integer First 1 front 0, building forms 0/1 new bit chain;
Step B, and the length of the bit chain is counted, and loop iteration depth value is added 1;
Step C, it carries out the bit chain newly obtained substitution original bit chain to simplify run-length encoding again, what statistics continuously occurred 0/1 number constitutes new nonzero integer sequence.
Circulation executes step A to step C, until the length of the nonzero integer sequence newly obtained is 1, i.e., only 1 Count value, and record this value to terminate encoded radio, while recording final coding depth value, i.e., above-mentioned loop iteration number.
Step 41 is reruned to 43, until obtaining six sequence sides by a variety of bit chains that different sortords are constructed The triple of formula<first place value, termination encoded radio, coding depth value>until, the triple of six sortords includes first The triple of triple, the third sortord construction that the triple of kind sortord construction, second of sortord construct, The three of triple and the 6th kind of sortord construction that the triple of 4th kind of sortord construction, the 5th kind of sortord construct Tuple;It is the fingerprint of the information by the feature space that six triples constitute raw information.
Generate the comparison method by turn of the fingerprint of the information, comprising the following steps:
(1) the first to compare: it is preferential to compare corresponding the first bit value for same bit chain structural model, if there is appointing What one corresponding the first bit value is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
(2) encoded radio compares: if the first bit value is identical, continuing to compare corresponding final encoded radio, if there is appointing What one corresponding final encoded radio is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
(3) coding depth value compares: if the first and final encoded radio is all identical, continuing to compare corresponding coding depth Value, i.e. the number of iterations value then illustrate that the two fingerprint mismatches, terminate if there is any one corresponding coding depth value is different It compares;If identical, then it is assumed that the two fingerprint matching.
The unified comparison of joint of the fingerprint of the generated information;The method of the unified comparison of joint are as follows: by first The triple of kind of sortord construction is compared according to the comparison method by turn of fingerprint, such as if they are the same, then second is sorted The triple that mode constructs is compared by the comparison method by turn of fingerprint, and so on, until the 6th kind of sortord carries out Until comparison;If all consistent according to the triple that the path that identical sequencing model constructs is formed by six sortords, Fingerprint is all consistent, then it is assumed that information is consistent, otherwise it is assumed that information is inconsistent.
The purpose of the present invention can also be achieved through the following technical solutions: a kind of multipath depth volume based on bit space The information fingerprint extracting method of code, comprising the following steps:
S1, building bit window: bit window is constructed according to the effective bit that can uniquely characterize original information bytes stream Mouthful, the width of the bit window wb of byte stream is m bit, and m is not more than the total bit number Tb, i.e., a total of n=of raw information Ceil (Tb/m) a wb;If aliquant, the width of the last one window is remainder, i.e. Tb-ceil (Tb/m) * m, here Ceil (Tb/m) indicates that Tb/m rounds up.In order to easy to operate, the value of m is generally 4 or 8 integral multiple.
S2, construction bit-planes: byte stream window wb is arranged side by side one by one by the width of m bit, lowest order 0, most A high position is m-1, obtains the bit-planes of m row * n column.
S3, dimensionality reduction construct bit chain: dimensionality reduction arrangement carried out by a variety of travel paths modes, constructs different bit chain BC, And record the coding of corresponding arrangement mode:
A) by above-mentioned bit-planes using m as fixed step size, since a high position for the first row, from the high-order column of the row to low level Column carry out dimensionality reduction sequence, and it is one by two-dimensional bits plane conversion that the tail position bit of low level row is connected with the first bit of high-order row A length is the one-dimensional bit chain BC0 of m*n.The encoded radio for recording above-mentioned sortord RM is 0;
B) it by above-mentioned bit-planes using m as fixed step size, since the tail position of the first row, arranges from the low level of the row to a high position Column carry out dimensionality reduction sequence, and it is one by two-dimensional bits plane conversion that the high order bit of low level row is connected with the tail position bit of high-order row A length is the one-dimensional bit chain BC1 of m*n.The encoded radio for recording above-mentioned sortord RM is 1;
C) supreme from the low level row of the column since the high-order column of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the low level row bit of low level column is connected with the high-order row bit of high-order column, two-dimensional bits plane is turned It is changed to the one-dimensional bit chain BC2 that a length is n*m.The encoded radio for recording above-mentioned sortord RM is 2;
D) supreme from the low level row of the column since ranking the tail of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the low level row bit of high-order column, two-dimensional bits plane is turned It is changed to the one-dimensional bit chain BC3 that a length is n*m.The encoded radio for recording above-mentioned sortord RM is 3;
E) supreme from the low level row of the column since ranking the tail of the first row by above-mentioned bit-planes using n as fixed step size Position row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the high-order row bit of high-order column, until low level column is minimum Position row bit is connected with the lowest order row bit of high-order column, and so on, it be a length by two-dimensional bits plane conversion is n*m One-dimensional bit chain BC4.The encoded radio for recording above-mentioned sortord RM is 4;
F) by above-mentioned bit-planes using n as fixed step size, since the first column of last line, from the high-order rows of the column to Low level row carries out dimensionality reduction sequence, and the low level row bit of high-order column is connected with the low level row bit that low level arranges, until high-order column is most Low level row bit is connected with the lowest order row bit that low level arranges, and so on, it is that a length is by two-dimensional bits plane conversion The one-dimensional bit chain BC5 of n*m.The encoded radio for recording above-mentioned sortord RM is 5.
S4, the first coding: carrying out the first coding Fb for above-mentioned bit chain, that is, record the 1st bit of above-mentioned bit chain, If it is 0, then 0 is encoded to;If it is 1, then 1 is encoded to;It is 0 or 1 that first place coding, which represents the start bit of corresponding bits chain,.
S5, it simplifies run-length encoding: carrying out the bit chain of formation to simplify run-length encoding, i.e., order statistics are continuous from the beginning to the end 0/1 occur number, obtain a corresponding bit chain nonzero integer sequence.And BRLC depth value depth_of_ is set BRLC=1;
S6, depth coding: resulting sequence B RLC is subjected to binarization, omits the corresponding binary digit of each integer First 1 front 0, building forms 0/1 new bit chain NBC, and counts the length Length_of_BRLC of the bit chain, And by loop iteration depth value depth_of_BRLC+=1, carry out the bit chain NBC newly obtained to simplify run-length encoding, count 0/1 number continuously occurred constitutes the BRLC of letter.
S7, step 6 is repeated, until the Length_of_BRLC newly obtained is 1.
S8, depth_of_BRLC when recording above-mentioned loop termination and the BRLC value finally obtained.
S9, fingerprint extraction: the uniform above-mentioned steps 3 of six kinds of bit chains constructed by raw information that step 3 is mentioned to Step 8 repeats to obtain 6 triple<Fb, and BRLC, depth_of_BRLC>, thus 6 triple (6*3 characteristic element) structures At the fingerprint of the information.
In order to carry out fingerprint comparison, MDB of the present invention provides two kinds of comparison methods, method one, (referred to as: MDB-M1) be by Position compares, and method two (referred to as: MDB-M2), which is that joint is unified, to be compared.Following steps are that one MDB-M1 of method uses the above process To the step S9 method that is compared by turn of fingerprint extracted the following steps are included:
S10, the first comparison: for same bit chain structural model RM, corresponding Fb value is preferentially compared, if there is any one A corresponding Fb value is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
S11, encoded radio compare: if Fb is identical, continuing to compare corresponding BRLC value, if there is any one is corresponding BRLC value is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
S12, coding depth value compare: if BRLC is identical, continue to compare corresponding depth_of_BRLC value, if There is any one corresponding depth_of_BRLC value different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, Think the two fingerprint matching.
If S13, the triple for pressing the first path configuration of step 3 kind are homogeneous to step 14 by above-mentioned comparison step 11 Together, then the triple for starting second of path configuration compares, and so on be compared to the 6th kind of path.It is all consistent, then Think that information is consistent, if any any difference, then information is inconsistent.
The technical characteristic that method two, (MDB-M2) carry out unified comparison is as follows:
S14, according to the bit chain CIFB of the above fingerprint extraction step 10 6*3 information fingerprint generated through overcompression Value, directly progress bit comparison are different if any any bit, then it is assumed that information is inconsistent, terminates and compares;If whole ratios Special position is consistent, then it is assumed that information is consistent.
Above step is the information fingerprint proposed by the invention based on bit space multipath depth coding from S1 to S5 The basic step of extracting method, in order to further increase the reliability and/or applicability of system, S6-S10 provides height for the present invention Grade spread step, the step can constitute the bit of different accuracy with one of the basic step S3 of front or multiple combinations Fingerprint extraction scheme.6 kinds of modes described in S3 can also extend further to Arbitrary Deterministic bit space all over the bit chain of column Forming types do not influence essence of the invention, for example surround from outside to inside, or dissipate around or the word of ZigZag from centre The modes such as shape.It only include the feature of 0 and 1 two element for 0/1 bit chain, the present invention, which mentions, to be simplified run-length encoding only and need 1 Unit instead of classical run-length encoding three basic units (i.e. symbol, number of repetition, position) because the present invention by symbol with 0/1 odd even fallback relationship and sequence arrangement mode imply symbol and the feature of position two, thus in bit chain construction It only needs continuously successively to record number of repetition, this saves the expenses of RLC.End about circulation BRCL coding described in step 7 Only condition is also possible to some given threshold value of depth_of_BRLC or BRCL value, which nor affects on reality of the invention Matter.S10 to S12 is fingerprint comparison step, and first place, which compares, is used for quick fingerprint detection, and the size and number of other alignment parameters passes through The information content very little crossing above-mentioned fingerprint extraction process, having been simplified, therefore compare, can greatly improve comparison efficiency.Together When, because the fingerprint extraction method is directly against each bit, and implicit bit position information and mechanical periodicity, by S3 The building mode of 6 dimensions, therefore comparing being capable of very solution of the high probability satisfaction to the 2nd class problem.
In conclusion the information fingerprint extracting method of the multipath depth coding provided by the invention based on bit space, Its key step includes: the construction of bit window, bit-planes and bit chain, and first place coding simplifies run-length encoding, depth coding (circulation depth simplifies run-length encoding using binary system), fingerprint extraction (six triples), fingerprint compression, fingerprint comparison (the first, Encoded radio, coding depth) and etc..The main function of the invention is to reduce comparison information amount, and enhancing compares accuracy, the present invention It is mainly used in the extraction and comparison of involved information fingerprint in network and the communications field.
The selection mode of the initial position of bit chain construction of the present invention and the adjustment of traversal mode and characteristic value Variation, do not change essence of the invention, i.e., its related mutation and the present invention constitute substantive consistent.
The present invention carries out 0/ by carrying out bit segmentation reconstruct bit-planes according to certain window to original information bits stream 1 traversal statistics, successively obtains the number of continuous 0 and continuous 1, constitutes new decimal system ordered series of numbers;Again for gained decimal system ordered series of numbers It carries out binarization and obtains the non-zero binary bits chain in new first place, above-mentioned traversal statistical counting is repeated to gained bit chain Operation;Rule loop iteration records finally obtained member until the element in the decimal system ordered series of numbers newly obtained is 1 according to this Element value is encoded radio, and loop iteration number is coding depth value;Using different bit-planes building methods and traversal mode, weight A series of resulting different encoded radios of the above method and coding depth value, the complete characterization of configuration information fingerprint are used again.It is right The comparison of the information fingerprint is using comparison and overall comparison method by turn.
The present invention has the following advantages and effects with respect to the prior art:
1, focus point of the invention is to be extracted by the depth coding of multipath to information fingerprint, including bit chain Construction, the first of bit chain encode and simplify run-length encoding, fingerprint comparison.
2, the thought source of the multipath depth coding proposed by the invention based on bit space is, in a computer, Any information can be stored in a manner of binary, and in storing process, 0 or 1 often it is continuous occur either with What the form in period occurred, therefore in transmission process, information itself there is a certain amount of redundancy.Thought of the invention is then By the bit for having larger redundancy in information by recompiling, to reduce the digit of bit, when carrying out information comparison, it is only necessary to The information fingerprint extracted is compared, without largely being detected to original information, is greatly improved when information compares Working efficiency.Although there is the method for much extracting and being detected to information fingerprint at present, so far, there is not base In correlative study and patent that bit space extracts information fingerprint and compares.Method proposed by the present invention can be to a large amount of Redundancy reduced, can apply information transmission can identical information context of detection.Meanwhile in order to reduce information Interior degree of redundancy, the invention also provides depth coding modes, and coding is constantly iterated to information, improve identical information inspection Efficiency when survey.
3, the information fingerprint extracting method of the multipath depth coding proposed by the invention based on bit space, core It is that prime information fingerprint is reconfigured to bit-planes in the form of bit window, constructs ratio by choosing different path dimensionality reductions Special chain, then encode by first place and simplify run-length encoding and the bit chain that path dimensionality reduction constructs is recompiled.In fingerprint pair Than aspect, the quick fingerprint detection method of two kinds proposed through the invention is compared compressed bit chain.Compared to biography The information content of the information fingerprint control methods of system, this method comparison is less, and the accuracy rate differentiated to prime information is higher.Side of the present invention The implementation of method does not need excessively complicated coding mode, and can be recompiled on original information fingerprint, and energy Enough to realize seamless combination with existing fingerprint extraction method, application prospect is extensive, can be widely used in network and the communications field.It is right It is significant in the transmission and migration of big data, particularly with the big data of amount of redundancy, fingerprint extraction method proposed by the present invention There is prior meaning.
Detailed description of the invention
Fig. 1 is the basic functional principle schematic diagram of MDB of the present invention.
Fig. 2 is detailed operation principle process schematic diagram described in MDB of the present invention.
Fig. 3 a is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode a.
Fig. 3 b is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode b.
Fig. 3 c is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode c.
Fig. 3 d is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode d.
Fig. 3 e is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode e.
Fig. 3 f is the figure that one Magic of the embodiment of the present invention (3) is constructed bit chain by mode f.
Fig. 4 a is the present invention according to the comparison fingerprint work flow diagram of Comparison Method by turn.
Fig. 4 b is the present invention according to whole Comparison Method comparison fingerprint work flow diagram.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment
As shown in Figure 1, indicating that the information fingerprint of the multipath depth coding proposed by the invention based on bit space mentions Take whole processes of method.Bit space and bit-planes are constructed first;Mode, which is chosen, according still further to different paths constructs bit Chain;Initial code and Depth Expansion coding are carried out to bit chain, wherein initial code include bit chain is carried out the first coding and Run-length encoding is simplified, Depth Expansion coding is progress successive ignition coding on the basis of initial code, until the length of bit chain Degree stops for 1;Fingerprint is compared again after fingerprint extraction.
As shown in Fig. 2, indicating fingerprint extraction specific method proposed by the present invention.It is square in a different ordering to bit-planes Formula constructs bit chain, and the triple that every kind of sortord eventually forms is combined, the finger print information of a 6*3 is formed, will The finger print information is compressed the finger print information CIFB finally extracted.
As shown in fig. 4 a, indicate that the present invention proposes the specific workflow compared by turn to fingerprint.It carries out first first Position compares, i.e., for same bit chain structural model RM, corresponding Fb value is preferentially compared, if there is any one corresponding Fb value Difference then illustrates that the two fingerprint mismatches, terminates to compare;If identical, continue to compare corresponding BRLC value, if there is any One corresponding BRLC value is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, continue to compare corresponding Depth_of_BRLC value then illustrates that the two fingerprint mismatches if there is any one corresponding depth_of_BRLC value is different, Terminate to compare;If identical, then it is assumed that the two fingerprint matching.
As shown in Figure 4 b, indicate that the present invention proposes the specific workflow integrally compared to fingerprint.It will be compressed The bit chain CIFB value of 6*3 information fingerprint directly carries out bit comparison, different if any any bit, then it is assumed that information is different It causes, terminates and compare;If whole bits are consistent, then it is assumed that information is consistent.
In this example, object to be processed is Magic (3) matrix of a standard, here for convenience and base 3*3, matrix are only taken in the size that the page limits the matrix are as follows:By dimension-reduction treatment, which turns to one 1 dimension Ordered series of numbers, 1 dimension are classified as: (8 3415967 2);Binarization is carried out to the decimal system element in 1 dimension column, is obtained To a Binary Zero/1 bit chain, Binary Zero/1 bit chain are as follows: (1,000 0,011 0,100 0,001 0,101 1,001 0110 0111 0010);For the bit chain of Fig. 3 c, segmentation is carried out with the bit window size of m=4 and is rearranged, the bit of acquisition is flat Face are as follows:For gained bit-planes, respectively with different initial positions and traverse path, 6 are reconfigured out Item difference bit chain, wherein as shown in Figure 3a, the bit chain specific structure constructed by sortord RM=0 are as follows: (10000 0110100000101011001011001110010) the bit chain specific structure such as Fig. 3 b, constructed by sortord RM=1 Shown: (000111000010100010101001011011100100) is had by the bit chain that sortord RM=2 is constructed Body structure is as shown in Figure 3c: (100001000001010110010000111010111010) are constructed by sortord RM=3 Bit chain specific structure it is as shown in Figure 3d: (010111010010000111001010110100001000), by sortord The bit chain specific structure that RM=4 is constructed is as shown in Figure 3 e: (100001000011010100010000111010111010), The bit chain specific structure constructed by sortord RM=5 is as illustrated in figure 3f: (000100001001010110111000010 010111010).For Fig. 3 a, the depth information fingerprint extraction detailed process of method mode a proposed by the invention are as follows:
The fingerprint character code that the above process is exported is<1,6,8>;
As shown in Figure 3b, the depth information fingerprint extraction detailed process of method mode b proposed by the invention are as follows:
The fingerprint character code that the above process is exported is<0,4,11>;
As shown in Figure 3c, the depth information fingerprint extraction detailed process of method mode c proposed by the invention are as follows:
The fingerprint character code that the above process is exported be<1,6,6>;
As shown in Figure 3d, the depth information fingerprint extraction detailed process of side's mode d proposed by the invention are as follows:
The fingerprint character code that the above process is exported be<1,3,11>;
As shown in Figure 3 e, the depth information fingerprint extraction detailed process of method mode e proposed by the invention are as follows:
The fingerprint character code that the above process is exported be<1,5,7>;
As illustrated in figure 3f, the depth information fingerprint extraction detailed process of method mode f proposed by the invention are as follows:
The fingerprint character code that the above process is exported be<Isosorbide-5-Nitrae, 8>;
In summary six mode condition codes generated, then the information fingerprint of former data is as follows:
<1,6,8;0,4,11;1,6,6;1,3,11;1,5,7;Isosorbide-5-Nitrae, 8 >,
For specific fingerprint comparison method as shown in Fig. 4 a and Fig. 4 b, Fig. 4 a is to compare process by turn;Fig. 4 b is whole Compare process.In this example, the comparison method by turn as shown in Fig. 4 a: first by the first place of the characteristic fingerprint of information to be compared It compares one by one, i.e., sequence compares < 1;0;1;1;1;1>whether consistent? sequence compares<6 if consistent;4;6;3;5;4 > whether Unanimously? does whether consistent continuation sequence compare<8,11,6,11,7,8>if consistent? think that fingerprint is consistent if consistent;Appoint What numerical value is not to thinking that fingerprint is inconsistent then.
In this example, the overall comparison method as shown in Fig. 4 b: the first bit-planes is calculated first and is constituted and traversal side Formula (a), generates the characteristic value of an information fingerprint and individual features value that band compares is compared, if pressing the calculating of this example As a result then the 1st comparison information be<1,6,8>, if the comparison results are consistent, then take the 2nd value<0,4,11>, if unanimously, according to It is secondary to analogize, until completeer all characteristic values, all consistent, then it is assumed that the information fingerprint of the two is consistent, any one element It compares inconsistent, then it is assumed that the two fingerprint is inconsistent.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (9)

1. a kind of information fingerprint extracting method of the multipath depth coding based on bit space, which is characterized in that including following Step:
Step 1: building bit window;
Step 2: building bit-planes: by bit window according to the extra heavy new segmentation of a certain width versus, it is flat to be arranged in a bit Face;
Step 3: dimensionality reduction constructs bit chain: dimensionality reduction arrangement is carried out by a variety of travel paths modes, constructs different bit chain BC, And record the coding of corresponding arrangement mode;
Step 4: being encoded to bit chain: carrying out traversal statistics for the dimensionality reduction bit chain of above-mentioned construction, successively obtain continuous 0 and continuous 1 number, constitute 1 new decimal system ordered series of numbers: carry out binarization acquisition again for gained decimal system ordered series of numbers The non-zero binary bits chain in one new first place, repeats above-mentioned traversal statistical counting to gained bit chain and operates, circuit sequentially Iteration records the value and loop iteration of finally obtained element until the element in the decimal system ordered series of numbers newly obtained is 1 Number is a characteristic value of information fingerprint feature space;
Step 5: fingerprint comparison.
2. the information fingerprint extracting method of the multipath depth coding according to claim 1 based on bit space, special Sign is: in step 1, constructing bit window, bit according to the effective bit that can uniquely characterize original information bytes stream The size of window is any positive integer m for being not more than raw information total bit number (Tb), for different system or byte stream, m Value can change;In step 2, original information word throttling window is divided by the width of m bit and is arranged side by side one by one Column, lowest order 0, highest order m-1 obtain the bit-planes of m row * n column;It is n that the m, which divides exactly Tb, if cannot be whole Except then being omitted or mended 0 processing to remainder.
3. the information fingerprint extracting method of the multipath depth coding according to claim 2 based on bit space, special Sign is, in step 3, carries out dimensionality reduction arrangement by following travel paths modes, constructs different bit chains, and records corresponding Arrangement mode coding:
The first sortord: by above-mentioned bit-planes using m as fixed step size, since a high position for the first row, from the height of the row It ranks to low level to arrange and carries out dimensionality reduction sequence, the tail position bit of low level row is connected with the first bit of high-order row, and two-dimensional bits are put down Face is converted to a length and is the one-dimensional bit chain of m*n, and records the encoded radio of above-mentioned sortord;
Second of sortord: by above-mentioned bit-planes using m as fixed step size, since the tail position of the first row, from the low of the row It ranks to high-order column and carries out dimensionality reduction sequence, the high order bit of low level row is connected with the tail position bit of high-order row, two-dimensional bits are put down Face is converted to a length and is the one-dimensional bit chain of m*n, and records the encoded radio of above-mentioned sortord;
The third sortord: by above-mentioned bit-planes using n as fixed step size, since the high-order column of the first row, from the column Low level row to high-order row carries out dimensionality reduction sequence, and the low level row bit of low level column is connected with the high-order row bit of high-order column, will be two-dimentional Bit-planes are converted to a length and are the one-dimensional bit chain of n*m, and record the encoded radio of above-mentioned sortord;
4th kind of sortord: by above-mentioned bit-planes using n as fixed step size, since ranking the tail of the first row, from the column Low level row to high-order row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the low level row bit of high-order column, will be two-dimentional Bit-planes are converted to a length and are the one-dimensional bit chain of n*m, and record the encoded radio of above-mentioned sortord;
5th kind of sortord: by above-mentioned bit-planes using n as fixed step size, since ranking the tail of the first row, from the column Low level row to high-order row carries out dimensionality reduction sequence, and the high-order row bit of low level column is connected with the high-order row bit of high-order column, until low The lowest order row bit ranked is connected with the lowest order row bit of high-order column, and so on, it is one by two-dimensional bits plane conversion A length is the one-dimensional bit chain of n*m, and records the encoded radio of above-mentioned sortord;
6th kind of sortord: by above-mentioned bit-planes using n as fixed step size, since the first column of last line, from the column High-order row to low level row carry out dimensionality reduction sequence, the low level row bit of high-order column is connected with the low level row bit that low level arranges, up to The lowest order row bit of high-order column is connected with the lowest order row bit that low level arranges, and so on, it is by two-dimensional bits plane conversion One length is the one-dimensional bit chain of n*m, and records the encoded radio of above-mentioned sortord.
4. the information fingerprint extracting method of the multipath depth coding according to claim 3 based on bit space, special Sign is, the step 4 the following steps are included:
Step 41 tentatively encodes bit chain;The dimensionality reduction bit chain is subjected to the first coding, that is, records above-mentioned bit chain The 1st bit be then encoded to 0 if it is 0;If it is 1, then 1 is encoded to;First place coding represents the starting of corresponding bits chain Position is 0 or 1;
Step 42 carries out simplifying run-length encoding to the dimensionality reduction bit chain of formation, i.e. order statistics continuous 0/1 from the beginning to the end The number of appearance, obtains the nonzero integer sequence of a corresponding bit chain, and initial code depth value is arranged, and then carries out deep Degree coding;
Step 43 carries out depth coding to bit chain;
It is described to bit chain carry out depth coding method the following steps are included:
Step A, resulting nonzero integer sequence is subjected to binarization again, omits the of the corresponding binary digit of each integer The 0 of one 1 front, building form 0/1 new bit chain;
Step B, and the length of the bit chain is counted, and loop iteration depth value is added 1;
Step C, it carries out the bit chain newly obtained substitution original bit chain to simplify run-length encoding again, counts 0/1 continuously occurred Number constitute new nonzero integer sequence.
5. the information fingerprint extracting method of the multipath depth coding according to claim 4 based on bit space, special Sign is that circulation executes step A to step C, until the length of the nonzero integer sequence newly obtained is 1, i.e., only 1 meter Numerical value, and record this value to terminate encoded radio, while recording final coding depth value, i.e., above-mentioned loop iteration number.
6. the information fingerprint extracting method of the multipath depth coding according to claim 5 based on bit space, special Sign is, reruns step 41 to 43, until obtaining six sequence sides by a variety of bit chains that different sortords are constructed The triple of formula<first place value, termination encoded radio, coding depth value>until, the triple of six sortords includes first The triple of triple, the third sortord construction that the triple of kind sortord construction, second of sortord construct, The three of triple and the 6th kind of sortord construction that the triple of 4th kind of sortord construction, the 5th kind of sortord construct Tuple;It is the fingerprint of the information by the feature space that six triples constitute raw information.
7. the information fingerprint extracting method of the multipath depth coding according to claim 6 based on bit space, special Sign is, generates the comparison method by turn of the fingerprint of the information, comprising the following steps:
(1) the first to compare: it is preferential to compare corresponding the first bit value for same bit chain structural model, if there is any one A corresponding the first bit value is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
(2) encoded radio compares: if the first bit value is identical, continuing to compare corresponding final encoded radio, if there is any one A corresponding final encoded radio is different, then illustrates that the two fingerprint mismatches, terminate to compare;If identical, turn in next step;
(3) coding depth value compares: if the first and final encoded radio is all identical, continue to compare corresponding coding depth value, That is the number of iterations value then illustrates that the two fingerprint mismatches, terminates ratio if there is any one corresponding coding depth value is different It is right;If identical, then it is assumed that the two fingerprint matching.
8. the information fingerprint extracting method of the multipath depth coding according to claim 7 based on bit space, special Sign is, the unified comparison of joint of the fingerprint of the generated information;The method of the unified comparison of joint are as follows: by the first The triple of sortord construction is compared according to the comparison method by turn of the fingerprint of the information, such as if they are the same, by second The triple of kind sortord construction is compared by the comparison method by turn of the fingerprint of the information, such as if they are the same, by third The triple of kind sortord construction is compared by the comparison method by turn of the fingerprint of the information, such as if they are the same, by the 4th The triple of kind sortord construction is compared by the comparison method by turn of the fingerprint of the information, such as if they are the same, by the 5th The triple of kind sortord construction is compared by the comparison method by turn of the fingerprint of the information, such as if they are the same, by the 6th The triple of kind sortord construction is compared by the comparison method by turn of the fingerprint of the information;If according to identical sequence mould The triple that the path that formula constructs is formed by six sortords is all consistent, then fingerprint is all consistent, then it is assumed that information one It causes, otherwise it is assumed that information is inconsistent.
9. the information fingerprint extracting method of the multipath depth coding according to claim 2 based on bit space, special Sign is, the integral number power that the value of the m is 2.
CN201610119377.8A 2016-03-02 2016-03-02 A kind of information fingerprint extracting method of the multipath depth coding based on bit space Active CN105844214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610119377.8A CN105844214B (en) 2016-03-02 2016-03-02 A kind of information fingerprint extracting method of the multipath depth coding based on bit space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610119377.8A CN105844214B (en) 2016-03-02 2016-03-02 A kind of information fingerprint extracting method of the multipath depth coding based on bit space

Publications (2)

Publication Number Publication Date
CN105844214A CN105844214A (en) 2016-08-10
CN105844214B true CN105844214B (en) 2019-06-21

Family

ID=56586862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610119377.8A Active CN105844214B (en) 2016-03-02 2016-03-02 A kind of information fingerprint extracting method of the multipath depth coding based on bit space

Country Status (1)

Country Link
CN (1) CN105844214B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464187B (en) * 2020-04-17 2023-04-28 北京百瑞互联技术有限公司 Host control interface command event coding method, storage medium and computer equipment
CN115470508B (en) * 2022-11-02 2023-01-31 北京点聚信息技术有限公司 Format file vectorization encryption method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7098815B1 (en) * 2005-03-25 2006-08-29 Orbital Data Corporation Method and apparatus for efficient compression
CN102323934A (en) * 2011-08-31 2012-01-18 深圳市彩讯科技有限公司 Mail fingerprint extraction method based on sliding window and mail similarity judging method
CN102354354A (en) * 2011-09-28 2012-02-15 辽宁国兴科技有限公司 Information fingerprint technique based picture password generation and authentication method
CN103258156A (en) * 2013-04-11 2013-08-21 杭州电子科技大学 Method for generating secret key on basis of fingerprint characteristics
CN103425639A (en) * 2013-09-06 2013-12-04 广州一呼百应网络技术有限公司 Similar information identifying method based on information fingerprints

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7098815B1 (en) * 2005-03-25 2006-08-29 Orbital Data Corporation Method and apparatus for efficient compression
CN102323934A (en) * 2011-08-31 2012-01-18 深圳市彩讯科技有限公司 Mail fingerprint extraction method based on sliding window and mail similarity judging method
CN102354354A (en) * 2011-09-28 2012-02-15 辽宁国兴科技有限公司 Information fingerprint technique based picture password generation and authentication method
CN103258156A (en) * 2013-04-11 2013-08-21 杭州电子科技大学 Method for generating secret key on basis of fingerprint characteristics
CN103425639A (en) * 2013-09-06 2013-12-04 广州一呼百应网络技术有限公司 Similar information identifying method based on information fingerprints

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Biometric key generation with a parametric linear classifier;Alper Kanak 等;《2009 IEEE 17th Signal Processing and Communications Applications Conference》;20090626;全文
基于数字水印的可追踪电子文档保护系统研究与实现;周星;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130315;第2013卷(第03期);全文

Also Published As

Publication number Publication date
CN105844214A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN103236847B (en) Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding
Bille et al. Random access to grammar-compressed strings
CN108920720A (en) The large-scale image search method accelerated based on depth Hash and GPU
Fusco et al. Indexing million of packets per second using GPUs
CN105144157B (en) System and method for the data in compressed data library
WO2018032368A1 (en) Block chain system data processing method based on compressed sensing and sparse reconstruction algorithm
CN104348490A (en) Combined data compression algorithm based on effect optimization
CN104765872A (en) Fast image retrieval method based on integrated hash encoding
CN104021234B (en) Large-scale image library retrieval method based on self-adaptive bit allocation Hash algorithm
CN111125119A (en) HBase-based spatio-temporal data storage and indexing method
CN113094346A (en) Big data coding and decoding method and device based on time sequence
CN105844214B (en) A kind of information fingerprint extracting method of the multipath depth coding based on bit space
CN104881449A (en) Image retrieval method based on manifold learning data compression hash
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN107273471A (en) A kind of binary electric power time series data index structuring method based on Geohash
CN109598334A (en) A kind of sample generating method and device
WO2023202149A1 (en) State selection method and system for finite state entropy encoding, and storage medium and device
CN110489606B (en) Packet Hilbert coding and decoding method
CN104125475A (en) Multi-dimensional quantum data compressing and uncompressing method and apparatus
CN104408039B (en) Structure and its querying method based on Hilbert curves Yu R tree HBase multi-dimensional query systems
CN114782148A (en) Agricultural product purchase management platform and business data compression method thereof
CN117177100B (en) Intelligent AR polarized data transmission method
CN116055559B (en) Data exchange format processing method and device
CN104679775B (en) A kind of data processing method based on Huffman table
CN202931290U (en) Compression hardware system based on GZIP

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant