CN109918398A

CN109918398A - A kind of data search classification method based on block chain

Info

Publication number: CN109918398A
Application number: CN201910142088.3A
Authority: CN
Inventors: 符安文
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-06-21

Abstract

The invention discloses a kind of data search classification methods based on block chain, and described method includes following steps: S1: object t each in the data-link in a data space is set as with m attribute, and the ith attribute reciprocal fraction value s of t_i(t), each object t has unique identifier OID；S2: object and its corresponding attribute scores are stored in different list L according to the difference of attribute₁、L₂……L_mOn；S3: by list L₁、L₂……L_mIt is initialized as m TF table, and TF table is denoted as h₁、h₂、……h_m, and the location index p of m TF table is set_i；S4: next object identity of location index is accessed, according to table h_iIn location index, directly access h_iIn (p_i+ 1) object identity；S5: according to p_i(p_i+ 1) index generates optimization location index point, is scanned for by the optimization location index point to data.

Description

A kind of data search classification method based on block chain

Technical field

The present invention relates to a kind of data search classification methods, and in particular to a kind of data search classification side based on block chain Method.

Background technique

Publicly-owned chain refers to the whole world, and anyone can enter at any time in system and reads data, sends and can confirm transaction, competing Strive the block chain of book keeping operation.Publicly-owned chain is typically considered complete decentralization, because nobody or mechanism can control Or distort the read-write of wherein data.Publicly-owned chain generally can encourage participant to compete book keeping operation by token mechanism, to ensure data Safety.Bit coin, ether coin are all typical publicly-owned chains.Main feature: user influences from developer, all data are silent It is low to recognize disclosure, access threshold.It is the block chain controlled by some organisations and institutions that the privately owned privately owned chain of chain, which refers to that permission is written in it,.Ginseng Qualification with node can be by stringent limitation, and since the node of participation is limited with controllably, privately owned chain can often have The transactions velocity that is exceedingly fast, lower transaction cost, is not easy by malicious attack and can accomplish body at better secret protection The necessary requirements of financial industry such as part certification.Compared to centralization database, privately owned chain can in anti-locking mechanism single node it is deliberately hidden It hides from or altered data.Even if mistake occurs, source can be also found that quickly, therefore many large-scale financial companies are more likely to make With privately owned chain technology.Main feature: it is preferably ensured to privacy, transaction cost is greatly lowered, transactions velocity is very fast.Connection Meng Lian alliance chain refers to that by the block chain of the common participative management of several mechanisms, each mechanism runs one or more sections Point, mechanism of the data therein only in permission system be written and read and send transaction, and records transaction data jointly.It is privately owned Design privacy authority between chain and alliance's chain has difference, and the Permission Design in alliance's chain requires often more complicated.Perhaps Can chain license chain refer to that each node participated in block catenary system is by license.Unauthorized node is can not In access system.Therefore, privately owned chain and alliance's chain belong to license chain.It should be noted that some license chains do not have token machine System, because not needing token to encourage node competition book keeping operation.Side chain is that side chain is also a block chain, it, which is able to verify that, comes From the data of other block chains, the mutual phase transfer between block chain of bit coin and other assets can be realized, form one entirely Newly open development platform.Side chain can allow block chain to realize preferably performance and secret protection.They can also extend to prop up The currency in various assets, such as stock, bond, real world or virtual world is held, intelligent contract, safe place can also be increased The function of reason and real world property registration etc.

Summary of the invention

The technical problem to be solved by the present invention is to the data search mode classifications at present in block chain can not be accurate Target data is found, or looks for data time too long, larger to the burden of whole system, it is an object of that present invention to provide a kind of bases In the data search classification method of block chain, solve that data classification search time in current technology is too long, and system response time is very Long problem.

The present invention is achieved through the following technical solutions:

A kind of data search classification method based on block chain, described method includes following steps: S1: by a data sky Each object t is set as with m attribute, and the ith attribute reciprocal fraction value s of t in interior data-link_i(t), each right As t has unique identifier OID；S2: object and its corresponding attribute scores are stored in difference according to the difference of attribute List L₁、L₂……L_mOn；S3: by list L₁、L₂……L_mIt is initialized as m TF table, and TF table is denoted as h₁、h₂、…… h_m, and the location index p of m TF table is set_i；S4: next object identity of location index is accessed, according to table h_iIn position Index, directly access h_iIn (p_i+ 1) object identity；S5: according to p_i(p_i+ 1) index generates optimization location index point, Data are scanned for by the optimization location index point.

Currently, the field with bloom filter (BF) i.e. Bloom filter is very wide in data search classification field, It is the Processing Algorithm in mass data, core concept is exactly to solve each data using multiple and different Hash functions to generate " conflict ", although having great advantage in processing mass data, have if simple carries out data processing with the algorithm It certain misrecognition and deletes situations such as difficult, reason is that it is determining input element number n, determines bit array m's When size and hash function number, as hash function number k=(ln2) * (m/n), error rate is minimum.It is not more than in error rate In the case where E, m will could at least indicate the set of any n element equal to n*lg (1/E).Therefore hash function number not When in section, error rate can be significantly risen, and in the prior art, can be used the previous step of a redundancy, be carried out data classification, Give the data in set to the screening that Bloom filter carries out mass data.

The technology that present specification uses is data search classification to be carried out, in order to protect by three-dimensional stepping Bloom filter The speed of data search is demonstrate,proved, each object of the chain data in block chain has many attributes, and carries out in each object properties Fractional value determines that the fractional value determines that list can be formed according to the data volume of object properties, and Bloom filter passes through the list On location index, it may be convenient to table data is scanned for, and because table data be classified in advance, Data need not classify when rear end is inquired, and can accelerate the speed of search algorithm processing, reduce system when handling data Response time, entire data search can be allowed more accurate, and because predeterminated position indexes available optimization position, can Time of the homogeneous data is searched for next time faster to allow, so that whole system can still be kept after a large amount of operations Efficient data search.

Further, list L is stored_iFor the data acquisition system of a n member, including the object i-th dimension that is identified with OID The score of attribute, L_i={ (OID, s_i(t): t ∈ DS) }, the DS is the data space with n object.

Further, in list object according to score on the section [0-1], and in lists according to score size be in descending Arrangement.

Further, the TF table initialized in the step S3, by (OID, s_iAnd the corresponding location index P of OID (t))_i Constitute a three-dimensional stepping Bloom filter table.

Further, the three-dimensional stepping Bloom filter table is divided into 3 grades in 1≤i≤m, h_i={ n, { ec_i1,ec_i2, ec_i3},{m₁,m₂,m₃},{k₁,k₂,k₃, wherein ec_i1,ec_i2,ec_i3Indicate the additional I/ when inquiry failure occurs for 3 grades of elements O expense, m₁,m₂,m₃Respectively indicate the length of the vector V of 3 grades of Bloom filters, k₁,k₂,k₃Indicate the corresponding hash of 3 grades of elements The number of function.

Compared with prior art, the present invention having the following advantages and benefits:

1, a kind of data search classification method based on block chain of the present invention should by using a list is preset List can provide preferable set for the Bloom filter of subsequent processing, can sort out Bloom filter and scan for mistake The higher object of rate carries out fast search to the object in list；

2, a kind of data search classification method based on block chain of the present invention can accelerate the speed of search algorithm processing, Response time of the reduction system when handling data, entire data search can be allowed more accurate；

3, a kind of data search classification method based on block chain of the present invention, because predeterminated position indexes available optimization Position can allow and search for time of the homogeneous data next time faster, so that whole system still may be used after a large amount of operations To keep efficient data search.

Detailed description of the invention

Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:

Fig. 1 is two example schematic of the embodiment of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.

Embodiment one

A kind of data search classification method based on block chain of the present invention, a kind of data search classification side based on block chain Method, described method includes following steps: S1: object t each in the data-link in a data space is set as belonging to m Property, and the ith attribute reciprocal fraction value s of t_i(t), each object t has unique identifier OID；S2: according to attribute Difference object and its corresponding attribute scores are stored in different list L₁、L₂……L_mOn；S3: by list L₁、L₂…… L_mIt is initialized as m TF table, and TF table is denoted as h₁、h₂、……h_m, and the location index p of m TF table is set_i；S4: access position The next object identity for setting index, according to table h_iIn location index, directly access h_iIn (p_i+ 1) object identity；S5: According to p_i(p_i+ 1) index generates optimization location index point, is scanned for by the optimization location index point to data.

The technology that present specification uses is data search classification to be carried out, in order to protect by three-dimensional stepping Bloom filter Demonstrate,prove data search speed, by each object of the chain data in block chain have many attributes, and each object properties into Row fractional value determines that the fractional value determines that list can be formed according to the data volume of object properties, and Bloom filter passes through the column Location index on table, it may be convenient to table data is scanned for, and because table data is classified in advance, because This data need not classify when rear end is inquired, and can accelerate the speed of search algorithm processing, reduce system in processing data When response time, entire data search can be allowed more accurate, and because predeterminated position indexes available optimization position, It can allow and search for time of the homogeneous data next time faster, so that whole system can still be protected after a large amount of operations Hold efficient data search.

OID Technology application in modern Internet of Things but is not yet had in the field of data search of block chain than wide Use, OID be by ISO/IEC, ITU-T International Organization for standardization eighties in last century joint propose identifier mechanism, using point Layer tree structure carries out the whole world to any kind of object and unambiguously, uniquely names.It is used in the object identity of data object On, can with the difference between effective district divided data object and data object, can allow each object t will unique identifier, Different list L can be made according to different classifications according to these identifiers₁、L₂……L_m, these lists can be convenient subsequent Bloom filter enter carry out data search, for the ease of quickly scanning for, have location index in these lists It is allowed to enter, and object is arranged according to score size in descending on the section [0-1], and in lists according to score in list Column.As shown in table 1.

P_i	(OID,S₁(t))	(OID,S₂(t))	(OID,S₃(t))
				0	(8,0.70)	(9,0.68)	(4,0.50)
1	(6,0.68)	(7,0.47)	(3,0.39)
				2	(7,0.37)	(11,0.35)	(6,0.28)
3	(5,0.26)	(3,0.24)	(1,0.27)
				4	(2,0.25)	(6,0.13)	(7,0.16)

The example that table 1 has 3 attribute list files

During this data search, it is first determined location index PI, such as in a two-dimensional space, two-dimensional space DS={ x₁,x₂…,x_n, if object x_i∈ DS, if x_i+1It is the xth of DS_i+1A tuple, then x_i+1Corresponding position is position Index PI.

When determining optimization location index, during a data scanning, if attribute list L_iIn be accessed to Location index set is denoted as { 0,1,2,3 ... I, j } (wherein 0≤i≤j≤n), if meeting i+1 < j, i.e., in L_iIn from location index 0 is all accessed to the data between i, and indexes that (i+1) is not visited, we may be considered i in this visit for it Optimal location index.

Store list L_iFor the data acquisition system of a n member, point of the attribute including the object i-th dimension being identified with OID Number, L_i={ (OID, s_i(t): t ∈ DS) }, the DS is the data space with n object.It is initialized in the step S3 TF table, by (OID, s_i(t)) and the corresponding location index PI of OID constitutes a three-dimensional stepping Bloom filter table (h_i).It is described Three-dimensional stepping Bloom filter table is divided into 3 grades in 1≤i≤m, h_i={ n, { ec_i1,ec_i2, ec_i3},{m₁,m₂,m₃},{k₁, k₂,k₃, wherein ec_i1,ec_i2,ec_i3Indicate the additional I/O expense when inquiry failure occurs for 3 grades of elements, m₁,m₂,m₃Table respectively Show the length of the vector V of 3 grades of Bloom filters, k₁,k₂,k₃Indicate the number of the corresponding hash function of 3 grades of elements.

Embodiment two

As shown in Figure 1, understanding for the ease of those skilled in the art, the present embodiment is to the Bu Long in above-described embodiment Filter table is specifically described, and 3 dimensions element { 4,6,0.13 } that the 5th in table 1 group data and PI are constituted are as mapping Data acquisition system, come illustrate TF table how to support triplet information store and inquiry.If TF uses 3 Bloom filter BF₁、 BF₂、BF₃Respectively indicate PI, OID and s_i(t).If m1=m2=m3=8bit, in order to which technical staff understands, we only make Illustrate the process with 3 hash functions.3 hash functions are as follows: g₁(z)=100z mod8, g₂(z)=(100z+3) mod8, g₃(z)=(100z+5) mod8.In BF₁It is middle to use g₁(z) and g₂(z), in BF₂And BF₃It is middle using 3 functions realize storage and Inquiry.

As PI=4, g is obtained₁(4)=100x4mod8=0, g₁(4)=(100x4+3) mod8=3.Therefore BF₁[0] and BF₁[3] set；As OID=6, g is obtained₁(6)=100x6mod8=0, g₂(6)=(100x6+3) mod8=3, g₃(6)= (100x6+5) mod8=5, therefore BF₂[0], BF₂[3], BF₂[5] set；Similarly it is found that as insertion s_i(t)=0.13 when, BF₃ [0], BF₃[2], BF₃[5] set.

When executing access operation, if accessing h_iThe object of middle OID=6, TKBFP algorithm is first in h₁BF₂Middle use Hash operation, inquiring the object whether there is, and extract BF if it exists₁In location index P_iAnd obtain BF₃In local score, It carries out calculating global score.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of data search classification method based on block chain, which is characterized in that described method includes following steps:

S1: object t each in the data-link in a data space is set as with m attribute, and the ith attribute pair of t Answer fractional value s_i(t), each object t has unique identifier OID；

S2: object and its corresponding attribute scores are stored in different list L according to the difference of attribute₁、L₂……L_mOn；

S3: by list L₁、L₂……L_mIt is initialized as m TF table, and TF table is denoted as h₁、h₂、……h_m, and m TF table is set Location index p_i；

S4: next object identity of location index is accessed, according to table h_iIn location index, directly access h_iIn (p_i+1) Object identity；

S5: according to p_i(p_i+ 1) index generates optimization location index point, is searched by the optimization location index point to data Rope.

2. a kind of data search classification method based on block chain according to claim 1, which is characterized in that storage list L_iFor the data acquisition system of a n member, the score of the attribute including the object i-th dimension being identified with OID, L_i={ (OID, s_i (t): t ∈ DS) }, the DS is the data space with n object.

3. a kind of data search classification method based on block chain according to claim 1, which is characterized in that right in list As being arranged according to score size in descending on the section [0-1], and in lists according to score.

4. a kind of data search classification method based on block chain according to claim 1, which is characterized in that in step S3 The TF table of initialization, by (OID, s_iAnd the corresponding location index P of OID (t))_iConstitute a three-dimensional stepping Bloom filter table.

5. a kind of data search classification method based on block chain according to claim 1, which is characterized in that three-dimensional stepping Bloom filter table is divided into 3 grades in 1≤i≤m, h_i={ n, { ec_i1,ec_i2,ec_i3},{m₁,m₂,m₃},{k₁,k₂,k₃, Middle ec_i1,ec_i2,ec_i3Indicate the additional I/O expense when inquiry failure occurs for 3 grades of elements, m₁,m₂,m₃It is grand to respectively indicate 3 grades of cloth The length of the vector V of filter, k₁,k₂,k₃Indicate the number of the corresponding hash function of 3 grades of elements.