CN112416941A - Block chain-based rapid data retrieval method and system - Google Patents
Block chain-based rapid data retrieval method and system Download PDFInfo
- Publication number
- CN112416941A CN112416941A CN202011369688.2A CN202011369688A CN112416941A CN 112416941 A CN112416941 A CN 112416941A CN 202011369688 A CN202011369688 A CN 202011369688A CN 112416941 A CN112416941 A CN 112416941A
- Authority
- CN
- China
- Prior art keywords
- data
- time
- block
- stored
- theta
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
- G06F21/645—Protecting data integrity, e.g. using checksums, certificates or signatures using a third party
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2143—Clearing memory, e.g. to prevent the data from being stolen
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of a block chain, and discloses a block chain-based rapid data retrieval method, which comprises the following steps: acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data; storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method; slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data; constructing a time sequence index by using an index construction method based on time sequence; and quickly retrieving the data by utilizing the time sequence index of the data. The invention also provides a rapid data retrieval system based on the block chain. The invention realizes the retrieval of data.
Description
Technical Field
The invention relates to the technical field of block chains, in particular to a block chain-based rapid data retrieval method and a block chain-based rapid data retrieval system.
Background
Along with the rapid popularization of social networks, intelligent hardware, mobile internet and internet of things, the value implied by big data can be displayed to a greater extent, and a new era which pays more attention to the data value and data openness is fortunately coming. Along with this, the fields of business, scientific research, public service and the like all put forward urgent needs for the open sharing of big data, however, due to the lack of a safe and credible data sharing environment, the big data is still stored and controlled by various government agencies, business enterprises, scientific research institutions and even individuals, so that a data island is formed, which seriously affects the sharing and opening of the big data.
The block chain is attracted by attention of various industries due to the characteristics of decentralized trust, complete distribution and the like, and the appearance of the block chain is capable of breaking a large data sharing barrier and realizing trusted data interconnection. The existing block chain is used as a decentralized distributed shared database, each node is required to store complete block data, and with the increase of the number of nodes in a system and the complexity of transaction, the nodes need more and more local storage spaces to store the block data, so that the bottleneck of the block chain in practical application is formed; meanwhile, the current block chain scheme does not support temporal data processing, and efficient query processing is prevented by sequential access based on block files in a block chain.
In view of this, how to optimize the storage manner of data in the block chain to achieve more efficient data retrieval is a problem to be urgently solved by those skilled in the art.
Disclosure of Invention
The invention provides a block chain-based rapid data retrieval method, which is characterized in that a data coding scheme based on a production line is adopted to code data so as to store coded data into a block chain, and the locally stored block data is subjected to data slicing, so that the block data is reconstructed, and the storage optimization of the block chain is realized; meanwhile, a temporal index of the block data is established by using a time-based index construction algorithm, so that the access amount to the block data and a database is reduced, and more efficient data retrieval is realized.
In order to achieve the above object, the present invention provides a fast data retrieval method based on a block chain, which includes:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
construction of time sequence index by using time sequence-based index construction method
And quickly retrieving the data by utilizing the time sequence index of the data.
Optionally, the encoding processing on the data to be stored by using the pipeline-based data encoding scheme includes:
the pipeline-based data coding scheme refers to that coding and decoding calculation processes are operated on different blocks in a pipeline mode, namely for data to be stored o1,o2,...,okCorresponding memory block h of1,h2,...,hn:
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in;
In one embodiment of the invention, the invention uses a classical systematic code (8,4) to convert the data O to (O)1,o2,o3,o4) Encoding into 8-dimensional encoded data C ═ (C)1,...,c8) Wherein the coding algorithm employs a finite field algorithmAfter the coding is finished, the code is storedStorage block hiStoring coded data ci(ii) a In the data decoding process, the block h is storediRespectively encode data ciIs sent to a decoding node niI 1.. 8, then the decoding node n that holds the first encoded data1For coded data c1Performing linear operation, and sending the obtained result to the second coded data c2Decoding node n2,n2Will be from n1The results obtained are compared with c2Performing operation and sending the result to the third coded data n3Decoding node n3According to this method, the decoding process proceeds in a pipelined fashion, and finally from the decoding node n8Raw data were obtained.
Optionally, the storing the copy of the data to be stored into the block link point by using a time-based copy storage method includes:
in detail, when a user uploads or recovers data for the first time, the invention stores a copy of the data in the last blockchain node participating in the encoding and decoding process, and simultaneously takes the data as hot data, which can be read again by the user in a short time;
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time; in the process, the calculation cost is reduced to zero, and the storage cost is increased by four blocks, so that the calculation cost can be reduced at the cost of the storage cost;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factor of the threshold value T comprises the time interval T of two accesses of the data by the user and the existing available network resources. In contrast, the present invention sets the value of the network available computing resource to be C, sets the value of the network available storage space resource to be S, and updates the threshold value when the user accesses the data each time:
wherein:
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, where α, β respectively denote importance of computation resources and storage resources in the blockchain to the blockchain network, and in one embodiment of the present invention, α ═ 0.4 and β ═ 0.6 are set.
Optionally, the slicing the data stored in the blockchain by using the erasure code-based data slicing method includes:
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
wherein:
is a matrix value in the coding matrix; in a specific embodiment of the present invention, the adopted coding matrix is a cauchy matrix-based coding matrix, and the cauchy matrix-based coding matrix is:
xi,yiis an element in the Galois field in which m>n;
3) Because m in the m multiplied by n order coding matrix is larger than n, the quantity of the slice data based on the erasure code is more than the quantity of the original data slices, the invention achieves the aim of saving storage by deleting part of the slice data based on the erasure code, and the node can select to reserve the quantity of the data slices according to different local storage capacities of the node; in a specific embodiment of the invention, in order to ensure that the data slices of the whole network are distributed stably, the nodes delete the coding slices randomly; if the coding matrix G of the system is of order mxn, the original block data is divided into n data slices, and if k data slices are averagely retained for each block of a certain node after the coding is finished and the corresponding storage space optimization efficiency is η, then:
4) the nodes after being coded and deleted have clearly stored transaction information, and when the nodes need complete block information, enough coded data slices need to be acquired from other nodes; because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
Optionally, the flow of the index construction method based on the time sequence is as follows:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1;
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1(ii) a In an embodiment of the present invention, the time division condition is a dynamic interval division condition, i.e. the size of the time interval is determined by measuring both the time calculation and the slice data amount, and a time interval is fixedAnd slice data quantity values, the determination of the time interval must satisfy either of the following two conditions: first, when the time interval is equal to a fixed value, the number of slice data must be equal to or exceed a prescribed slice data amount; second, when the number of slice data is equal to a fixed value, the time interval must be equal to or greater than the fixed value, avoiding the situation where too much or too little index data is formed within a certain time period θ.
Optionally, the process of performing fast retrieval on the data by using the time sequence index of the data includes:
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation; the time sequence connection relation is the interval theta of the time stateiAnd thetajIf present, if present Then call thetaiAnd thetajIs a time sequence connection relation; the timing inclusion relationship is to the temporal interval θiAnd thetajIf present, if presentThen call thetaiAnd thetajIs a timing inclusion relationship;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
In addition, to achieve the above object, the present invention further provides a block chain-based fast data retrieval system, including:
the data acquisition device is used for acquiring data to be stored and encoding the data to be stored by utilizing a data encoding scheme based on a production line;
the data processor is used for storing the coded data into the block chain and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method; meanwhile, slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
and the data retrieval device is used for constructing the time sequence index for the slice data by using the time sequence-based index construction method, so that the data is quickly retrieved by using the time sequence index of the data.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon data retrieval program instructions, which are executable by one or more processors to implement the steps of the implementation method of fast data retrieval based on block chains as described above.
Compared with the prior art, the invention provides a block chain-based rapid data retrieval method, which has the following advantages:
firstly, the invention provides a data coding scheme based on a production line, and during coding, corresponding storage blocks respectively treat stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k; when decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2From coded data 22And i1The decoding operation of (a) results in a decoded intermediate block i2To sum up the steps, the mostFinally decoding block hnObtaining the final decoded data inIn the process of pipeline coding or decoding, no message is transmitted between different blocks, the current block can not obtain the information of other blocks and does not know the information of other blocks, the anonymity between different blocks in the chain link points of the blocks is ensured, a certain block in the chain link nodes of the blocks is attacked, the data information of other blocks in the chain link points of the blocks can not be influenced, and the safety of the data stored in the chain of the blocks is ensured.
The invention also provides a time-based copy storage method, which is characterized in that a time threshold T is determined based on an available computing resource value C and a storage space resource value S of a current block chain network, and the time threshold T is updated when a user accesses data in a block chain each time:
wherein: t represents a threshold value before update; t' represents the updated threshold; α, β ∈ [0, 1], α + β ═ 1, α, β respectively represent the importance of computing resources and storage resources in the network to the network; when a user uploads or recovers data for the first time, the copy of the data is stored in the last blockchain node participating in the encoding and decoding process, and the data is used as hot data, so that the user can read the data again in a short time; if the user reads the data again when the clock T is less than T, the copy of the data can be directly obtained from the last block chain link point, and meanwhile, the calculation overhead is reduced to zero in the process of resetting the clock parameter value to zero, and the storage overhead is increased by four blocks, so that the block chain calculation overhead can be reduced at the expense of the storage overhead; if T is larger than or equal to T, the copy stored in the node is deleted, and the storage space is released; if the time of the object unread by the user is longer than the threshold value, the data is regarded as cold data, the user cannot read the cold data for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space, so that the block chain storage overhead is reduced, and more efficient block chain data storage is realized.
Because the current block chain scheme does not support temporal data processing, efficient query processing is prevented by sequential access based on block files in a block chain; therefore, the invention provides an index construction method based on time sequence, which is realized by setting initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods; using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1(ii) a If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1(ii) a In the data retrieval process, if all data related to data k in a time interval tau are to be retrieved, firstly querying a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target query interval tau, wherein the interval is marked as o (theta (k), tau, and the first theta and the last theta in the o (theta (k), tau) are in a time sequence connection relation; for each theta contained in o (theta (k), tau), performing<k,θ>The block file is analyzed through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result. Query data k in time interval t by traditional retrieval method1,t2]Internal state, requiring full accessIn the time interval, the time sequence index is used, and only o (theta (k), tau) files need to be deserialized, namely the time sequence index is established at a reasonable time interval, so that the access times of the files can be greatly reduced, and the data retrieval efficiency is effectively improved.
Drawings
Fig. 1 is a schematic flowchart of a block chain-based fast data retrieval method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a block chain-based fast data retrieval system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Data are coded by adopting a data coding scheme based on a production line so as to store coded data into a block chain, and data slicing is carried out on the locally stored block data, so that the block data is reconstructed, and the storage optimization of the block chain is realized; meanwhile, a temporal index of the block data is established by using a time-based index construction algorithm, so that the access amount to the block data and a database is reduced, and more efficient data retrieval is realized. Fig. 1 is a schematic diagram illustrating a block chain-based fast data retrieval method according to an embodiment of the present invention.
In this embodiment, the fast data retrieval method based on the block chain includes:
and S1, acquiring the data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data.
Firstly, the invention acquires the data to be stored and utilizes a data coding scheme based on a production line to carry out coding processing on the data to be stored, wherein the data coding scheme based on the production line refers to that coding and decoding calculation processes are operated on different blocks in a production line mode, namely, the data o to be stored1,o2,...,okCorresponding memory block h of1,h2,...,hn:
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in;
In one embodiment of the invention, the invention uses a classical systematic code (8,4) to convert the data O to (O)1,o2,o3,o4) Encoding into 8-dimensional encoded data C ═ (C)1,...,c8) Wherein the coding algorithm employs a finite field algorithmAfter the coding is completed, the block h is storediStoring coded data ci(ii) a In the data decoding process, the block h is storediRespectively encode data ciIs sent to a decoding node niI 1.. 8, then the decoding node n that holds the first encoded data1For coded data c1Performing linear operation, and sending the obtained result to the second coded data c2Decoding node n2,n2Will be from n1The results obtained are compared with c2Performing operation and sending the result to the third coded data c3Decoding node n3According to this method, the decoding process proceeds in a pipelined fashion, and finally from the decoding node n8Raw data were obtained.
And S2, storing the coded data into the block chain, and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method.
Further, for the encoded data, the invention stores it into the corresponding block chain block; in one embodiment of the invention, for the encoded data c1,c2,...,cnThe invention stores the data into the corresponding block hiWherein i is 1.. multidot.n;
further, the invention stores the copy of the data to be stored into the block chain node by using a time-based copy storage method, in detail, when a user uploads or recovers the data for the first time, the invention stores the copy of the data into the last block chain node participating in the encoding and decoding process, and simultaneously takes the data as hot data, so that the user can read the data again in a short time;
the time-based copy storage method comprises the following steps:
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time; in the process, the calculation cost is reduced to zero, and the storage cost is increased by four blocks, so that the calculation cost can be reduced at the cost of the storage cost;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factor of the threshold value T comprises the time interval T of two accesses of the data by the user and the existing available network resources. In contrast, the present invention sets the value of the network available computing resource to be C, sets the value of the network available storage space resource to be S, and updates the threshold value when the user accesses the data each time:
wherein:
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, where α, β respectively denote importance of computation resources and storage resources in the blockchain to the blockchain network, and in one embodiment of the present invention, α ═ 0.4 and β ═ 0.6 are set.
S3, the data stored in the blockchain is sliced in the erasure code based data slicing method, and erasure code based slice data is obtained.
Further, the invention uses a data slicing mode based on erasure codes to slice the data stored in the block chain, wherein the data slicing mode based on erasure codes is as follows:
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
wherein:
is a matrix value in the coding matrix; in a specific embodiment of the present invention, the adopted coding matrix is a cauchy matrix-based coding matrix, and the cauchy matrix-based coding matrix is:
xi,yiis an element in the Galois field in which m>n;
3) Because m in the m multiplied by n order coding matrix is larger than n, the quantity of the slice data based on the erasure code is more than the quantity of the original data slices, the invention achieves the aim of saving storage by deleting part of the slice data based on the erasure code, and the node can select to reserve the quantity of the data slices according to different local storage capacities of the node; in a specific embodiment of the invention, in order to ensure that the data slices of the whole network are distributed stably, the nodes delete the coding slices randomly; if the coding matrix G of the system is of order mxn, the original block data is divided into n data slices, and if k data slices are averagely retained for each block of a certain node after the coding is finished and the corresponding storage space optimization efficiency is η, then:
4) the nodes after being coded and deleted have clearly stored transaction information, and when the nodes need complete block information, enough coded data slices need to be acquired from other nodes; because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
And S4, constructing the time sequence index for the slice data by using a time sequence-based index construction method.
Further, the invention constructs the time sequence index for the slice data by using a time sequence-based index construction method, wherein the time sequence-based index construction method comprises the following steps:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1;
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Submitting to a block file, updating the index into historical data, creatingNew time interval thetan+1(ii) a In an embodiment of the present invention, the time division condition is a dynamic interval division condition, that is, the size of the time interval is determined by measuring both the time calculation and the slice data amount, and by fixing a value of the time interval and the slice data amount, the time interval must be determined in any one of the following two cases: first, when the time interval is equal to a fixed value, the number of slice data must be equal to or exceed a prescribed slice data amount; second, when the number of slice data is equal to a fixed value, the time interval must be equal to or greater than the fixed value, avoiding the situation where too much or too little index data is formed within a certain time period θ.
And S5, quickly searching the data by using the time sequence index of the data.
Furthermore, according to the temporal index of the data, the invention realizes the rapid retrieval of the data in the block chain;
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation; the time sequence connection relation is the interval theta of the time stateiAnd thetajIf present, if present Then call thetaiAnd thetajIs a time sequence connection relation; the timing inclusion relationship is to the temporal interval θiAnd thetajIf present, if presentThen call thetaiAnd thetajIs a timing inclusion relationship;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the operating system is Linux CentOS 6.9, and the memory is 16G; the contrast retrieval method is a data retrieval method based on Hash index storage, a data retrieval method based on reverse index storage and an index-free data retrieval method.
In the algorithm experiment, 5G data is collected, a comparison algorithm and the algorithm provided by the invention are used for storage and retrieval, and the time required by the retrieval completion is used as an evaluation index of the data retrieval method.
According to the experimental result, the retrieval time of the data retrieval method based on Hash index storage is 1.2s, the retrieval time of the data retrieval method based on inverted index storage is 0.68s, and the retrieval time of the non-index data retrieval method is 0.72 s.
The invention also provides a block chain-based rapid data retrieval system. Fig. 2 is a schematic diagram illustrating an internal structure of a block chain-based fast data retrieval system according to an embodiment of the present invention.
In the present embodiment, the block chain-based fast data retrieval system 1 at least includes a data acquisition device 11, a data processor 12, a data retrieval device 13, a communication bus 14, and a network interface 15.
The data acquisition device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The data processor 12 includes at least one type of readable storage medium including flash memory, hard disks, multi-media cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The data processor 12 may in some embodiments be an internal storage unit of the blockchain based fast data retrieval system 1, for example a hard disk of the blockchain based fast data retrieval system 1. The data processor 12 may also be an external storage device of the block chain based fast data retrieval system 1 in other embodiments, such as a plug-in hard disk provided on the block chain based fast data retrieval system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the data processor 12 may also comprise both an internal storage unit and an external storage device of the blockchain based fast data retrieval system 1. The data processor 12 can be used not only to store application software installed in the block chain based fast data retrieval system 1 and various kinds of data, but also to temporarily store data that has been output or will be output.
The data retrieving device 13 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the data processor 12 or Processing data, such as data retrieving program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.
Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the block chain based fast data retrieval system 1 and for displaying a visualized user interface, among others.
While FIG. 2 only shows the fast data retrieval system 1 with components 11-15 and based on blockchain, those skilled in the art will appreciate that the structure shown in FIG. 1 does not constitute a limitation of the blockchain based fast data retrieval system 1 and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the data processor 12 has stored therein data retrieval program instructions; the steps of the data retrieval device 13 executing the data retrieval program instructions stored in the data processor 12 are the same as the implementation method of the block chain based fast data retrieval method, and are not described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon data retrieval program instructions, which are executable by one or more processors to implement the following operations:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
construction of time sequence index by using time sequence-based index construction method
And quickly retrieving the data by utilizing the time sequence index of the data.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (8)
1. A block chain-based fast data retrieval method is characterized in that the method comprises the following steps:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
constructing a time sequence index by using an index construction method based on time sequence;
and quickly retrieving the data by utilizing the time sequence index of the data.
2. The method as claimed in claim 1, wherein the encoding process of the data to be stored by using the pipeline-based data encoding scheme includes:
the pipeline-based data coding scheme refers to that coding and decoding calculation processes are operated on different blocks in a pipeline mode, namely for data to be stored o1,o2,...,okCorresponding memory block h of1,h2,...,hn:
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in。
3. The method as claimed in claim 2, wherein the storing the copy of the data to be stored into the block link point by using the time-based copy storage method comprises:
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factors of the threshold value T comprise a time interval T of the data accessed by the user twice and the existing available network resources; by setting the value of the network available computing resource as C and the value of the network available storage space resource as S, updating the threshold value when the user accesses the data each time:
wherein:
t represents a threshold value before update;
t' represents the updated threshold;
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, α ═ 0.4, and β ═ 0.6 are set.
4. The method as claimed in claim 3, wherein the slicing process for the data stored in the blockchain by using the erasure code based data slicing method comprises:
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
wherein:
3) According to different local storage capacities of the nodes, the nodes randomly delete the coding slices, wherein the more the nodes with poorer local storage capacities delete more slice data;
4) because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
5. The method according to claim 4, wherein the time-series-based index building method comprises the following steps:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1;
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1。
6. The method as claimed in claim 5, wherein the fast retrieving of data by using the time sequence index of data comprises:
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
7. A block chain based fast data retrieval system, the system comprising:
the data acquisition device is used for acquiring data to be stored and encoding the data to be stored by utilizing a data encoding scheme based on a production line;
the data processor is used for storing the coded data into the block chain and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method; meanwhile, slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
and the data retrieval device is used for constructing the time sequence index for the slice data by using the time sequence-based index construction method, so that the data is quickly retrieved by using the time sequence index of the data.
8. A computer readable storage medium having stored thereon data retrieval program instructions executable by one or more processors to implement the steps of a method for implementing block chain based fast data retrieval as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011369688.2A CN112416941A (en) | 2020-11-30 | 2020-11-30 | Block chain-based rapid data retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011369688.2A CN112416941A (en) | 2020-11-30 | 2020-11-30 | Block chain-based rapid data retrieval method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112416941A true CN112416941A (en) | 2021-02-26 |
Family
ID=74829336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011369688.2A Withdrawn CN112416941A (en) | 2020-11-30 | 2020-11-30 | Block chain-based rapid data retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112416941A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112799607A (en) * | 2021-04-12 | 2021-05-14 | 骊阳(广东)节能科技股份有限公司 | Data storage method for partitioned storage according to data size |
CN113138989A (en) * | 2021-03-12 | 2021-07-20 | 莘上信息技术(上海)有限公司 | Block chain data retrieval method and device |
CN113641755A (en) * | 2021-07-19 | 2021-11-12 | 江苏大学 | Incremental slice analysis method for UTXO block chain |
CN113886115A (en) * | 2021-09-09 | 2022-01-04 | 上海智能网联汽车技术中心有限公司 | Block chain Byzantine fault-tolerant method and system based on vehicle-road cooperation |
CN113641755B (en) * | 2021-07-19 | 2024-09-27 | 江苏大学 | Incremental slice analysis method of UTXO block chain |
-
2020
- 2020-11-30 CN CN202011369688.2A patent/CN112416941A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113138989A (en) * | 2021-03-12 | 2021-07-20 | 莘上信息技术(上海)有限公司 | Block chain data retrieval method and device |
CN112799607A (en) * | 2021-04-12 | 2021-05-14 | 骊阳(广东)节能科技股份有限公司 | Data storage method for partitioned storage according to data size |
CN113641755A (en) * | 2021-07-19 | 2021-11-12 | 江苏大学 | Incremental slice analysis method for UTXO block chain |
CN113641755B (en) * | 2021-07-19 | 2024-09-27 | 江苏大学 | Incremental slice analysis method of UTXO block chain |
CN113886115A (en) * | 2021-09-09 | 2022-01-04 | 上海智能网联汽车技术中心有限公司 | Block chain Byzantine fault-tolerant method and system based on vehicle-road cooperation |
CN113886115B (en) * | 2021-09-09 | 2024-02-20 | 上海智能网联汽车技术中心有限公司 | Block chain Bayesian fault tolerance method and system based on vehicle-road cooperation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112416941A (en) | Block chain-based rapid data retrieval method and system | |
US8380680B2 (en) | Piecemeal list prefetch | |
CN110879854B (en) | Searching data using superset tree data structures | |
US20130124488A1 (en) | Method and system for managing and querying large graphs | |
Deorowicz | FQSqueezer: k-mer-based compression of sequencing data | |
Caro et al. | Data structures for temporal graphs based on compact sequence representations | |
CN112182004B (en) | Method, device, computer equipment and storage medium for checking data in real time | |
CN110059129A (en) | Date storage method, device and electronic equipment | |
US20200125493A1 (en) | Pattern-Aware Prefetching Using Parallel Log-Structured File System | |
US10503713B1 (en) | Criterion-based retention of data object versions | |
CN110895591B (en) | Method and device for positioning self-lifting point | |
CN107315753B (en) | Paging method and device across multiple databases | |
CN113886332B (en) | Large file difference comparison method and device, computer equipment and storage medium | |
Hur et al. | Performance analysis of automatic storage/retrieval systems by stochastic modelling | |
CN115129981A (en) | Information recommendation method, device, equipment and storage medium | |
Borysenko et al. | The Fibonacci numeral system for computer vision | |
CN114281817A (en) | Data cleaning method and device, computer equipment and storage medium | |
US11170000B2 (en) | Parallel map and reduce on hash chains | |
CN113448957A (en) | Data query method and device | |
Economou et al. | On the stationary distribution of the GI X/MY/1 queueing system | |
CN114663073B (en) | Abnormal node discovery method and related equipment thereof | |
CN112256801B (en) | Method, system and storage medium for extracting key entity in entity relation diagram | |
CN116680263A (en) | Data cleaning method, device, computer equipment and storage medium | |
WO2024016766A1 (en) | Transaction processing method and apparatus, device, storage medium, and program product | |
US20240012857A1 (en) | Asserted Relationships Matching in an Identity Graph Data Structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210226 |