CN112416941A - Block chain-based rapid data retrieval method and system - Google Patents

Block chain-based rapid data retrieval method and system Download PDF

Info

Publication number
CN112416941A
CN112416941A CN202011369688.2A CN202011369688A CN112416941A CN 112416941 A CN112416941 A CN 112416941A CN 202011369688 A CN202011369688 A CN 202011369688A CN 112416941 A CN112416941 A CN 112416941A
Authority
CN
China
Prior art keywords
data
time
block
stored
theta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011369688.2A
Other languages
Chinese (zh)
Inventor
肖玉连
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202011369688.2A priority Critical patent/CN112416941A/en
Publication of CN112416941A publication Critical patent/CN112416941A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • G06F21/645Protecting data integrity, e.g. using checksums, certificates or signatures using a third party
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2143Clearing memory, e.g. to prevent the data from being stolen

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of a block chain, and discloses a block chain-based rapid data retrieval method, which comprises the following steps: acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data; storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method; slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data; constructing a time sequence index by using an index construction method based on time sequence; and quickly retrieving the data by utilizing the time sequence index of the data. The invention also provides a rapid data retrieval system based on the block chain. The invention realizes the retrieval of data.

Description

Block chain-based rapid data retrieval method and system
Technical Field
The invention relates to the technical field of block chains, in particular to a block chain-based rapid data retrieval method and a block chain-based rapid data retrieval system.
Background
Along with the rapid popularization of social networks, intelligent hardware, mobile internet and internet of things, the value implied by big data can be displayed to a greater extent, and a new era which pays more attention to the data value and data openness is fortunately coming. Along with this, the fields of business, scientific research, public service and the like all put forward urgent needs for the open sharing of big data, however, due to the lack of a safe and credible data sharing environment, the big data is still stored and controlled by various government agencies, business enterprises, scientific research institutions and even individuals, so that a data island is formed, which seriously affects the sharing and opening of the big data.
The block chain is attracted by attention of various industries due to the characteristics of decentralized trust, complete distribution and the like, and the appearance of the block chain is capable of breaking a large data sharing barrier and realizing trusted data interconnection. The existing block chain is used as a decentralized distributed shared database, each node is required to store complete block data, and with the increase of the number of nodes in a system and the complexity of transaction, the nodes need more and more local storage spaces to store the block data, so that the bottleneck of the block chain in practical application is formed; meanwhile, the current block chain scheme does not support temporal data processing, and efficient query processing is prevented by sequential access based on block files in a block chain.
In view of this, how to optimize the storage manner of data in the block chain to achieve more efficient data retrieval is a problem to be urgently solved by those skilled in the art.
Disclosure of Invention
The invention provides a block chain-based rapid data retrieval method, which is characterized in that a data coding scheme based on a production line is adopted to code data so as to store coded data into a block chain, and the locally stored block data is subjected to data slicing, so that the block data is reconstructed, and the storage optimization of the block chain is realized; meanwhile, a temporal index of the block data is established by using a time-based index construction algorithm, so that the access amount to the block data and a database is reduced, and more efficient data retrieval is realized.
In order to achieve the above object, the present invention provides a fast data retrieval method based on a block chain, which includes:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
construction of time sequence index by using time sequence-based index construction method
And quickly retrieving the data by utilizing the time sequence index of the data.
Optionally, the encoding processing on the data to be stored by using the pipeline-based data encoding scheme includes:
the pipeline-based data coding scheme refers to that coding and decoding calculation processes are operated on different blocks in a pipeline mode, namely for data to be stored o1,o2,...,okCorresponding memory block h of1,h2,...,hn
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in
In one embodiment of the invention, the invention uses a classical systematic code (8,4) to convert the data O to (O)1,o2,o3,o4) Encoding into 8-dimensional encoded data C ═ (C)1,...,c8) Wherein the coding algorithm employs a finite field algorithm
Figure BDA0002805699660000021
After the coding is finished, the code is storedStorage block hiStoring coded data ci(ii) a In the data decoding process, the block h is storediRespectively encode data ciIs sent to a decoding node niI 1.. 8, then the decoding node n that holds the first encoded data1For coded data c1Performing linear operation, and sending the obtained result to the second coded data c2Decoding node n2,n2Will be from n1The results obtained are compared with c2Performing operation and sending the result to the third coded data n3Decoding node n3According to this method, the decoding process proceeds in a pipelined fashion, and finally from the decoding node n8Raw data were obtained.
Optionally, the storing the copy of the data to be stored into the block link point by using a time-based copy storage method includes:
in detail, when a user uploads or recovers data for the first time, the invention stores a copy of the data in the last blockchain node participating in the encoding and decoding process, and simultaneously takes the data as hot data, which can be read again by the user in a short time;
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time; in the process, the calculation cost is reduced to zero, and the storage cost is increased by four blocks, so that the calculation cost can be reduced at the cost of the storage cost;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factor of the threshold value T comprises the time interval T of two accesses of the data by the user and the existing available network resources. In contrast, the present invention sets the value of the network available computing resource to be C, sets the value of the network available storage space resource to be S, and updates the threshold value when the user accesses the data each time:
Figure BDA0002805699660000031
wherein:
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, where α, β respectively denote importance of computation resources and storage resources in the blockchain to the blockchain network, and in one embodiment of the present invention, α ═ 0.4 and β ═ 0.6 are set.
Optionally, the slicing the data stored in the blockchain by using the erasure code-based data slicing method includes:
1) the ith block B in the block chainiAre equally divided into
Figure BDA0002805699660000032
A total of k data slices;
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
Figure BDA0002805699660000033
wherein:
Figure BDA0002805699660000034
is a matrix value in the coding matrix; in a specific embodiment of the present invention, the adopted coding matrix is a cauchy matrix-based coding matrix, and the cauchy matrix-based coding matrix is:
Figure BDA0002805699660000035
xi,yiis an element in the Galois field in which m>n;
Figure BDA0002805699660000036
A k-th data slice of an ith block of the block chain;
Figure BDA0002805699660000041
the r-th erasure code based slice data for the ith block of the blockchain, where r>k;
3) Because m in the m multiplied by n order coding matrix is larger than n, the quantity of the slice data based on the erasure code is more than the quantity of the original data slices, the invention achieves the aim of saving storage by deleting part of the slice data based on the erasure code, and the node can select to reserve the quantity of the data slices according to different local storage capacities of the node; in a specific embodiment of the invention, in order to ensure that the data slices of the whole network are distributed stably, the nodes delete the coding slices randomly; if the coding matrix G of the system is of order mxn, the original block data is divided into n data slices, and if k data slices are averagely retained for each block of a certain node after the coding is finished and the corresponding storage space optimization efficiency is η, then:
Figure BDA0002805699660000042
4) the nodes after being coded and deleted have clearly stored transaction information, and when the nodes need complete block information, enough coded data slices need to be acquired from other nodes; because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
Optionally, the flow of the index construction method based on the time sequence is as follows:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1(ii) a In an embodiment of the present invention, the time division condition is a dynamic interval division condition, i.e. the size of the time interval is determined by measuring both the time calculation and the slice data amount, and a time interval is fixedAnd slice data quantity values, the determination of the time interval must satisfy either of the following two conditions: first, when the time interval is equal to a fixed value, the number of slice data must be equal to or exceed a prescribed slice data amount; second, when the number of slice data is equal to a fixed value, the time interval must be equal to or greater than the fixed value, avoiding the situation where too much or too little index data is formed within a certain time period θ.
Optionally, the process of performing fast retrieval on the data by using the time sequence index of the data includes:
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation; the time sequence connection relation is the interval theta of the time stateiAnd thetajIf present, if present
Figure BDA0002805699660000051
Figure BDA0002805699660000052
Then call thetaiAnd thetajIs a time sequence connection relation; the timing inclusion relationship is to the temporal interval θiAnd thetajIf present, if present
Figure BDA0002805699660000053
Then call thetaiAnd thetajIs a timing inclusion relationship;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
In addition, to achieve the above object, the present invention further provides a block chain-based fast data retrieval system, including:
the data acquisition device is used for acquiring data to be stored and encoding the data to be stored by utilizing a data encoding scheme based on a production line;
the data processor is used for storing the coded data into the block chain and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method; meanwhile, slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
and the data retrieval device is used for constructing the time sequence index for the slice data by using the time sequence-based index construction method, so that the data is quickly retrieved by using the time sequence index of the data.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores thereon data retrieval program instructions, which are executable by one or more processors to implement the steps of the implementation method of fast data retrieval based on block chains as described above.
Compared with the prior art, the invention provides a block chain-based rapid data retrieval method, which has the following advantages:
firstly, the invention provides a data coding scheme based on a production line, and during coding, corresponding storage blocks respectively treat stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k; when decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2From coded data 22And i1The decoding operation of (a) results in a decoded intermediate block i2To sum up the steps, the mostFinally decoding block hnObtaining the final decoded data inIn the process of pipeline coding or decoding, no message is transmitted between different blocks, the current block can not obtain the information of other blocks and does not know the information of other blocks, the anonymity between different blocks in the chain link points of the blocks is ensured, a certain block in the chain link nodes of the blocks is attacked, the data information of other blocks in the chain link points of the blocks can not be influenced, and the safety of the data stored in the chain of the blocks is ensured.
The invention also provides a time-based copy storage method, which is characterized in that a time threshold T is determined based on an available computing resource value C and a storage space resource value S of a current block chain network, and the time threshold T is updated when a user accesses data in a block chain each time:
Figure BDA0002805699660000061
wherein: t represents a threshold value before update; t' represents the updated threshold; α, β ∈ [0, 1], α + β ═ 1, α, β respectively represent the importance of computing resources and storage resources in the network to the network; when a user uploads or recovers data for the first time, the copy of the data is stored in the last blockchain node participating in the encoding and decoding process, and the data is used as hot data, so that the user can read the data again in a short time; if the user reads the data again when the clock T is less than T, the copy of the data can be directly obtained from the last block chain link point, and meanwhile, the calculation overhead is reduced to zero in the process of resetting the clock parameter value to zero, and the storage overhead is increased by four blocks, so that the block chain calculation overhead can be reduced at the expense of the storage overhead; if T is larger than or equal to T, the copy stored in the node is deleted, and the storage space is released; if the time of the object unread by the user is longer than the threshold value, the data is regarded as cold data, the user cannot read the cold data for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space, so that the block chain storage overhead is reduced, and more efficient block chain data storage is realized.
Because the current block chain scheme does not support temporal data processing, efficient query processing is prevented by sequential access based on block files in a block chain; therefore, the invention provides an index construction method based on time sequence, which is realized by setting initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods; using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1(ii) a If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1(ii) a In the data retrieval process, if all data related to data k in a time interval tau are to be retrieved, firstly querying a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target query interval tau, wherein the interval is marked as o (theta (k), tau, and the first theta and the last theta in the o (theta (k), tau) are in a time sequence connection relation; for each theta contained in o (theta (k), tau), performing<k,θ>The block file is analyzed through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result. Query data k in time interval t by traditional retrieval method1,t2]Internal state, requiring full accessIn the time interval, the time sequence index is used, and only o (theta (k), tau) files need to be deserialized, namely the time sequence index is established at a reasonable time interval, so that the access times of the files can be greatly reduced, and the data retrieval efficiency is effectively improved.
Drawings
Fig. 1 is a schematic flowchart of a block chain-based fast data retrieval method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a block chain-based fast data retrieval system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Data are coded by adopting a data coding scheme based on a production line so as to store coded data into a block chain, and data slicing is carried out on the locally stored block data, so that the block data is reconstructed, and the storage optimization of the block chain is realized; meanwhile, a temporal index of the block data is established by using a time-based index construction algorithm, so that the access amount to the block data and a database is reduced, and more efficient data retrieval is realized. Fig. 1 is a schematic diagram illustrating a block chain-based fast data retrieval method according to an embodiment of the present invention.
In this embodiment, the fast data retrieval method based on the block chain includes:
and S1, acquiring the data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data.
Firstly, the invention acquires the data to be stored and utilizes a data coding scheme based on a production line to carry out coding processing on the data to be stored, wherein the data coding scheme based on the production line refers to that coding and decoding calculation processes are operated on different blocks in a production line mode, namely, the data o to be stored1,o2,...,okCorresponding memory block h of1,h2,...,hn
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in
In one embodiment of the invention, the invention uses a classical systematic code (8,4) to convert the data O to (O)1,o2,o3,o4) Encoding into 8-dimensional encoded data C ═ (C)1,...,c8) Wherein the coding algorithm employs a finite field algorithm
Figure BDA0002805699660000071
After the coding is completed, the block h is storediStoring coded data ci(ii) a In the data decoding process, the block h is storediRespectively encode data ciIs sent to a decoding node niI 1.. 8, then the decoding node n that holds the first encoded data1For coded data c1Performing linear operation, and sending the obtained result to the second coded data c2Decoding node n2,n2Will be from n1The results obtained are compared with c2Performing operation and sending the result to the third coded data c3Decoding node n3According to this method, the decoding process proceeds in a pipelined fashion, and finally from the decoding node n8Raw data were obtained.
And S2, storing the coded data into the block chain, and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method.
Further, for the encoded data, the invention stores it into the corresponding block chain block; in one embodiment of the invention, for the encoded data c1,c2,...,cnThe invention stores the data into the corresponding block hiWherein i is 1.. multidot.n;
further, the invention stores the copy of the data to be stored into the block chain node by using a time-based copy storage method, in detail, when a user uploads or recovers the data for the first time, the invention stores the copy of the data into the last block chain node participating in the encoding and decoding process, and simultaneously takes the data as hot data, so that the user can read the data again in a short time;
the time-based copy storage method comprises the following steps:
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time; in the process, the calculation cost is reduced to zero, and the storage cost is increased by four blocks, so that the calculation cost can be reduced at the cost of the storage cost;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factor of the threshold value T comprises the time interval T of two accesses of the data by the user and the existing available network resources. In contrast, the present invention sets the value of the network available computing resource to be C, sets the value of the network available storage space resource to be S, and updates the threshold value when the user accesses the data each time:
Figure BDA0002805699660000081
wherein:
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, where α, β respectively denote importance of computation resources and storage resources in the blockchain to the blockchain network, and in one embodiment of the present invention, α ═ 0.4 and β ═ 0.6 are set.
S3, the data stored in the blockchain is sliced in the erasure code based data slicing method, and erasure code based slice data is obtained.
Further, the invention uses a data slicing mode based on erasure codes to slice the data stored in the block chain, wherein the data slicing mode based on erasure codes is as follows:
1) the ith block B in the block chainiAre equally divided into
Figure BDA0002805699660000091
A total of k data slices;
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
Figure BDA0002805699660000092
wherein:
Figure BDA0002805699660000093
is a matrix value in the coding matrix; in a specific embodiment of the present invention, the adopted coding matrix is a cauchy matrix-based coding matrix, and the cauchy matrix-based coding matrix is:
Figure BDA0002805699660000094
xi,yiis an element in the Galois field in which m>n;
Figure BDA0002805699660000095
A k-th data slice of an ith block of the block chain;
Figure BDA0002805699660000096
the r-th erasure code based slice data for the ith block of the blockchain, where r>k;
3) Because m in the m multiplied by n order coding matrix is larger than n, the quantity of the slice data based on the erasure code is more than the quantity of the original data slices, the invention achieves the aim of saving storage by deleting part of the slice data based on the erasure code, and the node can select to reserve the quantity of the data slices according to different local storage capacities of the node; in a specific embodiment of the invention, in order to ensure that the data slices of the whole network are distributed stably, the nodes delete the coding slices randomly; if the coding matrix G of the system is of order mxn, the original block data is divided into n data slices, and if k data slices are averagely retained for each block of a certain node after the coding is finished and the corresponding storage space optimization efficiency is η, then:
Figure BDA0002805699660000097
4) the nodes after being coded and deleted have clearly stored transaction information, and when the nodes need complete block information, enough coded data slices need to be acquired from other nodes; because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
And S4, constructing the time sequence index for the slice data by using a time sequence-based index construction method.
Further, the invention constructs the time sequence index for the slice data by using a time sequence-based index construction method, wherein the time sequence-based index construction method comprises the following steps:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Submitting to a block file, updating the index into historical data, creatingNew time interval thetan+1(ii) a In an embodiment of the present invention, the time division condition is a dynamic interval division condition, that is, the size of the time interval is determined by measuring both the time calculation and the slice data amount, and by fixing a value of the time interval and the slice data amount, the time interval must be determined in any one of the following two cases: first, when the time interval is equal to a fixed value, the number of slice data must be equal to or exceed a prescribed slice data amount; second, when the number of slice data is equal to a fixed value, the time interval must be equal to or greater than the fixed value, avoiding the situation where too much or too little index data is formed within a certain time period θ.
And S5, quickly searching the data by using the time sequence index of the data.
Furthermore, according to the temporal index of the data, the invention realizes the rapid retrieval of the data in the block chain;
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation; the time sequence connection relation is the interval theta of the time stateiAnd thetajIf present, if present
Figure BDA0002805699660000101
Figure BDA0002805699660000102
Then call thetaiAnd thetajIs a time sequence connection relation; the timing inclusion relationship is to the temporal interval θiAnd thetajIf present, if present
Figure BDA0002805699660000103
Then call thetaiAnd thetajIs a timing inclusion relationship;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the operating system is Linux CentOS 6.9, and the memory is 16G; the contrast retrieval method is a data retrieval method based on Hash index storage, a data retrieval method based on reverse index storage and an index-free data retrieval method.
In the algorithm experiment, 5G data is collected, a comparison algorithm and the algorithm provided by the invention are used for storage and retrieval, and the time required by the retrieval completion is used as an evaluation index of the data retrieval method.
According to the experimental result, the retrieval time of the data retrieval method based on Hash index storage is 1.2s, the retrieval time of the data retrieval method based on inverted index storage is 0.68s, and the retrieval time of the non-index data retrieval method is 0.72 s.
The invention also provides a block chain-based rapid data retrieval system. Fig. 2 is a schematic diagram illustrating an internal structure of a block chain-based fast data retrieval system according to an embodiment of the present invention.
In the present embodiment, the block chain-based fast data retrieval system 1 at least includes a data acquisition device 11, a data processor 12, a data retrieval device 13, a communication bus 14, and a network interface 15.
The data acquisition device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The data processor 12 includes at least one type of readable storage medium including flash memory, hard disks, multi-media cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The data processor 12 may in some embodiments be an internal storage unit of the blockchain based fast data retrieval system 1, for example a hard disk of the blockchain based fast data retrieval system 1. The data processor 12 may also be an external storage device of the block chain based fast data retrieval system 1 in other embodiments, such as a plug-in hard disk provided on the block chain based fast data retrieval system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the data processor 12 may also comprise both an internal storage unit and an external storage device of the blockchain based fast data retrieval system 1. The data processor 12 can be used not only to store application software installed in the block chain based fast data retrieval system 1 and various kinds of data, but also to temporarily store data that has been output or will be output.
The data retrieving device 13 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the data processor 12 or Processing data, such as data retrieving program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.
Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the block chain based fast data retrieval system 1 and for displaying a visualized user interface, among others.
While FIG. 2 only shows the fast data retrieval system 1 with components 11-15 and based on blockchain, those skilled in the art will appreciate that the structure shown in FIG. 1 does not constitute a limitation of the blockchain based fast data retrieval system 1 and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the data processor 12 has stored therein data retrieval program instructions; the steps of the data retrieval device 13 executing the data retrieval program instructions stored in the data processor 12 are the same as the implementation method of the block chain based fast data retrieval method, and are not described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon data retrieval program instructions, which are executable by one or more processors to implement the following operations:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
construction of time sequence index by using time sequence-based index construction method
And quickly retrieving the data by utilizing the time sequence index of the data.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A block chain-based fast data retrieval method is characterized in that the method comprises the following steps:
acquiring data to be stored, and encoding the data to be stored by using a data encoding scheme based on a production line to obtain encoded data;
storing the encoded data into a blockchain, and storing a copy of the data to be stored into a blockchain link point by using a time-based copy storage method;
slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
constructing a time sequence index by using an index construction method based on time sequence;
and quickly retrieving the data by utilizing the time sequence index of the data.
2. The method as claimed in claim 1, wherein the encoding process of the data to be stored by using the pipeline-based data encoding scheme includes:
the pipeline-based data coding scheme refers to that coding and decoding calculation processes are operated on different blocks in a pipeline mode, namely for data to be stored o1,o2,...,okCorresponding memory block h of1,h2,...,hn
When coding, the corresponding memory block hiRespectively treat the stored data o1,o2,...,okEncoding is carried out to obtain respective encoded data c1,c2,...,cnWherein n is>k;
When decoding, the coded data c1Storage block h of1To c1Performing decoding operation to obtain a decoding intermediate block i1And decoding the intermediate block i1Is sent to the coded data c2Storage block h of2(ii) a Memory block h2According to the coded data c2And i1The decoding operation of (a) results in a decoded intermediate block i2In summary, block h is finally decodednObtaining the final decoded data in
3. The method as claimed in claim 2, wherein the storing the copy of the data to be stored into the block link point by using the time-based copy storage method comprises:
1) when the copy is stored, a deletion clock T and a threshold value T are set simultaneously and stored in the block link point together with the copy, and the parameter value T of the clock is zero at the beginning and increases along with time;
2) if the user reads the data again when the time T is less than T, the copy of the data can be directly obtained from the last block link point, and the clock parameter value is reset to zero;
3) if T is larger than or equal to T, deleting the copy stored in the node, and releasing the storage space; that is, if the user does not read the object longer than the threshold, the data is regarded as cold data, the user does not read it for a long time, and for the cold data, the copy of the cold data is deleted to release the storage space.
The size determining factors of the threshold value T comprise a time interval T of the data accessed by the user twice and the existing available network resources; by setting the value of the network available computing resource as C and the value of the network available storage space resource as S, updating the threshold value when the user accesses the data each time:
Figure FDA0002805699650000021
wherein:
t represents a threshold value before update;
t' represents the updated threshold;
t is the time interval between two times of accessing the same data by the user;
Cold,Cnewrespectively calculating the network available computing resource values of the first access data and the second access data of the user;
Sold,Snewrespectively accessing the network available storage space resource values of the data for the first time and the data for the second time by the user;
α, β ∈ [0, 1], α + β ═ 1, α ═ 0.4, and β ═ 0.6 are set.
4. The method as claimed in claim 3, wherein the slicing process for the data stored in the blockchain by using the erasure code based data slicing method comprises:
1) the ith block B in the block chainiAre equally divided into
Figure FDA0002805699650000022
A total of k data slices;
2) performing matrix multiplication operation on the data slice and a preset segmentation matrix based on the erasure code to obtain slice data based on the erasure code:
Figure FDA0002805699650000023
wherein:
Figure FDA0002805699650000024
is a matrix value in the coding matrix;
Figure FDA0002805699650000025
a k-th data slice of an ith block of the block chain;
Figure FDA0002805699650000026
the r-th erasure code based slice data for the ith block of the blockchain, where r>k;
3) According to different local storage capacities of the nodes, the nodes randomly delete the coding slices, wherein the more the nodes with poorer local storage capacities delete more slice data;
4) because network transmission needs to occupy network bandwidth resources of a node, when the number of coded data slices of a certain block is greater than or equal to the number n of coded matrix columns, the block can be completely reconstructed, and therefore, the data amount Q of the node recovery block at least needing to be transmitted by other nodes is as follows:
Q=(n-k)*p
wherein:
n is the original block data divided into n data slices;
k is the average reserved k data slices of each block of a certain node;
p is the amount of data in each slice.
5. The method according to claim 4, wherein the time-series-based index building method comprises the following steps:
1) setting an initial time t1As the start time of the time series index construction, the start time of the next construction is set as t2For each slice data kiAt time [ t1,t2]In (2), time is divided into non-overlapping adjacent time periods theta (k) { theta } theta1,θ2,...,θmM is the number of divided time periods;
2) using Get (C)<ki,θn>) Acquisition over a time period thetanIth slice data kiCorresponding state ε (k)i,θn) And updating the current time sequence index state, i.e. adding the current datan+1To epsilon (k)i,θn) In, at the same time kiCorresponding data value datanUpdated to datan+1
3) If sonIf the time division condition is satisfied, the method will<(ki,θn),ε(ki,θn)>Commit to a tile file, update the index into the historical data, create a new time interval θn+1
6. The method as claimed in claim 5, wherein the fast retrieving of data by using the time sequence index of data comprises:
if all data related to the data k in the time interval tau are searched, firstly, inquiring a time interval theta (k) corresponding to the data k through a returned iterator, and calculating a time sequence connection relation or a time sequence containing relation interval existing between the theta (k) and a target inquiry interval tau, wherein the interval is marked as o (theta (k), tau), and the first theta and the last theta in the o (theta (k), tau are in a time sequence connection relation;
for each theta contained in o (theta (k), tau), executing a call of < k, theta >, and parsing the block file through the returned iterator; for the interval with the time sequence containing relationship, directly adding the data analyzed by the iterator to the result set, and for the interval with the time sequence connection relationship, traversing the data returned by the iterator to remove the data not in the interval tau; and finally, outputting a result set which is a data retrieval result.
7. A block chain based fast data retrieval system, the system comprising:
the data acquisition device is used for acquiring data to be stored and encoding the data to be stored by utilizing a data encoding scheme based on a production line;
the data processor is used for storing the coded data into the block chain and storing the copy of the data to be stored into the block chain link points by using a time-based copy storage method; meanwhile, slicing the data stored in the block chain by using an erasure code-based data slicing mode to obtain erasure code-based sliced data;
and the data retrieval device is used for constructing the time sequence index for the slice data by using the time sequence-based index construction method, so that the data is quickly retrieved by using the time sequence index of the data.
8. A computer readable storage medium having stored thereon data retrieval program instructions executable by one or more processors to implement the steps of a method for implementing block chain based fast data retrieval as claimed in any one of claims 1 to 6.
CN202011369688.2A 2020-11-30 2020-11-30 Block chain-based rapid data retrieval method and system Withdrawn CN112416941A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011369688.2A CN112416941A (en) 2020-11-30 2020-11-30 Block chain-based rapid data retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011369688.2A CN112416941A (en) 2020-11-30 2020-11-30 Block chain-based rapid data retrieval method and system

Publications (1)

Publication Number Publication Date
CN112416941A true CN112416941A (en) 2021-02-26

Family

ID=74829336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011369688.2A Withdrawn CN112416941A (en) 2020-11-30 2020-11-30 Block chain-based rapid data retrieval method and system

Country Status (1)

Country Link
CN (1) CN112416941A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799607A (en) * 2021-04-12 2021-05-14 骊阳(广东)节能科技股份有限公司 Data storage method for partitioned storage according to data size
CN113138989A (en) * 2021-03-12 2021-07-20 莘上信息技术(上海)有限公司 Block chain data retrieval method and device
CN113641755A (en) * 2021-07-19 2021-11-12 江苏大学 Incremental slice analysis method for UTXO block chain
CN113886115A (en) * 2021-09-09 2022-01-04 上海智能网联汽车技术中心有限公司 Block chain Byzantine fault-tolerant method and system based on vehicle-road cooperation
CN113641755B (en) * 2021-07-19 2024-09-27 江苏大学 Incremental slice analysis method of UTXO block chain

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138989A (en) * 2021-03-12 2021-07-20 莘上信息技术(上海)有限公司 Block chain data retrieval method and device
CN112799607A (en) * 2021-04-12 2021-05-14 骊阳(广东)节能科技股份有限公司 Data storage method for partitioned storage according to data size
CN113641755A (en) * 2021-07-19 2021-11-12 江苏大学 Incremental slice analysis method for UTXO block chain
CN113641755B (en) * 2021-07-19 2024-09-27 江苏大学 Incremental slice analysis method of UTXO block chain
CN113886115A (en) * 2021-09-09 2022-01-04 上海智能网联汽车技术中心有限公司 Block chain Byzantine fault-tolerant method and system based on vehicle-road cooperation
CN113886115B (en) * 2021-09-09 2024-02-20 上海智能网联汽车技术中心有限公司 Block chain Bayesian fault tolerance method and system based on vehicle-road cooperation

Similar Documents

Publication Publication Date Title
CN112416941A (en) Block chain-based rapid data retrieval method and system
US8380680B2 (en) Piecemeal list prefetch
CN110879854B (en) Searching data using superset tree data structures
US20130124488A1 (en) Method and system for managing and querying large graphs
Deorowicz FQSqueezer: k-mer-based compression of sequencing data
Caro et al. Data structures for temporal graphs based on compact sequence representations
CN112182004B (en) Method, device, computer equipment and storage medium for checking data in real time
CN110059129A (en) Date storage method, device and electronic equipment
US20200125493A1 (en) Pattern-Aware Prefetching Using Parallel Log-Structured File System
US10503713B1 (en) Criterion-based retention of data object versions
CN110895591B (en) Method and device for positioning self-lifting point
CN107315753B (en) Paging method and device across multiple databases
CN113886332B (en) Large file difference comparison method and device, computer equipment and storage medium
Hur et al. Performance analysis of automatic storage/retrieval systems by stochastic modelling
CN115129981A (en) Information recommendation method, device, equipment and storage medium
Borysenko et al. The Fibonacci numeral system for computer vision
CN114281817A (en) Data cleaning method and device, computer equipment and storage medium
US11170000B2 (en) Parallel map and reduce on hash chains
CN113448957A (en) Data query method and device
Economou et al. On the stationary distribution of the GI X/MY/1 queueing system
CN114663073B (en) Abnormal node discovery method and related equipment thereof
CN112256801B (en) Method, system and storage medium for extracting key entity in entity relation diagram
CN116680263A (en) Data cleaning method, device, computer equipment and storage medium
WO2024016766A1 (en) Transaction processing method and apparatus, device, storage medium, and program product
US20240012857A1 (en) Asserted Relationships Matching in an Identity Graph Data Structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210226