CN113901131A - Index-based on-chain data query method and device - Google Patents
Index-based on-chain data query method and device Download PDFInfo
- Publication number
- CN113901131A CN113901131A CN202111027751.9A CN202111027751A CN113901131A CN 113901131 A CN113901131 A CN 113901131A CN 202111027751 A CN202111027751 A CN 202111027751A CN 113901131 A CN113901131 A CN 113901131A
- Authority
- CN
- China
- Prior art keywords
- data
- abstract
- operation record
- index
- chain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000004806 packaging method and process Methods 0.000 claims abstract description 11
- 238000003860 storage Methods 0.000 claims description 81
- 238000010276 construction Methods 0.000 claims description 23
- 238000012544 monitoring process Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000003491 array Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 4
- 238000004804 winding Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 14
- 238000013461 design Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 11
- 230000008520 organization Effects 0.000 description 9
- 239000000284 extract Substances 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an index-based on-chain data query method and device, wherein the method comprises the following steps: constructing a summary dictionary tree index for the extracted data abstract, packaging the data abstract to be linked and the constructed summary dictionary tree index into a block structure and linking; constructing a data operation record for the original data operation, constructing or updating a data operation record chain index for the constructed data operation record, packaging the data operation record and the data operation record chain index into a block structure and chaining the block structure, wherein the data operation record chain is designed to link all operation records of the same data in a chain structure, and the address of the first node of the data operation record chain is stored in the corresponding node in the abstract dictionary tree; acquiring original data by executing abstract dictionary tree retrieval based on the original data query request; and acquiring historical operation records of the data digests through line digest dictionary tree retrieval and operation record chain retrieval based on the data operation record query request.
Description
Technical Field
The invention relates to the technical field of data retrieval, block chaining, computers and the like, in particular to an index-based on-chain data query method and device.
Background
The block chain technology is an emerging technology in the field of information technology, is an information technology formed by fusing a plurality of technologies including distributed data storage, cryptography and the like, and has the characteristics of decentralization, openness, non-falsification and the like. Due to the increasing proliferation of decentralized applications, the blockchain has received wide attention, and the demand for data query in the blockchain is also increasing. The distributed storage and data non-tamper-able nature of the block-chain technique is well suited for secure sharing of design data and reliable traceability schemes. For a traditional block chain, when executing data query operation, the traditional block chain needs to be executed in two steps, and firstly, all the existing blocks in the block chain are traversed sequentially; then, all the data records in each block are scanned, and whether each data record meets the query requirement is judged. Obviously, the data query mode has low efficiency, and is difficult to meet the application scenarios of the block chain with frequent data query, such as data sharing, data tracing and the like.
Currently, in the application of the technology related to the block chain, most of domestic and foreign researches are focused on solving the problems of fair sharing and reliable storage existing in data sharing or tracing under a specific scene by using the existing characteristics of the block chain, and few documents are dedicated to researching the problem of low data query or data retrieval efficiency in a block chain sharing platform.
Therefore, how to realize the fast and efficient query and source tracing of data on the blockchain aiming at the problem of low efficiency of data query of the traditional blockchain is a problem to be solved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide an index-based on-chain data query method and device so as to realize fast and efficient query and source tracing of on-chain data (such as original data and historical operation records thereof).
In one aspect of the present invention, an index-based on-chain data query method is provided, which includes the following steps:
an original data chaining step, which is used for extracting a data abstract for each original data to be chained, constructing or updating an abstract dictionary tree index for the extracted data abstract, packaging the data abstract to be chained and the constructed abstract dictionary tree index into a block structure and performing chaining processing, wherein the abstract dictionary tree is constructed by taking the hash value of the chained data abstract distributed in different blocks as a key value;
a data operation record generating and chaining step, configured to construct a data operation record for an original data operation, construct or update a data operation record chain index for the constructed data operation record, package the data operation record and the data operation record chain index into a block structure, and perform chaining processing, where the data operation record chain is designed to link all operation records of the same data in a chain structure, and a first node address of the data operation record chain is stored in a corresponding node in the abstract dictionary tree;
the method comprises the steps of raw data query, namely extracting a hash value of a data abstract to be queried from a received raw data query request of a user, acquiring a latest abstract dictionary tree root node from a latest block of a block chain network, acquiring an abstract storage address by executing abstract dictionary tree retrieval, acquiring a data abstract according to the acquired storage address and acquiring raw data by analyzing and accessing an access address of raw data in the data abstract; and
and a data operation record query step, which is used for extracting the hash value of the associated data abstract from the data operation record query request of a user after receiving the data operation record query request, acquiring the root node of the latest abstract dictionary tree from the latest block of the block chain network, acquiring the head node address of the operation record by executing the abstract dictionary tree retrieval based on the extracted hash value of the data abstract, executing the operation record chain retrieval based on the acquired head node address of the operation record to acquire the storage address of all operation records of the data abstract, and acquiring the historical operation record of the data abstract from the block chain network according to the acquired storage address.
In some embodiments of the present invention, the nodes of the digest dictionary tree include index nodes and data nodes; an index key and a pointer array are stored in the index node, and each pointer in the pointer array points to the next node of a different path; the data nodes are stored with index keys, pointers pointing to the addresses of the data nodes corresponding to the previous version of the current data summary in the summary dictionary tree, the storage addresses of the data summaries corresponding to the index keys of the current nodes and the addresses of the first nodes of the data summary operation record chains corresponding to the index keys of the current nodes; the data operation record chain is of a single linked list structure, the elements of the data operation record chain are operation record arrays, the data operation record chain comprises index nodes and data nodes, and index pointers and data pointer arrays are stored in the index nodes of the data operation record chain; the data operation record chain data node comprises at least part of the following field information: a data digest hash value, a data operator, a timestamp, and an operation type.
In some embodiments of the present invention, the value of the index key in the index node is a public prefix of the hash value of the data digest, and the value of the index key in the data node is the complete hash value of the data digest.
In some embodiments of the present invention, the step of enchaining the original data comprises: collecting original data to be chained, and constructing a data abstract for each original data; constructing a summary dictionary tree index for each data summary to be linked by constructing or updating a summary dictionary tree; packaging the abstract of the data to be linked and the abstract dictionary tree index data into a block structure, and serializing the block structure into a newly-built block file; and distributing and storing the newly-built block file to each node in the network.
In some embodiments of the present invention, the constructing a summary dictionary tree index for each to-be-uplink data summary by constructing or updating a summary dictionary tree includes: calculating the hash value of each data abstract, and creating a data node for the data abstract, wherein the index-created value of the data node is the hash value of the current data abstract; acquiring a current abstract dictionary tree, and constructing the abstract dictionary tree under the condition that the current abstract dictionary tree does not exist; determining whether each data abstract is a newly added abstract, if not, acquiring a hash value of a version of the data abstract, searching a dictionary tree of the current abstract to acquire data nodes corresponding to the data abstract of the historical version, and assigning the addresses of the data nodes to the pointers of the current data nodes; and if the new summary is added, the pointer of the current data node is nulled, and each newly-added data node is inserted into the current summary dictionary tree.
In some embodiments of the present invention, the data operation record generating and uplink step comprises: monitoring the operation behavior of original data on a chain, and constructing a data operation record for the operation of the original data; constructing a data operation record chain index for each data operation record to be chained by constructing or updating an operation record chain; packaging the operation records to be linked and the index data into a block structure, and then serializing the block structure into a newly-built block file; and distributing and storing the newly-built block file to each node in the network.
In some embodiments of the present invention, the raw data querying step comprises: after receiving an original data query request of a user, extracting a hash value of a to-be-queried data abstract; acquiring a latest block file in a current network and extracting a root node of a summary dictionary tree from the latest block file; acquiring a storage address of the data summary to be checked through a retrieval summary dictionary tree based on the extracted hash value of the data summary to be checked; acquiring a data summary to be checked from the block according to the data summary storage address; extracting an original data access address in the data abstract, and acquiring original data by accessing the address;
the data operation record query step comprises the following steps: extracting a hash value of a to-be-checked data abstract in the data operation record query request based on the data operation record query request, acquiring a latest block file in the current network and extracting a root node of a abstract dictionary tree from the latest block file; acquiring a chain head address of the operation record of the data to be checked through a retrieval summary dictionary tree based on the extracted hash value of the data summary to be checked; acquiring all operation record storage addresses of the data abstract to be checked through a retrieval operation record chain; and acquiring the operation record from the block according to the storage address.
In some embodiments of the present invention, the building a data operation record chain index for each data operation record to be linked by building or updating an operation record chain includes: classifying the constructed data operation records according to the operated original data, so that the operation records of the same original data are classified into one class; creating an index node for each original data operation class, and assigning all operation record addresses in the operation class to a data pointer array field of the index node; calculating a hash value of the data abstract corresponding to the current data operation class, acquiring an operation record chain head address of the data abstract through a retrieval abstract dictionary tree, and assigning the operation record chain head address to an index pointer field of the current index node; and updating the newly-built index node address serving as a new first node of the data operation record chain into a corresponding node of the summary dictionary tree, thereby realizing the construction of the index of the newly-added data operation record chain.
In another aspect of the present invention, an index-based on-chain data query apparatus is provided, which includes a processor and a memory, the memory storing computer instructions, the processor being configured to execute the computer instructions stored in the memory, and the apparatus implementing the steps of the method when the computer instructions are executed by the processor.
In a further aspect of the invention, a computer storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.
According to the index-based on-chain data efficient query method and device, aiming at the block chain-based data sharing and data tracing scene, the problem of low efficiency of data query of the traditional block chain is solved, the data abstract and the operation record are constructed for the original data and the operation thereof and are stored in a chain way, and the index is constructed for the original data abstract and the data operation record in the block chain by designing the abstract dictionary tree and the data operation record chain, so that the original data and the historical operation record thereof can be queried and traced quickly and efficiently.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a schematic diagram illustrating a logical structure of a data operation log chain according to an embodiment of the present invention.
Fig. 2 is an organization structure of uplink data in a conventional blockchain.
FIG. 3 is a diagram illustrating the relationship and organization structure of a data summary and a summary dictionary tree according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the relationship and organization structure between the operation record of the original data F in the block chain and the operation record chain according to the embodiment of the present invention.
Fig. 5 is a schematic diagram of a storage structure of blocks in a block chain.
FIG. 6 is a schematic block diagram of an index-based on-chain data access (query) device in an embodiment of the present invention.
FIG. 7 is a schematic block diagram of an on-chain data query service module in an embodiment of the invention.
Fig. 8 is a block diagram of a data uplink service module according to an embodiment of the invention.
FIG. 9 is a schematic block diagram of an index retrieval module according to an embodiment of the present invention.
FIG. 10 is a schematic block diagram of an index building module in an embodiment of the present invention.
FIG. 11 is a block diagram of a block lookup module according to an embodiment of the invention.
FIG. 12 is a block diagram of a block storage module according to an embodiment of the invention.
Fig. 13 is a logical structure example of a dictionary tree.
FIG. 14 is a logical structure example of a digest dictionary tree in an embodiment of the present invention.
FIG. 15 is a flowchart illustrating an index-based method for accessing data on a link according to an embodiment of the present invention.
FIG. 16 is a diagram illustrating a process for uplink of original data according to an embodiment of the present invention.
FIG. 17 is a flowchart illustrating a query of raw data according to an embodiment of the invention.
FIG. 18 is a diagram illustrating a data operation record structure and a UL procedure according to an embodiment of the present invention.
Fig. 19 is a flowchart illustrating a data operation record query method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
Aiming at the current situation that the traditional block chain query efficiency is low and the frequent data query requirement under a data sharing or tracing scene is difficult to meet, the invention constructs and chains the data abstract and the operation record for the original data and the operation thereof, and provides a design method of the data index on the chain based on the data abstract and the operation record, namely, the index is constructed for the original data abstract and the data operation record in the block chain by designing the abstract dictionary tree and the data operation record chain, the acquisition of the data abstract and the operation record on the chain is accelerated, and the fast and efficient query and tracing of the original data and the historical operation record thereof are realized. The invention further provides an index-based on-chain data access method and device based on the design method of the index of the on-chain data. The index-based on-chain data access method and device provided by the invention introduce the index-based on-chain data access flow in detail, namely a data abstract and operation record uplink and query flow.
The function and design method of the on-chain data index of the embodiment of the present invention are described below. The embodiment of the invention designs two different index structures of a summary dictionary tree and a data operation record chain aiming at two different query requirements of a data summary and a data operation record. The design of the abstract dictionary tree is to construct a centralized index for all data abstracts in a block chain, so that the query efficiency of the data abstracts is accelerated; the data operation record chain is used for linking different operation records of the same data in different blocks, so that the query efficiency of the data summary operation record is accelerated. The method for designing the index of the data on the chain according to the embodiment of the present invention will be described in detail below.
The method for designing the on-chain data index comprises the following steps of S1 and S2:
in step S1, a digest dictionary tree is constructed.
More specifically, in this step, the digest dictionary tree is constructed by using the hash values of the uplink data digests distributed in different blocks as key values.
A dictionary tree is a special multi-way tree, the content of the keys (keys) of which is usually a character string. In the trie, the position of the node is determined by the content of the key, and the value associated with each key is generally stored in the node corresponding to the last letter of the key. For example, an example of a dictionary tree composed of keys (am, bad, be, so) is shown in fig. 13.
The dictionary tree mainly has the following advantages: 1) the efficiency of the dictionary tree lookup depends on the length of the key looked up; 2) given a set of keys, the constructed dictionary trees are completely consistent (construction consistency) regardless of the insertion order; 3) the dictionary tree has good local updating performance, when a local key is changed, the state of the node in the path from the root node to the node where the key is located only needs to be changed, and the state of other nodes in the tree does not need to be changed.
In the embodiment of the invention, the dictionary tree is constructed by taking the hash values of the uplink data digests distributed in different blocks as key values, the consistency of all data states in each network node is ensured by utilizing the construction consistency of the dictionary tree, and meanwhile, the efficient retrieval of the uplink data digests is realized by virtue of the characteristic that the dictionary tree query efficiency is only related to the key length.
Function of the constructed abstract dictionary tree
For a conventional blockchain network, a blockchain is formed by concatenating a set of blocks through hash pointers, and data on the blockchain network is stored in the blocks. Thus, when a query is made for a data record on the chain, each block and each data record in the block need to be traversed sequentially until the data record to be queried is found. Obviously, the efficiency of implementing data query by the sequential traversal method is very low, and the query demand of the uplink data of the current block chain application scenes cannot be met.
Aiming at the problems, in the embodiment of the invention, the dictionary tree is constructed by designing the abstract dictionary tree, namely calculating the hash value of the uplink data abstract, and taking the hash value of all the data abstracts as the key value, so that a centralized index is constructed for all the data abstracts in the block chain, and the query efficiency of the data abstract is further accelerated.
The nodes of the abstract dictionary tree designed by the embodiment of the invention are mainly divided into two types, namely index nodes (non-leaf nodes) and data nodes (leaf nodes). The storage structure of the nodes of the digest dictionary tree will be described below.
(1) The data structure of the index node defines:
TABLE 1 Abstract dictionary Tree index node data Structure definition
As shown in Table 1, an index node in the digest dictionary tree contains two fields, namely a "key" field and a "next" field. The key field is an index key, and the value of the key field is a public prefix of the hash value of the data summary and is used for assisting in query; the "next" field is an array of pointers, each pointer in the array pointing to the next node of a different path.
(2) Data node data structure definition:
TABLE 2 data Structure definition of Abstract dictionary Tree data nodes
Field(s) | Description of the invention |
key | Index key whose value is the complete hash value of the data digest |
preVersion | Pointer field whose value is address of data node corresponding to last version of data abstract |
abstractAddr | Value field, the value of which is the storage address of the data abstract corresponding to the hash of the data abstract |
opRecordAddr | Value field whose value is the storage address of the first node of the data operation chain corresponding to the data abstract |
As shown in table 2, the data node in the digest dictionary tree includes four fields, i.e., "key", "preVersion", "abstratadadr", and "opRecordAddr". Wherein, the key is an index key, and the value of the key is the complete hash value of the data abstract; "preVersion" is a pointer whose value is the address of the data node corresponding to the previous version of the current data summary in the summary dictionary tree, and is used to maintain the update condition of the data summary; the 'abstratadadr' field records the storage address of the data abstract corresponding to the key and is used for quickly acquiring data abstract information; the "opRecordAddr" field records the address of the first node of the data summary operation record chain corresponding to the key, and is mainly used for quickly acquiring the historical operation record of the data summary.
(3) Example digest dictionary Tree:
the correspondence between the data digests and the digest trie is described in table 3 below:
TABLE 3 summary data information Table
Data summary address | Hash value | Data operation records link head node address |
abstract_1 | 0x123456789a | node_1 |
abstract_2 | 0x123456789b | node_2 |
abstract_3 | 0x123456788a | node_3 |
abstract_4 | 0x223456789a | node_4 |
Table 3 shows specific information of four data digests in the network, including the address content (abstrat _), the hash value of the data digest, and the data operation chain head node address (node _) corresponding to the data, where abstrat _2 is the updated data digest of abstrat _ 1. The storage structure of the digest dictionary tree corresponding to the data digest in table 3 is shown in fig. 14.
Step S2, a data manipulation record chain is constructed.
For many scenarios of block chains, users have a need to frequently query or trace the source of data on the chain to realize the tracking of data operation records. For a traditional block chain, the insertion query of each piece of data on the chain is a sequential traversal process, and the data query efficiency is time-consuming and cannot meet the frequent query of historical operation records of the data. In order to solve the above problems, in the embodiments of the present invention, a data operation record chain is designed, all operation records of the same data in different blocks are linked by a chain structure, and a first node address of the chain structure is stored in a leaf node corresponding to an original data abstract in an abstract dictionary tree, so as to establish an association relationship between the data abstract and the data operation record, thereby accelerating query efficiency on historical operation records of the original data. The design (construction) of the data manipulation record chain is described below.
The data operation record chain is a singly linked list structure, the data in the linked list is represented by nodes, and each node comprises: the element is a storage unit for storing data, and the pointer is address data for connecting each node. The chain structure has the greatest advantage of being simple to realize on the premise of ensuring that all data are connected in series.
In the invention, the element of the data operation record chain is an operation record array which records all operation records of the same data in the block and realizes the aggregation of the operation records of the same data in the same block; the pointer of the data operation record chain is the address data of the previous node, and the data operation record chain realizes the aggregation of the same data operation record in different blocks. The embodiment of the invention logically realizes the 'centralized' storage of all operation records of the same data on the block chain through the chain structure, and realizes the efficient retrieval of the operation records of the data on the chain.
The nodes of the data operation record chain designed by the embodiment of the invention are mainly divided into two types, namely index nodes and data nodes. The storage structure is as follows:
(1) index node structure definition
TABLE 4 data operation record chain index node structure
As shown in Table 4, an index node in a chain of data operation records contains two fields, namely, the PreNode and records fields. The value of the predenode field is the address of the index node before the data, so as to link the historical operation records of the same data among different blocks; the records field is a data pointer array, which takes the physical address (i.e. the node address of the operation record data) stored in all the operation records of the data in the current block to aggregate the same data and different operation records of the block.
(2) Data node structure definition
TABLE 5 data structure of data operation record chain data node
Field(s) | Description of the invention |
abstract | Hash value of data abstract for identifying data abstract corresponding to the data operation record |
operator | The data operator represents the identity of the data operator who operates the data summary |
timestamp | A time stamp indicating the time of this operation of the data summary |
opType | The operation type indicates the operation type (new, change, query) of the data operation |
As shown in table 5, a data node in a data operation record chain contains four fields, namely: abstratac, operator, opType, and timeframe. Wherein the abstrat field is used for recording the hash value of the data abstract associated with the operation record; the operator field is used for recording the identity information of an operator during the data operation of the data abstract; the opType field is used for recording the data operation type of the data abstract during the data operation; the timestamp field is used to record the time of this operation of the data summary.
(3) Data operation record chain example
The correspondence relationship between the data summary operation record and the data operation record chain is shown in the following table.
TABLE 6 data sharing information Table
Data summary operation record | Block number | Data summarization | Operator | Type of operation | Date | |
data_1 | i | 0x123456789a | A | Adding new | 20210401 | |
data_2 | i | 0x123456789a | | Query | 20210402 | |
data_3 | j | | C | Query | 20210403 | |
data_4 | j | | D | Query | 20210404 |
Table 6 shows all operation records of the data digest (0x123456789a), where the data digest is executed four times in the entire network, that is, the a node uploads the data digest, and then the B node, the C node, and the D node obtain the data digest by means of query, and the four operation records are distributed in block i and block j, respectively, so that the data operation record chain corresponding to all operation records of the data digest (0x123456789a) is as shown in fig. 1.
The structure of uplink data in a conventional blockchain is shown in fig. 2. In the existing block chain system, data is stored by using a block as a basic logic unit, the block can be logically divided into a block head and a data body, wherein the block head mainly stores some basic information of the block, such as a hash value of a current block, a hash value of a previous block, a root node of a merkel tree and the like; the data body stores the actual uplink data. However, the existing block chain system does not strictly agree or restrict the organization form of uplink data in the data body, so the invention constructs an index structure for the uplink data abstract and the data operation record to realize the fast and efficient query of the original data and the data operation record.
In the embodiment of the present invention, the relationship and the organization structure of the data digests and the digest dictionary tree are shown in fig. 3, from the relationship of data on the chain, the original data, the data digests, and the data nodes in the digest dictionary tree are in one-to-one correspondence, that is, for each original data, one data digest corresponding to the original data is stored in the block chain, and one data node corresponding to the original data exists in the digest dictionary tree. In the above logical storage structure of linked data, the blockchain does not directly store the linked original data, but stores the data digest of the original data and the digest dictionary tree into the data body of the block, where the digest dictionary tree is an index tree constructed according to the hash value of the data digest in the blockchain, and the root node address of the index tree is stored into the block header of the block, so as to support fast acquisition of the digest dictionary tree when the original data query operation is performed.
Fig. 4 shows the relationship and organization structure of the operation record and the operation record chain of the original data in the blockchain in the embodiment of the invention. From the relationship of data on the chain, firstly, the original data, the data abstract and the operation record chain have a one-to-one correspondence relationship, that is, each original data corresponds to one data abstract and also corresponds to one operation record chain (as shown in the figure, the original data has one operation record chain belonging to the original data); in addition, the operation record chain index node and the data operation record are in a one-to-many relationship, that is, in the embodiment of the present invention, an index node of an operation record chain is constructed for all operation records of the same original data in the block (as shown in fig. 4, the first node of the operation record chain stores the storage addresses of all operation records of the data in the block). In terms of a logical storage structure of linked data, the embodiment of the present invention stores a data operation record and an operation record chain into a data body of a block, and simultaneously stores a first node address of the data operation record chain into a data node of a data summary corresponding to the data operation record chain in a summary dictionary tree, so as to quickly query all historical operation records of corresponding original data through hash of the data summary.
Fig. 5 shows the logical organization and storage manner of each block in the blockchain network, i.e. how to store the blocks of the above logical level in the blockchain network. From the logical structure of the block chain, the blocks in the block chain are arranged according to the time sequence, and the blocks are connected in series by recording the address hash of the previous block; from the storage structure of the block chain, each block in the block chain is stored in a levelDB database in a file form, each block corresponds to one block file, and each block file realizes the chain structure of the block by storing the hash address of the previous block file.
The following describes the index-based on-chain data efficient access device and the functions of the modules within the device.
FIG. 6 is a schematic block diagram of an index-based on-chain data access apparatus according to an embodiment of the present invention. As shown in fig. 6, the index-based on-chain data access apparatus provided in the embodiment of the present invention mainly includes the following modules: the system comprises a data uplink service module, a data query service module, an index construction module, an index retrieval module, a block storage module, a block query module and an infrastructure. The functions of the modules are as follows:
(1) data uplink service module: is an entry that provides users with raw data and data operation record uplink services;
(2) data query service module (or called chain data query service module): the system is an entrance for providing original data and data operation record query service for users;
(3) an index building module: the function of index construction is provided for the original data abstract and the data operation record of the uplink to be transmitted;
(4) an index retrieval module: the method is mainly used for providing retrieval functions of two data indexes, namely a summary dictionary tree and an operation record chain;
(5) a block storage module: the module is mainly used for realizing the building and storage functions of the block, namely, the module receives the data to be linked and the data index, integrates and packages the data to be linked and the data index into a block structure, then stores the block structure into a block file in a serialization way, and finally distributes and stores the block file into a level DB database of each node of the network through an infrastructure;
(6) a block query module: the module is mainly used for realizing the functions of obtaining and analyzing blocks, namely, the module firstly obtains the current latest legal block file through an infrastructure, then deserializes the file into a block structure, and finally analyzes the block structure to obtain uplink data and index data;
(7) infrastructure network infrastructure: the infrastructure is a bottom layer network communication module, which is a bottom layer distributed network system physically composed of all nodes, and mainly realizes the transmission of block files for all nodes in the network.
In some embodiments of the present invention, the data query service module may include: the data operation record query component is used for querying the original data. The structure of the data query service module and the interaction logic of the module and the external module are shown in fig. 7, and the raw data query component is used for providing a raw data query service for a user. Firstly, the component receives the digest hash value of the original data input by a user and obtains a hit node of the digest dictionary tree of the original data to be checked through a digest dictionary tree retrieval component in an index retrieval module of a linked data access device, then extracts a storage address of the original data digest from the hit node and obtains the original data digest from the uplink data in the block through the storage address, and finally extracts an access address of the original data from the data digest and obtains the original data to be checked through the access address. And the data operation record query component is used for realizing data abstract operation record query service for users. Firstly, the component receives a data operation record query request of a user and extracts a related data summary hash value from the data operation record query request; then, a data abstract dictionary tree retrieval component of the index retrieval module acquires hit nodes of hash values of the data abstract to be checked in the abstract dictionary tree, and extracts the addresses of the head node of the data operation record chain from the hit nodes; then, the storage addresses of all historical operation records of the data abstract are obtained through an operation record chain retrieval component in the index retrieval module; and finally, acquiring a corresponding data operation record from uplink data in the block through the storage address, thereby realizing the query service of the user on the data operation record on the link.
In some embodiments of the present invention, the data uplink service module may include both a primary data uplink component and a data operations record uplink component. The structure of the ul service module and the logic of the module interacting with the external module are shown in fig. 8, and the original ul component is used to provide the original ul service for the user. Firstly, the original data chaining component receives data uploaded by a user, extracts a data abstract, and then transmits the data abstract to be chained to the index construction module and the block storage module so as to realize the construction of subsequent indexes and blocks. The data operation record uplink component is used for realizing data summary operation record uplink service for users. The data operation record uplink operation can be automatically triggered during original data storage and query, the data operation record uplink component firstly constructs a data operation record, and then transmits the data operation record to be uplink to the index construction module and the block storage module so as to realize construction of subsequent indexes and blocks.
In some embodiments of the invention, the index retrieval module may include: a summary dictionary tree retrieval component and an operational record chain retrieval component. The structure of the index retrieval module and the interaction logic of the module and the external module are shown in fig. 9, and the abstract dictionary tree retrieval component is mainly used for the retrieval process of the abstract dictionary tree. When the query operation of the original data and the operation record is executed, the abstract dictionary tree retrieval component acquires the hit node of the data abstract to be retrieved through retrieving the data abstract tree, and supports an upper-layer module to acquire the storage address of the data abstract or the address of the first node of the data abstract operation record chain from the hit node. The operation record chain retrieval component is mainly used for realizing the retrieval process of the operation record chain. When the data operation record query operation is executed, the operation record chain retrieval component retrieves the storage addresses of all historical operation records of the data abstract through the operation record chain corresponding to the data abstract.
In some embodiments of the invention, the index building module may include: a summary dictionary tree construction component and an operation record chain construction component. Fig. 10 shows the structure of the index building module and the interaction logic between the module and the external module, and the summary dictionary tree building component is mainly used for building or updating the summary dictionary tree during chaining of the original data. During the execution of the original data chaining, the summary dictionary tree construction component collects the newly added data summary constructed by the data chaining service module and calculates the hash value for the newly added data summary, and then constructs a summary dictionary tree for the newly added data summary or inserts the summary dictionary tree into the existing summary dictionary tree, thereby constructing an index for the newly added data summary. The operation record chain constructing component is mainly used for realizing the construction or the updating of the operation record chain when the data operation record is linked up. When data operation record uplink is executed, the operation record chain construction component collects the newly added data operation record constructed by the data uplink service module, and constructs an operation record chain for the newly added data operation record or inserts the operation record chain into the existing operation record chain, thereby constructing an index for the newly added data operation record.
In some embodiments of the present invention, the block query module may include: a block acquisition component and a block parsing component. The structure of the block query module and the interaction logic of the block query module and the external module are shown in fig. 11, the block acquisition component is mainly used for realizing the functions of acquiring block files and extracting blocks, and the component acquires the latest legal block files in the current network through infrastructure, and then deserializes the block files into a block structure in a deserialization mode, so as to assist a user in performing uplink or query operations. The block analysis component is mainly used for realizing the analysis function of the block structure, and the component can analyze the block structure output by the block acquisition component and separate uplink data (original data abstract or operation record) and index data (data abstract dictionary tree index or operation record chain index) in the block structure, so as to assist a user in performing uplink or query operation.
In some embodiments of the present invention, the block storage module may include: the block construction component, the local node block storage component and other node block storage components, the structure of the block storage module and the interaction logic of the module and the external module are shown in fig. 12. The block construction component is mainly used for realizing the functions of block construction and serialization, and the component firstly receives data to be uplink and data indexes and packs the data to be uplink and the data indexes into a block structure, and then stores the block structure in a block file in a serialization mode, so that the aggregation of uplink data and index data is realized. The local node block storage component is mainly used for realizing the local storage function of the newly-built block file, and stores the block file newly built by the block building component into a local level DB database, and then issues the newly-built block file to other nodes through infrastructure network facilities, so that the block files among the nodes are consistent. The other node block storage component is mainly used for storing block files issued by other nodes, receives the block files issued by other nodes through infrastructure, and stores the block files into a local level DB database after checking the validity of the block files so as to keep the block files stored by each node in the network consistent.
In some embodiments of the present invention, the infrastructure is an underlying network communication module, which is an underlying distributed network system physically composed of nodes, and mainly provides an underlying support for communication of nodes in a block chain network. The infrastructure module of the basic network follows the basic network communication architecture in the existing block chain network, and mainly realizes the transmission and synchronization of the block files for each node. The module is not important to describe in the present invention, and thus is not described herein again.
The index-based data uplink method and the on-chain data query method of the present invention are described below.
According to the design method of the two index structures of the abstract dictionary tree and the operation record chain and the index-based on-chain data query device, the index-based on-chain data query method can be realized by constructing and using the two index structures of the abstract dictionary tree and the operation record chain. Fig. 15 is a flowchart illustrating an index-based method for accessing data on a link according to an embodiment of the present invention. As shown in fig. 15, the method includes the steps of: step S110: an original data chaining step; step S120: inquiring original data; step S130: a data operation record generation and uplink step; and step S140: and querying a data operation record.
Step S110, performing original data uplink based on the constructed abstract dictionary tree.
In the embodiment of the present invention, the original data uplink refers to that a block chain system collects original data to be uplink uploaded by a user (the specific quantity may be configured according to the block file size and the system collection time), extracts a data abstract for each original data to be uplink, constructs or updates an abstract dictionary tree index for the extracted data abstract, then packages the data abstract to be uplink and the constructed abstract dictionary tree index into a block structure and serializes the block structure into a block file (block file), and finally stores and distributes the newly-built block file to databases of each node of a network through an infrastructure, as shown in fig. 16, the original data uplink method includes the following steps S111-S112:
step S111, collecting the original data to be linked, and constructing a data summary for each original data.
In the embodiment of the present invention, in consideration of the fact that the original data to be uplink are different in format and the storage capacity of a single block stored in a block chain is limited, it is preferable to construct a data summary of the original data to be uplink according to a predetermined field extraction rule, and the specific implementation steps are as follows:
(1.1) collecting a certain amount of original data to be linked according to the set block size and the system collection time;
and (1.2) extracting corresponding fields and values from each received original data according to the set original data uplink format to construct a uplink data summary. The original data uplink format mainly comprises fields such as a data hash value, a data provider address, a data uploading timestamp, an original data access address and the like.
In step S112, a summary dictionary tree index is constructed for each summary of the data to be linked by constructing or updating the summary dictionary tree.
In this step, the summary dictionary tree is constructed or updated, which means that the summary dictionary tree is updated according to the original data to be linked under the condition that the summary dictionary tree already exists, and a new summary dictionary tree is created according to the original data to be linked under the condition that the summary dictionary tree does not exist currently.
In the embodiment of the present invention, after the collection of the to-be-linked data is completed, in order to accelerate the access of the data digest, an index needs to be constructed for the data digest in step 2 by constructing or updating a digest dictionary tree, and the specific implementation steps are as follows:
(2.1) calculating the hash value of each data abstract, and creating a new data node for the data abstract, wherein the key of the new data node is the number
According to the hash value of the abstract;
(2.2) acquiring a current abstract dictionary tree, judging whether the current abstract dictionary tree is empty, and if the current abstract dictionary tree is empty, constructing the abstract dictionary tree;
that is, if there is no digest dictionary tree (the current digest dictionary tree is empty), a digest dictionary tree is constructed to obtain a digest dictionary tree; and if the current abstract dictionary tree exists, directly acquiring the current abstract dictionary tree.
(2.3) judging whether each data abstract is a new abstract, if not, acquiring a hash value of a version on the data abstract, then retrieving a dictionary tree of the current abstract to acquire a data node corresponding to the data abstract of the historical version, assigning the address of the data node to a preVersion field of the current data node, and if so, emptying the preVersion field;
and (2.4) inserting each newly-built data node into the current abstract dictionary tree, and realizing the construction of the data abstract dictionary tree index by updating the abstract dictionary tree.
Step S113, pack the summary of data to be linked and the index data into a block structure, and then serialize the block structure into a file.
Considering that the original block chain is stored in a block file, the embodiment of the present invention stores uplink data and indexes into a block structure, and then serializes the block structure into a file, and the specific implementation steps are as follows:
(3.1) integrating and packaging the data abstract to be linked output in the step S111 and the dictionary tree index of the data abstract constructed in the step S112 into a block structure;
and (3.2) adding necessary checking information such as the hash value of the previous block, the hash value of the block and the storage position of the root node of the digest dictionary tree (the offset in the block) to the block structure obtained by packaging.
And (3.3) serializing the block structure comprising the uplink data digest, the data digest dictionary tree index and the checking information into a file, wherein the file is suffixed by the block.
And step S114, distributing and storing the newly-built block file to each node in the network through the infrastructure.
In the blockchain network, data is stored and backed up in each network node in the form of blockfiles, so that the data can achieve consensus in the whole network. Similarly, the step of block file distribution and storage in the uplink data in the embodiment of the present invention may specifically include: and distributing the block file generated in the step S113 to other nodes in the network through the basic block chain network, and storing the block file in the LevelDB database after the other nodes receive the block file and verify the validity.
And step S120, performing original data query based on the abstract dictionary tree.
In the embodiment of the present invention, the original data query refers to a process in which a system extracts a hash value of a digest of data to be queried after receiving an original data query request from a user, acquires latest block file information from a block chain network and deserializes the latest block file information into a block structure, then performs a digest dictionary tree retrieval based on the extracted hash value of the digest of the data to be queried to acquire a digest storage address, directly acquires a data digest according to the acquired storage address, and acquires original data by analyzing and accessing an access address of the original data in the data digest, wherein the original data query process includes:
step S121, based on the original data query request, obtaining the latest block file in the current network and extracting the root node of the abstract dictionary tree from the latest block file.
More specifically, in this step, after receiving an original data query request from a user, extracting a hash value of a summary of data to be queried; and acquiring the latest block file in the current network and extracting the root node of the abstract dictionary tree from the latest block file.
In the invention, the data and the data index on the chain are stored in the form of blocks, and the blocks are stored in each node of the block chain network in the form of files, so that the latest block file is acquired from the block chain network and the latest index information is acquired from the latest block file when the query operation is executed, and the specific implementation steps are as follows:
(1.1) judging whether the block file stored in the node has the latest block file, if not, requesting to acquire the latest block file from the node with the latest block file through an infrastructure;
and (1.2) deserializing the latest block file into a block structure by the block parsing component, and then acquiring a root node of the abstract dictionary tree from the block structure.
And step S122, retrieving the summary dictionary tree based on the extracted data summary hash value to obtain a storage address of the data summary to be checked.
In the invention, the abstract dictionary tree is a tree-shaped index structure for recording and dispersing the storage positions of the data abstracts in each block, the storage positions of the data abstracts can be quickly inquired by retrieving the abstract dictionary tree, and the specific implementation steps are as follows:
(2.1) starting from a root node of the abstract dictionary tree, judging whether the current node is a data node corresponding to the hash of the data to be checked, and if not, acquiring the next node according to the node index pointer;
(2.2) repeating the steps until a corresponding data node is found, if no such node exists, the data abstract is not uplink, and the process is ended; otherwise, continuing to execute;
and (2.3) analyzing the data nodes searched in the steps and acquiring the storage address of the data abstract to be searched.
Step S123, obtain the data summary to be checked from the block according to the data summary storage address.
According to the data digest storage address output in step S122, the data digest to be checked can be directly obtained from the corresponding block structure and returned.
Step S124, extracting the original data access address in the data summary, and obtaining the original data by accessing the address.
The data summary acquired in step S123 is analyzed, an access address of the original data is extracted from the data summary, and then the original data is acquired by accessing the address query.
Step S130, a data operation record generation and uplink step.
In the embodiment of the invention, the data operation record is automatically constructed and generated by the system when the operations of chaining or inquiry of the original data, and the like, are executed. The generation and uplink of the data operation record relates to the system automatically constructing the data operation record for the original data operation, constructing an index for the operation record (constructing or updating a data operation record chain), packaging the operation record and the index data into a block structure and serializing the block structure into a block file (block file), and storing and distributing the newly-built block file to databases of various nodes of the network through an infrastructure, wherein the data operation record generation and uplink step specifically comprises the following steps:
step S131, monitoring the operation behavior of the original data on the chain, and constructing a data operation record for the operation of the original data.
In the invention, the data operation record uplink component monitors the operation behavior of original data through a monitoring mechanism, and when an original data uplink or inquiry operation event exists, the data operation record uplink component automatically collects the event and constructs a data operation record for the event according to a preset format, and the specific implementation steps are as follows:
(1.1) collecting a fixed number of original data operation events in a fixed time period according to the set block size and the set monitoring time period;
and (1.2) constructing a data operation record for each original data operation event collected in the step according to the set data operation record uplink format. The data operation record uplink format mainly comprises fields such as the hash value of the operated original data, the identity of the data operation party, and the data operation time.
Step S132, construct or update an operation record chain, and construct a data operation record chain index for each data operation record to be linked.
In the present invention, after the collection of data operation events and the construction of data operation records are completed, in order to speed up the access of the original data historical operation records, a data operation record chain index needs to be constructed or updated for the data operation records output in step S131 by constructing or updating an operation record chain, and the specific implementation steps are as follows:
(2.1) classifying the data operation records output in the step S131 according to the operated original data, that is, all operation records of the same original data are classified into one class and recorded as one operation class, and performing the following operations on each original data operation class;
(2.2) creating an index node for each original data operation class, and assigning all operation record addresses (intra-block offset) in the operation class to records fields of the index nodes;
(2.3) calculating the hash of the data abstract corresponding to the data operation class, acquiring the operation record chain head address of the data abstract through retrieving the abstract dictionary tree, and assigning the operation record chain head address to the preNode field of the current index node;
(2.4) updating the newly-built index node address serving as a new first node of the data operation record chain into a corresponding node of the abstract dictionary tree, so as to realize the construction of the index of the newly-added data operation record chain;
step S133, pack the operation record to be linked and the index data into a block structure, and then serialize the block structure into a newly created block file.
This step is very similar to the block construction and sequencing step (step S113) in the original data chaining step, and therefore will not be described here.
And step S134, distributing and storing the newly-built block file to each node in the network through the infrastructure.
This step is very similar to the block distribution storage method in the original data uplink step (step S114), and therefore will not be described here.
Step S140, a data operation record query step.
In the embodiment of the present invention, the data operation record query refers to a process of extracting a hash value of a data digest to be searched from a received data operation record query request of a user, obtaining latest block file information from a block chain network and deserializing the latest block file information into a block structure, then performing a digest dictionary tree search based on the extracted hash value of the data digest to be searched to obtain a first node address of an operation record chain, then performing an operation record chain search to obtain storage addresses of all operation records of the data digest, and finally obtaining all historical operation records of the data digest from the block chain network according to the obtained storage addresses, where the step S140 specifically includes the following steps:
step S141, extracting the hash value of the digest of the data to be checked from the data operation record query request, obtaining the latest block file in the current network, and extracting the root node of the digest dictionary tree from the latest block file.
This step is identical to the step S121 of obtaining the latest block file and extracting the root node of the digest dictionary tree in the original data searching step, and therefore, will not be described again.
And step S142, retrieving the summary dictionary tree based on the extracted data summary hash value to obtain the chain head address of the data operation record to be checked.
In the invention, the data node of the abstract dictionary tree records the storage address of the data abstract and the address of the first node of the operation record chain corresponding to the data abstract, so that the storage position of the first node of the operation record chain can be obtained by searching the abstract dictionary tree, and the specific implementation steps are as follows:
(2.1) starting from a root node of the abstract dictionary tree, judging whether the current node is a data node corresponding to the hash of the data to be checked, and if not, acquiring the next node according to the node index pointer;
(2.2) repeating the steps until a corresponding data node is found, if no such node exists, the data abstract is not uplink, and the process is ended; otherwise, continuing to execute;
and (2.3) analyzing the data nodes retrieved in the step, and acquiring the storage position of the operation record chain head node of the data abstract to be searched.
And step S143, acquiring all operation record storage addresses of the data abstract to be checked through the retrieval operation record chain.
In the embodiment of the invention, the operation record chain is a chain structure which links the data operation records dispersed in each block by an address pointer, and the storage position of the data abstract can be quickly inquired by retrieving the operation record chain, and the specific implementation steps are as follows:
(3.1) traversing the index nodes according to the address pointer sequence from the operation record chain head node, and executing the following steps on each node;
(3.2) resolving the records field of each index node, extracting a data operation record storage address from the records field and adding an operation record result set;
(3.3) returning the complete data operation record storage address set;
in step S144, the operation record is obtained from the block according to the storage address.
According to the data operation record storage address set output in step S143, all the historical operation records of the data summary to be checked can be directly obtained from the corresponding block structure and returned.
The above steps of the embodiment of the present invention are not limited to the illustrated order, for example, the order of step S120 and step S130 may be completely interchanged.
In a word, the invention optimizes the organization form of the data abstract on the chain by designing the abstract dictionary tree, constructs a centralized index for the data abstract on the cross-block chain, and realizes the high-efficiency query of the data abstract on the chain; the organization form of the data operation records on the chain is optimized by designing the data operation record chain, and indexes are built for the data operation records through the chain structure, so that the efficient query of the data operation records on the chain is realized; the invention further establishes the incidence relation from the data abstract to the data operation record by storing the first node address of the data operation record chain into the abstract dictionary tree, thereby accelerating the query efficiency of the data operation record.
In summary, the index-based on-chain data access method according to the embodiment of the present invention has the following beneficial effects:
(1) aiming at the problem that the traditional block chain data query efficiency is low and the query requirement of the uplink data abstract cannot be met, the embodiment of the invention constructs indexes for all uplink data abstracts in the block chain by designing the abstract dictionary tree and stores the index root node into the block head of the latest block so as to realize the efficient query of the data abstract.
(2) Aiming at the problem that the traditional block chain data query efficiency is low and the query requirement of the uplink data operation record cannot be met, the embodiment of the invention designs the data operation record chain, links all the operation records of the same data in a chain structure, stores the first node address of the operation records into the node corresponding to the abstract dictionary tree (the specific node refers to the node of the data abstract associated with the operation record in the abstract dictionary tree), and establishes the association relationship from the data abstract to the data operation record so as to realize the efficient query of the data operation record.
(3) The embodiment of the invention provides an index-based on-chain data efficient query method and device, provides a data query and retrieval optimization idea and scheme, provides an overall design framework and a process for querying a set of on-chain data abstract and data operation record on the basis of the data query and retrieval optimization idea and scheme, and designs a block chain system more fitting the practical application scene of a block chain.
Aiming at the query operation of the original data, indexes are constructed for all data abstracts in a block chain by designing an abstract dictionary tree, so that the acquisition of the data abstracts on the chain is accelerated, and the quick and efficient query of the original data is realized; aiming at the query of the historical data operation records, the invention designs a data operation record chain, links all the operation records of the same data in a chain structure, stores the address of the first node of the operation records into the node corresponding to the abstract dictionary tree (the specific node refers to the node of the data abstract associated with the operation records in the abstract dictionary tree), and establishes the association relationship from the data abstract to the data operation records, thereby accelerating the query efficiency of the data operation records.
The index-based on-chain data efficient query device comprises a processor and a memory, wherein the memory is stored with computer instructions, the processor is used for executing the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the device realizes the steps of the index-based on-chain data query method.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the index-based on-chain data efficient query method. The computer readable storage medium may be a tangible storage medium such as an optical disk, a U disk, a floppy disk, a hard disk, and the like.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An index-based on-chain data query method is characterized by comprising the following steps:
an original data chaining step, which is used for extracting a data abstract for each original data to be chained, constructing or updating an abstract dictionary tree index for the extracted data abstract, packaging the data abstract to be chained and the constructed abstract dictionary tree index into a block structure and performing chaining processing, wherein the abstract dictionary tree is constructed by taking the hash value of the chained data abstract distributed in different blocks as a key value;
a data operation record generating and chaining step, configured to construct a data operation record for an original data operation, construct or update a data operation record chain index for the constructed data operation record, package the data operation record and the data operation record chain index into a block structure, and perform chaining processing, where the data operation record chain is designed to link all operation records of the same data in a chain structure, and a first node address of the data operation record chain is stored in a corresponding node in the abstract dictionary tree;
the method comprises the steps of raw data query, namely extracting a hash value of a data abstract to be queried from a received raw data query request of a user, acquiring a latest abstract dictionary tree root node from a latest block of a block chain network, acquiring an abstract storage address by executing abstract dictionary tree retrieval, acquiring a data abstract according to the acquired storage address and acquiring raw data by analyzing and accessing an access address of raw data in the data abstract; and
and a data operation record query step, which is used for extracting the hash value of the associated data abstract from the data operation record query request of a user after receiving the data operation record query request, acquiring the root node of the latest abstract dictionary tree from the latest block of the block chain network, acquiring the head node address of the operation record by executing the abstract dictionary tree retrieval based on the extracted hash value of the data abstract, executing the operation record chain retrieval based on the acquired head node address of the operation record to acquire the storage address of all operation records of the data abstract, and acquiring the historical operation record of the data abstract from the block chain network according to the acquired storage address.
2. The method of claim 1,
the nodes of the abstract dictionary tree comprise index nodes and data nodes; an index key and a pointer array are stored in the index node, and each pointer in the pointer array points to the next node of a different path; the data nodes are stored with index keys, pointers pointing to the addresses of the data nodes corresponding to the previous version of the current data summary in the summary dictionary tree, the storage addresses of the data summaries corresponding to the index keys of the current nodes and the addresses of the first nodes of the data summary operation record chains corresponding to the index keys of the current nodes;
the data operation record chain is of a single linked list structure, the elements of the data operation record chain are operation record arrays, the data operation record chain comprises index nodes and data nodes, and index pointers and data pointer arrays are stored in the index nodes of the data operation record chain; the data operation record chain data node comprises at least part of the following field information: a data digest hash value, a data operator, a timestamp, and an operation type.
3. The method of claim 2, wherein the value of the index key in the index node is a public prefix of the hash value of the data digest, and the value of the index key in the data node is the full hash value of the data digest.
4. The method of claim 2 or 3, wherein the step of enchaining the original data comprises:
collecting original data to be chained, and constructing a data abstract for each original data;
constructing a summary dictionary tree index for each data summary to be linked by constructing or updating a summary dictionary tree;
packaging the abstract of the data to be linked and the abstract dictionary tree index data into a block structure, and serializing the block structure into a newly-built block file; and
and distributing and storing the newly-built block file to each node in the network.
5. The method of claim 4, wherein the constructing a digest dictionary tree index for each to-be-uplink data digest by constructing or updating a digest dictionary tree comprises:
calculating the hash value of each data abstract, and creating a data node for the data abstract, wherein the value of an index key of the data node is the hash value of the current data abstract;
acquiring a current abstract dictionary tree, and constructing the abstract dictionary tree under the condition that the current abstract dictionary tree does not exist;
determining whether each data abstract is a newly added abstract, if not, acquiring a hash value of a version of the data abstract, searching a dictionary tree of the current abstract to acquire data nodes corresponding to the data abstract of the historical version, and assigning the addresses of the data nodes to the pointers of the current data nodes; and if the new summary is added, the pointer of the current data node is nulled, and each newly-added data node is inserted into the current summary dictionary tree.
6. The method of claim 1, wherein the data operation record generating and winding step comprises:
monitoring the operation behavior of original data on a chain, and constructing a data operation record for the operation of the original data;
constructing a data operation record chain index for each data operation record to be chained by constructing or updating an operation record chain;
packaging the operation records to be linked and the index data into a block structure, and then serializing the block structure into a newly-built block file; and
and distributing and storing the newly-built block file to each node in the network.
7. The method of claim 2,
the raw data querying step comprises: after receiving an original data query request of a user, extracting a hash value of a to-be-queried data abstract;
acquiring a latest block file in a current network and extracting a root node of a summary dictionary tree from the latest block file;
retrieving a summary dictionary tree based on the extracted hash value of the data summary to be checked to obtain a storage address of the data summary to be checked;
acquiring a data summary to be checked from the block according to the data summary storage address; and
extracting an original data access address in the data abstract, and acquiring original data by accessing the address;
the data operation record query step comprises the following steps:
extracting the hash value of the data summary to be checked in the data operation record query request based on the data operation record;
acquiring a latest block file in a current network and extracting a root node of a summary dictionary tree from the latest block file;
retrieving a summary dictionary tree based on the extracted data summary hash value to obtain a chain head address of the data operation record to be checked;
acquiring all operation record storage addresses of the data abstract to be checked through a retrieval operation record chain;
and acquiring the operation record from the block according to the storage address.
8. The method of claim 7, wherein the constructing a chain index of data operation records for each data operation record to be linked by constructing or updating a chain of operation records comprises:
classifying the constructed data operation records according to the operated original data, so that the operation records of the same original data are classified into one class;
creating an index node for each original data operation class, and assigning all operation record addresses in the operation class to a data pointer array field of the index node;
calculating a hash value of the data abstract corresponding to the current data operation class, acquiring an operation record chain head address of the data abstract through a retrieval abstract dictionary tree, and assigning the operation record chain head address to an index pointer field of the current index node;
and updating the newly-built index node address serving as a new first node of the data operation record chain into a corresponding data node of the summary dictionary tree, thereby realizing the construction of the index of the newly-added data operation record chain.
9. An index-based on-chain data query device comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, and when the computer instructions are executed by the processor, the device implements the steps of the method according to any one of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111027751.9A CN113901131B (en) | 2021-09-02 | 2021-09-02 | Index-based on-chain data query method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111027751.9A CN113901131B (en) | 2021-09-02 | 2021-09-02 | Index-based on-chain data query method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113901131A true CN113901131A (en) | 2022-01-07 |
CN113901131B CN113901131B (en) | 2024-06-07 |
Family
ID=79188488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111027751.9A Active CN113901131B (en) | 2021-09-02 | 2021-09-02 | Index-based on-chain data query method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113901131B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385996A (en) * | 2022-01-10 | 2022-04-22 | 北京新华夏信息技术有限公司 | Block chain consensus method and system based on node identity hierarchical management |
CN116521698A (en) * | 2023-05-09 | 2023-08-01 | 重庆数字城市科技有限公司 | Data uplink method and system based on abstract information |
CN117997510A (en) * | 2024-02-01 | 2024-05-07 | 交通运输部公路科学研究所 | Block chain-based integrated travel data sharing method and system |
CN118332049A (en) * | 2024-06-12 | 2024-07-12 | 中科山水(北京)科技信息有限公司 | Ecological resource data synchronization method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165224A (en) * | 2018-08-24 | 2019-01-08 | 东北大学 | A kind of indexing means being directed to keyword key on block chain database |
WO2019117651A1 (en) * | 2017-12-13 | 2019-06-20 | 서강대학교 산학협력단 | Search method using data structure for supporting multiple search in blockchain-based iot environment, and device according to method |
CN111488614A (en) * | 2020-04-08 | 2020-08-04 | 北京瑞策科技有限公司 | Digital identity storage method and device based on service data block chain |
CN112800065A (en) * | 2021-02-09 | 2021-05-14 | 北京工业大学 | Efficient data retrieval method based on improved block storage structure |
CN113127562A (en) * | 2021-03-30 | 2021-07-16 | 河南九域腾龙信息工程有限公司 | Low-redundancy block chain data storage and retrieval method and system |
CN113139100A (en) * | 2021-04-27 | 2021-07-20 | 中国科学院计算技术研究所 | Network flow real-time indexing method and system |
-
2021
- 2021-09-02 CN CN202111027751.9A patent/CN113901131B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019117651A1 (en) * | 2017-12-13 | 2019-06-20 | 서강대학교 산학협력단 | Search method using data structure for supporting multiple search in blockchain-based iot environment, and device according to method |
CN109165224A (en) * | 2018-08-24 | 2019-01-08 | 东北大学 | A kind of indexing means being directed to keyword key on block chain database |
CN111488614A (en) * | 2020-04-08 | 2020-08-04 | 北京瑞策科技有限公司 | Digital identity storage method and device based on service data block chain |
CN112800065A (en) * | 2021-02-09 | 2021-05-14 | 北京工业大学 | Efficient data retrieval method based on improved block storage structure |
CN113127562A (en) * | 2021-03-30 | 2021-07-16 | 河南九域腾龙信息工程有限公司 | Low-redundancy block chain data storage and retrieval method and system |
CN113139100A (en) * | 2021-04-27 | 2021-07-20 | 中国科学院计算技术研究所 | Network flow real-time indexing method and system |
Non-Patent Citations (1)
Title |
---|
陈才;杨放春;苏森;双锴;: "考虑节点自主特性的对等网络拓扑优化调整", 北京邮电大学学报, no. 01, 15 February 2011 (2011-02-15) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385996A (en) * | 2022-01-10 | 2022-04-22 | 北京新华夏信息技术有限公司 | Block chain consensus method and system based on node identity hierarchical management |
CN116521698A (en) * | 2023-05-09 | 2023-08-01 | 重庆数字城市科技有限公司 | Data uplink method and system based on abstract information |
CN116521698B (en) * | 2023-05-09 | 2024-09-13 | 重庆数字城市科技有限公司 | Data uplink method and system based on abstract information |
CN117997510A (en) * | 2024-02-01 | 2024-05-07 | 交通运输部公路科学研究所 | Block chain-based integrated travel data sharing method and system |
CN118332049A (en) * | 2024-06-12 | 2024-07-12 | 中科山水(北京)科技信息有限公司 | Ecological resource data synchronization method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113901131B (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113901131B (en) | Index-based on-chain data query method and device | |
CN111046034B (en) | Method and system for managing memory data and maintaining data in memory | |
US9984128B2 (en) | Managing site-based search configuration data | |
US9124612B2 (en) | Multi-site clustering | |
CN113986873B (en) | Method for processing, storing and sharing data modeling of mass Internet of things | |
US20030097359A1 (en) | Deduplicaiton system | |
CN112685433B (en) | Metadata updating method and device, electronic equipment and computer-readable storage medium | |
CN111008521B (en) | Method, device and computer storage medium for generating wide table | |
CN106062751A (en) | Managing data profiling operations related to data type | |
CN104769586A (en) | Profiling data with location information | |
CN111324604A (en) | Database table processing method and device, electronic equipment and storage medium | |
CN113704790A (en) | Abnormal log information summarizing method and computer equipment | |
CN111506589A (en) | Block chain data service system based on alliance chain, access method and storage medium | |
CN111125213A (en) | Data acquisition method, device and system | |
CN109947730B (en) | Metadata recovery method, device, distributed file system and readable storage medium | |
CN109189759A (en) | Method for reading data, data query method, device and equipment in KV storage system | |
CN108763323A (en) | Meteorological lattice point file application process based on resource set and big data technology | |
CN111259082B (en) | Method for realizing full data synchronization in big data environment | |
CN111782886A (en) | Method and device for managing metadata | |
CN117170908A (en) | Data exchange method, system and device based on heterogeneous data of network system | |
KR20130048025A (en) | Device for handling korean variable message format message and method thereof | |
CN112131228A (en) | FABRIC-based alliance chain system convenient for data retrieval | |
CN110727726B (en) | Method and system for extracting data from document database to relational database | |
CN116701414A (en) | Block chain-based data processing method, device, equipment and readable storage medium | |
CN114691700A (en) | Kafaka cluster-based intelligent park retrieval method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |