CN113704248A

CN113704248A - Block chain query optimization method based on external index

Info

Publication number: CN113704248A
Application number: CN202110784835.0A
Authority: CN
Inventors: 郭少勇; 阮琳娜; 亓峰; 马圳江; 王科特
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2021-11-26
Anticipated expiration: 2041-07-12
Also published as: CN113704248B

Abstract

The invention provides a block chain query optimization method based on external indexes, which comprises the following steps: creating a hierarchical index based on the specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values; and performing block chain transaction query based on the hierarchical index. By the block chain query optimization method based on the external index, the efficiency of range query on specific attributes can be improved, and therefore query performance of the block chain is optimized.

Description

Block chain query optimization method based on external index

Technical Field

The invention relates to the technical field of block chains, in particular to a block chain query optimization method based on external indexes.

Background

As blockchains develop, they are applied in more and more fields, and thus the query capability of blockchains becomes more and more demanding.

However, most of current blockchain systems use a K-V (Key-Value) model for query, and the limited expression capability of the K-V model results in insufficient semantic description of transaction data and difficulty in supporting complex queries, so that query semantic richness and query efficiency supported by the current blockchain are very limited.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a block chain query optimization method based on an external index.

In a first aspect, the present invention provides a block chain query optimization method based on an external index, including:

creating a hierarchical index based on the specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, wherein the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions containing and not containing corresponding attribute value ranges in the blocks, the second-layer index is a tree index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of the specified transaction attributes contained in the blocks and pointers pointing to transaction storage positions corresponding to the attribute values;

and performing block chain transaction query based on the hierarchical index.

Optionally, the index entry is generated based on the attribute values in the tile and an equal-depth histogram of the corresponding attribute.

Optionally, the index tree isB⁺And (4) a tree.

Optionally, the method further comprises:

and if the leaf node of the index tree is determined to be full, directly generating a new leaf node as a newly-added attribute value in the rightmost leaf node storage block and a pointer pointing to a transaction storage position corresponding to the newly-added attribute value.

Optionally, the blockchain transaction query is a trace back query;

the conducting of the block chain transaction query based on the hierarchical index comprises:

acquiring a time window condition, a query attribute and an attribute value corresponding to the query attribute for querying the target transaction;

determining a set of blocks that satisfy the time window condition;

determining a target block set containing the target transaction from the block sets meeting the time window condition based on the first-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute;

and traversing and querying each block in the target block set based on the second-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute to obtain a transaction pointer set pointing to a query result.

Optionally, the blockchain transaction query is an on-chain connection query;

acquiring a time window condition, a first connection table, a second connection table and connection attributes for on-chain connection query;

determining a set of blocks that satisfy the time window condition;

respectively determining a first block set containing transactions in the first connection table and a second block set containing transactions in the second connection table from the block sets meeting the time window condition on the basis of the first layer index corresponding to the connection attribute;

if it is determined that the attribute value range corresponding to the connection attribute in the first block and the attribute value range corresponding to the connection attribute in the second block have an intersection, performing sorting, merging and connecting between the first block and the second block based on the second-layer index corresponding to the connection attribute to obtain a query result;

wherein the first block belongs to the first set of blocks and the second block belongs to the second set of blocks.

Optionally, the blockchain transaction query is a federated query;

acquiring a time window condition, connection attributes, an on-chain connection table and a third block set containing transactions in an off-chain connection table for joint query;

determining a set of blocks that satisfy the time window condition;

determining a fourth block set containing transactions in the on-chain connection table from the block sets meeting the time window condition based on the first layer index corresponding to the connection attribute;

if it is determined that the attribute value range corresponding to the connection attribute in the third block and the attribute value range corresponding to the connection attribute in the fourth block have an intersection, performing sorting, merging and connecting between the third block and the fourth block based on the second-layer index corresponding to the connection attribute to obtain a query result;

wherein the third block belongs to the third set of blocks and the fourth block belongs to the fourth set of blocks.

In a second aspect, the present invention further provides an external index-based device for optimizing a block chain query, including:

a creation module to create a hierarchical index based on specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, wherein the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions containing and not containing corresponding attribute value ranges in the blocks, the second-layer index is a tree index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of the specified transaction attributes contained in the blocks and pointers pointing to transaction storage positions corresponding to the attribute values;

and the query module is used for carrying out block chain transaction query based on the hierarchical index.

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the external index-based block chain query optimization method according to the first aspect.

In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the extrinsic index based blockchain query optimization method according to the first aspect.

According to the block chain query optimization method based on the external index, the specific hierarchical index combining the bitmap index and the tree index structure is established on the designated transaction attribute, so that the efficiency of range query on the specific attribute can be accelerated, and the query performance of the block chain is optimized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a block chain query optimization method based on external indexes according to the present invention;

FIG. 2 is a schematic diagram of the design of an index structure provided by the present invention;

FIG. 3 is a schematic flow chart of a trace-back query provided by the present invention;

FIG. 4 is a schematic flow chart of a link query provided by the present invention;

FIG. 5 is a schematic flow diagram of a federated query provided by the present invention;

FIG. 6 is a schematic diagram of a database design provided by the present invention;

FIG. 7 is a graph comparing Q1 query performance at different data set sizes provided by the present invention;

FIG. 8 is a comparison graph of Q1 query performance at different result set sizes provided by the present invention;

FIG. 9 is a comparison graph of Q3 query performance at different result set sizes provided by the present invention;

FIG. 10 is a graph comparing Q4 query performance at different data set sizes provided by the present invention;

FIG. 11 is a schematic structural diagram of an external index-based blockchain query optimization apparatus according to the present invention;

fig. 12 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The semantic richness and query efficiency of current block chain query are very limited, and the following problems exist in data query: firstly, the semantics of data are not rich enough, and most of block chain systems use a K-V model with limited expression capability, so that the semantic description of transaction data is insufficient, and complex query is difficult to support; secondly, the operation of data is insufficient, the existing system usually stores block data in a K-V database and only supports a simple access mode, so that although the transaction is structured, the existing system cannot support the relational operation on the transaction data; some systems import block data into a downlink database for query, but data migration brings extra system overhead, and data copy storage brings extra storage cost; finally, data integration is difficult. In order to reduce storage, network overhead or protect privacy, the existing scheme stores large-scale data or private data in a down-chain database and stores data summaries on a block chain, but this way makes information related to the same entity stored on and off the chain at the same time, resulting in data integration difficulty. Due to the problems of data consistency and data privacy, a user cannot access the data under the link by using an intelligent contract, and extra storage overhead is introduced when the blockchain data is imported into the data under the link for query.

Aiming at the problem that the relation semantic description of the current block chain system is insufficient and the supported query types are limited, the invention introduces the relation semantic description into the block chain system, and models the block data through a relation model so that the block data supports complex queries composed of basic relation operators such as selection, projection and connection. The system only stores one piece of data, and does not bring extra storage space while improving the query processing capacity of the blockchain system.

And aiming at the problem that the relation query rate of the block chain is low, the invention designs a tree type, a bitmap index and a hierarchical index to improve data access. Based on the index structure, the query of the relation description on the block chain can be optimized, such as tracing query and connection query on the block chain. In addition to this, a chain uplink and downlink connection operator can be implemented to support querying for data integration.

Fig. 1 is a schematic flowchart of a block chain query optimization method based on external indexes, as shown in fig. 1, the method includes the following steps:

step 100, establishing a hierarchical index based on the specified transaction attribute; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values;

step 101, performing a block chain transaction query based on the hierarchical index.

Specifically, the lack of data semantics makes it difficult for the blockchain system to support complex queries, and in order to support rich queries on the blockchain data, data semantics must be added first. In fact, transactions are structured, and a typical transaction contains two attributes: system attributes and custom attributes. The system attributes comprise transaction hashes, transaction senders and the like, and the user-defined attributes comprise intelligent contract parameters and the like. Transactions invoking the same kind of operation have the same structure, transactions of the same type are described by a uniform relationship mode, the transaction structure is determined by declaring a relationship table mode, and the attribute type can be character, number or other types. Table attributes fall into two categories: system attributes and application attributes. The system attributes are automatically added when creating a table mode, such as transaction type, transaction sender, etc.; application properties are explicitly specified by the user, typically parameters related to the application. The user may send a special transaction to create the table schema and synchronize the table schema information between the nodes through a consensus algorithm.

In practical applications, data is usually stored in both the blockchain system and the database. Data is stored in a blockchain system, also referred to as on-chain data, organized in a blockchain structure. Large-scale data is managed by a local database, also called as data under the chain, stored in relational tables under the chain. Accordingly, an uplink query, i.e., querying the data stored in the blockchain, and a downlink-uplink joint query, i.e., querying the downlink-uplink data simultaneously.

In order to improve the query performance of block-oriented storage, the invention designs an index structure to accelerate data access, and mainly aims at three basic operations: (1) a block number, transaction number, or timestamp is given to acquire the block. (2) All transaction data belonging to a certain data table are acquired. (3) And obtaining the transaction meeting the condition under the specific query condition of a certain attribute. The design of the index structure in the block chain system faces many challenges, firstly, the index update affects the writing performance of the system, and a single block may cause a large number of index updates, which affects the writing and reading performance of the system. Second, there is a need to support queries based on time windows. The blockchain stores all historical transactions and the user may query for transactions within a particular time. The invention therefore proposes a write-efficient index structure: tree, bitmap index and hierarchical index that uses both together.

Fig. 2 is a schematic design diagram of an index structure provided in the present invention, and as shown in fig. 2, the index structure includes a block index (block index), a table bitmap index (table bitmap index), and a level index (level index).

The block index is a tree index containing block number, transaction number and time stamp information, and all block information on the block chain can be recorded by using the block index.

The block index may be one B⁺A tree represented as a triplet (bid, tid, Ts), where bid is the tile number and tid is the first transaction number in the tile, where a transaction number can uniquely identify a transaction, identified here by an incremented integer, and Ts represents the timestamp of the tile, as shown in fig. 2 (n, j, T)_sn) The leaf node stores a pointer to a block storage location, p as shown in FIG. 1_n. For any two blocks b_iAnd b_jIf block b is present_iPacking time ratio of (b)_jEarly, then block b_iBlock number, transaction number, timestamp are all earlier than block b_j。

The generation of the new block triggers the update of the block index, and since bid, tid and Ts of the new block are increased with the expansion of the block, the search code value corresponding to the new block is always directly additionally stored in the rightmost leaf child node. Different from the conventional B⁺In the embodiment of the invention, if the leaf node is full, the block index can directly generate a new leaf node as the rightmost leaf node to store the search code value, so that the waste of storage resources caused by the splitting of the leaf node is avoided, and the updating effect is realizedThe rate performance is higher.

A bitmap is a practical and less overhead data structure. The bitmap can represent rich meaning and occupies small space. One mark of the bitmap is associated with data distribution in the block chain, different meanings are given to the bitmap, useless blocks can be filtered, and the query efficiency is improved.

For example, a bitmap is given the meaning of a relational table (table), and the table forms a corresponding bitmap on the block chain, such as the table-level bitmap index shown in FIG. 2. If there is relation table data in the k block, the k bit of the bitmap is 1, otherwise it is 0.

Bitmaps may also be built to query on system attributes. In addition to this, it can also be established on the common attribute values.

The generation of a new block triggers the updating of the bitmap index. If the block is the ith block in the system and exactly contains the transaction of a certain relationship table, the ith position of the bitmap corresponding to the relationship table is set to 1, otherwise, the ith position is set to 0. Thus, the update overhead of the bitmap index is only one set operation. For the query request on the designated relation table, the block containing the table transaction can be found by searching the bit of the 1 in the bitmap corresponding to the table, thereby avoiding scanning all the blocks.

The hierarchical index can be established on the designated transaction attribute, and the index is established according to the attribute value, so that the speed of range query on the specific attribute can be accelerated.

The hierarchical index comprises a first layer index and a second layer index, wherein the first layer index is a bitmap index consisting of block numbers and index items corresponding to each block number. Alternatively, the index entry may be generated based on the attribute values in the tile and the equal-depth histograms of the corresponding attributes, each bucket (bucket) in the discrete attribute equal-depth histogram representing one attribute value, the equal-depth histograms of the continuous attributes, generated according to the attribute value distribution, and the bucket representing the attribute value range, such as the bucket shown in fig. 2₁The attribute values are represented in the range of 0-300. Each block corresponds to an index item, the index item is represented by a bitmap, the data range in the block is recorded, and 1 and 0 in the bitmap respectively represent a packet in the blockTransactions with and without corresponding attribute value ranges, e.g. blocks as shown in FIG. 2₁If the corresponding bitmap is 1001001101, the 1 st bit in the bitmap is 1, which means block₁Includes a bucket₁Transactions in the attribute value range (i.e., 0-300), where a bit 2 of the bitmap is 0, indicating a block₁Does not contain a bucket₂Trading attribute value ranges (i.e., 300-600).

The second layer of the hierarchical index is a tree index composed of index trees corresponding to each block, each block corresponds to one index tree established on an index column, leaf nodes of the index trees store attribute values of specified transaction attributes contained in the blocks and pointers to transaction storage positions corresponding to the attribute values, and the purpose of creating the second layer of index is to avoid scanning the whole block. Alternatively, the index tree may be one B⁺And (4) a tree.

Optionally, in this embodiment of the present invention, the second-level index of the hierarchical index may adopt a leaf node splitting manner that is the same as that of the block index, and if it is determined that a leaf node of the index tree is full, a new leaf node is directly generated as a newly added attribute value in the rightmost leaf node storage block and a pointer pointing to a transaction storage location corresponding to the newly added attribute value.

In the above hierarchical index, the depth of the equal-depth histogram may be established according to a specific scene, and the equal-depth histogram and the hierarchical index may be established according to a specific relation table and data. The index is generally stored in the memory, but when the memory occupied by the index is too large, the early index structure can be stored on the disk, and the recent index structure is cached in the memory.

According to the block chain query optimization method based on the external index, the specific hierarchical index combining the bitmap index and the tree index structure is created on the designated transaction attribute, so that the efficiency of range query on the specific attribute can be improved, and the query performance of the block chain is optimized.

Optionally, the blockchain transaction query is a trace back query;

the block chain transaction query based on the hierarchical index comprises the following steps:

determining a block set meeting a time window condition;

determining a target block set containing target transactions from the block sets meeting the time window condition based on the first-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute;

and traversing each block in the query target block set based on the second-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute to obtain a transaction pointer set pointing to the query result.

Specifically, a traceback query may be implemented by scanning all blocks and checking all transactions in a block, but this approach is inefficient. The query performance can be optimized through the bitmap index, for example, all transactions sent by a certain sender are traced, corresponding blocks are obtained through the bitmap by tracing all the transactions of the sender, and then the blocks are scanned to obtain a query result, so that all the blocks are prevented from being scanned. In the embodiment of the invention, the query efficiency can be higher through the hierarchical index.

Specifically, in the embodiment of the present invention, a trace-back query is performed based on a hierarchical index, and first, a time window condition, a query attribute, and an attribute value corresponding to the query attribute for querying a target transaction may be obtained according to a query statement or a query condition of the trace-back query, where the query attribute may be one or more; then, a block set meeting the time window condition can be obtained through the time query index structure, for example, a block set of a transaction containing a specified time window can be quickly determined based on the block index; on the basis of the block set, querying a first-layer index corresponding to the query attribute based on an attribute value corresponding to the query attribute to obtain a target block set containing target transactions, and if the query attribute is multiple, obtaining an intersection of the block sets of each query attribute obtained based on the first-layer index to obtain the target block set containing the target transactions; and finally, querying the second-layer index corresponding to the query attribute based on the attribute value corresponding to the query attribute, traversing each block in the query target block set to obtain a transaction pointer set pointing to the query result, wherein the transaction pointer set comprises pointers pointing to the query result, and reading the transaction from the disk according to the pointers and outputting the query result.

Fig. 3 is a schematic flowchart of a Trace-back query provided by the present invention, and as shown in fig. 3, a process of performing a Trace-back query Q ═ Trace, (SenID, o), (Tname, p), [ c, e ] (where Trace represents the Trace-back query, SenID represents the query attribute as a transaction sender, o represents a corresponding attribute value, Tname represents the query attribute as a transaction type, p represents a corresponding attribute value, and [ c, e ] represents a time window) based on a hierarchical index includes the following steps:

step 300, starting;

in particular, a hierarchical index I is created over the system fields SenID and Tname_dAnd I_nWhich covers all transactions within the blockchain system.

Step 301, obtaining a set B by querying a block index through time;

specifically, the Search function queries the block index according to the time window and obtains a set B, where B includes i if the block i is in the time window.

Step 302, search I_dAnd I_nTo obtain sets B' and B ";

specifically, the firstLevelBitmap function searches for I_dAnd I_nGet the sets B 'and B ", if block j contains the transaction sent by o and j is in B, then j is in B', if block j contains the transaction of transaction type p and is in B, then j is contained in B".

Step 303, intersecting the B 'and the B' to obtain a meeting attribute set;

specifically, intersecting B' and B ″ may result in a block that is within a specified time window and contains transactions with a SenID attribute value of o and a Tname attribute value of p.

Step 304, searching the second-layer index to obtain a query result;

specifically, the SecondLevel function is determined by searching the secondThe two-layer index obtains a query result, and for each block, the I is queried respectively by taking 'SenID ═ o' and 'Tname ═ p' as query conditions_dAnd I_nAnd respectively obtaining transaction pointer sets.

305, obtaining a transaction set output result;

specifically, the intersection of the transaction pointer sets includes pointers pointing to the query result, and the Read function reads transactions from the disk according to the pointers and outputs the query result.

And step 306, ending.

Optionally, the blockchain transaction query is an on-chain connection query;

determining a block set meeting a time window condition;

respectively determining a first block set containing transactions in a first connection table and a second block set containing transactions in a second connection table from block sets meeting time window conditions based on the first layer indexes corresponding to the connection attributes;

the first block belongs to the first block set, and the second block belongs to the second block set.

Specifically, the traditional connection algorithm, such as the nested loop connection algorithm, cannot be directly applied to the block chain because of huge query overhead caused by data scattered storage. The embodiment of the invention can realize one-time scanning Hash connection algorithm based on the hierarchical index, for the connection query on the table r and the table s, firstly, all blocks are scanned, a transaction partition is constructed by using a Hash function, a Hash index is established for each partition, and for each transaction t in the partition, the Hash index is explored to obtain all transaction t meeting the connection condition. With bitmap indexing, the performance of join queries can be improved because only the block containing the transaction of r or s needs to be read. While query performance may be made higher by hierarchical indexing.

Specifically, in the embodiment of the present invention, link query is performed based on the hierarchical index, and first, a time window condition, a first link table, a second link table, and a link attribute for link query may be obtained according to a query statement or a query condition; then, a block set meeting the time window condition can be obtained through the time query index structure, for example, a block set of a transaction containing a specified time window can be quickly determined based on the block index; on the basis of the block set, based on the first layer index corresponding to the connection attribute, a first block set containing transactions in the first connection table and a second block set containing transactions in the second connection table can be obtained; if the first block belongs to the first block set, the second block belongs to the second block set, and the attribute value range of the corresponding connection attribute in the first block and the attribute value range of the corresponding connection attribute in the second block have intersection, the first block and the second block may generate a connection query result, so that the query result may be obtained by performing sorting, merging and connecting between the first block and the second block based on the second-layer index corresponding to the connection attribute.

Fig. 4 is a schematic flow chart of the On-chain link query provided by the present invention, and as shown in fig. 4, a process of performing On-chain link query Q ═ On, r, s, r.attr ═ s.attr, [ c, e ] (where On represents the On-chain link query, r and s represent the link table, ═ represents the equivalent link condition, [ c, e ] represents the time window, and attr represents the link attribute) based On the hierarchical index includes the following steps:

step 400, start;

specifically, I is created above attr of table r and table s_rAnd I_sAnd (4) hierarchical indexing.

Step 401, querying a block index to obtain a set B;

specifically, the Search function looks up the chunk index and gets a set B, if chunk i is in the time window, then i is contained in B.

Step 402, search I_rAnd I_sThe first layer of indexing of (1) to obtain blocks containing tables r and s in the time window, which are sets B' and B ";

specifically, the firstLevelBitmap function query I_rAnd I_sCan be obtained within the time window and contains the transaction block belonging to table r or table s, recorded in B' and B ″.

Step 403, judge L (k.u < m.l < k.l > m.u);

specifically, let e be for each element i in B' and each element j in B ″_riAnd e_sjRespectively represent I_rIndex entry and I of middle block I_sThe index entries of the middle bin j, k and m, represent buckets in the equal-depth histogram, with upper and lower bounds of (l, u). If the Interact function returns TRUE, i.e.

m∈e_sjSo that (k.u)<m.l∨k.l>m.u) is true, blocks i and j may produce join query results. Where k.u represents the value of the lower bound u of bucket k, m.l represents the value of the upper bound l of bucket m, k.l represents the value of the upper bound l of bucket k, and m.u represents the value of the lower bound u of bucket m.

Step 404, performing sorting, merging and connecting between the blocks i and j based on the second layer index, and outputting a result;

specifically, the sorting, merging and connecting on the blocks i and j are performed through the SortMergeJoin function based on the second-layer index, and finally the query result is output.

And step 405, ending.

Optionally, the blockchain transaction query is a federated query;

determining a block set meeting a time window condition;

if it is determined that the attribute value range of the corresponding connection attribute in the third block and the attribute value range of the corresponding connection attribute in the fourth block have an intersection, performing sorting, merging and connecting between the third block and the fourth block based on the second-layer index corresponding to the connection attribute to obtain a query result;

and the third block belongs to the third block set, and the fourth block belongs to the fourth block set.

Specifically, in the embodiment of the present invention, the link uplink and downlink joint query may also implement a one-pass scan hash join algorithm based on the hierarchical index. Firstly, acquiring data of a link relation table s from a link database, then scanning a block to acquire data in a link relation table r, and finally acquiring a query result through a Hash connection algorithm. The use of hierarchical indexes in the query optimizes query performance, and only the blocks containing transactions belonging to the relationship table r will be read.

Specifically, in the embodiment of the present invention, joint query is performed based on the hierarchical index, and first, a time window condition, a connection attribute, and an on-chain connection table used for joint query may be obtained according to a query statement or a query condition, and a third block set including a transaction in the off-chain connection table is obtained through the off-chain database; then, a block set meeting the time window condition can be obtained through the time query index structure, for example, a block set of a transaction containing a specified time window can be quickly determined based on the block index; on the basis of the block set, a fourth block set containing the transaction in the linked list can be obtained based on the first-layer index corresponding to the connection attribute; if the third block belongs to the third block set, the fourth block belongs to the fourth block set, and the attribute value range of the corresponding connection attribute in the third block and the attribute value range of the corresponding connection attribute in the fourth block have an intersection, the third block and the fourth block may generate a connection query result, so that the third block and the fourth block may perform sorting, merging and connection based on the second-layer index corresponding to the connection attribute to obtain the query result.

Fig. 5 is a schematic flow chart of the joint query provided by the present invention, and as shown in fig. 5, a process of performing the joint query Q ═ Onoff, r, s, r.attr ═ s.attr, [ c, e ] (where Onoff denotes the joint query, r denotes an on-chain link table, s denotes an off-chain link table, denotes an equivalent link condition, [ c, e ] denotes a time window, and attr denotes a link attribute) based on a hierarchical index includes the following steps:

step 500, start;

specifically, a hierarchical index I is established on an attribute r.attr_r。

Step 501, obtaining a set B by inquiring tree type indexes through time;

specifically, the Search function queries the block index according to the time window and obtains a set B, and if the block i is in the time window, i is included in B.

502, sequencing the data under the chain according to the s.attr attribute;

specifically, before the data in the chain is transmitted to the block chain, the data in the chain is sorted according to the s.attr attribute.

Step 503, according to [ S ]_min，S_max]Filtering blocks which do not contain the connection result;

specifically, after the sort operation, the range of s.attr attribute values through the data under the chain [ S ]_min，S_max]Blocks not containing the join result can be filtered to obtain a range of s.attr attribute values S_min，S_max]Block set B' of intra transactions.

Step 504, obtaining all blocks which are in the time window and contain the data of the table r through the first layer index;

specifically, the firstLevelBitmap function query I_rCan be obtained within the time window and contains the transaction block belonging to table r, recorded in B ".

Step 505, judge L (k.u)<S_min∨k.l>S_max)；

Specifically, let e be for each element d in B' and each element i in B ″_riIs represented by_rThe index entry of the middle bin i, k, represents a bucket in the equal-depth histogram, with its upper and lower bounds being (l, u). If the Interact function returns TRUE, i.e.

So that (k.u)<S_min∨k.l>S_max) If true, blocks d and i may generate join query results. Where k.u represents the value of the lower bound u of bucket k and k.l represents the value of the upper bound l of bucket k.

Step 506, the second layer index performs sorting, merging and connecting between the block i and the block d, and outputs a result;

specifically, if the Intersect function returns TRUE, tile i will participate in the connection process. And performing sorting and merging connection between the blocks i and d through a SortMergeJoin function by using the second-layer index to obtain a result, and finally outputting the result.

And step 507, ending.

The query performance provided by the invention is described below by a specific experimental environment.

Fig. 6 is a schematic diagram of a database design provided by the present invention, and as shown in fig. 6, the database includes a database for storing data on a block chain and a database for storing data under the chain, and the query efficiency of different index structures is illustrated by comparing different query performances tested in the same experimental environment.

The test statements are divided into four categories, Q1, Q2, Q3 and Q4, as follows:

wherein Q1 and Q2 test the performance of the traceback query. Q1 is a one-dimensional traceback query tracing all transactions issued by the charity "org 1", whereas Q2 is a two-dimensional traceback query tracing the Transfer transactions issued by "org 1". Q3 and Q4 test the performance of queries on the chain. Q3 is a Range query on Donate. Q4 is a federated query that queries the details of the donated donors. The query performance is compared by the response time of the query.

Testing the data set: in all experiments, the block size was set to 4MB and the transaction size was set to 300 bytes. The experiment first tested the query performance at different result set scales and data set scales. In order to test the query performance of the system under different data set sizes, the query result set size is fixed, and the number of the queried blocks is increased from 500 by taking 500 blocks as a step. In order to test the query performance under different result set scales, the number of blocks is fixed to 1000, and the result set scale is changed. Transactions follow a uniform or gaussian distribution across blocks to test query performance under different data distributions.

SU, SG, BU, BG, LU, and LG represent query executions under uniform distribution of transaction data and gaussian distribution based on block scanning, table-level bitmap indexing, and hierarchical indexing, respectively, where SU represents query execution under uniform distribution of transaction data based on block scanning, SG represents query execution under gaussian distribution of transaction data based on block scanning, BU represents query execution under uniform distribution of transaction data based on table-level bitmap indexing, BG represents query execution under gaussian distribution of transaction data based on table-level bitmap indexing, LU represents query execution under uniform distribution of transaction data based on hierarchical indexing, and LG represents query execution under gaussian distribution of transaction data based on hierarchical indexing. The performance of the traceback query when a single-level index or a two-level index is used under the condition of uniform distribution and Gaussian distribution of transaction data is represented by SIU, SIG, TIU and TIG, wherein the SIU represents the performance of the traceback query when the single-level index is used under the condition of uniform distribution of the transaction data, the SIG represents the performance of the traceback query when the single-level index is used under the condition of Gaussian distribution of the transaction data, the TIU represents the performance of the traceback query when the two-level index is used under the condition of uniform distribution of the transaction data, and the TIG represents the performance of the traceback query when the two-level index is used under the condition of Gaussian distribution of the transaction data.

Tracing and querying: FIG. 7 is a comparison graph of Q1 query performance at different data set sizes provided by the present invention, and FIG. 8 is a comparison graph of Q1 query performance at different result set sizes provided by the present invention. In fig. 7, the result size was fixed to 10000. The hierarchical index-based query processing is significantly better than the other two methods in all cases because the hierarchical index-based method only needs to read query results directly from a disk, and the current result set is relatively small in size. The query processing based on table-level bitmap indexing requires reading all blocks with query results, while the query based on block scanning requires reading all blocks in a block chain, and both of them have relatively high overhead.

Next, the performance of the Q2 query was tested. The SIU and the SIG use the hierarchical indexes established on the SenID of the system to perform two-dimensional tracing inquiry by using a single hierarchical index, and the TIU and the TIG use the two hierarchical indexes established on the SenID and the Tname to perform two-dimensional tracing. The results show that query processing based on hierarchical indexing is clearly superior to the other two methods in all cases.

And (3) range query: the depth of the histogram in this set of experiments was set to 100, and there were 10000 Donate transactions. FIG. 9 is a comparison graph of Q3 query performance at different result set sizes provided by the present invention. In fig. 9, when the size of the transaction result set increases from 1000 to 10000, and the size of the result set increases, the difference between the three methods becomes smaller, because the performance of the query method based on the hierarchical index decreases as the I/O (Input/Output) of the disk increases. Transactions are distributed in more blocks, and thus the query method based on bitmap indexing also degrades in performance.

And (3) joint query: this experiment tested a hierarchical index based joint query. FIG. 10 is a graph comparing Q4 query performance at different data set sizes provided by the present invention. As shown in fig. 10, the performance of the hierarchical index based approach is superior to the other two approaches because it only needs to read the block that intersects the data range under the chain, and reading data based on the second level index can further improve the performance of the query.

The external index-based block chain query optimization device provided by the invention is described below, and the external index-based block chain query optimization device described below and the external index-based block chain query optimization method described above can be referred to correspondingly.

Fig. 11 is a schematic structural diagram of the block chain query optimization device based on external index provided in the present invention, as shown in fig. 11, the device includes:

a creation module 1100 for creating a hierarchical index based on specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values;

the query module 1110 is configured to perform a blockchain transaction query based on the hierarchical index.

Optionally, the index tree is B⁺And (4) a tree.

Optionally, the apparatus further comprises:

the generating module 1120 is configured to directly generate a new leaf node as a new attribute value in the rightmost leaf node storage block and a pointer pointing to a transaction storage location corresponding to the new attribute value if it is determined that a leaf node of the index tree is full.

Optionally, the blockchain transaction query is a trace back query;

the query module 1110 is configured to: acquiring a time window condition, a query attribute and an attribute value corresponding to the query attribute for querying the target transaction; determining a block set meeting a time window condition; determining a target block set containing target transactions from the block sets meeting the time window condition based on the first-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute; and traversing each block in the query target block set based on the second-layer index corresponding to the query attribute and the attribute value corresponding to the query attribute to obtain a transaction pointer set pointing to the query result.

Optionally, the blockchain transaction query is an on-chain connection query;

the query module 1110 is configured to: acquiring a time window condition, a first connection table, a second connection table and connection attributes for on-chain connection query; determining a block set meeting a time window condition; respectively determining a first block set containing transactions in a first connection table and a second block set containing transactions in a second connection table from block sets meeting time window conditions based on the first layer indexes corresponding to the connection attributes; if it is determined that the attribute value range corresponding to the connection attribute in the first block and the attribute value range corresponding to the connection attribute in the second block have an intersection, performing sorting, merging and connecting between the first block and the second block based on the second-layer index corresponding to the connection attribute to obtain a query result; the first block belongs to the first block set, and the second block belongs to the second block set.

Optionally, the blockchain transaction query is a federated query;

the query module 1110 is configured to: acquiring a time window condition, connection attributes, an on-chain connection table and a third block set containing transactions in an off-chain connection table for joint query; determining a block set meeting a time window condition; determining a fourth block set containing transactions in the on-chain connection table from the block sets meeting the time window condition based on the first layer index corresponding to the connection attribute; if it is determined that the attribute value range of the corresponding connection attribute in the third block and the attribute value range of the corresponding connection attribute in the fourth block have an intersection, performing sorting, merging and connecting between the third block and the fourth block based on the second-layer index corresponding to the connection attribute to obtain a query result; and the third block belongs to the third block set, and the fourth block belongs to the fourth block set.

It should be noted that, the apparatus provided in the present invention can implement all the method steps implemented by the method embodiments and achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as the method embodiments in this embodiment are omitted here.

Fig. 12 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 12, the electronic device may include: a processor (processor)1210, a communication Interface (Communications Interface)1220, a memory (memory)1230, and a communication bus 1240, wherein the processor 1210, the communication Interface 1220, and the memory 1230 communicate with each other via the communication bus 1240. The processor 1210 may call logic instructions in the memory 1230 to perform the steps of any of the external index-based blockchain query optimization methods provided by the embodiments, for example: creating a hierarchical index based on the specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values; and performing block chain transaction query based on the hierarchical index.

In addition, the logic instructions in the memory 1230 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can perform the steps of any one of the external index-based blockchain query optimization methods provided in the above embodiments, for example: creating a hierarchical index based on the specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values; and performing block chain transaction query based on the hierarchical index.

In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the steps of any one of the external index-based block chain query optimization methods provided in the foregoing embodiments, for example: creating a hierarchical index based on the specified transaction attributes; the hierarchical index comprises a first-layer index and a second-layer index, the first-layer index is a bitmap index consisting of block numbers and index items corresponding to each block number, the index items are represented by bitmaps, 1 and 0 in the bitmap respectively represent transactions which contain and do not contain corresponding attribute value ranges in the blocks, the second-layer index is a tree-type index consisting of an index tree corresponding to each block, and leaf nodes of the index tree store attribute values of specified transaction attributes contained in the blocks and pointers of transaction storage positions corresponding to the attribute values; and performing block chain transaction query based on the hierarchical index.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A block chain query optimization method based on external indexes is characterized by comprising the following steps:

and performing block chain transaction query based on the hierarchical index.

2. The method of claim 1, wherein the index entries are generated based on equal depth histograms of attribute values and corresponding attributes in the tiles.

3. The method of claim 1, wherein the index tree is B⁺And (4) a tree.

4. The method of claim 1, wherein the method further comprises:

5. The external index-based blockchain query optimization method according to any one of claims 1 to 4, wherein the blockchain transaction query is a traceback query;

determining a set of blocks that satisfy the time window condition;

6. The external index-based blockchain query optimization method according to any one of claims 1 to 4, wherein the blockchain transaction query is an on-chain connection query;

determining a set of blocks that satisfy the time window condition;

7. The external index-based blockchain query optimization method according to any one of claims 1 to 4, wherein the blockchain transaction query is a joint query;

determining a set of blocks that satisfy the time window condition;

8. An apparatus for optimizing a blockchain query based on an external index, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for optimizing external index-based blockchain query according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the extrinsic index based blockchain query optimization method according to any one of claims 1 to 7.