CN114003607A - Block chain data storage method and device - Google Patents

Block chain data storage method and device Download PDF

Info

Publication number
CN114003607A
CN114003607A CN202111337636.1A CN202111337636A CN114003607A CN 114003607 A CN114003607 A CN 114003607A CN 202111337636 A CN202111337636 A CN 202111337636A CN 114003607 A CN114003607 A CN 114003607A
Authority
CN
China
Prior art keywords
data
key
block chain
area
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111337636.1A
Other languages
Chinese (zh)
Inventor
王晓亮
张亚宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xita Technology Co ltd
Original Assignee
Beijing Xita Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xita Technology Co ltd filed Critical Beijing Xita Technology Co ltd
Priority to CN202111337636.1A priority Critical patent/CN114003607A/en
Publication of CN114003607A publication Critical patent/CN114003607A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage method and device of block chain data are disclosed, the method is applied to node equipment of a block chain, a storage service configured by the node equipment comprises an index area and a data area, wherein the data area is a storage area for recording information by using a sequential writing mode; the method comprises the following steps: writing first block chain data to be stored into the data area; calculating a hash value of the first blockchain data; and taking the preset part or all of the hash value as a key, taking the offset address of the first block chain data in the data area as a value, forming a key value pair, and storing the formed key value pair into a first key value table of the index area. By applying the scheme, the block chain data can be stored in the index area and the data area, and the random reading performance of the block chain data is obviously improved.

Description

Block chain data storage method and device
Technical Field
The present disclosure relates to the field of blockchain technologies, and in particular, to a method and an apparatus for storing blockchain data.
Background
The block chain technology, also called distributed ledger technology, is an emerging technology in which several computing devices participate in "accounting" together, and a complete distributed database is maintained together. The blockchain technology has been widely used in many fields due to its characteristics of decentralization, transparency, participation of each computing device in database records, and rapid data synchronization between computing devices.
Generally, the data of the block chain may be organized in a tree structure such as a Merkle tree, wherein nodes of the tree structure may be stored in a key-value pair type database such as a LevelDB, RocksDB, and the like in the form of key-value pairs; however, in the block chain scenario, there is a large random read requirement, and the key value pair type database usually performs poorly in random read; therefore, there is a need for a storage method capable of improving random access performance of blockchain data.
Disclosure of Invention
In view of the above, the present specification discloses a method and an apparatus for storing blockchain data.
According to a first aspect of the embodiments of the present specification, a method for storing blockchain data is disclosed, which is applied to a node device of a blockchain, where a storage service configured by the node device includes an index area and a data area, where the data area is a storage area in which information is recorded in a sequential write manner; the method comprises the following steps:
writing first block chain data to be stored into the data area;
calculating a hash value of the first blockchain data;
and taking the preset part or all of the hash value as a key, taking the offset address of the first block chain data in the data area as a value, forming a key value pair, and storing the formed key value pair into a first key value table of the index area.
Optionally, the method further includes:
acquiring a data query instruction, wherein the data query instruction carries a hash value of second block chain data to be queried;
querying, in the first key value table, a target key value pair that matches the hash value of the second blockchain data; wherein the matching comprises: the preset part or all of the content of the hash value of the second block chain data is the same as the key of the target key value pair;
and reading target block chain data corresponding to the target offset address stored in the value of the target key value pair from the data area, and returning the target block chain data as second block chain data to be inquired.
Optionally, a second key value table supporting storage of a plurality of key name repeated values is also preset in the index area;
the using a preset part or all of the hash value as a key includes: using the prefix or suffix of the hash value as a key;
after composing the key-value pair, the method further comprises:
determining whether existing data exists in a storage region in the first key value table that matches a prefix or a suffix of the hash value; and if so, storing the key-value pairs and the existing data into a storage area matched with the prefix or suffix of the hash value in the second key-value table.
Optionally, the method further includes:
determining whether a storage area in the second key value table that matches a prefix or a suffix of the hash value is empty; and if not, storing the key-value pair increment to a storage area matched with the prefix or suffix of the hash value in the second key value table.
Optionally, in a case that a target key-value pair matching the hash value of the second blockchain data cannot be searched from the first key-value table, the method further includes:
querying, within the second key value table, at least two candidate key value pairs that match a hash value prefix or suffix of the second blockchain data;
reading corresponding at least two alternative block chain data from the data area based on the offset addresses recorded in the at least two alternative key value pairs respectively;
and returning the candidate block chain data of which the hash value is matched with the full text of the hash value of the second block chain data in the at least two candidate block chain data as the second block chain data to be inquired.
Optionally, the index area is located in a memory of the node device.
Optionally, the data format used by the data area is a log file of the Kafka system.
According to a second aspect of the embodiments of the present specification, a storage apparatus for blockchain data is disclosed, which is applied to a node device of a blockchain, where a storage service configured by the node device includes an index area and a data area, where the data area is a storage area for recording information using a sequential write mode; the device comprises:
the first writing module is used for writing the first block chain data to be stored into the data area;
the calculating module is used for calculating a hash value of the first block chain data;
and the second writing module is used for forming a key value pair by taking the preset part or all of the hash value as a key and the offset address of the first block chain data in the data area as a value, and storing the formed key value pair into the first key value table of the index area.
According to a third aspect of the embodiments of the present specification, a computer device is disclosed, which at least comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of the above-mentioned aspect embodiments when executing the program.
According to a fourth aspect of embodiments herein, a computer-readable storage medium is disclosed, on which a computer program is stored, which, when executed by a processor, implements the method of any of the above-described aspect embodiments.
In the above technical solution, on one hand, the storage service configured by the block link node device includes an index area and a data area, and the data area records data in a sequential writing manner; therefore, after the block chain data are stored by applying the scheme, the offset address of the block chain data to be checked in the data area can be quickly obtained from the first key value table of the index area during reading, so that the block chain data to be checked can be quickly obtained by using the offset address, and the random reading efficiency can be obviously improved compared with the conventional key value pair-type database;
on the other hand, since the blockchain data only needs to be written once without being modified or deleted in a normal situation, compared with the conventional key value pair type database, the writing speed of the blockchain data can be further increased by using the sequential writing mode for the data area.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with this specification and together with the description, serve to explain the principles.
FIG. 1 is a diagram illustrating an example tree structure of blockchain data in the related art;
fig. 2 is a flowchart illustrating a method for storing blockchain data according to an embodiment of the present disclosure;
FIG. 3 is a diagram illustrating an example of the logic for querying using two index tables;
fig. 4 is a diagram illustrating an exemplary structure of a block chain data storage device according to the present specification;
fig. 5 is a diagram illustrating an exemplary configuration of a computer apparatus for storage of blockchain data according to the present specification.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure. It is to be understood that the described embodiments are only a few, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of the disclosure.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of systems and methods consistent with certain aspects of the present description, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The block chain technology, also called distributed ledger technology, is an emerging technology in which several computing devices participate in "accounting" together, and a complete distributed database is maintained together. In general, the properties of block chain technology are decentralized, transparent to disclosure, capable of participating in database records by each computing device, and capable of rapidly synchronizing data among computing devices, which makes block chain technology widely used in many fields.
In blockchain technology, more blockchain models use Merkle trees, or data structures based on Merkle trees, to store and maintain data. Taking etherhouses as an example, the etherhouses use an MPT Tree (a Merkle Tree variant, which is called a Merkle Patricia Tree entirely, wherein the Patricia Tree can be regarded as a result of path compression of a Trie of a dictionary) as a data organization form for organizing and managing important data such as account states, transaction information and the like.
The Etherhouse designs three MPT trees, namely an MPT state tree, an MPT transaction tree and an MPT receipt tree, aiming at data needing to be stored and maintained in a block chain. In addition to the three MPT trees, there is actually a Storage tree constructed based on the Storage content of the contract account.
An MPT state tree, which is an MPT tree organized by account state data of all accounts in a blockchain; an MPT transaction tree, which is an MPT tree organized by transaction (transaction) data in a blockchain; the MPT receipt tree is organized into transaction (receipt) receipts corresponding to each transaction generated after the transactions in the block are executed. The hash values of the root nodes of the MPT state tree, the MPT transaction tree, and the MPT receipt tree shown above are eventually added to the block header of the corresponding block.
The MPT transaction tree and the MPT receipt tree correspond to the blocks, namely each block has the MPT transaction tree and the MPT receipt tree. The MPT state tree is a global MPT tree, which does not correspond to a specific tile, but covers account state data of all accounts in the tile chain.
For the organized MPT transaction tree, MPT receipt tree and MPT state tree, storing the MPT transaction tree, the MPT receipt tree and the MPT state tree in a Key-Value type database (such as a levelDB) adopting a multilevel data storage structure; referring to fig. 1, fig. 1 is a schematic diagram of a tree structure shown in this specification, taking the tree structure shown in fig. 1 as an example, nodes in the tree structure in this example include not only Data but also Hash values Hash corresponding to the Data, so when a Key-Value type database is used to store the tree structure, the Hash values Hash of the nodes are often used as keys, and the Data of the nodes are used as values to be stored in an associated manner.
However, in the block chain scenario, there is a large random read requirement, and the key value pair type database usually performs poorly in random read; therefore, there is a need for a storage method capable of improving random access performance of blockchain data.
Based on this, the present specification proposes a technical solution of separately establishing an index area and a data area, recording blockchain data in the data area, and recording and establishing an index for fast searching for an offset address of the blockchain data in the data area in the index area.
In implementation, the data stored in the index area may include a first key value table, where the key value table may include a key composed of all or part of the hash value of the block chain data, and an offset address of the block chain data in the data area as a value; the data area may be a storage area in which information is recorded by using a sequential writing method.
In the above technical solution, on one hand, the storage service configured by the block link node device includes an index area and a data area, and the data area records data in a sequential writing manner; therefore, after the block chain data are stored by applying the scheme, the offset address of the block chain data to be checked in the data area can be quickly obtained from the first key value table of the index area during reading, so that the block chain data to be checked can be quickly obtained by using the offset address, and the random reading efficiency can be obviously improved compared with the conventional key value pair-type database;
on the other hand, since the blockchain data only needs to be written once without being modified or deleted in a normal situation, compared with the conventional key value pair type database, the writing speed of the blockchain data can be further increased by using the sequential writing mode for the data area. The present specification is described below with reference to specific embodiments and specific application scenarios.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for storing blockchain data according to an embodiment of the present disclosure, where the method may be applied to a node device of a blockchain, where a storage service configured by the node device includes an index area and a data area, where the data area is a storage area for recording information in a sequential writing manner; the method may comprise the steps of:
s201, writing first block chain data to be stored into the data area;
s202, calculating a hash value of the first block chain data;
s203, using the preset part or all of the hash value as a key, using the offset address of the first block chain data in the data area as a value, forming a key value pair, and storing the formed key value pair into a first key value table of the index area.
The blockchain may include any type of blockchain having access to corresponding data using hash values. In general, blockchains are generally divided into three types: public chain (Public Blockchain), Private chain (Private Blockchain) and alliance chain (Consortium Blockchain). Furthermore, there may be a combination of the above types, such as private chain + federation chain, federation chain + public chain, and so on.
Among them, the most decentralized is the public chain. The public chain is represented by bitcoin and ether house, and participants (also called nodes in the block chain) joining the public chain can read data records on the chain, participate in transactions, compete for accounting rights of new blocks, and the like. Moreover, each node can freely join or leave the network and perform related operations.
Private chains are the opposite, with the network's write rights controlled by an organization or organization and the data read rights specified by the organization. Briefly, a private chain may be a weakly centralized system with strict restrictions on nodes and a small number of nodes. This type of blockchain is more suitable for use within a particular establishment.
A federation chain is a block chain between a public chain and a private chain, and "partial decentralization" can be achieved. Each node in a federation chain typically has a physical organization or organization corresponding to it; the nodes are authorized to join the network and form a benefit-related alliance, and block chain operation is maintained together.
Those skilled in the art can apply the technical solutions described in the present specification to the above various types of blockchains according to specific needs.
The blockchain data may include data to be recorded in a blockchain of the uneconomical transaction output UTXO, the MPT state tree, the MPT transaction tree, and the MPT receipt tree. It can be understood that, in some cases, in order to reasonably utilize computer resources, some nodes in a blockchain may not store all the full-volume nodes of blockchain data, but only store a block header and a lightweight node of transaction details related to itself, and the latter may judge whether a transaction is in the current blockchain transaction list through Merkle proof, but actually store less blockchain data mentioned in correspondence to the above technical solution, so that a person skilled in the art may apply the present solution to all the full-volume nodes that need to store more data, thereby improving the data random reading performance of all the full-volume nodes more obviously.
The data area may be an arbitrary storage area in which information is recorded by a sequential writing method; such as sequentially written text files, binary stream files, log files, etc. Such a data area for recording information using the sequential entry method generally has a high sequential write performance, and in the case of single use, the random read performance is not high; however, those skilled in the art find that, in the case of matching with the index table recorded with the offset address, the data area can indirectly obtain higher random reading performance by reading according to the offset address.
In an embodiment shown, the data format used by the data area is a log file of the Kafka system. Kafka is a distributed publish-subscribe messaging system written by Scala and Java and having high throughput capability, and Log files, i.e. Log files, of the system usually record information in a sequential writing manner and support reading by means of offset addresses.
By using the Kafka log file as the data area, the property requirement of the data area in the technical scheme can be met, and the multiplexing of the system interface can be realized under the condition that the Kafka system is originally configured in the block chain system, so that the repeated development cost of the program is reduced.
The index area can adopt any form of storage form; generally, compared with the data area, the index area occupies a much smaller actual storage space, so that the data in the index area can be stored by using a small-capacity high-performance storage device with better performance and higher cost, and the data in the data area can be stored by using a large-capacity normal-performance storage device with common performance and lower cost, thereby taking system performance and cost into consideration.
In an embodiment shown, the index area may be located in a memory of the blockchain device; the memory has a reading and writing speed far higher than that of a solid state disk and a mechanical hard disk, so that the performance of reading and writing the index area by the block chain equipment can be remarkably accelerated by configuring the index area in the memory of the block chain equipment; and because the storage space occupied by the index area is not too much, the memory capacity of the block chain equipment is not excessively occupied, the method belongs to reasonable memory expenditure, and the condition of insufficient memory is usually not caused.
In this specification, the node device of the above block chain may first write first block chain data to be stored into the data area; specifically, the first blockchain data to be stored may be data that is acquired by the node device of the blockchain through the internet or other input channels and needs to be written into a blockchain account book, and in a data operation plane, the node device of the blockchain may write the first blockchain data to be stored into the data area in a sequential writing manner. In terms of distance, if the data area adopts a plain text journal format, the first blockchain data may be added to the end of the plain text journal format data area in a sequential write manner when the first blockchain data is written into the data area.
In this specification, the node device of the above blockchain may calculate a hash value of the first blockchain data; generally, a hash algorithm has a one-way and anti-collision property, and therefore it is generally considered that two data with different hash values are different from each other, and two data with the same hash value are data with the same probability. Since the schemes described in this specification rely on the commonality of the hashing algorithm rather than the characteristics of a particular hashing algorithm, the specific type of hashing algorithm need not be strictly defined. In practical applications, in view of calculation speed or security, a person skilled in the art may select an algorithm meeting actual requirements from a plurality of existing hash algorithms according to specific requirements, or may design a hash algorithm according to specific requirements.
For example, in the field of block chaining, the SHA-256 algorithm capable of generating a hash value with a length of 256 bits is more common; therefore, in the above technical solution, in order to reduce the redundant operation, when the node device of the block chain calculates the hash value of the first block chain data, the SHA-256 hash value of the first block chain data may be obtained.
In this specification, the node device of the block chain may use a preset part or all of the hash value as a key, use an offset address of the first block chain data in the data area as a value, compose a key-value pair, and store the composed key-value pair in the first key-value table of the index area.
It is to be understood that, in some scenarios, the content of the key in the first key value table may not be the full text of the hash value of the first block chain; for example, if the hash value is obtained by using SHA-256 algorithm, the length of the hash value is 256 bits, and compared with a hash value with a lower number of bits, the collision probability is lower, but the performance requirements of storage and query are higher; therefore, when the first key value table of the index area is constructed, the information used by the key may be the hash value of the first block chain data, or may be a part of the hash value of the first block chain data, such as the first bits or the last bits of the hash value.
Generally, the longer the hash value as a key, the lower the probability of hash collision, but the cost is that the more performance it consumes for storage and comparison, i.e. the higher the cost; therefore, one skilled in the art can determine the number of bits of the key used for storing the first blockchain data according to specific requirements. Continuing with the hash value obtained using the SHA-256 algorithm as described above, 4 Bytes, that is, 32 bits of data of the prefix may be used as a key; thus, performance and cost advantages may be taken into account.
For example, assume that the resulting SHA-256 hash value is as follows:
0x3a6fed5fc11392b3ee9f81caf017b48640d7458766a8eb0382899a605b41f2b9
then, the first 4 Bytes, that is, 32 bits 0x3a6fed5f, can be taken as the key stored in the index area; the value corresponding to the key may be an offset address of the corresponding first blockchain data in the data area; for example, assuming that 12345678 bytes of data already exist in the data area before the first blockchain data is written into the data area, the offset address of the first blockchain data in the data area may be 12345678. It is understood that the specific form of the offset address, such as the binary system, the step size, etc., can be set by the developer according to the specific requirement, and the specification need not be limited in detail.
At this point, the first blockchain data is stored in the joint storage structure composed of the index area and the data area. In addition, for the second blockchain data already stored in the joint storage structure, the present specification also discloses method steps for acquiring the second blockchain data from the joint storage structure based on the hash value of the second blockchain data.
In an embodiment shown in the present disclosure, the blockchain node device may first obtain a data query instruction, where the data query instruction carries a hash value of second blockchain data to be queried; then, the block link point device may query, in the first key value table, a target key value pair matched with the hash value of the second block link data; the condition that the matching is established may include: the preset part or all of the hash value of the second block chain data is the same as the key of the target key value pair; and reading target block chain data corresponding to the target offset address stored in the value of the target key value pair from the data area, and returning the target block chain data as second block chain data to be inquired.
That is to say, for the second blockchain data already stored in the joint storage structure, the blockchain node device may first find a corresponding target key value pair in the index area according to the hash value of the second blockchain data, and read a target offset address of the second blockchain data in the data area from the target key value pair, that is, may further acquire the target blockchain data, that is, the second blockchain data itself, from the data area in an offset reading manner according to the target offset address.
It is understood that, in the case where the keys in the first key value table described above adopt incomplete hash values, the probability of hash collisions occurring is significantly increased; for example, assume that the original two hash values are as follows:
0x3a6fed5fc11392b3ee9f81caf017b48640d7458766a8eb0382899a605b41f2b9
0x3a6fed5fa8eb0382899a605b41f2b9 c11392b3ee9f81caf017b48640d7458766
if the first 32 bits of the hash value are adopted by the key in the first key value table, the block chain data corresponding to two different hash values will correspond to the same key. Therefore, in order to solve the above problem, the storage scheme may be configured to select a collision processing link.
In an embodiment shown in the foregoing description, a second key value table is further preset in the index area, where the second key value table is different from the first key value table that supports one value corresponding to one key name, and the second key value table may support storing repeated values of multiple key names, that is, one key name may support multiple values corresponding to multiple key names; for example, a key name corresponds to a vector, an array, or a predetermined structure. It is understood that the first key-value table may be a map of type Int32-Int32, and the second key-value table may be a map of type Int32-Int32 [ ]. However, the specific structures of the first key value table and the second key value table presented in this specification may be any key value structure adapted to the development environment, such as a HashMap structure and the like.
In the case where the prefix or suffix of the hash value is used as a key, if it is determined that existing data exists in a storage area in the first key value table that matches the prefix or suffix of the hash value corresponding to the first block chain data, the key value pair and the existing data may be stored in a storage area in the second key value table that matches the prefix or suffix of the hash value. That is, the existing data and the new data are all migrated from the first key value table to the second key value table. Generally, the working efficiency of a key value table, which is shaped like a first key value table and only supports one value corresponding to one key name, is higher than that of a key value table, which is shaped like a second key value table and supports the storage of a plurality of repeated values of key names; therefore, the above scheme can deal with the above hash prefix or suffix collision problem without significantly affecting the system efficiency.
In one embodiment shown, the method further comprises the step of determining whether a storage area in the second key value table that matches the prefix or suffix of the hash value is empty; if not, it can prove that there is data matching the prefix or suffix of the hash value already in the second key value table; in this case, the key-value pair increment may be stored in a storage area in the second key value table that matches a prefix or a suffix of the hash value.
For example, assuming that the storage area corresponding to the hash prefix 0x3a6fed5f in the second key value table is not empty, it is proved that the offset addresses of the blockchain data having at least two hash prefixes 0x3a6fed5f are stored in the storage area corresponding to the hash prefix 0x3a6fed5f in the second key value table; therefore, the key-value pair increment to be stored in the second key-value table may be stored in the storage area corresponding to the hash prefix 0x3a6fed5 f.
It is understood that, besides, other schemes may be adopted to determine whether existing data conflicting with the key value pair corresponding to the first blockchain data currently needing to be stored already exists in the index area; for example, when a collision occurs, a preset mark may be added to a corresponding position in the first key value table, and if the mark is identified, it is determined that a hash prefix/suffix collision has occurred at the address previously, the key value pair corresponding to the first blockchain data that needs to be stored may be directly stored into the second key value table. Those skilled in the art can design candidates in case of hash collision according to specific requirements.
In an embodiment shown, after applying the processing scheme for the case of prefix or suffix collision, in the data query process, if a target key-value pair matching the hash value of the second blockchain data cannot be queried from the first key-value table, at least two candidate key-value pairs matching the hash value prefix or suffix of the second blockchain data may be further queried from the second key-value table; and reading corresponding at least two alternative block chain data from the data area based on the offset addresses recorded in the at least two alternative key value pairs respectively, and returning the alternative block chain data in which the hash value in the at least two alternative block chain data is matched with the full text of the hash value of the second block chain data as second block chain data to be inquired.
Referring to fig. 3, fig. 3 is a diagram illustrating a logic example of a query using two index tables; in this example, when the two hash prefixes 0x3a6fed5f collide, the above-mentioned blockchain node device may extract the hash value "0 x3a6fed5fc11392b3ee9f81caf017b48640d7458766a8eb0382899a605b41f2b 9" of the second blockchain data to be queried from the data query request, so as to obtain the hash prefix 0x3a6fed5 f; in the storage stage, data which generates conflict is migrated from the first key value table to the second key value table, so that only null values can be searched from the first key value table in the index area, and two alternative key value pairs corresponding to the hash prefix 0x3a6fed5f can be searched from the second key value table, so as to obtain two offset addresses, and two alternative block chain data are read from the data area according to the two offset addresses; only the complete hash values of the two alternative block chain data need to be calculated respectively, and then two hash values with the same prefix and different subsequent contents can be obtained; and matching the two different hash values with the hash value of the second block chain data carried in the data query instruction, wherein the alternative block chain data corresponding to the successfully matched hash value is the second block chain data to be queried.
The foregoing is all embodiments of the present specification directed to the method for storing blockchain data. Based on the above embodiments, it can be seen that the method for storing blockchain data described in the present specification can significantly improve the random read efficiency compared to the conventional key value pair database, and can further improve the write speed of blockchain data.
The present specification also provides an embodiment of a storage apparatus for corresponding blockchain data as follows:
the present specification provides a storage apparatus for blockchain data, which is applied to a node device of a blockchain, where a storage service configured by the node device includes an index area and a data area, where the data area is a storage area for recording information using a sequential write-in manner; an example of the structure of the device for storing block chain data is shown in fig. 4, and the device may include the following modules:
a first writing module 401, writing the first block chain data to be stored into the data area;
a calculating module 402, configured to calculate a hash value of the first block chain data;
the second writing module 403 uses a preset part or all of the hash value as a key, uses the offset address of the first block chain data in the data area as a value, forms a key value pair, and stores the formed key value pair in the first key value table of the index area.
Embodiments of the present specification further provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the foregoing storage method of blockchain data when executing the program.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of a computing device according to an embodiment of the present disclosure, where the computing device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Embodiments of the present specification further provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the foregoing storage method for blockchain data.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims (10)

1. A storage method of block chain data is applied to node equipment of a block chain, wherein a storage service configured by the node equipment comprises an index area and a data area, wherein the data area is a storage area for recording information by using a sequential writing mode; the method comprises the following steps:
writing first block chain data to be stored into the data area;
calculating a hash value of the first blockchain data;
and taking the preset part or all of the hash value as a key, taking the offset address of the first block chain data in the data area as a value, forming a key value pair, and storing the formed key value pair into a first key value table of the index area.
2. The method of claim 1, further comprising:
acquiring a data query instruction, wherein the data query instruction carries a hash value of second block chain data to be queried;
querying, in the first key value table, a target key value pair that matches the hash value of the second blockchain data; wherein the matching comprises: the preset part or all of the content of the hash value of the second block chain data is the same as the key of the target key value pair;
and reading target block chain data corresponding to the target offset address stored in the value of the target key value pair from the data area, and returning the target block chain data as second block chain data to be inquired.
3. The method according to claim 2, wherein a second key value table supporting storage of a plurality of key name repeated values is preset in the index area;
the using a preset part or all of the hash value as a key includes: using the prefix or suffix of the hash value as a key;
after composing the key-value pair, the method further comprises:
determining whether existing data exists in a storage region in the first key value table that matches a prefix or a suffix of the hash value; and if so, storing the key-value pairs and the existing data into a storage area matched with the prefix or suffix of the hash value in the second key-value table.
4. The method of claim 3, further comprising:
determining whether a storage area in the second key value table that matches a prefix or a suffix of the hash value is empty; and if not, storing the key-value pair increment to a storage area matched with the prefix or suffix of the hash value in the second key value table.
5. The method of claim 4, in the event that a target key-value pair matching a hash value of the second blockchain data cannot be queried from within the first key-value table, the method further comprising:
querying, within the second key value table, at least two candidate key value pairs that match a hash value prefix or suffix of the second blockchain data;
reading corresponding at least two alternative block chain data from the data area based on the offset addresses recorded in the at least two alternative key value pairs respectively;
and returning the candidate block chain data of which the hash value is matched with the full text of the hash value of the second block chain data in the at least two candidate block chain data as the second block chain data to be inquired.
6. The method of claim 4, the index region located in a memory of the node device.
7. The method of claim 1, wherein the data format used by the data area is a log file of the Kafka system.
8. A storage device of block chain data is applied to node equipment of a block chain, wherein a storage service configured by the node equipment comprises an index area and a data area, wherein the data area is a storage area for recording information by using a sequential writing mode; the device comprises:
the first writing module is used for writing the first block chain data to be stored into the data area;
the calculating module is used for calculating a hash value of the first block chain data;
and the second writing module is used for forming a key value pair by taking the preset part or all of the hash value as a key and the offset address of the first block chain data in the data area as a value, and storing the formed key value pair into the first key value table of the index area.
9. A computer device comprising at least a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202111337636.1A 2021-11-12 2021-11-12 Block chain data storage method and device Pending CN114003607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111337636.1A CN114003607A (en) 2021-11-12 2021-11-12 Block chain data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111337636.1A CN114003607A (en) 2021-11-12 2021-11-12 Block chain data storage method and device

Publications (1)

Publication Number Publication Date
CN114003607A true CN114003607A (en) 2022-02-01

Family

ID=79928694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111337636.1A Pending CN114003607A (en) 2021-11-12 2021-11-12 Block chain data storage method and device

Country Status (1)

Country Link
CN (1) CN114003607A (en)

Similar Documents

Publication Publication Date Title
US10878052B2 (en) Blockchain-based cross-chain data operation method and apparatus
US20200034334A1 (en) Blockchain-based cross-chain data access method and apparatus
CN106933854B (en) Short link processing method and device and server
CN108205577B (en) Array construction method, array query method, device and electronic equipment
US20200201560A1 (en) Data storage method, apparatus, and device for multi-layer blockchain-type ledger
US11294875B2 (en) Data storage on tree nodes
CN111324665B (en) Log playback method and device
CN113220717B (en) Block chain-based data verification method and device and electronic equipment
CN110716965B (en) Query method, device and equipment in block chain type account book
CN112214468B (en) Small file acceleration method, device, equipment and medium for distributed storage system
CN109145053B (en) Data processing method and device, client and server
CN110597852A (en) Data processing method, device, terminal and storage medium
CN114780537A (en) Flow table storage and message forwarding method, device, computing equipment and medium
CN109460406A (en) A kind of data processing method and device
CN114840487A (en) Metadata management method and device for distributed file system
CN112286457B (en) Object deduplication method and device, electronic equipment and machine-readable storage medium
CN112579591B (en) Data verification method, device, electronic equipment and computer readable storage medium
CN114003607A (en) Block chain data storage method and device
CN110221778A (en) Processing method, system, storage medium and the electronic equipment of hotel's data
TW202004521A (en) LSM tree optimization method and device and computer equipment
CN115757397A (en) Data reforming method and device for database table, medium and computer equipment
CN111444194B (en) Method, device and equipment for clearing indexes in block chain type account book
CN114398373A (en) File data storage and reading method and device applied to database storage
CN110636042B (en) Method, device and equipment for updating verified block height of server
CN113282617A (en) Data query method and business system page turning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination