Detailed Description
In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.
Referring to fig. 1, a schematic view of an application scenario of the control method for data storage according to the embodiment of the present disclosure is shown. The client 10 communicates with the blockchain system 20, and the blockchain system 20 receives data from the client 10 and performs uplink data operation.
The block chain is a technical system which is jointly maintained by multiple parties, transmission and access safety is guaranteed by using cryptography, and data consistent storage, tamper resistance and repudiation resistance can be realized. A typical blockchain implements data storage in a blockchain structure.
The data on the block link has different requirements for real-time performance, and can be classified into real-time data and non-real-time data. The instant data generally refers to business data, for example, data with high real-time requirement for transaction processing; the non-instant data may include evidence data and state description data, where the evidence data refers to data used for indexing or marking the position of the business data, such as key/value (kv) data; the state description data refers to data describing state information of the service data.
The process of data uplink includes three stages: an acceptance phase, a consensus phase and a storage phase. The accepting stage may be understood as that the data to be uplink is received by a certain blockchain node in the blockchain network and the data is accepted by the blockchain node; in the consensus stage, after the block chain link points accept the data, other block chain link points in the block chain network need to participate in consensus processing on the data, and the data can enter the storage stage after passing the consensus; the storage node can be understood as the uplink processing of the data which is passed by the block link node and is commonly identified.
The control method for data storage provided in the embodiments of the present specification is used to control storage of non-real-time data for a blockchain system.
In a first aspect, an embodiment of the present specification provides a data storage control method, which is used for controlling storage of non-real-time data for a blockchain system. Referring to fig. 2, the method for controlling data storage includes S201 to S205.
S201: non-instant data is received.
As mentioned above, the non-immediate data refers to data with low requirement on immediacy, and includes evidence storage data and state description data.
Taking the evidence storing data as an example, the specific process of receiving the evidence storing data, for example, the blockchain system receives the evidence storing data of the blockchain service data from the client. Evidence data may be understood as data for indexing or marking the location of business data, including, for example, key/value (kv) data. The block chain evidence storage service is an important service form in the block chain technology, and aims to archive data of certain submitted transaction data and provide existence evidence according to transaction hash subsequently.
S202: non-instantaneity data is verified.
In order to ensure the correctness and the unchanged of data transmitted between two blockchain nodes, the data needs to be verified. Transaction hashing (block hashing) is a hash value obtained by hashing a transaction or a block header by using a specific hash algorithm in a blockchain technology, and is commonly used for retrieving and verifying related information.
The process of verification includes, for example:
obtaining transaction information (including evidence storage data or other data such as transaction data) at a node (marked as: node 001) for recording the transaction, and calculating a hash value 001 of the transaction related information; the 'Hash value 001+ private key' carries out signature operation to generate a signature, and the external broadcast content is 'transaction information + signature';
at the node where the transaction is verified (denoted as: node 002), the above "transaction information + signature" is obtained by broadcasting; decrypting the signature through the node 001 public key to obtain a hash value 001; performing hash calculation on the transaction information to obtain a hash value 002; and comparing whether the hash value 001 is consistent with the hash value 001, if so, passing the verification, and if not, failing to pass the verification.
In the embodiment of the present specification, it applies to both the node for recording the transaction and the node for verifying the transaction, that is, whichever node, as long as the verification is passed, step S203 is executed, and the operation of storing the non-immediate data is started immediately.
S203: after the verification is passed, the following two steps are performed asynchronously:
s203 a: storing the non-real-time data into a database of the blockchain system.
S203 b: and carrying out consensus processing on the non-instant data.
A very central part of the blockchain technology is its ledger database. The conventional database uses a CS (client-server) network structure. In this way, the user can modify the data. Meanwhile, the control right of the database is also in a central mechanism, such as a company or a mechanism, and after the identity of the client is verified, the access right of the database is provided, and the traditional database has obvious trace of centralized service. The blockchain database is different and is composed of a plurality of distributed decentralized nodes. All nodes participate in data management, any data added to the ledger database is confirmed by the nodes, and the ledgers are public and transparent to all nodes. To add transaction data to the account of the bitcoin, a consensus must be obtained and the nodes can enter the block after confirmation. The consensus algorithm ensures the security of the network and also makes it tamper-proof. The consensus mechanism is, besides competitive POW, also proof of authority POS and proof of delegation DPOS, etc.
In the related art, the blockchain system processes non-real-time data in a synchronous execution manner, that is: the blockchain system first performs a consensus process similar to that described in S203b above, and performs a storage operation similar to that described in S203a above when non-real-time data passes through a consensus between nodes in the blockchain system, thereby performing a destage on the non-real-time data. It can be seen that the storage operation for the non-real-time data in the related art is affected by the consensus process, which forms a performance bottleneck when the storage operation is performed on the non-real-time data, and affects the data throughput of the blockchain system.
The present specification adopts the above asynchronous execution mode to process non-instant data, that is: the above-mentioned asynchronous execution between S203a and S203b, are not influenced each other, each node in the blockchain system can directly store the non-instantaneous data to the database by executing S203a, and does not need to wait for the consensus result of S203b for the non-instantaneous data, so that the high-efficiency logic parallel of the cluster system can be achieved, thereby significantly improving the system throughput, and being particularly suitable for the alliance chain of the distributed architecture.
In an alternative implementation, the instant storage of the non-instant data may be implemented by:
(1) creating an asynchronous submission thread;
(2) and when the verification is passed, starting the asynchronous submission thread immediately, and inserting the non-instant data into the database of the blockchain system.
S204: and determining abnormal data according to the consensus processing result, and managing the storage of the abnormal data in the database.
Since the embodiment of the specification does not consider the consensus result and asynchronously stores all the non-immediate data in the database, in order to ensure the validity of the data, the data stored in the database and failed in consensus can be managed according to the consensus result.
In an alternative, after verification, the consensus process is performed while the non-immediate data storage is performed. And subsequently, according to the consensus result, determining abnormal data and managing the storage of the abnormal data in the database.
The consensus mechanism is a core mechanism in the blockchain technique. In a blockchain, "consensus" means that participants have a common understanding of a blockchain state. Since the blockchain is decentralized, any "decision/state/change, etc" requires all nodes (participants) to use some mechanism together to achieve the same awareness, which is the consensus mechanism for blockchains. The consensus mechanism is also called consensus algorithm. The consensus algorithm in the embodiments of the present specification includes, but is not limited to: proof of workload (PoW), proof of rights and interests (POS), proof of equity authorization (DPoS), Practical Byzantine Fault Tolerance (PBFT), authorized byzantine fault tolerance (DBFT), and the like.
In order to remove abnormal data which are failed in consensus, firstly, determining the abnormal data according to a consensus result, and inserting abnormal mark information corresponding to the abnormal data into a database; then, at an appropriate time (for example, when the database is idle), the abnormal data is identified and deleted according to the abnormal mark information.
In an alternative, determining exception data based on the consensus process result includes:
(1) inserting abnormal mark information corresponding to non-instant data of a failure block with common identification failure or uplink failure into a database;
(2) and in the database, corresponding abnormal data is determined according to the abnormal marking information.
The abnormal marking information is flag information for marking abnormal non-immediate data, for example, including an abnormal hash value, or the abnormal marking information may include block height information of a failed block in addition to the abnormal hash value, and the purpose of adding the block height information of the failed block is to avoid hash collision over a wide area according to the block height information.
In an alternative, the process of managing the storage of the exception data in the database may be: and deleting the abnormal data when the block chain system is idle, or within a preset time period, or triggered by a preset event.
For example, whether the system is stored in the idle state can be determined by counting the system memory state, and the abnormal data is deleted in the idle state in advance; an abnormal data deleting time period may also be preset, for example, a fixed time period (for example, a period of time in the morning) is set to clean abnormal data once a week; alternatively, deletion of the anomalous data is initiated upon certain events (e.g., insufficient storage capacity of the database).
In an alternative, it is possible to control the different nodes to respectively execute S203a (step of storing the non-immediate data in the database) and S203B (step of performing the consensus processing on the non-immediate data), which is advantageous in that the processing efficiency is improved by processing the operations of storing and consensus on the different nodes (physical devices), for example, the operation of storing control is performed on node a, and the operation of consensus processing is performed on node B, because the storing control is closely related to the input/output (I/O) performance of the computer, and the consensus processing efficiency is closely related to the CPU throughput of the computer, so that the physical devices more suitable for the above two steps can be selected to perform the special processing respectively, without affecting each other, and the processing efficiency is improved. Of course, it is also possible to place the two steps described above on the same physical device.
It can be seen that, in the embodiment of the present specification, in order to ensure that the purposes of efficient storage and system throughput improvement are achieved, after data verification passes, an operation of storing non-immediate data in a database is started immediately, instead of storing the non-immediate data after other operations (such as consensus processing, virtual machine processing, and the like) are completed, by breaking through the original synchronous execution manner (i.e., performing consensus sorting and then dropping the data), efficient logical parallelism of a cluster system can be achieved, and thus system throughput is significantly improved, which is particularly suitable for a federation chain of a distributed architecture.
Referring to fig. 3, a schematic diagram of an example of a data storage control method according to the first aspect of the embodiments of the present disclosure is shown.
In this embodiment, the storage of the certification data is described by taking a distributed kv database of a blockchain system as an example. Distributed kv (key/value) is also called distributed key-value pair storage, and refers to a kv database system maintained by clustered multiple computers, and data is stored on each node of the network, rather than being stored in a single kv.
First, in step 301, the blockchain system receives authentication data from the client; then, in step 302, the evidence data is verified; then, step 303a and step 303b are executed asynchronously, and there is no necessary execution order or dependency relationship in the execution process between these two steps, such as inserting the evidence data into the kv database in step 303a, while in the execution process of step 303a, step 303b may be executed simultaneously: carrying out consensus processing on the certificate storing data; step 304: calculating the data after consensus processing through a virtual machine; step 305: updating the kv database according to the abnormal marking information; step 306: and the kv database asynchronously cleans up the abnormal evidence storage data according to the abnormal marking information, for example, when the system is idle, the invalid Tx-hash key value pair is asynchronously deleted and cleaned from the transaction database through the utility program according to the local abnormal marking information.
It can be seen that, in this embodiment, for data of the certificate-storing business type transaction hash key value, the original synchronous execution mode (i.e., implementing consensus sequencing and then implementing data destaging) is broken, and the data is immediately and asynchronously submitted to the distributed kv system for data writing after hash calculation, so that the local consensus process and the distributed kv data writing process are asynchronously and concurrently implemented, and efficient cluster system logic concurrence is achieved, thereby significantly improving the system throughput performance, and for transactions with consensus failure storing local files, subsequent asynchronous use programs centrally clear the database when the system is idle; embodiments of the present description are applicable to federated link block chain systems (licensed), and in particular, to distributed architecture federated link, block chain systems.
In a second aspect, based on the same inventive concept, an embodiment of the present specification provides a data storage control apparatus for controlling storage of non-real-time data for a block chain system, please refer to fig. 4, where the apparatus includes:
a data receiving unit 401, configured to receive non-instant data;
a verification unit 402, configured to verify the non-instant data;
an asynchronous execution unit 403, configured to, after the verification passes, asynchronously execute the following two steps: storing the non-instant data into a database of the block chain system, and performing consensus processing on the non-instant data;
in an alternative, the method further comprises:
an asynchronous commit thread creating unit 404 for creating an asynchronous commit thread;
the asynchronous execution unit 403 inserts the non-real-time data into the database of the blockchain system by starting the asynchronous commit thread.
In an alternative, the method further comprises:
an abnormal data determination unit 405 configured to determine abnormal data according to the consensus processing result;
an exception management unit 406, configured to manage storage of the exception data in the database.
In an optional manner, the abnormal data determining unit 405 is specifically configured to: inserting abnormal mark information corresponding to non-instant data of a failure block with common identification failure or uplink failure into the database; and determining corresponding abnormal data in the database according to the abnormal marking information.
In an optional manner, the exception flag information includes an exception hash value, or the exception flag information includes an exception hash value and block height information of the failed block.
In an optional manner, the pair of exception management units 406 is specifically configured to: and deleting the abnormal data when the block chain system is idle, or in a preset time period or according to a preset event trigger.
In an alternative, the non-real-time data is stored in a database of the blockchain system, and the operation of performing consensus processing on the non-real-time data is performed by different nodes respectively.
In an alternative, the non-instant data includes forensic data or status description data.
In a third aspect, based on the same inventive concept as the data storage control method in the foregoing embodiments, the present invention further provides a server, as shown in fig. 5, including a memory 504, a processor 502, and a computer program stored on the memory 504 and executable on the processor 502, wherein the processor 502 implements the steps of any one of the foregoing data storage control methods when executing the program.
Where in fig. 5 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 506 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.
In a fourth aspect, based on the inventive concept of the data storage control method as in the previous embodiments, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any one of the above-described data storage control methods.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.