CN115378788A

CN115378788A - Block chain performance self-adaptive optimization method based on hierarchical consensus and reinforcement learning

Info

Publication number: CN115378788A
Application number: CN202211004846.3A
Authority: CN
Inventors: 王孟鑫; 陈世展
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-22
Anticipated expiration: 2042-08-22
Also published as: CN115378788B

Abstract

The invention discloses a block chain performance self-adaptive optimization method based on layered consensus and reinforcement learning, which is based on a reinforcement learning decision module consisting of a layered consensus module and a network security evaluation module, wherein the layered consensus module is a consensus algorithm module based on a network layered model and comprises a trust evaluation submodule; the optimization method specifically comprises the following steps: (1) Dividing nodes in the consensus process into a main consensus group and a sub consensus group cluster through a network node layering module, wherein the sub consensus group cluster comprises a plurality of sub consensus groups; (2) Evaluating the behavior of each node in the consensus process through a trust model in the trust evaluation sub-module; trust evaluation and trust election are realized; (3) Calculating the safety constraint of the packet quantity and the safety constraint of the time delay through a network safety evaluation module; (4) And the block chain performance self-adaptive optimization is realized through a reinforcement learning decision module. The invention optimizes the network performance while ensuring the safety of the block chain network and realizes the self-adaptive adjustment.

Description

Block chain performance self-adaptive optimization method based on hierarchical consensus and reinforcement learning

Technical Field

The invention relates to the technical field of block chains, in particular to a consensus algorithm based on network layering and a block chain performance optimization method based on deep reinforcement learning.

Background

The traditional blockchain architecture has the problem of insufficient expansibility, and as the complexity of verification blocks and the size of the blockchain increase, the transaction processing speed becomes slow, so that the management of the blockchain becomes more difficult. When the blockchain network is used for supporting an internet of things network generating a large amount of data, it is required to ensure that the blockchain network has high throughput performance so as to process mass data transactions caused by the increase of the number of devices of the internet of things. Since the block chain technique has a ternary paradox: that is, decentralization, security and scalability are not compatible. The pursuit of improving the performance of the block chain can sacrifice the safety and cause hidden danger to the network safety. Therefore, it is very meaningful to design a scheme for improving the block chain scalability while ensuring the security.

The existing block chain performance optimization technology mainly comprises two solutions, namely an on-chain solution and an off-chain solution, and only an on-chain optimization method is considered for ensuring two characteristics of decentralization and safety of a block chain, and the on-chain optimization method is mainly divided into three categories:

the method for adjusting the block chain parameters improves the network performance by improving the block size and reducing the time interval (block output time) for producing the blocks; the consensus process is improved, the consensus is a process of reaching agreement by nodes of the whole network, the process of reaching the consensus by voting of the nodes of the whole network is optimized to reduce communication time delay and improve efficiency; and thirdly, transaction parallelization (such as fragmentation) is carried out, the transaction in the blockchain network is processed according to the parallelization characteristic, and the blockchain throughput is improved by improving the number of the fragmentation. Meanwhile, some hybrid solutions merge the first method and the third method, reinforcement learning is introduced into the segment network, and the block chain network parameters are adjusted according to the block chain network security limit, so that the effect of self-adaptive adjustment is realized.

In the existing block chain performance optimization technology, due to the performance limitation of network nodes, the size of a block cannot be freely enlarged without control, so that a block chain parameter adjustment method has a remarkable bottleneck, the time for block output is reduced, the block size is increased, the time for account book synchronization in the whole network is increased, and bifurcation may be caused; for the Byzantine fault-tolerant consensus algorithm, the improved consensus process method has higher pertinence to specific scenes, the consensus process optimization only reduces communication time delay, and the effect of improving the network performance of the whole block chain is not obvious; the fragmentation scheme can remarkably improve the block chain expandability and the transaction throughput, but the current attempt of improving the block chain expandability is limited to cryptocurrency, and under a general application scene, the fragmentation block chain network is difficult to realize.

The existing hybrid solution for the internet of things introduces reinforcement learning in a fragment network, sets block chain network security limit in a feedback function, adjusts the fragment number and the block chain parameters, and adaptively adjusts the network performance according to the security state. However, based on the PBFT (Practical Byzantine Fault permission) consensus algorithm, when the number of nodes in the block chain network increases, the communication traffic between the nodes increases rapidly, which brings great pressure to the network bandwidth, and causes rapid degradation of the system performance, so that the PBFT algorithm is difficult to deal with large-scale internet of things transactions.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a block chain performance adaptive optimization method based on hierarchical consensus and reinforcement learning.

The purpose of the invention is realized by the following technical scheme:

a reinforcement learning decision-making module applied to a block chain network comprises a layered consensus module and a network security evaluation module, wherein the layered consensus module is a consensus algorithm module based on a network layered model and comprises a trust evaluation sub-module, the layered consensus module is used for dividing nodes in the block chain network into a main consensus group and a sub consensus group cluster, and the sub consensus group cluster comprises a plurality of sub consensus groups; the consensus algorithm in the layered consensus module can reduce the complexity of consensus communication and achieve the consensus of the whole network more quickly;

the trust evaluation submodule is used for realizing trust evaluation and trust election; a trust model is introduced into the trust evaluation sub-module to evaluate the behavior of each node in the consensus process; if the node malicious behavior is detected, the trust model can reduce the trust value of the node, otherwise, the trust value of the node can be improved; after each round of consensus, all the nodes update the node trust values and the node state information, and the consensus groups are adjusted according to the trust values, and the corresponding nodes are elected to become leader nodes or main nodes by taking the trust values as election standards;

the network security evaluation module calculates the security constraint of the block chain network from the two angles of packet quantity and time delay; the network security evaluation module estimates the proportion of malicious nodes in the current blockchain network by acquiring the block chain consensus historical information and calculating the inconsistency of the consensus history so as to calculate the security constraint of the packet number; finishing a round of consensus within a limited interval time of continuous blocks, and finally calculating the safety constraint of time delay;

the reinforcement learning neural network in the reinforcement learning decision module uses two agents, the agent for the main consensus group adopts strict grouping constraint as a constraint condition of an excitation function, and the agent for the sub consensus group adopts loose grouping constraint; the reinforcement learning decision module takes the network environment information of a block chain consisting of the data transmission rate among nodes, the node performance and the consensus history as a state space; taking parameters consisting of block size, block output time and the number of nodes in the consensus group as an action space; under the premise of meeting the safety constraint of the block chain network, the block chain performance self-adaptive optimization is realized by calculating the block chain performance parameters.

The invention also provides a block chain performance self-adaptive optimization method based on hierarchical consensus and reinforcement learning, which comprises the following steps:

(1) Dividing nodes in the consensus process into a main consensus group and a sub consensus group cluster through a network node layering module, wherein the sub consensus group cluster comprises a plurality of sub consensus groups; the consensus algorithm used in the network node layering module changes the mode that all nodes broadcast mutually in the PBFT algorithm into layering consensus, introduces a pipelined Hotstuff algorithm, completes preliminary consensus in each sub-consensus group, and then completes final consensus in the main consensus group; the consensus algorithm consists of four stages of pre-preparation, submission and determination;

(2) Evaluating the behavior of each node in the consensus process through a trust model in the trust evaluation sub-module; if the node malicious behavior is detected, the trust model can reduce the trust value of the node, otherwise, the trust value of the node can be improved; after each round of consensus, all the nodes update the node trust values and the node state information, and the consensus groups are adjusted according to the trust values, and the corresponding nodes are elected to become leader nodes or main nodes by taking the trust values as election standards;

(3) Acquiring block chain consensus historical information through a network security evaluation module, and estimating the proportion of malicious nodes in the current block chain network by calculating inconsistent values of the consensus history so as to calculate the security constraint of the packet number; completing a round of consensus within a limited interval of continuous blocks to calculate the safety constraint of time delay;

(4) The reinforcement learning decision module is used for realizing the block chain performance self-adaptive optimization, the reinforcement learning neural network uses two intelligent agents adopting a D3QN network, the intelligent agent for the main consensus group adopts strict grouping constraint as a constraint condition of an excitation function, and the intelligent agent for the sub consensus group adopts loose grouping constraint; the reinforcement learning decision module takes the block chain network environment information consisting of the data transmission rate among the nodes, the node performance and the consensus history as a state space; taking parameters consisting of block size, block output time and the number of nodes in the consensus group as an action space; under the premise of meeting the safety constraint of the block chain network, the block chain performance self-adaptive optimization is realized by calculating the block chain performance parameters.

Further, the step (1) is specifically as follows:

(101) In the initial stage of the consensus algorithm, a trust value evaluation module selects a main node in each consensus group, and a client sends a request to the main node;

(102) When each round of consensus starts, the main node collects the change view information sent by the full-amount replica node, wherein the change view information comprises the pre-prepared signature with the highest height on the sending node; the main node forwards the request to all nodes and sends a pre-preparation message, wherein the message contains a pre-preparation signature; the step is a pre-preparation stage of the main consensus group;

(103) After the copy nodes in each sub-consensus group receive the pre-preparation message, verifying the legality and view legality of the signature in the pre-preparation message, and then sending a confirmation message to the leader node; steps (102) and (103) are a pre-preparation phase of the sub-consensus group;

(104) When the leader node in the sub-consensus group collects the full-amount signatures, the step is started, and when the leader node receives the pre-preparation message, the pre-preparation signatures are obtained by aggregating a plurality of numbers of signatures; then the leader node sends a preparation message to the copy nodes in the group, and the preparation message is attached with a pre-preparation signature obtained by aggregation;

(105) The copy nodes in each consensus group receive the preparation message to the leader node, and send a voting preparation message after verification; steps (104 and (105)) are a preparation phase of the sub-consensus groups;

(106) When leading nodes in the sub-consensus group collect full signatures, entering the step, then aggregating the prepared signatures at the stage, and then sending a submission message to other leading nodes by a first leading node, wherein the submission message comprises the prepared signatures;

(107) The other leader nodes receive the submission message and send submission voting messages to the master node after verification; steps (106) and (107) are the commit phase of the master consensus group;

(108) When the main node of the first consensus group collects the sufficient submitted message, the submitted signature is obtained through aggregation, and the submitted signature is attached to the determined message and sent to all other nodes;

(109) When other nodes receive the confirmation message, the transaction pointed by the submitted signature is executed, and then the view number is added; and finally, sending a reply message to the client to finish the current round of consensus and start the next round of consensus.

Further, in the step (2), the trust value of the nodes in the trust model is set to be [0,1], and the higher the numerical value is, the higher the credibility is; the trust model divides the trust value into different intervals, and each interval represents a node state; and setting a node state conversion mode based on the trust value.

Further, the node state conversion mode is as follows: when the block chain network just runs, the node state is normal; when the node generates an effective block for a plurality of times and the trust value is greater than a threshold value alpha, upgrading to a trusted state; if the node has abnormal behavior, the node state is changed into a limited state; if the trust value of the node is lower than a threshold value beta, the node becomes a malicious state; no matter what state the node is in, if inconsistent voting information is sent to different nodes in the consensus process, directly degrading the node into a malicious node; after the node generates the effective block or is consistent with the voting information of most nodes in the consensus process, the trust value can be continuously improved; and finally, after each round of consensus, all the nodes update the node trust value and the state information.

Further, the disagreement values of the consensus history in step (3) are calculated using normalized entropy values, the disagreement values being measures of uncertainty of different probabilities of the consensus state; firstly, calculating an inconsistency value in each sub-consensus group and the main consensus group in the sub-consensus group cluster; then, blockchain network security is computed by averaging normalized entropy values for all consensus groups.

The security constraint for calculating the number of packets in step (3) is specifically as follows:

(301) The network security evaluation module collects the history records of each round of consensus in the main consensus group and the sub consensus group, and when malicious nodes exist in the consensus groups, the consensus histories are inconsistent;

(302) The inconsistent value of consensus is a measure of the different probabilities of consensus success and consensus failure; computing an entropy value I in each of the sub-consensus groups of the sub-consensus group cluster _{Group of subconjunctival associations} The formula is as follows:

wherein,

ith sub consensus group in sub consensus group clusterThe consensus opinion proportion of the minority pie represents the secondary consensus, namely the ratio of the voting number of the minority pie to the effective voting number in the sub consensus group;

the consensus opinion proportion of majority main body in ith sub consensus group in the sub consensus group cluster represents main consensus, namely the ratio of the voting number of most nodes to the effective voting number in the sub consensus group,

is given by

(303): calculating an entropy value I in a main consensus group _{Group of main consensus} The formula is as follows:

wherein,

consensus opinion proportions of a few groups in the master consensus group;

consensus opinion proportions of a plurality of groups in the main consensus group;

(304): averaging all the normalized entropy values to obtain a total consensus trust U, wherein the formula is as follows:

wherein m: the number of the sub consensus groups in the sub consensus group cluster; i is _i : the consensus of the ith sub-consensus group in the sub-consensus group cluster is inconsistent with the entropy value;

u represents an inconsistency index of the overall consensus process of the blockchain network, is also the probability of consensus failure in the blockchain network, and is calculated by the consensus history in the step (301); and estimating the probability p of the network malicious node according to the U, wherein the formula is as follows:

wherein,

calculating an intermediate variable of the probability of the malicious node;

(305): adopting looser safety constraint S for main consensus group ₁ The formula is as follows:

S ₁ ：k<2N/(3Np+1)

wherein, N: total number of blockchain network nodes; p: probability of malicious nodes in the blockchain network;

for the sub-consensus group, a stricter security constraint S is adopted ₂ Firstly, calculating the number K of the sub-consensus groups in the sub-consensus group cluster, wherein the formula is as follows:

S ₂ ：K<(N(1-3p)-1)/(3Np+1)

in the network hierarchical model, the main consensus group agent transfers the node number of the main consensus group to the sub consensus group agent, and then calculates the constraint S of the node number k in the sub consensus group ₂ The formula is as follows:

k<(N-K)/(K-1)

wherein, K: the number of sub consensus groups in the sub consensus group cluster.

Further, the safety constraint of calculating the time delay in step (3) is specifically as follows:

the block chain network delay comprises block production delay, namely block interval, message transmission delay and consensus communication delay, wherein the message transmission delay is ignored, and the block chain network delay is expressed as follows:

T _{time delay} ＝T _{Discharging block} +T _{Consensus of}

Block chain consensus should be done within a limited number of consecutive block intervals, so the security constraint formula for the delay is as follows:

T _{time delay} ≤ω×T _{Discharging block} And omega is a positive integer.

Further, the step (4) is specifically as follows:

(401) Defining a state space S of the reinforcement learning neural network, wherein the parameters are as follows:

S ^t ＝[R,C,H,p] ^t

wherein R is ^t Data transmission rate between different nodes; c ^t The performance of the node is calculated power; h ^t The consensus history; p is a radical of ^t : the network security evaluation module calculates the evaluation of the network security of the current block chain, specifically the probability evaluation of the malicious nodes in the total number of nodes;

(402) Defining an action space A of the reinforcement learning neural network, wherein the parameters are as follows:

A ^t ＝[B,T,k] ^t

wherein, B ^t The block size; t is ^t Block producing time interval is block producing time; k is a radical of formula ^t For the main consensus group, k ^t Representing the number of nodes of the main consensus group; for the subconscious groups, k ^t Representing the upper limit of the number of nodes in each sub consensus group;

(403) Defining an excitation Function Reward Function to maximize the throughput of the blockchain network, and simultaneously satisfying the delay constraint and the security constraint of the blockchain network by reinforcement learning self-adaptation, and summarizing:

the target is as follows: max Q (S, A)

The limiting conditions 1: t is a unit of _{General (1)} ＝T _{Discharging block} +T _{Consensus of} <ω·T _{Discharging block}

Constraint 2:

on the premise of meeting the limiting conditions 1 and 2, the reinforcement learning decision module continuously selects the action with the highest Q value, namely continuously increasing the block size, reducing the block output time and increasing the k value so as to improve the performance of the block chain network; the limitation condition 1 represents the total time delay of the block chain network, namely the sum of the block-out time delay and the consensus time delay is smaller than a plurality of block-out time delays; constraint 2 indicates that different agents employ different security constraints, S ₁ And S ₂ The network security evaluation module calculates the network security;

(404) Making an agent decision; after the parameters of the reinforcement learning neural network are defined, a reinforcement learning decision module extracts parameter information in a state space S from a block chain network environment, then inputs the parameter information into an intelligent agent adopting a D3QN network, and selects an action A with the maximum Q value and interacts with the block chain network environment on the premise of meeting the limiting conditions;

(405) Training reinforcement learning neural network parameters; the intelligent agent can generate a series of experience data while making a decision continuously, and the experience data is firstly input into an experience replay pool, so that the correlation of the sample is relieved, and the updated variance is reduced; training the D3QN network randomly draws small batches of empirical data from a random empirical replay pool.

The invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for adaptively optimizing performance of a blockchain based on hierarchical consensus and reinforcement learning when executing the program.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

1. according to the invention, a consensus algorithm based on layered consensus is constructed, and a streamlined HotStuff consensus algorithm is introduced, so that compared with a PBFT algorithm, the communication complexity between nodes in the consensus process is obviously reduced, and the block chain network delay is reduced.

2. A trust model is introduced into the hierarchical consensus module, in the hierarchical consensus algorithm, the trust value of the nodes is evaluated through node behaviors, and the nodes with higher trust values are elected to become a leader node and a main node, so that the consensus failure probability is remarkably reduced, and the safety of the block chain network is improved.

3. The method has the advantages that the reinforcement learning neural network is introduced to achieve self-adaptive adjustment of network performance, two intelligent agents are constructed according to a network layering model to optimize the performance of the block chain with different safety constraints, the network performance of the block chain is optimized while the network safety is guaranteed, and the self-adaptive adjustment effect is achieved.

Drawings

Fig. 1 is a schematic structural diagram of interaction between a reinforcement learning decision module and a blockchain network.

Fig. 2 is a schematic diagram of a network layer model in the present embodiment.

Fig. 3 is a schematic diagram of the consensus algorithm in the hierarchical consensus module in this embodiment.

Fig. 4 is a flow chart of the consensus algorithm.

Fig. 5 is a schematic diagram of node state transition.

Fig. 6 is a flow diagram of a security constraint algorithm for calculating the number of packets in the network security assessment module.

Fig. 7 is a state diagram of the interaction of the reinforcement learning neural network model and the blockchain network.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, the embodiment provides a reinforcement learning decision module applied to a blockchain network, and the reinforcement learning decision module interacts with the blockchain network to achieve adaptive performance optimization. The reinforcement learning decision module comprises a layered consensus module and a network security evaluation module, wherein the layered consensus module comprises a trust evaluation submodule.

A layered consensus module: the module is a consensus algorithm module based on a network hierarchical model. The module divides nodes in the block chain network into a main consensus group and a sub-consensus group cluster, wherein the sub-consensus group cluster comprises a certain number of sub-consensus groups. The consensus algorithm provided by the module can obviously reduce the complexity of consensus communication and achieve the consensus of the whole network more quickly.

A trust evaluation sub-module: the module has trust evaluation and trust election functions. The module introduces a trust model to evaluate the behavior of each node in the consensus process. If the node malicious behavior is detected, the trust model can reduce the trust value of the node, otherwise, the trust value of the node can be improved. On the basis, a node state conversion scheme based on the trust value is also provided. After each round of consensus, all nodes update the node trust values and the node state information, and the consensus group is adjusted according to the trust values, and the nodes with higher trust values are elected to become leader nodes or main nodes. The module reduces the risk that the malicious node becomes a leader node or a main node, and improves the reliability of the consensus algorithm.

A network security evaluation module: the module calculates the security constraints of the blockchain network from the perspective of packet quantity and time delay. The module acquires block chain consensus historical information, and estimates the proportion of malicious nodes in the current block chain network by calculating the inconsistency of the consensus history, so as to calculate the security constraint of the packet quantity; and a round of consensus is completed within a limited interval of continuous blocks, so that the safety constraint of the time delay is calculated.

A reinforcement learning decision module: the module can realize the function of block chain performance self-adaptive optimization. The reinforcement learning neural Network uses two agents, both adopt D3QN (Dual Double Deep Q Network) networks, and the main difference is that the agent for the main consensus group adopts stricter grouping constraint as constraint conditions of an excitation function, and the agent for the sub-consensus group adopts looser grouping constraint. The reinforcement learning decision module takes the data transmission rate between nodes, the node performance, the consensus history and other block chain network environment information as a state space; and taking parameters such as block size, block output time, number of nodes in the consensus group and the like as an action space. The module calculates reasonable block chain performance parameters on the premise of meeting the block chain network security constraint, and realizes the block chain performance self-adaptive optimization.

Based on the reinforcement learning decision module, the embodiment provides a block chain performance adaptive optimization method based on hierarchical consensus and reinforcement learning, which specifically includes the following steps:

1. the network hierarchical model in the hierarchical consensus module is shown in fig. 2, and the network hierarchical model divides nodes in the consensus process into a main consensus group and a sub consensus group cluster, wherein the sub consensus group cluster comprises a plurality of sub consensus groups.

In order to avoid trivial communication, the consensus algorithm in the module changes the mode that all nodes in the PBFT algorithm broadcast mutually into layered consensus, introduces a pipelined Hotstuff algorithm, completes primary consensus in each sub-consensus group, and then completes final consensus in the main consensus group. The specific process of the consensus algorithm in this embodiment is as shown in fig. 3, and the consensus algorithm generally includes four stages of pre-preparation, submission, and determination. In the figure, the consensus nodes are divided into 1 main consensus group and 3 sub consensus groups. The client is a client server, the replica node 0 is a master node, and the replica node 1, the replica node 5 and the replica node 9 are leader nodes.

The consensus algorithm flow chart is as shown in fig. 4, the flow is that a client initiates a request, and then the master node forwards the request to all nodes, thereby starting the consensus step of the master consensus group, and meanwhile, the pre-preparation step of the sub-consensus group is also included, and then the consensus of the sub-consensus group is performed. After the two-stage communication-voting step is completed by the consensus of the sub-consensus groups, the main consensus group also completes the rest round of communication-voting step, then the main node enters the final determination stage, and finally the consensus result is confirmed to the client.

Step 1.1: in the initial stage of the consensus algorithm, a trust value evaluation module in each consensus group elects a main node, and then a client sends a request to the main node;

step 1.2: at the beginning of each round of consensus, the primary node collects enough of the change view messages sent by multiple replica nodes that contain the highest degree of pre-prepared signatures on the sending node. The master node forwards the request to all nodes and sends a pre-prepare message, which contains a pre-prepare signature. Step 1.2 is a pre-preparation stage of the main consensus group;

step 1.3: and in each sub-consensus group, after the copy node in the group receives the pre-prepared message, verifying the validity of the signature in the pre-prepared message and the validity of the view. A confirmation message is then sent to the leader node. Steps 1.2 and 1.3 are pre-preparation phases of the sub-consensus groups;

step 1.4: when the leader node in the sub consensus group collects enough signatures, the step is carried out, and when the leader node receives the pre-preparation message, a certain number of signatures are aggregated to obtain a pre-preparation signature; then the leader node sends a preparation message to the copy nodes in the group, and the message is attached with a pre-preparation signature obtained by aggregation;

step 1.5: and the copy nodes in each consensus group receive the preparation message to the leader node, and send a preparation voting message after verification. Steps 1.4 and 1.5 are the preparation phase of the sub-consensus group;

step 1.6: when leading nodes in the sub-consensus group collect enough signatures, entering the step, then aggregating the prepared signatures at the stage, and then sending a submission message to other leading nodes by the first leading node, wherein the message comprises the prepared signatures;

step 1.7: the other leader nodes receive the submission message and send submission voting messages to the master node after verification; steps 1.6 and 1.7 are the commit phase of the master consensus group;

step 1.8: when the main node collects enough submitted messages, the submitted signatures are obtained through aggregation, and the submitted signatures are attached to determined messages and sent to all other nodes.

Step 1.9: when all other nodes receive the confirmation message, the transaction to which the commit signature points is executed, and then the view number is incremented. And finally, sending a reply message to the client to finish the current round of consensus and start the next round of consensus.

2. In the trust evaluation submodule, the network hierarchical model is more easily affected by malicious attacks while the expandability of the block chain network is enhanced, so that a trust model is introduced into the trust evaluation submodule to evaluate the trust value of the node, and the node with high credibility is elected as a leader node and a main node. The trust evaluation submodule introduces a trust model to evaluate the behavior of each node in the consensus process. The trust model defines the states of consensus nodes as table 1.

TABLE 1 consensus node status Table

In the trust model, the node trust value is set to [0,1], and the larger the value, the higher the credibility. The trust model divides the trust value into different intervals, each interval represents a node state, and on the basis of node state definition, the embodiment provides a node state conversion scheme based on the trust value, as shown in fig. 5. If the node malicious behavior is detected, the trust value of the node is reduced, otherwise, the trust value of the node is increased.

Node state transformation refers to the change of node state, and is mainly related to the performance of nodes in the consistency process. When the block chain network just runs, the node state is normal. When a node generates a valid block multiple times and the trust value is greater than a threshold value alpha, it can be upgraded to a trusted state. Nodes in good state have more opportunities to be selected as master nodes. If the node has abnormal behavior, for example, when the node in the trusted state or the normal state generates an invalid block, the node state changes into the limited state because the trust value of the node continuously decreases. If the node trust value is lower than the threshold value beta, the node becomes a malicious state. In addition, no matter what state the node is in, if inconsistent voting messages are sent to different nodes in a consensus process, the node is directly degraded into a malicious node. In addition, after the node generates the effective block or is consistent with the voting information of most nodes in the consensus process, the trust value can be continuously improved, and the node state can be correspondingly upgraded when the trust value reaches a certain threshold value. And finally, after each round of consensus, all the nodes update the node trust value and the state information.

3. In the network security evaluation module, the security constraint of the blockchain network is calculated from the two aspects of the packet quantity and the time delay.

For the security constraint of the grouping quantity, the network security evaluation module acquires block chain consensus historical information and estimates the proportion of malicious nodes in the current block chain network by calculating the inconsistent values of the consensus history, so as to calculate the security constraint of the grouping quantity, wherein the inconsistent values of the consensus history are calculated by using normalized entropy values, and the inconsistent values are measures of uncertainty of different probabilities of consensus states. This inconsistency value is first calculated in each consensus group of the sub-consensus groups and the main consensus group. Then, the network security is calculated by averaging the normalized entropy values of all consensus groups, and the flow is as shown in fig. 6. Specifically, the method comprises the following steps:

step 3.1: the network security evaluation module collects the history records of each round of consensus in the main consensus group and the sub-consensus group, and when the consensus group has malicious nodes, the consensus histories are inconsistent (namely, a plurality of mixed votes are used for agreeing and rejecting new blocks);

step 3.2: the consensus inconsistency is calculated using normalized entropy values, which are measures of different probabilities of consensus success and consensus failure. For example, if the consensus on the block verification (i.e., the consensus on whether the block is determined to be valid or invalid) results in a decision that the agreement and objection votes are equal, then the normalized entropy value will be 1. On the other hand, if a consensus produced a consensus vote (all votes agreed or all votes rejected), the normalization entropy would be 0. Entropy values are first calculated in each of the sub-consensus groups of the sub-consensus group cluster, as follows:

the consensus opinion proportion (secondary consensus) of the minority group in the ith sub-consensus group in the sub-consensus group cluster is the ratio of the voting number of the minority group to the effective voting number in the sub-consensus group;

the consensus opinion proportion (main consensus) of majority subject in ith sub-consensus group in the sub-consensus group cluster, i.e. the ratio of the voting number of most nodes to the effective votes in the consensus group, and their voting opinions are also the voting opinions of the whole sub-consensus group,

is given by

Step 3.3: computing an entropy value I in a master consensus group _{Group of main consensus} The formula is as follows:

wherein,

the consensus opinion proportion of a few groups in the main consensus group;

step 3.4: averaging all the normalized entropy values to obtain a total consensus trust U, wherein the formula is as follows:

m: the number of the sub consensus groups in the sub consensus group cluster; i is _i : the consensus inconsistency entropy value of the ith sub-consensus group of the sub-consensus group cluster;

u represents the inconsistency index of the overall consensus process of the blockchain network, and U is also the probability of failure of consensus in the blockchain network, and is calculated from the consensus history in step 3.1. Although the main consensus group node in the hierarchical blockchain network has a higher trust value than the sub consensus group, all consensus groups should be considered equally when the probability of the malicious node in the whole network is estimated, so the probability p of the malicious node in the network can be estimated according to U, and the formula is as follows:

wherein,

calculating an intermediate variable of the probability of the malicious node;

step 3.5: for the main consensus group, because the node trust value is higher, a looser security constraint S is adopted ₁ The formula is as follows:

S ₁ ：k<2N/(3Np+1)

n: total number of blockchain network nodes; p: network malicious node probability;

for the sub-consensus group, because the node trust value is low, a stricter security constraint S is adopted ₂ First, the number of the sub-consensus groups K is calculated, and the formula is as follows:

S ₂ ：K<(N(1-3p)-1)/(3Np+1)

in the network hierarchical model, the main consensus group agent transfers the node number of the main consensus group to the sub-consensus group agent, and then the constraint S of the node number k in the sub-consensus group can be calculated ₂ The formula is as follows:

k<(N-K)/(K-1)

k: number of sub-consensus groups.

For the safety constraint of the time delay, the network safety evaluation module completes a round of consensus by limiting the interval time of a limited number of continuous blocks, thereby calculating the safety constraint of the time delay.

The latency of the blockchain network refers to the time elapsed from the submission to the completion of the transaction, i.e., the time interval from the initiation of the transaction by the client to the submission of the block containing the transaction into the blockchain in the consensus process. The transaction processing in the block chain network comprises two stages, namely, a main node generates a block, the generated block is identified in common among common identification groups, and finally the block is uploaded to a block chain. In the above process, the delay of the blockchain network includes the production delay (i.e. out-of-block interval) of the blocks, the message transmission delay and the consensus communication delay, and the message transmission delay is very short and can be ignored. The blockchain network delay can be expressed as:

T _{time delay} ＝T _{Discharging block} +T _{General knowledge of}

And the internet of things equipment usually expects to complete one transaction in a short time, so the equipment needs to receive the final result of the transaction within a certain time, otherwise, the equipment is regarded as timeout. In order to meet the delay requirement of the internet of things network, the block chain consensus should be completed within a limited continuous block interval time, so the security constraint formula of the delay is as follows:

T _{time delay} ≤ω×T _{Discharging block} Omega is a positive integer

4. The reinforcement learning decision module can realize the function of the block chain performance self-adaptive optimization. The reinforcement learning neural Network uses two agents, and both adopt D3QN (Dual Double Deep Q Network) networks, extracts state space information in a block chain Network environment, makes a decision to select an optimal action, interacts with the Network environment, realizes a Network performance self-adaptive optimization effect, and the interaction state of a reinforcement learning neural Network model and the block chain Network is shown in figure 7.

The reinforcement learning neural network adopts two agents to respectively carry out decision optimization on a main consensus group and a sub consensus group in a network hierarchical model, and the main difference is that the agent for the main consensus group adopts stricter grouping constraint as a constraint condition of an excitation function, and the agent for the sub consensus group adopts looser grouping constraint.

After each round of the reinforcement learning decision module interacts with the block link environment, the result of the reinforcement learning intermediate parameter is firstly put into a replay experience pool, and then the result of the intermediate parameter is transmitted into the next round of reinforcement learning process, so that the correlation of the sample can be relieved, and the updated variance is reduced. The method comprises the following specific steps:

step 4.1: defining a state space S of the reinforcement learning neural network, wherein the parameters are as follows:

S ^t ＝[R,C,H,p] ^t

R ^t data transmission rate between different nodes; c ^t Performance (power) of the node; h ^t The consensus history; p is a radical of ^t : the network security module calculates the evaluation of the network security of the current block chain, specifically the probability estimation of the malicious nodes in the total number of nodes;

step 4.2: defining an action space A of the reinforcement learning neural network, wherein the parameters are as follows:

A ^t ＝[B,T,k] ^t

B ^t the block size; t is ^t Block production interval (out-of-block time); k is a radical of ^t For the main consensus group, k ^t Representing the number of nodes of the main consensus group; for the sub-consensus group, k ^t Representing the upper limit of the number of the nodes in each consensus group;

step 4.3: defining an excitation Function Reward Function, wherein the excitation Function is designed to maximize the Throughput (TPS) of a blockchain network, and the delay constraint and the security constraint of the blockchain network are adaptively satisfied through reinforcement learning, and the excitation Function may be summarized as follows:

the target is as follows: max Q (S, A)

The limiting conditions 1: t is a unit of _{General (1)} ＝T _{Discharging block} +T _{General knowledge of} <ω·T _{Discharging block}

Constraint 2:

on the premise of meeting the limiting conditions 1 and 2, the reinforcement learning decision module continuously selects the action with the highest Q value, namely continuously increases the block size, reduces the block output time, and increases the k value, so as to improve the performance of the block chain network. Constraint 1 represents the total system delay, i.e. the block-out delay plus the consensus delay should be less than a constant number of block-out delays. Because of the thingIn networked networks, smart devices often want to receive the end result of a transaction in a short time, so a block needs to be published and verified in multiple consecutive block intervals. Constraint 2 indicates that different agents employ different security constraints, S ₁ And S ₂ Calculated by a network security evaluation module.

Step 4.4: and (4) making intelligent agent decision. After the reinforcement learning neural network parameters are defined, the reinforcement learning decision module extracts parameter information in a state space S from a layered block chain network environment, then inputs the parameter information into a reinforcement learning D3QN network, and selects an action A with the maximum Q value and interacts with the block chain network environment on the premise of meeting the limiting conditions.

Step 4.5: and training a reinforcement learning neural network. While the intelligent agent makes continuous decisions, a series of information is generated and is input into an experience replay pool first, so that the correlation of samples is relieved, and the updated variance is reduced. Training the reinforcement learning neural network randomly extracts each combination from the experience replay pool at random. Training the reinforcement learning neural network will be intermittently trained according to the information and update the agent network.

The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make various changes in form and details without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A reinforcement learning decision-making module applied to a block chain network is characterized by comprising a layering consensus module and a network security evaluation module, wherein the layering consensus module is a consensus algorithm module based on a network layering model and comprises a trust evaluation sub-module, the layering consensus module is used for dividing nodes in the block chain network into a main consensus group cluster and a sub consensus group cluster, and the sub consensus group cluster comprises a plurality of sub consensus groups; the consensus algorithm in the layered consensus module can reduce the complexity of consensus communication and achieve the consensus of the whole network more quickly;

the trust evaluation submodule is used for realizing trust evaluation and trust election; a trust model is introduced into the trust evaluation sub-module to evaluate the behavior of each node in the consensus process; if the malicious behavior of the node is detected, the trust model can reduce the trust value of the node, otherwise, the trust value of the node can be improved; after each round of consensus, all the nodes update the node trust values and the node state information, and the consensus groups are adjusted according to the trust values, and the corresponding nodes are elected to become leader nodes or main nodes by taking the trust values as election standards;

the network security evaluation module calculates the security constraint of the block chain network from the two angles of packet quantity and time delay; the network security evaluation module estimates the proportion of malicious nodes in the current blockchain network by acquiring the block chain consensus historical information and calculating the inconsistency of the consensus history so as to calculate the security constraint of the packet number; finishing a round of consensus by limiting the interval time of a limited number of continuous blocks, and finally calculating the security constraint of the time delay;

the reinforcement learning neural network in the reinforcement learning decision module uses two agents, wherein the agent for the main consensus group adopts strict grouping constraint as constraint conditions of an excitation function, and the agent for the sub-consensus group adopts loose grouping constraint; the reinforcement learning decision module takes the block chain network environment information consisting of the data transmission rate among the nodes, the node performance and the consensus history as a state space; taking parameters consisting of block size, block output time and the number of nodes in the consensus group as an action space; under the premise of meeting the safety constraint of the block chain network, the block chain performance self-adaptive optimization is realized by calculating the block chain performance parameters.

2. A block chain performance adaptive optimization method based on hierarchical consensus and reinforcement learning, wherein the reinforcement learning decision module in claim 1 is configured to:

(2) Evaluating the behavior of each node in the consensus process through a trust model in the trust evaluation sub-module; if the malicious behavior of the node is detected, the trust model can reduce the trust value of the node, otherwise, the trust value of the node can be improved; after each round of consensus, all the nodes update the node trust values and the node state information, and the consensus groups are adjusted according to the trust values, and the corresponding nodes are elected to become leader nodes or main nodes by taking the trust values as election standards;

3. The method for adaptively optimizing the performance of the blockchain based on the hierarchical consensus and the reinforcement learning as claimed in claim 2, wherein the step (1) is as follows:

(101) In the initial stage of the consensus algorithm, a trust value evaluation module in each consensus group selects a main node, and a client sends a request to the main node;

(105) The copy nodes in each consensus group receive the preparation message to the leader node, and send a voting preparation message after verification; steps (104 and (105)) are a preparation phase of the sub consensus group;

(107) Other leader nodes receive the submission message, and send submission voting messages to the master node after verification; steps (106) and (107) are the commit phase of the master consensus group;

(109) When other nodes receive the confirmation message, the transaction pointed by the submission signature is executed, and then the view number is added; and finally, sending a reply message to the client to finish the current round of consensus and start the next round of consensus.

4. The method for optimizing the performance of the blockchain based on the hierarchical consensus and the reinforcement learning according to claim 2, wherein in the step (2), the trust value of the node in the trust model is set to [0,1], and the higher the value is, the higher the confidence level is; the trust model divides the trust value into different intervals, and each interval represents a node state; and setting a node state conversion mode based on the trust value.

5. The method of claim 3, wherein the node state transformation method comprises: when the block chain network just runs, the node state is normal; when the node generates an effective block for a plurality of times and the trust value is greater than a threshold value alpha, upgrading to a trusted state; if the node has abnormal behaviors, the node state is changed into a limited state; if the node trust value is lower than a threshold value beta, the node becomes a malicious state; no matter what state the node is in, if inconsistent voting messages are sent to different nodes in the consensus process, directly degrading the node into a malicious node; after the node generates the effective block or is consistent with the voting information of most nodes in the consensus process, the trust value can be continuously improved; and finally, after each round of consensus, all the nodes update the node trust value and the state information.

6. The adaptive optimization method for blockchain performance based on hierarchical consensus and reinforcement learning as claimed in claim 2, wherein the inconsistency of the consensus history in step (3) is calculated using a normalized entropy, and the inconsistency is a measure of uncertainty of different probabilities of the consensus states; firstly, calculating an inconsistency value in each sub-consensus group and the main consensus group in the sub-consensus group cluster; then, blockchain network security is computed by averaging normalized entropy values for all consensus groups.

7. The method for adaptively optimizing the performance of the blockchain based on the hierarchical consensus and the reinforcement learning as claimed in claim 2, wherein the security constraint for calculating the number of the packets in the step (3) is as follows:

wherein,

the consensus opinion proportion of the minority pie in the ith sub-consensus group in the sub-consensus group cluster represents the secondary consensus, namely the ratio of the voting number of the minority pie to the effective voting number in the sub-consensus group;

is given by the value of

(303): computing an entropy value I in a master consensus group _{Group of main consensus} The formula is as follows:

wherein,

consensus opinion proportions of a few groups in the master consensus group;

wherein,

calculating an intermediate variable of the probability of the malicious node;

S ₁ ∶k<2N/(3Np+1)

S ₂ ∶K<(N(1-3p)-1)/(3Np+1)

k<(N-K)/(K-1)

8. The method for adaptively optimizing the performance of the blockchain based on the hierarchical consensus and the reinforcement learning as claimed in claim 2, wherein the security constraint of the computation delay in the step (3) is specifically as follows:

the delay of the blockchain network includes the production delay of the block, i.e., the block interval, the message transmission delay and the consensus communication delay, wherein the message transmission delay is ignored, and then the blockchain network delay is expressed as:

T _{time delay} ＝T _{Discharging block} +T _{General knowledge of}

Block chain consensus should be done within a finite number of consecutive block intervals, so the security constraint equation for latency is as follows:

T _{time delay} ≤ω×T _{Discharging block} And ω is a positive integer.

9. The method for adaptively optimizing the performance of the blockchain based on the hierarchical consensus and the reinforcement learning as claimed in claim 2, wherein the step (4) is as follows:

S ^t ＝[R,C,H,p] ^t

wherein R is ^t Data transmission rate between different nodes; c ^t The performance of the node is calculated power; h ^t The consensus history; p is a radical of formula ^t : the network security evaluation module calculates the evaluation of the network security of the current block chain, specifically the probability evaluation of the malicious nodes in the total number of nodes;

A ^t ＝[B,T,k] ^t

wherein, B ^t The block size; t is ^t The block producing time interval is the block producing time; k is a radical of ^t For the main consensus group, k ^t Representing the number of nodes of the main consensus group; for the sub-consensus group, k ^t Representing the upper limit of the number of nodes in each sub-consensus group;

the target is as follows: maxQ (S, A)

The limiting conditions 1: t is _{General assembly} ＝T _{Discharging block} +T _{Consensus of} <ω·T _{Discharging block}

The limiting conditions 2:

on the premise of meeting the limiting conditions 1 and 2, the reinforcement learning decision module continuously selects the action with the highest Q value, namely continuously increases the block size, reduces the block output time, and increases the k value, so as to improve the performance of the block chain network; the limiting condition 1 represents the total time delay of the block chain network, namely the block-out time delay plus the consensus time delay is less than a plurality of block-out time delays; constraint 2 denotes different intelligenceEnergy bodies adopt different safety limits, S ₁ And S ₂ The network security evaluation module calculates the network security;

(405) Training a reinforcement learning neural network parameter; the intelligent agent can generate a series of experience data while making a decision continuously, and the experience data is firstly input into an experience replay pool, so that the correlation of the sample is relieved, and the updated variance is reduced; training the D3QN network randomly draws small batches of empirical data from a random empirical replay pool.

10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for adaptive optimization of blockchain performance based on hierarchical consensus and reinforcement learning according to any one of claims 2 to 9.