CN115102867A - Block chain fragmentation system performance optimization method combined with deep reinforcement learning - Google Patents
Block chain fragmentation system performance optimization method combined with deep reinforcement learning Download PDFInfo
- Publication number
- CN115102867A CN115102867A CN202210505118.4A CN202210505118A CN115102867A CN 115102867 A CN115102867 A CN 115102867A CN 202210505118 A CN202210505118 A CN 202210505118A CN 115102867 A CN115102867 A CN 115102867A
- Authority
- CN
- China
- Prior art keywords
- block chain
- behavior
- time
- state
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013467 fragmentation Methods 0.000 title claims abstract description 55
- 238000006062 fragmentation reaction Methods 0.000 title claims abstract description 55
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000002787 reinforcement Effects 0.000 title claims abstract description 24
- 238000005457 optimization Methods 0.000 title claims abstract description 18
- 230000006399 behavior Effects 0.000 claims abstract description 65
- 230000006870 function Effects 0.000 claims abstract description 53
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims abstract description 29
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000005540 biological transmission Effects 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 23
- 239000012634 fragment Substances 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 11
- 238000004088 simulation Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 238000012856 packing Methods 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 2
- 238000009825 accumulation Methods 0.000 claims 1
- 238000004880 explosion Methods 0.000 abstract description 7
- 230000003068 static effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005192 partition Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The performance optimization method of the block chain fragmentation system combined with deep reinforcement learning is characterized in that a block chain fragmentation selection problem is established as a Markov decision process model, the model consists of four parts of a system state, a behavior, an incentive and a value function, and the solution of the model is that an optimal behavior is continuously selected under a dynamic block chain fragmentation system environment, so that the throughput of the block chain fragmentation system is maximized. The BDQSB algorithm can select the most appropriate fragmentation strategy according to the transmission rate among the nodes, the settlement capability of the nodes, the consensus history of the nodes and the probability of malicious nodes by continuously exploring and learning the block size, the block output time, the block chain fragmentation quantity and the complex relation with the block chain fragmentation system, so that the performance of the block chain fragmentation system is improved. Compared with other schemes, the invention can improve the performance of the block chain fragmentation system, solve the problem of behavior space explosion and reduce the time cost of neural network training.
Description
Technical Field
The invention belongs to the technical field of data management and evidence storage, relates to intelligent control of a block chain system fragment, and particularly relates to a method for optimizing the performance of a block chain fragment system by combining deep reinforcement learning.
Background
And (4) partitioning the block chain, namely partitioning nodes in the block chain system into different partitions. The block chain transaction processing capability is improved by processing the transactions by the nodes in the chip in parallel, namely the performance of the block chain is improved.
The block chain can be fragmented by adopting a static optimization method, wherein the static optimization method means that when a block chain system adopts a fragmentation technology, the used fragmentation strategy is always fixed and unchanged. However, the blockchain system is constantly changing, and the blockchain system does not conform to a dynamic blockchain environment by using a static optimization method.
Currently, a dynamic optimization method is adopted for the partition of the blockchain system, for example, a deep reinforcement learning algorithm is used to dynamically provide a partition strategy for the blockchain system. According to the current system state of the block chain, the reinforcement learning algorithm provides an optimal fragmentation strategy for the current state, so that the throughput of the block chain system is maximized.
The dynamic optimization method provides a slicing strategy according to the dynamic block chain system environment, and is more suitable for the dynamic block chain system compared with a static optimization method. At present, a deep reinforcement learning algorithm is added into a block chain fragmentation system, and a DQN (deep Q network) algorithm is mostly used for research to solve the defects of a static block chain fragmentation strategy and the problem of state space explosion, but the method using the DQN algorithm cannot solve the problem of behavior space explosion caused by behavior combination after behavior dimension is enlarged.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a performance optimization method of a block chain fragmentation system combined with deep reinforcement learning, which combines a BDQ algorithm with a block chain fragmentation technology under a dynamic block chain environment to solve the problem of behavior space explosion caused by behavior combination after behavior dimension expansion and further solve the problem of low block chain throughput.
In order to achieve the purpose, the invention adopts the technical scheme that:
the performance optimization method of the block chain fragmentation system combined with deep reinforcement learning comprises the following steps:
system state S at time t t Defined as a set of transmission rates R between nodes t Set of computing capabilities of nodes C t Node consensus history set H t And probability P of malicious node t ;
The behavior space A comprises a block size B, a block output time TI and a block chain fragmentation number K;
reward R t+1 The reward obtained after the block chain fragmentation system at the time t executes the behavior is represented by the system state S at the time t t The yield obtained by the next action, namely the number of transactions processed by the block chain per second;
the markov decision process model is summarized as: system state S at any time t t Next, by selecting the optimal behavior, the system cumulative reward is maximized, and the formula is:
is constrained to
wherein ,at For actions taken by the system at time t, γ t Is at time t R t+1 The attenuation factor of (c);
and 3, solving the model by adopting a deep reinforcement learning BDQ algorithm, continuously exploring and learning the complex relation between the throughput of the block chain system and the block size, the block outlet time and the block chain fragment number, finally carrying out fragmentation according to the block chain fragment number, and parallelly processing the transactions by the nodes in the fragments according to the block size and the block outlet time so as to maximize the transaction number processed by the block chain.
Compared with the prior art, the invention has the beneficial effects that:
the algorithm uses a deep reinforcement learning BDQ algorithm according to the dynamically changed block chain system environment to provide an optimal slicing strategy for the block chain system, and changes the original DQN algorithm into the BDQ algorithm. The invention can solve the problem that the neural network is difficult to train caused by behavior space explosion, and can reduce the time cost of neural network training. Compared with other schemes, the invention can improve the performance of the block chain fragmentation system, solve the problem of behavior space explosion and reduce the time cost of neural network training.
Drawings
Fig. 1 is a block chain slicing simulation system structure.
Fig. 2 is a diagram of a neural network structure of the BDQ algorithm.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The invention relates to a performance improvement method of a block chain fragmentation system combined with deep reinforcement learning, which is used for establishing a Markov decision process model for a block chain fragmentation problem, providing a deep reinforcement learning BDQ algorithm as a core of a block chain fragmentation strategy selection algorithm and designing a block chain fragmentation optimal selection strategy (BDQSB) Based on the deep reinforcement learning. The solution of the model constructed by the invention is that the optimal behavior is continuously selected under a series of system states, so that the accumulated reward of the system is maximized, and finally the throughput of the block chain is improved. Compared with other schemes, the invention can improve the performance of the block chain fragmentation system, solve the problem of behavior space explosion and reduce the time cost of neural network training.
Fig. 1 is a block chain fragmentation simulation system structure diagram, where the simulation system includes N nodes, all the nodes have transmission rates, the nodes have computing capabilities, and malicious nodes exist in the nodes. The system slicing process comprises the following steps: according to the number K of the block chain fragments in the behaviors, firstly, selecting nodes in the directory committee, and the number of the nodes in the directory committeeThen, the N-C nodes except the nodes in the catalog committee are divided into pieces, the nodes are divided into different pieces according to the last L bits of the node ID, and L is log 2 The K node ID and the slice number are both binary coded characters. After the block chain fragmentation is completed, the block chain system obtains K fragments, the transactions are distributed to different fragments to be processed, the intra-fragment nodes pack the transactions into blocks with the size of B, the blocks are broadcasted to other intra-fragment nodes to carry out consensus, and consensus history H is generated in the consensus process. The K fragments send the blocks passing the verification in the fragments to a directory committee, the directory committee finally packs the K blocks into final blocks, and then the final blocks are broadcasted to other nodes in the directory committee to carry out final consensus to form consensus history. The probability P of a malicious node in the block chain can be calculated according to the consensus history in the slice and the consensus history in the directory committee. After the above process is finished, the state of the block chain system changes.
FIG. 2 is a diagram of a neural Network structure of the BDQ (Branching delay Q-Network) algorithm. The existing research of applying a deep reinforcement learning algorithm to a block chain fragmentation system is mostly a used DQN algorithm, and compared with the traditional DQN algorithm, the BDQ algorithm provides a new neural network structure, a behavior space has several sub-behaviors corresponding to several network branches, and a shared decision module is provided. The BDQ algorithm provides a degree of independence for each individual action dimensionThe method has better standability and expandability. BDQ algorithm links block state S t =(R t ,C t ,H t ,P t ) The state is abstracted through a shared decision module (namely, a hidden layer of the neural network), and the output is divided into two branches, namely a state branch and a behavior branch. Behavior branches output a dominance function for each child behavior, and state branches output a state value function V (S) t ) And combining the advantage function and the state value function of the child behavior to obtain a Q function of the child behavior, and selecting corresponding behavior according to the Q value of each output child behavior when the block chain fragmentation system makes a decision.
The performance optimization method specifically comprises the following steps:
1. establishing a Markov decision process model for the block chain fragmentation problem, wherein the model consists of the following four parts:
state space: the state space is defined as the system state S t I.e. set of transmission rates R between nodes t Set of computing capabilities of nodes C t Node consensus history set H t And probability P of malicious node t The formula is as follows:
S t ={R t ,C t ,H t ,P t }
wherein Rt ={R i,j J belongs to N, Rij represents the transmission speed of a link between the node i and the node j; c t ={C i },i∈N,C i A computational resource that is a blockchain node i; h t ={H i },H i Is the consensus history of node i, H i 1 or H i =0,H i 1 indicates that node i is not legal for block verification, H i 0 means that the node i is valid for block verification; p t And calculating according to the consensus history.
The behavior space is as follows: the behavior space is represented as a, and includes a block size B, a block-out time TI, and a block chain fragmentation number K, and the formula is:
A={B,TI,K}
the reward function: instant prize R t+1 Which represents the reward obtained after the block chain slicing system executes the action at the time t,from time t the system state S t The yield obtained by the following actions, i.e. the number of transactions processed per second by the blockchain, is formed by the formula:
wherein ,BH Is the size of the block header; b is the average size of the transaction. (B-B) H ) Indicating the size of each block processing transaction. (B-B) H ) And b is the size of the processing transaction divided by the size of the average transaction, and represents the number of processing transactions in each fragment.Representing the total number of K sharded processing transactions.The total number of transactions divided by the time of the block represents the number of transactions processed per second by the block chain, i.e., the throughput of the transactions.
A cost function: is defined as Q (S) t ,a t ) The formula is as follows:
cost function Q (S) t ,a t ) Also known as Q function, a t The epsilon A is the action taken by the system at the time t,for the expectation function, y is the future time relative to time t, R t+y+1 Represents the reward obtained after the system takes action at the time of t + y, gamma represents the attenuation factor and represents the attention degree of taking certain action under a certain state to the future reward of the system, namely the environmental influence, gamma is more than or equal to 0 and less than 1, and gamma y Y power of gamma, is t + y time R t+y+1 The attenuation factor of (2).
Thus, the established Markov decision process model can be summarized at any time tMomentary system state S t Then, by selecting the optimal behavior, the accumulated reward of the system is maximized, and the model formula is as follows:
is constrained to
wherein ,γt Is at time t R t+1 The attenuation factor of (2).
2. Model solution and solution algorithm
a solution of the model a, namely, the optimal cost function is obtained through calculation, namely, the system state S at any time t can be obtained according to the optimal cost function t And selecting the optimal behavior to maximize the accumulated reward, wherein the optimal value function calculation formula is as follows:
at any time t, the optimal behavior selection formula is as follows:
wherein Q* (S t ,a t ) Represents the optimal cost function, S t+1 Represents the system state at time t + 1, a t+1 Represents any of all actions that the system may take at time t + 1, i.e., a certain action in action space a.
And b, solving an algorithm, namely, calculating to obtain an optimal value function and selecting optimal behaviors in the decision so as to maximize the accumulated reward.
The invention solves the algorithm selection deep reinforcement learning BDQ algorithm, and the algorithm is accumulated by continuous decision (S) t ,a t ,R t+1 ,S t+1 ) Training a neural network with sample records such that the neural network can approximate a cost function, and further selecting an optimal behavior such that the cumulative reward for the model is maximized, wherein R t+1 Adopt a for the system t The reward obtained later, S t+1 The system state at time t + 1.
In the neural network training process, the BDQ algorithm proposes a new neural network structure, and as shown in fig. 2, the neural network of the BDQ algorithm has several sub-behaviors in the behavior space corresponding to several network branches, that is, the network branches correspond to the sub-behaviors in the behavior space a one-to-one, and have a shared decision module (hidden layer of the neural network). The BDQ algorithm provides a certain degree of independence for each independent action dimension, and the expandability is good. BDQ algorithm converts block chain state S at time t t =(R t ,C t ,H t ,P t ) The state is input into a neural network, abstracted through a shared decision module, and output is divided into two branches, namely a state branch and a behavior branch. Behavior branching outputs a dominance function for each child behavior, i.e. a dominance function A for the block size B 1 (S t ,a 1 ) Dominant function A of block out time TI 2 (S t ,a 2 ) And the number of blockchain shards K 3 (S t ,a 3 ) The state branch outputs a state value function V (S) t ) And combining the dominance function and the state value function of the child behavior to obtain a value function of the child behavior, and selecting corresponding behavior according to the Q value of each output child behavior when the block chain fragmentation system makes a decision.
The updating process of the neural network is to randomly extract the experience of minimatch size in an experience pool and update the neural network parameters in a gradient descending mode. The BDQ algorithm updates the loss function by the formula
wherein yd The definition is as follows,
is shown at Q d According to state S in network t+1 Selecting the sub-behavior a corresponding to the maximum Q value d Then according to the state-behavior pair toSelecting the corresponding Q value by the network, wherein the value function in the BDQ algorithm is a state value function V (S) t ) And the dominance function A of the behavior d (S t ,a d ) Composition, the cost function is:
Q d (S t ,a d )=V(S t )+A d (S t ,a d )
the BDQ algorithm comprises two neural networks with the same structure, wherein an online network is updated in real time, a target network is updated every C step, and an online network parameter value is assigned to the target network.
3. By continuously exploring and learning the complex relation between the throughput of the block chain system and the block size, the block output time and the block chain fragment number, the fragments are finally divided according to the block chain fragment number, and the nodes in the fragments process transactions in parallel according to the block size and the block output time, so that the transaction number processed by the block chain is maximized, and the performance of the block chain is improved.
The operation logic for performing performance optimization intelligent control based on the BDQ algorithm is as follows:
1) initializing an experience playback pool D with the size of N, and storing a system state S at the moment t of the block chain fragmentation system by the experience playback pool D t Behavior a t Award R t+1 And the system state S of the blockchain system at the next time t+1 ;
2) Initializing two networks with the same structure, namely an online network and a target network; the weights of the two networks are theta and theta respectively - ;
3) Setting the initial time t to be 0, and recording the time of sample recording in the experience playback pool D as tau;
4) the probability of searching for the initial behavior is epsilon, and the amount of decrease of the searching rate with t is delta ε Minimum exploration probability ε min ;
5) Starting a circulation body;
6) obtaining the state S of the block chain system at the current time t t ={R t ,C t ,H t ,P t };
7) Selecting an action using an epsilon-greedy strategy:
8) execution of action a t The block chain system carries out fragmentation, processes transactions in parallel in the fragmentation, packs the transactions into blocks, carries out consensus on the blocks, and sends the blocks passing the consensus to a directory committee for final packing and consensus; and obtaining the system environment S of the block chain at the next time of the system t+1 Calculating R t+1 ;
9) Will (S) t ,a t ,R t+1 ,S t+1 ) Storing the data into a cache array;
10) randomly extracting Y sample records from the cache array (S) τ ,a τ ,R τ+1 ,S τ+1 );
11) And calculating a Q sample value by using the Y records, wherein the formula is as follows:
Q d (S τ ,a d )=V(S τ )+A d (S τ ,a d )
12) updating the neural network using the following loss function:
13) the exploration probability epsilon is equal to epsilon-delta ε and εmin Minimum value of (1);
14) if t mod C is 0, the target network copies the online network parameter,
15) increasing time t by 1;
16) and ending the circulation body.
The invention provides an embodiment of specific block chain fragmentation optimization, which comprises the following steps:
the block chain fragmentation system has 200 nodes, and the block chain system state S at the moment t is determined t Inputting the data into a neural network of a BDQ algorithm, outputting Q values of three sub-behaviors B, TI and K, and selecting the sub-behavior with the maximum Q value to form a behavior a to be executed by the block chain slicing system t . Assume behavior a t In the block chain fragmentation system, according to the behavior, the number K of block chain fragments is 4, the block chain fragmentation system first selects nodes in the directory committee, and the number of nodes in the directory committeeAnd then, dividing the N-C (180) nodes except the nodes in the directory committee into 4 pieces, wherein the number of the nodes in each piece is 65. And after the block chain fragmentation is finished, distributing the transaction into different slices for processing, and packing the transaction into a block with the size of B being 4 by the intra-slice node according to the block outlet time TI being 8. The block chain nodes are divided into 4 different fragments, so that the 4 fragments process transactions in parallel, and the throughput of the block chain fragmentation system is improved. The BDQ algorithm is used for providing a slicing strategy for the real-time dynamic block chain slicing system, and the performance of the block chain slicing system is improved. Compared with the use of the DQN algorithm, the BDQ algorithm improves the block chain throughput.
Claims (9)
1. The performance optimization method of the block chain fragmentation system combined with deep reinforcement learning is characterized by comprising the following steps of:
step 1, a block chain simulation system comprises N nodes, transmission rates are provided among all the nodes, the nodes have computing capacity, and malicious nodes exist in the nodes;
step 2, establishing a Markov decision process model for the block chain fragmentation problem, wherein the model is determined by the system state S t Behavior space A, reward R t+1 And a cost function Q (S) t ,a t ) The four parts are formed;
system state S at time t t Defined as a set of transmission rates R between nodes t Set of computing capabilities of nodes C t Node consensus history set H t And probability P of malicious node t ;
The behavior space A comprises a block size B, a block output time TI and a block chain fragmentation number K;
reward R t+1 The reward obtained after the block chain fragmentation system at the time t executes the behavior is represented by the system state S at the time t t The yield obtained by the next action, namely the number of transactions processed by the block chain per second;
the markov decision process model is summarized as: system state S at any time t t Then, by selecting the optimal behavior, the system cumulative reward is maximized, and the formula is as follows:
is constrained to
wherein ,at Actions taken by the system for time t, γ t Is at time t R t+1 The attenuation factor of (d);
and 3, solving the model by adopting a deep reinforcement learning BDQ algorithm, continuously exploring and learning the complex relation between the throughput of the block chain system and the block size, the block outlet time and the block chain fragment number, finally carrying out fragmentation according to the block chain fragment number, and parallelly processing the transactions by the nodes in the fragments according to the block size and the block outlet time so as to maximize the transaction number processed by the block chain.
2. The method of claim 1, wherein the system state S is a system state of a block chain fragmentation system performance optimization method in combination with deep reinforcement learning t In, R t ={R i,j },i,j∈N,R ij Expressed as the transmission rate of the link between node i and node j; c t ={C i },i∈N,C i A computing resource that is node i; h t ={H i },H i Is a consensus history of node i, H i 1 or H i =0,H i 1 indicates that node i is not legal for block verification, H i 0 means that the node i is valid for block verification; p t And calculating according to the consensus history.
3. The method of claim 1, wherein the reward R is a number of factors selected from the group consisting of a number of factors, a type of the reward R, and a combination thereof t+1 The formula of (1) is:
4. The method of claim 1, wherein the cost function Q (S) is a function of the value of the system performance optimization method based on the blockchain slicing method with deep reinforcement learning t ,a t ) The formula of (1) is:
wherein ,for the expectation function, y is the future time relative to time t, R t+y+1 Represents the reward obtained after the system takes action at the time of t + y, gamma represents the attenuation factor and represents the attention degree of taking certain action under a certain state to the future reward of the system, namely the environmental influence, gamma is more than or equal to 0 and less than 1, and gamma y Is at time t + y t+y+1 The attenuation factor of (2).
5. The method for optimizing performance of a block chain slicing system in combination with deep reinforcement learning of claim 1, wherein in the step 2, an optimal cost function is obtained through calculation, that is, the system state S at any time t according to the optimal cost function t And selecting the optimal behavior to maximize the accumulated reward, wherein the calculation formula of the optimal value function is as follows:
at any time t, the optimal behavior selection formula is as follows:
wherein Q* (S t ,a t ) Representing the optimal cost function, S t+1 Represents the system state at time t +1, a t+1 Represents any of all actions that the system may take at time t +1, i.e., a certain action in action space a.
6. The method for optimizing the performance of the blockchain slicing system in combination with deep reinforcement learning of claim 1, wherein the BDQ algorithm is implemented by continuous decision accumulation (S) t ,a t ,R t+1 ,S t+1 ) Training the neural network by the sample records, so that the neural network can approximate the value function, and then selectingOptimal behavior such that the cumulative reward for the model is maximized, where S t+1 Representing the system state at time t + 1.
7. The method for optimizing the performance of the block chain slicing system in combination with deep reinforcement learning according to claim 1 or 6, wherein the neural network of the BDQ algorithm, network branches and the sub-behaviors of the behavior space A are in one-to-one correspondence, and a shared decision module, namely a hidden layer of the neural network, is provided; the system state S at the moment t t ={R t ,C t ,H t ,P t The state is abstracted through a shared decision module, the output is divided into two branches, namely a state branch and a behavior branch, and the behavior branch outputs a dominant function of each sub-behavior, namely a dominant function A of a block size B 1 (S t ,a 1 ) Dominant function A of block out time TI 2 (S t ,a 2 ) And the number of blockchain shards K 3 (S t ,a 3 ) The state branch outputs a state value function V (S) t ) And combining the dominance function and the state value function of the child behavior to obtain a value function of the child behavior, and selecting corresponding behavior according to the Q value of each output child behavior when the block chain fragmentation system makes a decision.
8. The method of claim 7, wherein the updating process of the neural network is an experience of randomly extracting minipatch size from an experience pool, and the neural network parameters are updated in a gradient descent manner, and an update formula of the BDQ algorithm to the loss function is as follows:
wherein yd The definition is as follows:
is shown at Q d According to state S in network t+1 Selecting the sub-behavior a corresponding to the maximum Q value d Then according to the state-behavior pair toSelecting the corresponding Q value by the network, wherein the value function in the BDQ algorithm is a state value function V (S) t ) And the dominance function A of the behavior d (S t ,a d ) Composition, cost function of
Q d (S t ,a d )=V(S t )+A d (S t ,a d )
The BDQ algorithm comprises two neural networks with the same structure, wherein an online network is updated in real time, a target network is updated every C step, and an online network parameter value is assigned to the target network.
9. The method for optimizing the performance of the blockchain slicing system in combination with deep reinforcement learning according to claim 1, wherein the operation logic for performing performance optimization based on the BDQ algorithm is as follows:
1) initializing an experience playback pool D with the size of N, and storing a system state S at the moment t of the block chain fragmentation system by the experience playback pool D t Behavior a t Award R t+1 And the system state S of the blockchain system at the next time t+1 ;
2) Initializing two networks with the same structure, namely an online network and a target network; the weights of the two networks are theta and theta respectively - ;
3) Setting the initial time t to be 0, and recording the time of sample recording in the experience playback pool D as tau;
4) the probability of searching for the initial behavior is epsilon, and the amount of decrease of the searching rate with t is delta ε Minimum exploration probability ε min ;
5) Starting a circulation body;
6) obtaining the state S of the block chain system at the current time t t ={R t ,C t ,H t ,P t };
7) Selecting an action using an epsilon-greedy strategy:
8) execution of action a t The block chain system carries out fragmentation, processes transactions in parallel in the fragmentation, packs the transactions into blocks, carries out consensus on the blocks, and sends the blocks passing the consensus to a directory committee for final packing and consensus; and obtaining the system environment S of the block chain at the next time of the system t+1 Calculating R t+1 ;
9) Will (S) t ,a t ,R t+1 ,S t+1 ) Storing the data into a cache array;
10) randomly extracting Y sample records from the cache array (S) τ ,a τ ,R τ+1 ,S τ+1 );
11) And calculating a Q sample value by using the Y records, wherein the formula is as follows:
Q d (S τ ,a d )=V(S τ )+A d (S τ ,a d )
12) updating the neural network using the following loss function:
13) the exploration probability epsilon is equal to epsilon-delta ε and εmin Minimum value of (d);
14) if t mod C is 0, the target network copies the online network parameter,
15) increasing time t by 1;
16) and ending the circulation body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210505118.4A CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210505118.4A CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115102867A true CN115102867A (en) | 2022-09-23 |
CN115102867B CN115102867B (en) | 2023-04-25 |
Family
ID=83287942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210505118.4A Active CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115102867B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116506444A (en) * | 2023-06-28 | 2023-07-28 | 北京科技大学 | Block chain stable slicing method based on deep reinforcement learning and reputation mechanism |
CN116702583A (en) * | 2023-04-20 | 2023-09-05 | 北京科技大学 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005064868A1 (en) * | 2003-11-25 | 2005-07-14 | Freescale Semiconductor, Inc. | Network message processing using pattern matching |
CN110389591A (en) * | 2019-08-29 | 2019-10-29 | 哈尔滨工程大学 | A kind of paths planning method based on DBQ algorithm |
CN111132175A (en) * | 2019-12-18 | 2020-05-08 | 西安电子科技大学 | Cooperative computing unloading and resource allocation method and application |
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
CN113297310A (en) * | 2021-06-15 | 2021-08-24 | 广东工业大学 | Method for selecting block chain fragmentation verifier in Internet of things |
CN113361706A (en) * | 2021-05-18 | 2021-09-07 | 深圳大数点科技有限公司 | Data processing method and system combining artificial intelligence application and block chain |
CN113570039A (en) * | 2021-07-22 | 2021-10-29 | 同济大学 | Optimized consensus block chain system based on reinforcement learning |
CN113645702A (en) * | 2021-07-30 | 2021-11-12 | 同济大学 | Internet of things system supporting block chain and optimized by strategy gradient technology |
EP3985579A1 (en) * | 2020-10-14 | 2022-04-20 | Bayerische Motoren Werke Aktiengesellschaft | Regional batching technique with reinforcement learning based decision controller for shared autonomous mobility fleet |
-
2022
- 2022-05-10 CN CN202210505118.4A patent/CN115102867B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005064868A1 (en) * | 2003-11-25 | 2005-07-14 | Freescale Semiconductor, Inc. | Network message processing using pattern matching |
CN110389591A (en) * | 2019-08-29 | 2019-10-29 | 哈尔滨工程大学 | A kind of paths planning method based on DBQ algorithm |
CN111132175A (en) * | 2019-12-18 | 2020-05-08 | 西安电子科技大学 | Cooperative computing unloading and resource allocation method and application |
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
EP3985579A1 (en) * | 2020-10-14 | 2022-04-20 | Bayerische Motoren Werke Aktiengesellschaft | Regional batching technique with reinforcement learning based decision controller for shared autonomous mobility fleet |
CN113361706A (en) * | 2021-05-18 | 2021-09-07 | 深圳大数点科技有限公司 | Data processing method and system combining artificial intelligence application and block chain |
CN113297310A (en) * | 2021-06-15 | 2021-08-24 | 广东工业大学 | Method for selecting block chain fragmentation verifier in Internet of things |
CN113570039A (en) * | 2021-07-22 | 2021-10-29 | 同济大学 | Optimized consensus block chain system based on reinforcement learning |
CN113645702A (en) * | 2021-07-30 | 2021-11-12 | 同济大学 | Internet of things system supporting block chain and optimized by strategy gradient technology |
Non-Patent Citations (4)
Title |
---|
HANG SHUAI等: "Branching Dueling Q-Network-Based Online Scheduling of a Microgrid With Distributed Energy Storage Systems" * |
SHIJING YUAN等: "Sharding for Blockchain based Mobile Edge Computing System: A Deep Reinforcement Learning Approach" * |
宋琪杰;陈铁明;陈园;马栋捷;翁正秋;: "面向物联网区块链的共识机制优化研究" * |
曾晶晶: "区块链应用系统若干脆弱性分析与评测" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116702583A (en) * | 2023-04-20 | 2023-09-05 | 北京科技大学 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
CN116702583B (en) * | 2023-04-20 | 2024-03-19 | 北京科技大学 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
CN116506444A (en) * | 2023-06-28 | 2023-07-28 | 北京科技大学 | Block chain stable slicing method based on deep reinforcement learning and reputation mechanism |
CN116506444B (en) * | 2023-06-28 | 2023-10-17 | 北京科技大学 | Block chain stable slicing method based on deep reinforcement learning and reputation mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN115102867B (en) | 2023-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115102867A (en) | Block chain fragmentation system performance optimization method combined with deep reinforcement learning | |
CN110533183A (en) | The model partition and task laying method of heterogeneous network perception in a kind of assembly line distribution deep learning | |
CN112788605B (en) | Edge computing resource scheduling method and system based on double-delay depth certainty strategy | |
CN103399902A (en) | Generation and search method for reachability chain list of directed graph in parallel environment | |
CN115374853A (en) | Asynchronous federal learning method and system based on T-Step polymerization algorithm | |
CN115798598B (en) | Hypergraph-based miRNA-disease association prediction model and method | |
CN112491818A (en) | Power grid transmission line defense method based on multi-agent deep reinforcement learning | |
CN113485826A (en) | Load balancing method and system for edge server | |
CN116050540B (en) | Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling | |
CN115391385A (en) | Database query optimization method based on ant colony genetic dynamic fusion algorithm | |
CN114465900B (en) | Data sharing delay optimization method and device based on federal edge learning | |
CN115437795A (en) | Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception | |
CN106407379A (en) | Hadoop platform based movie recommendation method | |
CN115470889A (en) | Network-on-chip autonomous optimal mapping exploration system and method based on reinforcement learning | |
Lorido-Botran et al. | ImpalaE: Towards an optimal policy for efficient resource management at the edge | |
CN117391858A (en) | Inductive blockchain account distribution method and device based on graphic neural network | |
CN117764135A (en) | Pruning method of deep neural network model and related components | |
CN115766475B (en) | Semi-asynchronous power federal learning network based on communication efficiency and communication method thereof | |
CN104348695A (en) | Artificial immune system-based virtual network mapping method and system thereof | |
CN117170836A (en) | Video cloud transcoding task scheduling method and device based on improved HHO algorithm | |
CN116723547A (en) | Collaborative caching method based on localized federal reinforcement learning in fog wireless access network | |
CN116644131A (en) | PSO-GA-based performance optimization method for segmented block chain system | |
CN113572647B (en) | Block chain-edge calculation combined system based on reinforcement learning | |
CN107273970B (en) | Reconfigurable platform of convolutional neural network supporting online learning and construction method thereof | |
CN116187383A (en) | Method for determining 3D NoC optimal test plan based on improved whale optimization algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |