CN115102867B - Block chain slicing system performance optimization method combining deep reinforcement learning - Google Patents
Block chain slicing system performance optimization method combining deep reinforcement learning Download PDFInfo
- Publication number
- CN115102867B CN115102867B CN202210505118.4A CN202210505118A CN115102867B CN 115102867 B CN115102867 B CN 115102867B CN 202210505118 A CN202210505118 A CN 202210505118A CN 115102867 B CN115102867 B CN 115102867B
- Authority
- CN
- China
- Prior art keywords
- time
- block chain
- behavior
- slicing
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002787 reinforcement Effects 0.000 title claims abstract description 19
- 238000005457 optimization Methods 0.000 title claims abstract description 16
- 230000006399 behavior Effects 0.000 claims abstract description 57
- 230000006870 function Effects 0.000 claims abstract description 54
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 45
- 238000013528 artificial neural network Methods 0.000 claims abstract description 29
- 230000005540 biological transmission Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000004088 simulation Methods 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 claims description 2
- 238000004806 packaging method and process Methods 0.000 claims description 2
- 238000004880 explosion Methods 0.000 abstract description 7
- 238000012545 processing Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pure & Applied Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The block chain slicing system performance optimization method combining with deep reinforcement learning establishes a block chain slicing selection problem as a Markov decision process model, wherein the model consists of four parts of system states, behaviors, rewards and cost functions, and the solution of the model is to continuously select the optimal behaviors under the dynamic block chain slicing system environment so as to maximize throughput of the block chain slicing system. The BDQSB algorithm can select the most suitable slicing strategy and improve the performance of the block chain slicing system according to the transmission rate among nodes, the settlement capability of the nodes, the consensus history of the nodes and the probability of malicious nodes by continuously exploring and learning the block size, the block outlet time, the block chain slicing number and the complex relation with the block chain slicing system. Compared with other schemes, the invention can further improve the performance of the block chain slicing system, solve the problem of behavior space explosion and reduce the training time cost of the neural network.
Description
Technical Field
The invention belongs to the technical field of data management and evidence storage, relates to intelligent control of block chain system segmentation, and particularly relates to a block chain segmentation system performance optimization method combined with deep reinforcement learning.
Background
Blockchain shards, i.e., partitioning nodes in a blockchain system into different shards. The ability of the blockchain to handle transactions is improved by nodes within the chip processing the transactions in parallel, i.e., improving the performance of the blockchain.
The block chain can be fragmented by adopting a static optimization method, wherein the static optimization method means that the used fragmentation strategy is always fixed when the block chain system adopts the fragmentation technology. However, the blockchain system is time-of-day, and the blockchain system does not conform to the dynamic blockchain environment by adopting a static optimization method.
Currently, a dynamic optimization method is adopted for slicing the blockchain system, for example, a deep reinforcement learning algorithm is used for dynamically providing a slicing strategy for the blockchain system. And providing an optimal slicing strategy for the current state by the reinforcement learning algorithm according to the current system state of the blockchain, so that the throughput of the blockchain system is maximized.
The dynamic optimization method provides a slicing strategy according to the dynamic blockchain system environment, and is more suitable for the dynamic blockchain system than the static optimization method. At present, a deep reinforcement learning algorithm is added into a blockchain slicing system, most of researches use DQN (Deep Q Network) algorithm to solve the defects of a static blockchain slicing strategy and the problem of state space explosion, but the method using DQN algorithm cannot solve the problem of behavior space explosion caused by behavior combination after behavior dimension expansion.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method for optimizing the performance of a blockchain slicing system combined with deep reinforcement learning, which combines a blockchain slicing technology with a deep reinforcement learning BDQ algorithm in a dynamic blockchain environment so as to solve the problem of behavior space explosion caused by behavior combination after behavior dimension expansion and further solve the problem of low blockchain throughput.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the block chain slicing system performance optimization method combined with deep reinforcement learning comprises the following steps:
system state S at time t t Defined as transmission between nodesSet of transmission rates R t Computing power set C of nodes t Consensus history set H of nodes t And probability P of malicious node t ;
The behavior space A comprises a block size B, a block outlet time TI and a block chain segmentation quantity K;
rewards R t+1 Representing rewards obtained after the execution of behaviors of the blockchain slicing system at the time t, and determining the state S of the system at the time t t The benefits obtained by taking action, namely the number of transactions per second processed by the blockchain;
the Markov decision process model is summarized as: system state S at arbitrary time t t And then, the system accumulated rewards are maximized by selecting the optimal behavior, wherein the formula is as follows:
constrained to
wherein ,at For the action taken by the system at time t, gamma t Is time t R t+1 Attenuation factor of (2);
and 3, adopting a deep reinforcement learning BDQ algorithm to solve a model, and finally performing slicing according to the number of the sliced blockchains by continuously exploring and learning complex relations between throughput of the blockchain system and the size, the time for outputting the blockchains and the number of the sliced blockchains, wherein nodes in the slicing process the transactions in parallel according to the size and the time for outputting the blockchains, so that the number of the transactions processed by the blockchains is maximized.
Compared with the prior art, the invention has the beneficial effects that:
the algorithm provides an optimal slicing strategy for the block chain system by using a deep reinforcement learning BDQ algorithm according to the dynamically changed block chain system environment, and changes the original DQN algorithm into the BDQ algorithm. The invention can solve the problem that the neural network is difficult to train caused by the action space explosion, and can reduce the time cost of the neural network training. Compared with other schemes, the invention can further improve the performance of the block chain slicing system, solve the problem of behavior space explosion and reduce the training time cost of the neural network.
Drawings
FIG. 1 is a block chain slice simulation system block diagram.
Fig. 2 is a neural network structure diagram of the BDQ algorithm.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples.
The invention relates to a method for improving performance of a block chain slicing system combined with deep reinforcement learning, which is characterized by establishing a Markov decision process model for a block chain slicing problem, providing a deep reinforcement learning BDQ algorithm as a core of a block chain slicing strategy selection algorithm, and designing a block chain slicing optimal selection strategy (Branching Dueling Q-Network card-Based block chain, BDQSB) Based on the deep reinforcement learning. The solution of the model constructed by the invention is that the optimal behavior is continuously selected in a series of system states, so that the accumulated rewards of the system are maximized, and finally, the throughput of the blockchain is improved. Compared with other schemes, the invention can further improve the performance of the block chain slicing system, solve the problem of behavior space explosion and reduce the training time cost of the neural network.
FIG. 1 is a block chain slicing simulation system architecture diagram, wherein the simulation system comprises N nodes, all nodes have transmission rates, the nodes have computing power, and malicious nodes exist in the nodes. The slicing process of the system is as follows: according to the number K of the blockchain fragments in the behavior, firstly selecting the nodes in the catalogue committee, and the number of the nodes in the catalogue committeeThen N-C nodes except nodes in the directory committee are segmented, the nodes are divided into different segments according to the last L bits of the node ID, and L=log 2 The K node ID and the slice number are binary coded characters. After the block chain slicing is completed, the block chain system obtains K slices and then performs the following stepsThe transactions are distributed to different slices for processing, the on-chip nodes package the transactions into blocks with the size of B, and broadcast the blocks to other on-chip nodes for consensus, and a consensus history H is generated in the consensus process. The K fragments send the blocks passing the verification in the fragments to a catalogue committee, and the catalogue committee finally packages the K blocks into final blocks and broadcasts the final blocks to other nodes in the catalogue committee for final consensus to form a consensus history. The probability P of a malicious node in the blockchain may be calculated from the intra-chip consensus history and the consensus history in the catalogue committee. After the above process is finished, the state of the block chain system is changed.
Fig. 2 is a neural Network structure diagram of a BDQ (Branching Dueling Q-Network) algorithm. The existing research of applying the deep reinforcement learning algorithm to the blockchain slicing system is mostly a DQN algorithm, but compared with the traditional DQN algorithm, the BDQ algorithm provides a new neural network structure, and the behavior space has a plurality of sub-behaviors corresponding to a plurality of network branches and a shared decision module. The BDQ algorithm provides a certain degree of independence for each independent action dimension, and has good expandability. BDQ algorithm will block chain state S t =(R t ,C t ,H t ,P t ) The state is abstracted by a shared decision module (namely a hidden layer of the neural network) and is output into two branches, namely a state branch and a behavior branch. Behavior branching outputs a dominance function for each sub-behavior, and state branching outputs a state value function V (S t ) And combining the dominance function and the state value function of the sub-behaviors to obtain a Q function of the sub-behaviors, and selecting corresponding behaviors according to the Q value of each sub-behavior when the block chain slicing system makes a decision.
The performance optimization method specifically comprises the following steps:
1. establishing a Markov decision process model for the block chain slicing problem, wherein the model comprises the following four parts:
state space: the state space is defined as the system state S t I.e. set of transmission rates R between nodes t Computing power set C of nodes t Consensus of nodesHistory set H t And probability P of malicious node t The formula is as follows:
S t ={R t ,C t ,H t ,P t }
wherein Rt ={R i,j I, j e N, rij represents the transmission speed of the link between node i and node j; c (C) t ={C i },i∈N,C i Computing resources for blockchain node i; h t ={H i },H i For consensus history of node i, H i =1 or H i =0,H i =1 indicates that node i is not legal for block verification, H i =0 indicates that node i is valid for block verification; p (P) t And calculating according to the consensus history.
Behavioral space: the behavior space is denoted as a, which includes a block size B, a block out time TI, and a blockchain slicing number K, with the formula:
A={B,TI,K}
bonus function: i.e. rewards R t+1 Which represents rewards obtained after the execution of the behavior of the blockchain sharding system at the time t, and is represented by the system state S at the time t t The following benefits obtained by taking action, namely the number of transactions processed by the blockchain per second, are formed by the following formulas:
wherein ,BH The size of the block header; b is the average size of the transaction. (B-B) H ) Representing the size of each block processing transaction. (B-B) H ) And/b is the size of the transaction divided by the average transaction size, indicating the number of transactions per slice.Representing the total number of K sliced transactions. />Dividing the total number of transactions by the time to get out of the block, representing the number of transactions per second processed by the blockchain, i.eThroughput of transactions.
Cost function: defined as Q (S) t ,a t ) The formula is as follows:
cost function Q (S) t ,a t ) Also known as Q function, a t E a is the action taken by the system at time t,as a desired function, y is the future time relative to time t, R t+y+1 Representing rewards obtained after the system takes action at the time t+y, wherein gamma represents attenuation factors, and represents the degree of importance of taking action in a certain state to future rewards of the system, namely environmental influence, wherein gamma is more than or equal to 0 and less than 1 y Y to the power of γ, is the time R of t+y t+y+1 Is a factor of attenuation of (a).
Thus, the established Markov decision process model can be summarized as the system state S at any time t t And then, the system accumulated rewards are maximized by selecting the optimal behavior, and the model formula is as follows:
constrained to
wherein ,γt Is time t R t+1 Is a factor of attenuation of (a).
2. Model solution and solving algorithm
The solution of the model a is that the optimal cost function is obtained by calculation, namely the system state S at any time t according to the optimal cost function t And selecting the optimal behavior to maximize the accumulated rewards, wherein the optimal cost function calculation formula is as follows:
at any time t, the optimal behavior selection formula is:
wherein Q* (S t ,a t ) Represents an optimal cost function S t+1 Represents the system state at time t+ 1, a t+1 Representing any of all the actions that the system may take at time t+ 1, i.e. a certain action in action space a.
And b, calculating to obtain an optimal cost function and selecting optimal behaviors in the decision so as to maximize the accumulated rewards.
The solution algorithm of the invention selects a deep reinforcement learning BDQ algorithm which accumulates through continuous decision (S t ,a t ,R t+1 ,S t+1 ) Training a neural network using sample records, such that the neural network approximates a cost function, thereby selecting optimal behavior such that cumulative rewards of the model are maximized, where R t+1 Take a for the system t The obtained rewards S t+1 The system state is at time t+ 1.
In the neural network training process, the BDQ algorithm provides a new neural network structure, and the neural network of the BDQ algorithm is shown in fig. 2, and the behavior space has several sub-behaviors corresponding to several network branches, namely, the network branches are in one-to-one correspondence with the sub-behaviors of the behavior space A, and the BDQ algorithm has a shared decision module (a hidden layer of the neural network). The BDQ algorithm provides a certain degree of independence for each independent action dimension, and has good expandability. BDQ algorithm will block chain state S at time t t =(R t ,C t ,H t ,P t ) The state is abstracted by a shared decision module and is divided into two branches, namely a state branch and a behavior branch. Behavior branching outputs each sub-behaviorDominance function, i.e. dominance function A of block size B 1 (S t ,a 1 ) Dominance function A of out-block time TI 2 (S t ,a 2 ) Dominance function A for the number of blockchain slices K 3 (S t ,a 3 ) The state branch outputs a state value function V (S t ) And combining the dominance function and the state value function of the sub-behaviors to obtain a value function of the sub-behaviors, and selecting corresponding behaviors according to the Q value of each outputted sub-behavior when the block chain slicing system makes a decision.
The updating process of the neural network is to randomly extract experiences of miniband size from an experience pool and update the neural network parameters in a gradient descent mode. The update formula of the BDQ algorithm on the loss function is as follows
wherein yd The definition is as follows,
represented at Q d According to state S in network t+1 Selecting sub-behavior a corresponding to maximum Q value d Then according to the state-behavior pair to +.>The network selects the corresponding Q value, and the cost function in the BDQ algorithm is represented by a state value function V (S t ) Dominance function A of sum behavior d (S t ,a d ) The composition, cost function is:
Q d (S t ,a d )=V(S t )+A d (S t ,a d )
two neural networks with the same structure exist in the BDQ algorithm, wherein the online network is updated in real time, the target network is updated once every C steps, and the online network parameter value is assigned to the target network.
3. Through continuously exploring and learning the complex relation between the throughput of the block chain system and the size, the block outlet time and the block chain slicing number, finally slicing is carried out according to the block chain slicing number, and the nodes in the slicing process the transactions in parallel according to the block size and the block outlet time, so that the transaction number of the block chain processing is maximized, and the performance of the block chain is improved.
The running logic for performing intelligent control of performance optimization based on the BDQ algorithm is as follows:
1) Initializing an experience playback pool D with the size of N, and storing a system state S of a blockchain slicing system at the moment t by the experience playback pool D t Behavior a t Awards R t+1 And system state S of the blockchain system at the next time t+1 ;
2) Initializing two networks with the same structure of an online network and a target network; the weights of the two networks are respectively theta and theta - ;
3) Setting an initial time t=0, and recording the time of sample record in the experience playback pool D as tau;
4) Initializing the exploration probability epsilon of behaviors, wherein the exploration rate reduces delta along with t ε Minimum search probability ε min ;
5) Starting a circulating body;
6) Acquiring the current t-moment block chain system state S t ={R t ,C t ,H t ,P t };
7) Selecting behavior using an ε -greedy policy:
8) Execution behavior a t The block chain system performs slicing, processes transactions in parallel in the slicing, packages the transactions into blocks, performs consensus on the blocks, and sends the blocks passing the consensus to the catalogue committee for final packaging and consensus; and obtains the system environment S of the block chain system at the next time t+1 Calculating R t+1 ;
9) Will (S) t ,a t ,R t+1 ,S t+1 ) Storing into a cache array;
10 Randomly extracting Y-bar sample records from the cache array (S) τ ,a τ ,R τ+1 ,S τ+1 );
11 Using Y records, the Q sample value is calculated as follows:
Q d (S τ ,a d )=V(S τ )+A d (S τ ,a d )
12 Updating the neural network using the following loss function:
13 Finding the probability ε to ε - Δ ε and εmin Is the minimum value of (a);
14 If t mod c=0, then the target network replicates the online network parameter,
15 Time t) is increased by 1;
16 The cycle body is finished.
The invention provides a specific blockchain slicing optimization embodiment, which comprises the following steps:
the blockchain slicing system has 200 nodes, and the state S of the blockchain system at the moment of t is calculated t Inputting the behaviors into a neural network of a BDQ algorithm, outputting Q values of three sub-behaviors B, TI and K, and selecting the sub-behaviors with the largest Q value to form a behavior a to be executed by the block chain slicing system t . Hypothesized behavior a t The block chain slicing system selects nodes in the catalogue committee according to the number K=4 of the block chain slices in the behavior, and the number of the nodes in the catalogue committeeThe nodes were then sliced into 4 pieces, 65 per piece, with N-c=180 nodes excluding nodes in the directory committee. After the block chain is sliced, the transaction is distributed to different slices for processing, and the intra-slice nodes pack the transaction into blocks with the size of B=4 according to the block out time TI=8. The blockchain node is divided into 4 different slices, so that the 4 slices process transactions in parallel, and the throughput of the blockchain slicing system is improved. And a BDQ algorithm is used for providing a slicing strategy for the block chain slicing system which is dynamic in real time, so that the performance of the block chain slicing system is improved. The BDQ algorithm results in improved blockchain throughput as compared to using the DQN algorithm. />
Claims (1)
1. The block chain slicing system performance optimization method combined with deep reinforcement learning is characterized by comprising the following steps of:
step 1, a block chain simulation system comprises N nodes, wherein all nodes have transmission rates, the nodes have computing power, and malicious nodes exist in the nodes;
step 2, establishing a Markov decision process model for the block chain slicing problem, wherein the model is formed by a system state S t Behavioral space A, reward R t+1 And a cost function Q (S t ,a t ) Four parts;
system state S at time t t Defined as the set of transmission rates R between nodes t Computing power set C of nodes t Consensus history set H of nodes t And probability P of malicious node t The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
S t ={R t ,C t ,H t ,P t }
wherein Rt ={R i,j },i,j∈N,R ij Represented as the transmission rate of the link between node i and node j; c (C) t ={C i },i∈N,C i Computing resources for blockchain node i; h t ={H i },H i For consensus history of node i, H i =1 or H i =0,H i Table=1Node i is shown to verify that the block is illegal, H i =0 indicates that node i is valid for block verification; p (P) t According to the consensus history, calculating;
the behavior space A comprises a block size B, a block outlet time TI and a block chain segmentation quantity K; the formula is:
A={B,TI,K}
rewards R t+1 Representing rewards obtained after the execution of behaviors of the blockchain slicing system at the time t, and determining the state S of the system at the time t t The benefits obtained by taking action, namely the number of transactions per second processed by the blockchain; the formula is:
wherein ,BH The size of the block header; b is the average size of the transaction, (B-B) H ) Representing the size of each block transaction, (B-B) H ) B represents the number of transactions per fragment,representing the total number of K slicing transactions;dividing the total number of the transactions by the time of outputting the blocks to represent the number of the transactions processed by the blockchain per second, namely the throughput of the transactions;
cost function Q (S) t ,a t ) The formula of (2) is:
wherein ,at E a is the action taken by the system at time t,as a desired function, less future time relative to time t, R t+y+1 Representing rewards obtained after the system takes action at the time t+y, wherein gamma represents attenuation factors, and represents the degree of importance of taking action in a certain state to future rewards of the system, namely environmental influence, wherein gamma is more than or equal to 0 and less than 1 y Is t+y time R t+y+1 Attenuation factor of (2);
the Markov decision process model is summarized as: system state S at arbitrary time t t And then, the system accumulated rewards are maximized by selecting the optimal behavior, wherein the formula is as follows:
constrained to
wherein ,at For the action taken by the system at time t, gamma t Is time t R t+1 Attenuation factor of (2);
wherein: calculating to obtain an optimal cost function, namely, according to the optimal cost function, the system state S at any time t t And selecting the optimal behavior to maximize the accumulated rewards, wherein the calculation formula of the optimal cost function is as follows:
at any time t, the optimal behavior selection formula is:
wherein Q* (S t ,a t ) Represents an optimal cost function S t+1 Represents the system state at time t+1, a t+1 Representing any of all the actions that the system may take at time t+1, i.e. the action spaceA certain behavior in A;
step 3, adopting a deep reinforcement learning BDQ algorithm to solve a model, and finally performing slicing according to the number of the sliced blockchains by continuously exploring and learning complex relations between throughput of a blockchain system, the size of a block, the block outlet time and the number of the sliced blockchains, wherein nodes in the slicing process transactions in parallel according to the size of the block and the block outlet time, so that the number of the transactions processed by the blockchains is maximized;
the running logic for performance optimization based on the BDQ algorithm is as follows:
1) Initializing an experience playback pool D with the size of N, and storing a system state S of a blockchain slicing system at the moment t by the experience playback pool D t Behavior a t Awards R t+1 And system state S of the blockchain system at the next time t+1 ;
2) Initializing two networks with the same structure of an online network and a target network; the weights of the two networks are respectively theta and theta - ;
3) Setting an initial time t=0, and recording the time of sample record in the experience playback pool D as tau;
4) Initializing the exploration probability delta of behaviors, wherein the exploration rate reduces delta along with t ε Minimum search probability ε min ;
5) Starting a circulating body;
6) Acquiring the current t-moment block chain system state S t ={R t ,C t ,H t ,P t };
7) Selecting behavior using an ε -greedy policy:
8) Execution behavior a t The block chain system performs slicing, processes transactions in parallel in the slicing, packages the transactions into blocks, performs consensus on the blocks, and sends the blocks passing the consensus to the catalogue committee for final packaging and consensus; and obtains the system environment S of the block chain system at the next time t+1 Calculating R t+1 ;
9) Will (S) t ,a t ,R t+1 ,S t+1 ) Storing into a cache array;
10 Randomly extracting Y-bar sample records from the cache array (S) τ ,a τ ,R τ+1 ,S τ+1 );
11 Using Y records, the Q sample value is calculated as follows:
Q d (S τ ,a d )=V(S τ )+A d (S τ ,a d )
12 Updating the neural network using the following loss function:
13 Finding the probability ε to ε - Δ ε and εmin Is the minimum value of (a);
14 If t mod c=0, then the target network replicates the online network parameter,
15 Time t) is increased by 1;
16 Ending the circulation body;
the BDQ algorithm accumulates (S t ,a t ,R t+1 ,S t+1 ) Training the neural network with the sample records such that the neural network approximates a cost function, thereby selecting optimal behavior such that cumulative rewards of the model are maximized, wherein S t+1 The system state at time t+1;
the neural network of the BDQ algorithm has network branches in one-to-one correspondence with sub-behaviors of the behavior space A and is provided with a shared decision module, namely a hidden layer of the neural network; system state S at t time t ={R t ,C t ,H t ,P t Input into neural network, and the state is processed by shared decision moduleThe abstract, output is divided into two branches, namely a state branch and a behavior branch, the behavior branch outputs the dominance function of each sub-behavior, namely the dominance function A of the block size B 1 (S t ,a 1 ) Dominance function A of out-block time TI 2 (S t ,a 2 ) Dominance function A for the number of blockchain slices K 3 (S t ,a 3 ) The state branch outputs a state value function V (S t ) Combining the dominance function and the state value function of the sub-behaviors to obtain a value function of the sub-behaviors, and selecting corresponding behaviors according to the Q value of each outputted sub-behavior when the block chain slicing system makes a decision;
the updating process of the neural network is to randomly extract experiences of miniband size from an experience pool, update the neural network parameters in a gradient descent mode, and update formulas of the BDQ algorithm on loss functions are as follows:
wherein yd The definition is as follows:
represented at Q d According to state S in network t+1 Selecting sub-behavior a corresponding to maximum Q value d Then according to the state-behavior pair to +.>The network selects the corresponding Q value, and the cost function in the BDQ algorithm is represented by a state value function V (S t ) Dominance function A of sum behavior d (S t ,a d ) Composition, cost function is
Q d (S t ,a d )=V(S t )+A d (S t ,a d )
Two neural networks with the same structure exist in the BDQ algorithm, wherein the online network is updated in real time, the target network is updated once every C steps, and the online network parameter value is assigned to the target network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210505118.4A CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210505118.4A CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115102867A CN115102867A (en) | 2022-09-23 |
CN115102867B true CN115102867B (en) | 2023-04-25 |
Family
ID=83287942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210505118.4A Active CN115102867B (en) | 2022-05-10 | 2022-05-10 | Block chain slicing system performance optimization method combining deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115102867B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116702583B (en) * | 2023-04-20 | 2024-03-19 | 北京科技大学 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
CN116506444B (en) * | 2023-06-28 | 2023-10-17 | 北京科技大学 | Block chain stable slicing method based on deep reinforcement learning and reputation mechanism |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
CN113645702A (en) * | 2021-07-30 | 2021-11-12 | 同济大学 | Internet of things system supporting block chain and optimized by strategy gradient technology |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1860743A (en) * | 2003-11-25 | 2006-11-08 | 飞思卡尔半导体公司 | Network message processing using pattern matching |
CN110389591A (en) * | 2019-08-29 | 2019-10-29 | 哈尔滨工程大学 | A kind of paths planning method based on DBQ algorithm |
CN111132175B (en) * | 2019-12-18 | 2022-04-05 | 西安电子科技大学 | Cooperative computing unloading and resource allocation method and application |
EP3985579A1 (en) * | 2020-10-14 | 2022-04-20 | Bayerische Motoren Werke Aktiengesellschaft | Regional batching technique with reinforcement learning based decision controller for shared autonomous mobility fleet |
CN113361706A (en) * | 2021-05-18 | 2021-09-07 | 深圳大数点科技有限公司 | Data processing method and system combining artificial intelligence application and block chain |
CN113297310B (en) * | 2021-06-15 | 2023-03-21 | 广东工业大学 | Method for selecting block chain fragmentation verifier in Internet of things |
CN113570039B (en) * | 2021-07-22 | 2024-02-06 | 同济大学 | Block chain system based on reinforcement learning optimization consensus |
-
2022
- 2022-05-10 CN CN202210505118.4A patent/CN115102867B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112261674A (en) * | 2020-09-30 | 2021-01-22 | 北京邮电大学 | Performance optimization method of Internet of things scene based on mobile edge calculation and block chain collaborative enabling |
CN113645702A (en) * | 2021-07-30 | 2021-11-12 | 同济大学 | Internet of things system supporting block chain and optimized by strategy gradient technology |
Also Published As
Publication number | Publication date |
---|---|
CN115102867A (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115102867B (en) | Block chain slicing system performance optimization method combining deep reinforcement learning | |
WO2021155713A1 (en) | Weight grafting model fusion-based facial recognition method, and related device | |
CN112491818B (en) | Power grid transmission line defense method based on multi-agent deep reinforcement learning | |
CN109388565B (en) | Software system performance optimization method based on generating type countermeasure network | |
Mahfoud | Finite Markov chain models of an alternative selection strategy for the genetic algorithm | |
CN112884149B (en) | Random sensitivity ST-SM-based deep neural network pruning method and system | |
CN113691594B (en) | Method for solving data imbalance problem in federal learning based on second derivative | |
CN113254719B (en) | Online social network information propagation method based on status theory | |
CN109145107B (en) | Theme extraction method, device, medium and equipment based on convolutional neural network | |
CN115374853A (en) | Asynchronous federal learning method and system based on T-Step polymerization algorithm | |
CN106372101A (en) | Video recommendation method and apparatus | |
CN116362329A (en) | Cluster federation learning method and device integrating parameter optimization | |
CN115437795A (en) | Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception | |
CN111626404A (en) | Deep network model compression training method based on generation of antagonistic neural network | |
CN116975778A (en) | Social network information propagation influence prediction method based on information cascading | |
CN116055209A (en) | Network attack detection method based on deep reinforcement learning | |
CN115829029A (en) | Channel attention-based self-distillation implementation method | |
CN108388942A (en) | Information intelligent processing method based on big data | |
CN108417204A (en) | Information security processing method based on big data | |
CN113342474B (en) | Method, equipment and storage medium for predicting customer flow and training model | |
CN114611721A (en) | Federal learning method, device, equipment and medium based on partitioned block chain | |
CN107256425B (en) | Random weight network generalization capability improvement method and device | |
CN111210009A (en) | Information entropy-based multi-model adaptive deep neural network filter grafting method, device and system and storage medium | |
CN113763167B (en) | Blacklist mining method based on complex network | |
CN116702583B (en) | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |