CN116702583B

CN116702583B - Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning

Info

Publication number: CN116702583B
Application number: CN202310428183.6A
Authority: CN
Inventors: 罗熊; 马铃; 李耀宗
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2024-03-19
Anticipated expiration: 2043-04-20
Also published as: CN116702583A

Abstract

The invention discloses a method and a device for optimizing the performance of a block chain under the Internet of things based on deep reinforcement learning, and relates to the technical field of the Internet of things. Comprising the following steps: initializing a block chain simulation system in an Internet of things scene; constructing a performance optimization model of the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model; and solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things. According to the average transaction size, the computing resources of the nodes and the transmission rate of the nodes, the invention adopts a dual-depth Q network algorithm to dynamically adjust the number of fragments, the size of the blocks and the interval of the blocks, and obtains the optimal expandability configuration of the block chain system on the premise of not sacrificing other necessary performance indexes. The block chain technology is introduced in the field of the Internet of things, the performance optimization of a block chain system is realized based on a deep reinforcement learning algorithm, and the requirements of high safety and high efficiency of the Internet of things are met.

Description

Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning

Technical Field

The invention relates to the technical field of the Internet of things, in particular to a method and a device for optimizing the performance of a blockchain under the Internet of things based on deep reinforcement learning.

Background

The internet of things is a network which extends and expands on the basis of the traditional internet, and is a huge network formed by combining various information sensing devices with the internet. The Internet of things is widely applied to the fields of logistics transportation and industrial production in the traditional industry, intelligent home furnishing and intelligent medical treatment in the emerging industry and the like. The traditional internet of things is mostly composed of distributed equipment and centralized data processing nodes. However, in the context of rapid development of mobile communication technologies represented by 5G, the proliferation of the amount of computation tasks in the internet of things scenario causes data to face security risks, and generates higher network delay and operation costs.

The rise of the block chain technology provides an effective scheme for solving the problems of the traditional Internet of things. The first application of blockchains is bitcoin, which ensures the security and efficiency of data by enabling anonymous and trusted transactions to eliminate intervening. However, its impact is far beyond the field of cryptocurrency. Blockchains are essentially a distributed storage system consisting of a plurality of time stamped blocks in the form of P2P (Peer to Peer) accounts. Therefore, it has the characteristics of decentralization, anonymity, security, non-tamper property and the like. In the scene of the Internet of things, the data is safely and reliably stored through the block chain technology, so that the data safety and the processing efficiency of the whole system are ensured, and more efficient data storage, data exchange and data management can be realized. Efficient consensus algorithms are key to application of blockchain technology to the internet of things. The PoW (Proof of Work) consensus algorithm is the earliest and safest public chain consensus algorithm, has the characteristics of decentralization and high safety, and can meet the high-safety requirement of the Internet of things. However, the PoW consensus algorithm has a significant drawback, since each node in the network needs to calculate the hash value of the block header, which results in a lot of resource waste. To solve this problem, poDL (Proof of Deep Learning, deep learning proof) consensus algorithm is proposed, which replaces the conventional hash collision with a deep learning computing task, thereby effectively avoiding meaningless resource waste. However, how to provide the scalability necessary to meet the high transaction throughput requirements of the internet of things remains a challenge.

Currently, methods for improving scalability of blockchain systems can be divided into two modes, on-chain and off-chain. The first is a sharding technique that divides blockchain nodes into different shards, each of which can process transactions in parallel. Another on-chain approach is parameter adjustment, which improves system performance by adjusting parameters such as block size and block spacing. The method under the chain mainly adopts a multi-chain technology, and reduces the load and redundancy of the main chain by migrating trivial matters of the main chain to other sub-chains. However, the under-chain approach is supported by an incompletely distributed local off-line system, in which malicious nodes easily hook up with each other, linking the wrong blockchain to the system, thereby reducing the security level and performance of the blockchain.

There is a well-known triple paradox in blockchain systems, namely that the blockchain system can only have two at the same time in the three properties of decentralization, security and scalability. The bit coin system based on the PoW consensus algorithm preferentially pursues decentralization and safety, thereby sacrificing the characteristic of expandability. However, most blockchain platforms in the internet of things scenario only enhance the scalability of the system by sacrificing performance metrics such as security, latency, etc.

The blockchain application in the scene of the Internet of things has the characteristics of dynamic and high dimension, and the DRL (Deep Reinforcement Learning ) algorithm has natural advantages in solving the complex optimization decision problem of the application of the Internet of things. Deep reinforcement learning combines the perceptibility of deep learning with the decision-making capability of reinforcement learning, and directly controls the behavior of an agent by learning high-dimensional perceptual input. Currently, most of the common optimization strategies employ DQN (Deep Q Network) algorithms. However, the DQN algorithm has a problem of overestimation, i.e., the estimated value is larger than the true value. Therefore, on the premise of not sacrificing other performance indexes, the method for finding the optimal extensibility configuration of the blockchain system by using a proper DRL algorithm has important significance.

Disclosure of Invention

The invention provides the invention aiming at the problem of finding out the optimal expandability configuration of the block chain system by using a proper DRL algorithm on the premise of not sacrificing other performance indexes.

In order to solve the technical problems, the invention provides the following technical scheme:

in one aspect, the invention provides a method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning, which is realized by electronic equipment, and comprises the following steps:

s1, initializing a block chain simulation system in an Internet of things scene.

S2, constructing a performance optimization model of the block chain simulation system according to the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model.

And S3, solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.

Optionally, the initializing the blockchain simulation system in the internet of things scene in S1 includes:

setting the total number N of nodes, the number F of malicious nodes and the average transaction size X of a blockchain simulation system in the scene of the Internet of things.

The N nodes all have computing resources, and a data path exists among all the N nodes.

The N nodes are divided into K slices, each of the K slices containing one full node to generate a block.

Optionally, the markov decision process in S2 is a five-tuple (S, a, P, R, γ).

Wherein S is a set of states, and the state at decision time t is S _t ＝[X，C，D] _t X represents the average transaction size, c= { C _i The computing resource of node i, d= { D }, is represented _i，j And represents the data transmission rate between node i and node j.

A is a set of actions, and the action at decision time t is a _t ＝[K，S ^B ，T ^B ] _t K represents the number of fragments, S ^B Representing block size, T ^B Representing the block interval.

P is a state transition matrix, R is a reward function, gamma is an attenuation coefficient, and gamma is [0,1].

Optionally, the objective function of the performance optimization model of the blockchain simulation system in S2 is as shown in the following formula (1):

wherein E is a desired function, gamma _t For the attenuation coefficient at decision time t, r _t To be in state s _t Lower selection action a _t Generated rewards s _t For deciding the state at time t, a _t Is the action at decision time t.

Optionally, solving the performance optimization model in S3 by using a deep reinforcement learning algorithm includes:

s31, initializing an experience playback pool B, a current network and a target network.

S32, initializing parameters of a deep reinforcement learning algorithm; wherein the parameters include the exploration probability epsilon and the maximum round number T.

S33, starting a circulation body and initializing a state S _t 。

S34, state S _t As input to the current network, and employ an e-greedy policy to select action a _t 。

S35, in state S _t Lower execution action a _t Obtaining a new state s _t+1 Sum prize r _t 。

S36, four-element group (S) _t ，a _t ，r _t ，s _t+1 ) Store to experience playback pool B.

S37, randomly extracting experience information (S) of a certain batch from the experience playback pool B _i ，a _i ，r _i ，s _i+1 ) Learning and calculating target value y _i And updates the parameter ω of the current network using a gradient back-propagation approach.

S38, setting a fixed time interval C, and copying parameters of the current network after each time of C iterations is completed _ω To the target network to update the parameter omega of the target network ^- 。

S39, repeatedly executing the steps S33 to S38 until the maximum round number T is reached, and ending the cycle body.

Alternatively, the network structures of the current network and the target network in S31 are the same.

The parameters of the current network and the target network are omega and omega respectively ^- 。

Optionally, at state S in S35 _t Lower execution action a _t Obtaining a new state s _t+1 Sum prize r _t Further comprising:

and adopting an S-PoDL (Separate Proof of Deep Learning, fragment deep learning proving) consensus algorithm to complete consensus verification.

Optionally, the quadruple (S in S36 _t ，a _t ，r _t ，s _t+1 ) Store to experience playback pool B, further comprising:

when the experience information stored in the experience playback pool B reaches the maximum storage amount and new experience information arrives, the experience information which is first entered into the experience playback pool is popped up and deleted to record the new experience information.

Alternatively, the target value y is calculated in S37 _i The following formula (2) shows:

y _i ＝r _i +γQ(s _i+1 ，argmgx _a Q(s _i+1 ，a _i ；ω)；ω ^- ) (2)

wherein r is _i To be in state s _i Lower selection action a _i The generated reward, gamma is the attenuation coefficient, s _i+1 For deciding the state at time i+1, a _i Is the action at decision time i.

On the other hand, the invention provides a device for optimizing the performance of the block chain under the Internet of things based on deep reinforcement learning, which is applied to realizing a method for optimizing the performance of the block chain under the Internet of things based on the deep reinforcement learning, and comprises the following steps:

and the initialization module is used for initializing the blockchain simulation system in the scene of the Internet of things.

The building module is used for building a performance optimization model of the block chain simulation system according to the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model.

And the output module is used for solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.

Optionally, the initialization module is further configured to:

setting total number N of nodes, number F of malicious nodes and average transaction size X of block chain simulation system in Internet of things scene _。

Alternatively, the markov decision process is a five-tuple (S, a, P, R, γ).

Optionally, an objective function of a performance optimization model of the blockchain simulation system is shown in the following formula (1):

Optionally, the output module is further configured to:

S33, starting a circulation body and initializing a state S _t 。

S34, state S _t As input to the current network and using epsilon-greedy policy to select action a _t 。

S38, setting a fixed time interval C, and copying the parameter omega of the current network to the target network after each time of C iterations is completed so as to update the parameter omega of the target network ^- 。

Optionally, the network structures of the current network and the target network are the same.

Optionally, the output module is further configured to:

and adopting an S-PoDL consensus algorithm to complete the consensus verification.

Optionally, the output module is further configured to:

Optionally, a target value y is calculated _i The following formula (2) shows:

y _i ＝r _i +γQ(s _i+1 ，argmax _a Q(s _i+1 ，a _i ；ω)；ω ^- ) (2)

In one aspect, an electronic device is provided, which includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the above-mentioned deep reinforcement learning-based method for optimizing the performance of a blockchain under the internet of things.

In one aspect, a computer readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the above-mentioned method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning.

Compared with the prior art, the technical scheme has at least the following beneficial effects:

according to the scheme, the block chain system performance optimization method based on deep reinforcement learning in the Internet of things scene is provided. Specifically, the invention quantifies the performance of the blockchain system in the scene of the Internet of things from three aspects of expandability, safety and time delay, and obtains a more comprehensive optimization scheme. Then, the performance of the block chain system is improved by adopting a slicing mechanism and a parameter adjustment technology, and the high expandability requirement of the Internet of things system is met. In order to obtain optimal expandability configuration without sacrificing other necessary performance indexes, the invention adopts a DDQN (Double Deep Q Network, double-depth Q network) algorithm to dynamically optimize the performance of the system, and the algorithm uses different networks to calculate target values, decouples the selection and evaluation of actions, and solves the inherent overestimation problem of the DQN.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a block chain performance optimization method under the Internet of things based on deep reinforcement learning provided by the embodiment of the invention;

fig. 2 is a schematic diagram of a block chain system in an internet of things scenario according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an architecture of an S-PoDL consensus algorithm according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a deep reinforcement learning algorithm according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of a performance optimization method according to an embodiment of the present invention;

FIG. 6 is a block diagram of a block chain performance optimization device based on deep reinforcement learning under the Internet of things, provided by an embodiment of the invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.

As shown in fig. 1, the embodiment of the invention provides a method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning. The block chain performance optimization method flow chart under the internet of things based on the deep reinforcement learning as shown in fig. 1, the processing flow of the method can comprise the following steps:

fig. 2 shows that the blockchain system in the internet of things scenario of the present invention includes an internet of things network and a blockchain system. In the internet of things, intelligent devices refer to sensors, monitoring devices, personal terminals and the like, and are responsible for data collection and connection with other devices for data sharing. The hierarchical structure of the internet of things can be divided into three layers from bottom to top: a perception layer, a network layer and an application layer. The sensing layer is mainly responsible for data collection, and the network layer uses a wireless network or a wired network to store and share data information from the sensing layer. As a structural model, the application layer processes the obtained data through the cloud computing platform, which provides the user with a data-based application. Therefore, there are multiple transactions in the internet of things network, such as data storage, processing, sharing, and so on.

With the proliferation of data of the internet of things, blockchain systems that safely and reliably process transactions are widely used. In the block chain system, a large number of transactions are processed in parallel by adopting a slicing technology, so that the processing efficiency of the data of the Internet of things is improved. When a transaction generated by the internet of things is securely transmitted to the blockchain system and stored in the distributed ledger, the blockchain will complete the transaction request by the following steps. First, all nodes in a blockchain system are partitioned into different slices, each containing one full node to generate a block. Second, after each slice completes the consensus verification using the S-PoDL consensus algorithm, the new block is linked to the blockchain network.

As shown in fig. 3, the S-PoDL consensus algorithm of the present invention is split into two phases.

Stage 1: first, the model requestor publishes multiple deep learning models and training sets to nodes in different shards. The purpose of the model requester is to obtain an optimal training model, so the present invention assumes that the model requester is honest. Next, all nodes start training the model without obtaining the test set, which can effectively prevent overfitting. After the node completes training, the block head is generated according to the rule of the bottom layer block chain system. Finally, the node submits the block header to the corresponding full node.

Stage 2: first, the model requestor publishes the test set to nodes in different shards, each node calculates the accuracy of the training model and generates a block. Next, the node submits a block and training model containing precision to the full node. The full node then verifies the validity of the block by comparing the hash values submitted in the two phases. In this process, the full node will ignore the model that did not receive the block header in phase 1. Then, the eligible blocks are sorted in descending order of precision. And finally, sequentially verifying the precision submitted by the blocks by the full nodes, and receiving the first verified block.

Referring to fig. 4 and 5, the method for optimizing the performance of the blockchain under the internet of things based on the deep reinforcement learning provided by the invention specifically comprises the following steps:

Optionally, the step S1 may include:

S2, constructing a performance optimization model of the block chain simulation system according to the block chain simulation system.

Wherein the performance optimization model is built as a Markov decision process model.

Optionally, the markov decision process in S2 is a five-tuple (S, a, P, R, γ).

A is a set of actions, and the action at decision time t is a _t ＝[K，S ^B ，T ^B ] _t K represents the number of fragments, S ^B Representing the block size (number of bytes contained in each block), T ^B Representing the block interval (the average time required to generate a new block).

P is a state transition matrix, satisfying: p (t) =pr [ D ] _i，j (t+1)＝D _d |D _i，j (t)＝D _c ]，D _c ，D _d ∈D。

R is a reward function, when the decision time is t, R _t Represented in state s _t Lower selection motionAct as a _t The generated rewards. At decision time t, the rewards for fragments that meet the constraint are defined as the following equation (1):

gamma is the attenuation coefficient and gamma is 0, 1.

Further, the action decisions in the Markov decision process have Markov properties, i.e. the next state s _t+1 Dependent only on the current state s _t Irrespective of the history state, it satisfies the formula (2):

P(s _t+1 |s _t )＝P(s _t+1 |s _t ，s _t-1 ，...，s ₁ ，s ₀ ) (2)

optionally, the optimization objective of the model is to maximize the scalability of the blockchain system without sacrificing security and latency, and the objective function is set to the following formula (3):

wherein Q (s, a) is an action cost function, and can be expressed as the following formula (4):

In one possible implementation, a deep reinforcement learning algorithm is used to solve the model according to the optimization objective. Currently, most of the common optimization strategies employ DQN algorithms. However, the DQN algorithm has a problem of overestimation, i.e., the estimated value is larger than the true value. Therefore, the invention adopts the DDQN algorithm to dynamically optimize the performance of the system, the algorithm uses different networks to calculate the target value, the selection and the evaluation of the action are decoupled, and the inherent overestimation problem of the DQN is solved. Solving the model using the deep reinforcement learning algorithm may include the following steps S31-S39:

S33, starting a circulation body and initializing a state S _t 。

In a possible implementation manner, the blockchain environment in the scene of the internet of things provides the state s of the decision time t for the current network _t ，s _t E S, i.e. the average transaction size, the computational resources of the nodes and the data transfer rate between the nodes. The current network outputs all possible actions and corresponding values and adopts epsilon-greedy strategy to select action a _t The formula (5) is as follows:

In a possible embodiment, action a is performed _t I.e. the number of slices, the block size and the block interval are selected. Action a _t After execution, the S-PoDL consensus algorithm is adopted to complete the sharingAnd the knowledge verification avoids the waste of resources by using a multi-stage collaborative optimization strategy.

In a possible implementation, the experience information stored in the experience playback pool B is: state s at decision time t _t Action a _t Rewards r _t New state s _t+1 . When the stored experience information reaches the maximum storage amount of the experience playback pool and new experience information arrives, the experience information that first enters the experience playback pool is ejected and deleted to record the new experience information.

Alternatively, the target value y is calculated in S37 _i The following formula (6) shows:

y _i ＝r _i +γQ(s _i+1 ，argmax _a Q(s _i+1 ，a _i ；ω)；ω ^- ) (6)

In a possible implementation, the parameter ω of the current network is updated in a gradient back-propagation manner, and the loss function is defined as the following formula (7):

L＝||y _i -Q(s _i ，a _i ；ω)|| ² (7)

The invention provides a block chain system performance optimization method based on deep reinforcement learning in an Internet of things scene, which adopts a DDQN algorithm to search the optimal expandability configuration of the block chain system in the Internet of things scene on the premise of considering the necessary performance indexes such as the safety and the time delay of the system, thereby realizing the requirements of high safety and high efficiency of the application of the Internet of things.

In the embodiment of the invention, a block chain system performance optimization method based on deep reinforcement learning in an Internet of things scene is provided. Specifically, the invention quantifies the performance of the blockchain system in the scene of the Internet of things from three aspects of expandability, safety and time delay, and obtains a more comprehensive optimization scheme. Then, the performance of the block chain system is improved by adopting a slicing mechanism and a parameter adjustment technology, and the high expandability requirement of the Internet of things system is met. In order to obtain optimal expandability configuration without sacrificing other necessary performance indexes, the invention adopts a DDQN algorithm to dynamically optimize the performance of the system, and the algorithm uses different networks to calculate target values, so that the selection and evaluation of actions are decoupled, and the inherent overestimation problem of DQN is solved.

As shown in fig. 6, an embodiment of the present invention provides a device 600 for optimizing performance of a blockchain under the internet of things based on deep reinforcement learning, where the device 600 is applied to implement a method for optimizing performance of a blockchain under the internet of things based on deep reinforcement learning, and the device 600 includes:

an initialization module 610 is configured to initialize a blockchain simulation system in an internet of things scenario.

A construction module 620, configured to construct a performance optimization model of the blockchain simulation system according to the blockchain simulation system; wherein the performance optimization model is built as a Markov decision process model.

And the output module 630 is configured to solve the performance optimization model by using a deep reinforcement learning algorithm, so as to obtain an optimal extensibility configuration of the blockchain simulation system in the scene of the internet of things.

Optionally, the initialization module 610 is further configured to:

Alternatively, the markov decision process is a five-tuple (S, a, P, R, γ).

Optionally, the output module 630 is further configured to:

S33, starting a circulation body and initializing a state S _t 。

Optionally, the output module 630 is further configured to:

and adopting the slice deep learning to prove that the S-PoDL consensus algorithm completes the consensus verification.

Optionally, the output module 630 is further configured to:

Optionally, a target value y is calculated _i The following formula (2) shows:

y _i ＝r _i +γQ(s _i+1 ，argmax _a Q(s _i+1 ，a _i ；ω)；ω ^- ) (2)

Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention, where the electronic device 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 701 and one or more memories 702, where at least one instruction is stored in the memories 702, and the at least one instruction is loaded and executed by the processors 701 to implement the following method for optimizing the performance of the blockchain under the internet of things based on deep reinforcement learning:

In an exemplary embodiment, a computer readable storage medium, such as a memory including instructions executable by a processor in a terminal to perform the above-described deep reinforcement learning based method of blockchain performance optimization under the internet of things, is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. The method for optimizing the performance of the block chain under the Internet of things based on deep reinforcement learning is characterized by comprising the following steps:

s1, constructing a block chain system based on a slice deep learning proof S-PoDL consensus algorithm in an Internet of things scene; wherein, the performance evaluation index system of the block chain system comprises: scalability, security, and latency;

s2, constructing a Markov performance optimization model according to the block chain system;

s3, solving the performance optimization model by adopting a double-depth Q network DDQN algorithm to obtain the optimal expandability configuration of the block chain system in the Internet of things scene;

the slice deep learning in S1 proves an S-PoDL consensus algorithm, which comprises the following steps:

s11, dividing N nodes in a block chain system into K fragments, wherein each fragment contains a full node to generate a block, and the number of the nodes in each fragment is N/K;

s12, distributing training tasks of the deep learning model to nodes of each patch;

s13, verifying the validity and accuracy of the block by the full node, and receiving the first verified block;

the step S12 of assigning the training task of the deep learning model to the node of each patch includes:

the model requester issues a plurality of deep learning models and training sets to nodes in different segments for obtaining an optimal deep learning model, wherein the model requester is set as honest; n nodes in the blockchain system start training the model under the condition that the test set is not obtained; after the node finishes training, generating a block head according to the rule of the bottom layer block chain system; the node submits the block header to the corresponding full node;

the full node in S13 verifies the validity and accuracy of the block and accepts the first verified block, including:

the model requester issues the test set to nodes in different fragments, and each node calculates the precision of the deep learning model; the nodes submit blocks and training models containing precision to the whole nodes; the full node verifies the validity of the block by comparing the hash values submitted in the two stages; sequencing the blocks with effectiveness according to the descending order of precision; the full node sequentially verifies the precision submitted by the blocks and receives the first verified block;

wherein the two phases comprise: distributing training tasks of the deep learning model to nodes of each patch and verifying validity and accuracy of the block by all nodes;

and in the step S2, constructing a Markov performance optimization model according to the blockchain system, wherein the method comprises the following steps of:

the Markov performance optimization model comprises: state space S (t), action space a (t), reward function R;

the state space S (t) includes an average transaction size, the computing resource c= { C of the node _i Data transmission rate d= { D between nodes } _i,j -the state space S (t), as shown in the following formula (1):

S(t)＝[X,C,D] _t (1)

the motion space A (t) comprises the number K of fragments and the block size S ^B Block interval T ^B The motion space a (t) is represented by the following formula (2):

A(t)＝[K,S ^B ,T ^B ] _t (2)

the reward function is represented by the following formula (3):

wherein r is _t (s _t ,a _t ) Represented in state s _t Lower selection action a _t Generated rewards S ^B Representing the block size, i.e., the number of bytes contained in each block, determines how many transactions, T, are contained in a block ^B Representing the block interval, i.e., the average time required by the block producer to generate a new block, representing the block release rate, and X representing the average transaction size;

the step S3 of solving the performance optimization model by adopting a double-depth Q network DDQN algorithm comprises the following steps:

s31, initializing a parameter omega of a current Q network, initializing a parameter omega' =omega of a target Q network, and emptying an experience playback pool B;

s32, initializing a search probability epsilon, a time interval C and a maximum round number T;

s33, setting an initial time slot t=0, initializing a state S _t ；

S34, putting the state S _t As input to the current Q network, and employing epsilon-greedy policy selection action a _t ；

S35, in the state S _t Executing the action a _t Obtaining a new state s _t+1 Sum prize r _t ；

S36, four-element group (S) _t ,a _t ,r _t ,s _t+1 ) Store to the experience playback pool B;

s37, randomly extracting q pieces of experience information from the experience playback pool B (S _i ,a _i ,r _i ,s _i+1 ) Learning and calculating the target Q value y _i And updating parameters omega, y of the current network by using a gradient back propagation mode _i The following formula (4):

y _i ＝r _i +γQ(s _i+1 ,argmax _a Q(s _i+1 ,a _i ；ω)；ω ^- ) (4)

s38, after each C iterations are completed, making the parameter omega' =omega of the target Q network;

s39, order S _t ＝s _t+1 T=t+1, go to step S34 until the maximum round number T is reached.

2. The utility model provides a thing networking lower block chain performance optimizing device based on degree of depth reinforcement study which characterized in that, the device includes:

the initialization module is used for constructing a blockchain system based on a slice deep learning proof S-PoDL consensus algorithm in the scene of the Internet of things; wherein, the performance evaluation index system of the block chain system comprises: scalability, security, and latency;

the building module is used for building a Markov performance optimization model according to the block chain system;

the output module is used for solving the performance optimization model by adopting a double-depth Q network DDQN algorithm to obtain the optimal expandability configuration of the block chain system in the scene of the Internet of things;

the slice deep learning proves an S-PoDL consensus algorithm, which comprises the following steps:

the establishing a Markov performance optimization model according to the blockchain system comprises the following steps:

S(t)＝[X,C,D] _t (1)

A(t)＝[K,S ^B ,T ^B ] _t (2)

the reward function is represented by the following formula (3):

the method for solving the performance optimization model by adopting the double-depth Q network DDQN algorithm comprises the following steps:

s33, setting an initial time slot t=0, initializing a state S _t ；

y _i ＝r _i +γQ(s _i+1 ,argmax _a Q(s _i+1 ,a _i ；ω)；ω ^- ) (4)