CN116702583B - Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning - Google Patents
Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning Download PDFInfo
- Publication number
- CN116702583B CN116702583B CN202310428183.6A CN202310428183A CN116702583B CN 116702583 B CN116702583 B CN 116702583B CN 202310428183 A CN202310428183 A CN 202310428183A CN 116702583 B CN116702583 B CN 116702583B
- Authority
- CN
- China
- Prior art keywords
- block
- nodes
- internet
- block chain
- things
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000002787 reinforcement Effects 0.000 title claims abstract description 42
- 238000005457 optimization Methods 0.000 claims abstract description 51
- 239000012634 fragment Substances 0.000 claims abstract description 17
- 230000005540 biological transmission Effects 0.000 claims abstract description 7
- 230000009471 action Effects 0.000 claims description 52
- 238000012549 training Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 18
- 238000013136 deep learning model Methods 0.000 claims description 13
- 238000013135 deep learning Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims 2
- 230000006855 networking Effects 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 31
- 230000008569 process Effects 0.000 abstract description 15
- 238000005516 engineering process Methods 0.000 abstract description 10
- 238000003860 storage Methods 0.000 description 10
- 238000013459 approach Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000035755 proliferation Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/08—Probabilistic or stochastic CAD
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a device for optimizing the performance of a block chain under the Internet of things based on deep reinforcement learning, and relates to the technical field of the Internet of things. Comprising the following steps: initializing a block chain simulation system in an Internet of things scene; constructing a performance optimization model of the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model; and solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things. According to the average transaction size, the computing resources of the nodes and the transmission rate of the nodes, the invention adopts a dual-depth Q network algorithm to dynamically adjust the number of fragments, the size of the blocks and the interval of the blocks, and obtains the optimal expandability configuration of the block chain system on the premise of not sacrificing other necessary performance indexes. The block chain technology is introduced in the field of the Internet of things, the performance optimization of a block chain system is realized based on a deep reinforcement learning algorithm, and the requirements of high safety and high efficiency of the Internet of things are met.
Description
Technical Field
The invention relates to the technical field of the Internet of things, in particular to a method and a device for optimizing the performance of a blockchain under the Internet of things based on deep reinforcement learning.
Background
The internet of things is a network which extends and expands on the basis of the traditional internet, and is a huge network formed by combining various information sensing devices with the internet. The Internet of things is widely applied to the fields of logistics transportation and industrial production in the traditional industry, intelligent home furnishing and intelligent medical treatment in the emerging industry and the like. The traditional internet of things is mostly composed of distributed equipment and centralized data processing nodes. However, in the context of rapid development of mobile communication technologies represented by 5G, the proliferation of the amount of computation tasks in the internet of things scenario causes data to face security risks, and generates higher network delay and operation costs.
The rise of the block chain technology provides an effective scheme for solving the problems of the traditional Internet of things. The first application of blockchains is bitcoin, which ensures the security and efficiency of data by enabling anonymous and trusted transactions to eliminate intervening. However, its impact is far beyond the field of cryptocurrency. Blockchains are essentially a distributed storage system consisting of a plurality of time stamped blocks in the form of P2P (Peer to Peer) accounts. Therefore, it has the characteristics of decentralization, anonymity, security, non-tamper property and the like. In the scene of the Internet of things, the data is safely and reliably stored through the block chain technology, so that the data safety and the processing efficiency of the whole system are ensured, and more efficient data storage, data exchange and data management can be realized. Efficient consensus algorithms are key to application of blockchain technology to the internet of things. The PoW (Proof of Work) consensus algorithm is the earliest and safest public chain consensus algorithm, has the characteristics of decentralization and high safety, and can meet the high-safety requirement of the Internet of things. However, the PoW consensus algorithm has a significant drawback, since each node in the network needs to calculate the hash value of the block header, which results in a lot of resource waste. To solve this problem, poDL (Proof of Deep Learning, deep learning proof) consensus algorithm is proposed, which replaces the conventional hash collision with a deep learning computing task, thereby effectively avoiding meaningless resource waste. However, how to provide the scalability necessary to meet the high transaction throughput requirements of the internet of things remains a challenge.
Currently, methods for improving scalability of blockchain systems can be divided into two modes, on-chain and off-chain. The first is a sharding technique that divides blockchain nodes into different shards, each of which can process transactions in parallel. Another on-chain approach is parameter adjustment, which improves system performance by adjusting parameters such as block size and block spacing. The method under the chain mainly adopts a multi-chain technology, and reduces the load and redundancy of the main chain by migrating trivial matters of the main chain to other sub-chains. However, the under-chain approach is supported by an incompletely distributed local off-line system, in which malicious nodes easily hook up with each other, linking the wrong blockchain to the system, thereby reducing the security level and performance of the blockchain.
There is a well-known triple paradox in blockchain systems, namely that the blockchain system can only have two at the same time in the three properties of decentralization, security and scalability. The bit coin system based on the PoW consensus algorithm preferentially pursues decentralization and safety, thereby sacrificing the characteristic of expandability. However, most blockchain platforms in the internet of things scenario only enhance the scalability of the system by sacrificing performance metrics such as security, latency, etc.
The blockchain application in the scene of the Internet of things has the characteristics of dynamic and high dimension, and the DRL (Deep Reinforcement Learning ) algorithm has natural advantages in solving the complex optimization decision problem of the application of the Internet of things. Deep reinforcement learning combines the perceptibility of deep learning with the decision-making capability of reinforcement learning, and directly controls the behavior of an agent by learning high-dimensional perceptual input. Currently, most of the common optimization strategies employ DQN (Deep Q Network) algorithms. However, the DQN algorithm has a problem of overestimation, i.e., the estimated value is larger than the true value. Therefore, on the premise of not sacrificing other performance indexes, the method for finding the optimal extensibility configuration of the blockchain system by using a proper DRL algorithm has important significance.
Disclosure of Invention
The invention provides the invention aiming at the problem of finding out the optimal expandability configuration of the block chain system by using a proper DRL algorithm on the premise of not sacrificing other performance indexes.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, the invention provides a method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning, which is realized by electronic equipment, and comprises the following steps:
s1, initializing a block chain simulation system in an Internet of things scene.
S2, constructing a performance optimization model of the block chain simulation system according to the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model.
And S3, solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.
Optionally, the initializing the blockchain simulation system in the internet of things scene in S1 includes:
setting the total number N of nodes, the number F of malicious nodes and the average transaction size X of a blockchain simulation system in the scene of the Internet of things.
The N nodes all have computing resources, and a data path exists among all the N nodes.
The N nodes are divided into K slices, each of the K slices containing one full node to generate a block.
Optionally, the markov decision process in S2 is a five-tuple (S, a, P, R, γ).
Wherein S is a set of states, and the state at decision time t is S t =[X,C,D] t X represents the average transaction size, c= { C i The computing resource of node i, d= { D }, is represented i,j And represents the data transmission rate between node i and node j.
A is a set of actions, and the action at decision time t is a t =[K,S B ,T B ] t K represents the number of fragments, S B Representing block size, T B Representing the block interval.
P is a state transition matrix, R is a reward function, gamma is an attenuation coefficient, and gamma is [0,1].
Optionally, the objective function of the performance optimization model of the blockchain simulation system in S2 is as shown in the following formula (1):
wherein E is a desired function, gamma t For the attenuation coefficient at decision time t, r t To be in state s t Lower selection action a t Generated rewards s t For deciding the state at time t, a t Is the action at decision time t.
Optionally, solving the performance optimization model in S3 by using a deep reinforcement learning algorithm includes:
s31, initializing an experience playback pool B, a current network and a target network.
S32, initializing parameters of a deep reinforcement learning algorithm; wherein the parameters include the exploration probability epsilon and the maximum round number T.
S33, starting a circulation body and initializing a state S t 。
S34, state S t As input to the current network, and employ an e-greedy policy to select action a t 。
S35, in state S t Lower execution action a t Obtaining a new state s t+1 Sum prize r t 。
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to experience playback pool B.
S37, randomly extracting experience information (S) of a certain batch from the experience playback pool B i ,a i ,r i ,s i+1 ) Learning and calculating target value y i And updates the parameter ω of the current network using a gradient back-propagation approach.
S38, setting a fixed time interval C, and copying parameters of the current network after each time of C iterations is completed ω To the target network to update the parameter omega of the target network - 。
S39, repeatedly executing the steps S33 to S38 until the maximum round number T is reached, and ending the cycle body.
Alternatively, the network structures of the current network and the target network in S31 are the same.
The parameters of the current network and the target network are omega and omega respectively - 。
Optionally, at state S in S35 t Lower execution action a t Obtaining a new state s t+1 Sum prize r t Further comprising:
and adopting an S-PoDL (Separate Proof of Deep Learning, fragment deep learning proving) consensus algorithm to complete consensus verification.
Optionally, the quadruple (S in S36 t ,a t ,r t ,s t+1 ) Store to experience playback pool B, further comprising:
when the experience information stored in the experience playback pool B reaches the maximum storage amount and new experience information arrives, the experience information which is first entered into the experience playback pool is popped up and deleted to record the new experience information.
Alternatively, the target value y is calculated in S37 i The following formula (2) shows:
y i =r i +γQ(s i+1 ,argmgx a Q(s i+1 ,a i ;ω);ω - ) (2)
wherein r is i To be in state s i Lower selection action a i The generated reward, gamma is the attenuation coefficient, s i+1 For deciding the state at time i+1, a i Is the action at decision time i.
On the other hand, the invention provides a device for optimizing the performance of the block chain under the Internet of things based on deep reinforcement learning, which is applied to realizing a method for optimizing the performance of the block chain under the Internet of things based on the deep reinforcement learning, and comprises the following steps:
and the initialization module is used for initializing the blockchain simulation system in the scene of the Internet of things.
The building module is used for building a performance optimization model of the block chain simulation system according to the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model.
And the output module is used for solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.
Optionally, the initialization module is further configured to:
setting total number N of nodes, number F of malicious nodes and average transaction size X of block chain simulation system in Internet of things scene 。
The N nodes all have computing resources, and a data path exists among all the N nodes.
The N nodes are divided into K slices, each of the K slices containing one full node to generate a block.
Alternatively, the markov decision process is a five-tuple (S, a, P, R, γ).
Wherein S is a set of states, and the state at decision time t is S t =[X,C,D] t X represents the average transaction size, c= { C i The computing resource of node i, d= { D }, is represented i,j And represents the data transmission rate between node i and node j.
A is a set of actions, and the action at decision time t is a t =[K,S B ,T B ] t K represents the number of fragments, S B Representing block size, T B Representing the block interval.
P is a state transition matrix, R is a reward function, gamma is an attenuation coefficient, and gamma is [0,1].
Optionally, an objective function of a performance optimization model of the blockchain simulation system is shown in the following formula (1):
wherein E is a desired function, gamma t For the attenuation coefficient at decision time t, r t To be in state s t Lower selection action a t Generated rewards s t For deciding the state at time t, a t Is the action at decision time t.
Optionally, the output module is further configured to:
s31, initializing an experience playback pool B, a current network and a target network.
S32, initializing parameters of a deep reinforcement learning algorithm; wherein the parameters include the exploration probability epsilon and the maximum round number T.
S33, starting a circulation body and initializing a state S t 。
S34, state S t As input to the current network and using epsilon-greedy policy to select action a t 。
S35, in state S t Lower execution action a t Obtaining a new state s t+1 Sum prize r t 。
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to experience playback pool B.
S37, randomly extracting experience information (S) of a certain batch from the experience playback pool B i ,a i ,r i ,s i+1 ) Learning and calculating target value y i And updates the parameter ω of the current network using a gradient back-propagation approach.
S38, setting a fixed time interval C, and copying the parameter omega of the current network to the target network after each time of C iterations is completed so as to update the parameter omega of the target network - 。
S39, repeatedly executing the steps S33 to S38 until the maximum round number T is reached, and ending the cycle body.
Optionally, the network structures of the current network and the target network are the same.
The parameters of the current network and the target network are omega and omega respectively - 。
Optionally, the output module is further configured to:
and adopting an S-PoDL consensus algorithm to complete the consensus verification.
Optionally, the output module is further configured to:
when the experience information stored in the experience playback pool B reaches the maximum storage amount and new experience information arrives, the experience information which is first entered into the experience playback pool is popped up and deleted to record the new experience information.
Optionally, a target value y is calculated i The following formula (2) shows:
y i =r i +γQ(s i+1 ,argmax a Q(s i+1 ,a i ;ω);ω - ) (2)
wherein r is i To be in state s i Lower selection action a i The generated reward, gamma is the attenuation coefficient, s i+1 For deciding the state at time i+1, a i Is the action at decision time i.
In one aspect, an electronic device is provided, which includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the above-mentioned deep reinforcement learning-based method for optimizing the performance of a blockchain under the internet of things.
In one aspect, a computer readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the above-mentioned method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning.
Compared with the prior art, the technical scheme has at least the following beneficial effects:
according to the scheme, the block chain system performance optimization method based on deep reinforcement learning in the Internet of things scene is provided. Specifically, the invention quantifies the performance of the blockchain system in the scene of the Internet of things from three aspects of expandability, safety and time delay, and obtains a more comprehensive optimization scheme. Then, the performance of the block chain system is improved by adopting a slicing mechanism and a parameter adjustment technology, and the high expandability requirement of the Internet of things system is met. In order to obtain optimal expandability configuration without sacrificing other necessary performance indexes, the invention adopts a DDQN (Double Deep Q Network, double-depth Q network) algorithm to dynamically optimize the performance of the system, and the algorithm uses different networks to calculate target values, decouples the selection and evaluation of actions, and solves the inherent overestimation problem of the DQN.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a block chain performance optimization method under the Internet of things based on deep reinforcement learning provided by the embodiment of the invention;
fig. 2 is a schematic diagram of a block chain system in an internet of things scenario according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an architecture of an S-PoDL consensus algorithm according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a deep reinforcement learning algorithm according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a performance optimization method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a block chain performance optimization device based on deep reinforcement learning under the Internet of things, provided by an embodiment of the invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention.
As shown in fig. 1, the embodiment of the invention provides a method for optimizing the performance of a blockchain under the internet of things based on deep reinforcement learning. The block chain performance optimization method flow chart under the internet of things based on the deep reinforcement learning as shown in fig. 1, the processing flow of the method can comprise the following steps:
fig. 2 shows that the blockchain system in the internet of things scenario of the present invention includes an internet of things network and a blockchain system. In the internet of things, intelligent devices refer to sensors, monitoring devices, personal terminals and the like, and are responsible for data collection and connection with other devices for data sharing. The hierarchical structure of the internet of things can be divided into three layers from bottom to top: a perception layer, a network layer and an application layer. The sensing layer is mainly responsible for data collection, and the network layer uses a wireless network or a wired network to store and share data information from the sensing layer. As a structural model, the application layer processes the obtained data through the cloud computing platform, which provides the user with a data-based application. Therefore, there are multiple transactions in the internet of things network, such as data storage, processing, sharing, and so on.
With the proliferation of data of the internet of things, blockchain systems that safely and reliably process transactions are widely used. In the block chain system, a large number of transactions are processed in parallel by adopting a slicing technology, so that the processing efficiency of the data of the Internet of things is improved. When a transaction generated by the internet of things is securely transmitted to the blockchain system and stored in the distributed ledger, the blockchain will complete the transaction request by the following steps. First, all nodes in a blockchain system are partitioned into different slices, each containing one full node to generate a block. Second, after each slice completes the consensus verification using the S-PoDL consensus algorithm, the new block is linked to the blockchain network.
As shown in fig. 3, the S-PoDL consensus algorithm of the present invention is split into two phases.
Stage 1: first, the model requestor publishes multiple deep learning models and training sets to nodes in different shards. The purpose of the model requester is to obtain an optimal training model, so the present invention assumes that the model requester is honest. Next, all nodes start training the model without obtaining the test set, which can effectively prevent overfitting. After the node completes training, the block head is generated according to the rule of the bottom layer block chain system. Finally, the node submits the block header to the corresponding full node.
Stage 2: first, the model requestor publishes the test set to nodes in different shards, each node calculates the accuracy of the training model and generates a block. Next, the node submits a block and training model containing precision to the full node. The full node then verifies the validity of the block by comparing the hash values submitted in the two phases. In this process, the full node will ignore the model that did not receive the block header in phase 1. Then, the eligible blocks are sorted in descending order of precision. And finally, sequentially verifying the precision submitted by the blocks by the full nodes, and receiving the first verified block.
Referring to fig. 4 and 5, the method for optimizing the performance of the blockchain under the internet of things based on the deep reinforcement learning provided by the invention specifically comprises the following steps:
s1, initializing a block chain simulation system in an Internet of things scene.
Optionally, the step S1 may include:
setting the total number N of nodes, the number F of malicious nodes and the average transaction size X of a blockchain simulation system in the scene of the Internet of things.
The N nodes all have computing resources, and a data path exists among all the N nodes.
The N nodes are divided into K slices, each of the K slices containing one full node to generate a block.
S2, constructing a performance optimization model of the block chain simulation system according to the block chain simulation system.
Wherein the performance optimization model is built as a Markov decision process model.
Optionally, the markov decision process in S2 is a five-tuple (S, a, P, R, γ).
Wherein S is a set of states, and the state at decision time t is S t =[X,C,D] t X represents the average transaction size, c= { C i The computing resource of node i, d= { D }, is represented i,j And represents the data transmission rate between node i and node j.
A is a set of actions, and the action at decision time t is a t =[K,S B ,T B ] t K represents the number of fragments, S B Representing the block size (number of bytes contained in each block), T B Representing the block interval (the average time required to generate a new block).
P is a state transition matrix, satisfying: p (t) =pr [ D ] i,j (t+1)=D d |D i,j (t)=D c ],D c ,D d ∈D。
R is a reward function, when the decision time is t, R t Represented in state s t Lower selection motionAct as a t The generated rewards. At decision time t, the rewards for fragments that meet the constraint are defined as the following equation (1):
gamma is the attenuation coefficient and gamma is 0, 1.
Further, the action decisions in the Markov decision process have Markov properties, i.e. the next state s t+1 Dependent only on the current state s t Irrespective of the history state, it satisfies the formula (2):
P(s t+1 |s t )=P(s t+1 |s t ,s t-1 ,...,s 1 ,s 0 ) (2)
optionally, the optimization objective of the model is to maximize the scalability of the blockchain system without sacrificing security and latency, and the objective function is set to the following formula (3):
wherein Q (s, a) is an action cost function, and can be expressed as the following formula (4):
wherein E is a desired function, gamma t For the attenuation coefficient at decision time t, r t To be in state s t Lower selection action a t Generated rewards s t For deciding the state at time t, a t Is the action at decision time t.
And S3, solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.
In one possible implementation, a deep reinforcement learning algorithm is used to solve the model according to the optimization objective. Currently, most of the common optimization strategies employ DQN algorithms. However, the DQN algorithm has a problem of overestimation, i.e., the estimated value is larger than the true value. Therefore, the invention adopts the DDQN algorithm to dynamically optimize the performance of the system, the algorithm uses different networks to calculate the target value, the selection and the evaluation of the action are decoupled, and the inherent overestimation problem of the DQN is solved. Solving the model using the deep reinforcement learning algorithm may include the following steps S31-S39:
s31, initializing an experience playback pool B, a current network and a target network.
Alternatively, the network structures of the current network and the target network in S31 are the same.
The parameters of the current network and the target network are omega and omega respectively - 。
S32, initializing parameters of a deep reinforcement learning algorithm; wherein the parameters include the exploration probability epsilon and the maximum round number T.
S33, starting a circulation body and initializing a state S t 。
S34, state S t As input to the current network and using epsilon-greedy policy to select action a t 。
In a possible implementation manner, the blockchain environment in the scene of the internet of things provides the state s of the decision time t for the current network t ,s t E S, i.e. the average transaction size, the computational resources of the nodes and the data transfer rate between the nodes. The current network outputs all possible actions and corresponding values and adopts epsilon-greedy strategy to select action a t The formula (5) is as follows:
s35, in state S t Lower execution action a t Obtaining a new state s t+1 Sum prize r t 。
In a possible embodiment, action a is performed t I.e. the number of slices, the block size and the block interval are selected. Action a t After execution, the S-PoDL consensus algorithm is adopted to complete the sharingAnd the knowledge verification avoids the waste of resources by using a multi-stage collaborative optimization strategy.
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to experience playback pool B.
In a possible implementation, the experience information stored in the experience playback pool B is: state s at decision time t t Action a t Rewards r t New state s t+1 . When the stored experience information reaches the maximum storage amount of the experience playback pool and new experience information arrives, the experience information that first enters the experience playback pool is ejected and deleted to record the new experience information.
S37, randomly extracting experience information (S) of a certain batch from the experience playback pool B i ,a i ,r i ,s i+1 ) Learning and calculating target value y i And updates the parameter ω of the current network using a gradient back-propagation approach.
Alternatively, the target value y is calculated in S37 i The following formula (6) shows:
y i =r i +γQ(s i+1 ,argmax a Q(s i+1 ,a i ;ω);ω - ) (6)
wherein r is i To be in state s i Lower selection action a i The generated reward, gamma is the attenuation coefficient, s i+1 For deciding the state at time i+1, a i Is the action at decision time i.
In a possible implementation, the parameter ω of the current network is updated in a gradient back-propagation manner, and the loss function is defined as the following formula (7):
L=||y i -Q(s i ,a i ;ω)|| 2 (7)
s38, setting a fixed time interval C, and copying parameters of the current network after each time of C iterations is completed ω To the target network to update the parameter omega of the target network - 。
S39, repeatedly executing the steps S33 to S38 until the maximum round number T is reached, and ending the cycle body.
The invention provides a block chain system performance optimization method based on deep reinforcement learning in an Internet of things scene, which adopts a DDQN algorithm to search the optimal expandability configuration of the block chain system in the Internet of things scene on the premise of considering the necessary performance indexes such as the safety and the time delay of the system, thereby realizing the requirements of high safety and high efficiency of the application of the Internet of things.
In the embodiment of the invention, a block chain system performance optimization method based on deep reinforcement learning in an Internet of things scene is provided. Specifically, the invention quantifies the performance of the blockchain system in the scene of the Internet of things from three aspects of expandability, safety and time delay, and obtains a more comprehensive optimization scheme. Then, the performance of the block chain system is improved by adopting a slicing mechanism and a parameter adjustment technology, and the high expandability requirement of the Internet of things system is met. In order to obtain optimal expandability configuration without sacrificing other necessary performance indexes, the invention adopts a DDQN algorithm to dynamically optimize the performance of the system, and the algorithm uses different networks to calculate target values, so that the selection and evaluation of actions are decoupled, and the inherent overestimation problem of DQN is solved.
As shown in fig. 6, an embodiment of the present invention provides a device 600 for optimizing performance of a blockchain under the internet of things based on deep reinforcement learning, where the device 600 is applied to implement a method for optimizing performance of a blockchain under the internet of things based on deep reinforcement learning, and the device 600 includes:
an initialization module 610 is configured to initialize a blockchain simulation system in an internet of things scenario.
A construction module 620, configured to construct a performance optimization model of the blockchain simulation system according to the blockchain simulation system; wherein the performance optimization model is built as a Markov decision process model.
And the output module 630 is configured to solve the performance optimization model by using a deep reinforcement learning algorithm, so as to obtain an optimal extensibility configuration of the blockchain simulation system in the scene of the internet of things.
Optionally, the initialization module 610 is further configured to:
setting the total number N of nodes, the number F of malicious nodes and the average transaction size X of a blockchain simulation system in the scene of the Internet of things.
The N nodes all have computing resources, and a data path exists among all the N nodes.
The N nodes are divided into K slices, each of the K slices containing one full node to generate a block.
Alternatively, the markov decision process is a five-tuple (S, a, P, R, γ).
Wherein S is a set of states, and the state at decision time t is S t =[X,C,D] t X represents the average transaction size, c= { C i The computing resource of node i, d= { D }, is represented i,j And represents the data transmission rate between node i and node j.
A is a set of actions, and the action at decision time t is a t =[K,S B ,T B ] t K represents the number of fragments, S B Representing block size, T B Representing the block interval.
P is a state transition matrix, R is a reward function, gamma is an attenuation coefficient, and gamma is [0,1].
Optionally, an objective function of a performance optimization model of the blockchain simulation system is shown in the following formula (1):
wherein E is a desired function, gamma t For the attenuation coefficient at decision time t, r t To be in state s t Lower selection action a t Generated rewards s t For deciding the state at time t, a t Is the action at decision time t.
Optionally, the output module 630 is further configured to:
s31, initializing an experience playback pool B, a current network and a target network.
S32, initializing parameters of a deep reinforcement learning algorithm; wherein the parameters include the exploration probability epsilon and the maximum round number T.
S33, starting a circulation body and initializing a state S t 。
S34, state S t As input to the current network and using epsilon-greedy policy to select action a t 。
S35, in state S t Lower execution action a t Obtaining a new state s t+1 Sum prize r t 。
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to experience playback pool B.
S37, randomly extracting experience information (S) of a certain batch from the experience playback pool B i ,a i ,r i ,s i+1 ) Learning and calculating target value y i And updates the parameter ω of the current network using a gradient back-propagation approach.
S38, setting a fixed time interval C, and copying the parameter omega of the current network to the target network after each time of C iterations is completed so as to update the parameter omega of the target network - 。
S39, repeatedly executing the steps S33 to S38 until the maximum round number T is reached, and ending the cycle body.
Optionally, the network structures of the current network and the target network are the same.
The parameters of the current network and the target network are omega and omega respectively - 。
Optionally, the output module 630 is further configured to:
and adopting the slice deep learning to prove that the S-PoDL consensus algorithm completes the consensus verification.
Optionally, the output module 630 is further configured to:
when the experience information stored in the experience playback pool B reaches the maximum storage amount and new experience information arrives, the experience information which is first entered into the experience playback pool is popped up and deleted to record the new experience information.
Optionally, a target value y is calculated i The following formula (2) shows:
y i =r i +γQ(s i+1 ,argmax a Q(s i+1 ,a i ;ω);ω - ) (2)
wherein r is i To be in state s i Lower selection action a i The generated reward, gamma is the attenuation coefficient, s i+1 For deciding the state at time i+1, a i Is the action at decision time i.
In the embodiment of the invention, a block chain system performance optimization method based on deep reinforcement learning in an Internet of things scene is provided. Specifically, the invention quantifies the performance of the blockchain system in the scene of the Internet of things from three aspects of expandability, safety and time delay, and obtains a more comprehensive optimization scheme. Then, the performance of the block chain system is improved by adopting a slicing mechanism and a parameter adjustment technology, and the high expandability requirement of the Internet of things system is met. In order to obtain optimal expandability configuration without sacrificing other necessary performance indexes, the invention adopts a DDQN algorithm to dynamically optimize the performance of the system, and the algorithm uses different networks to calculate target values, so that the selection and evaluation of actions are decoupled, and the inherent overestimation problem of DQN is solved.
Fig. 7 is a schematic structural diagram of an electronic device 700 according to an embodiment of the present invention, where the electronic device 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 701 and one or more memories 702, where at least one instruction is stored in the memories 702, and the at least one instruction is loaded and executed by the processors 701 to implement the following method for optimizing the performance of the blockchain under the internet of things based on deep reinforcement learning:
s1, initializing a block chain simulation system in an Internet of things scene.
S2, constructing a performance optimization model of the block chain simulation system according to the block chain simulation system; wherein the performance optimization model is built as a Markov decision process model.
And S3, solving the performance optimization model by adopting a deep reinforcement learning algorithm to obtain the optimal expandability configuration of the blockchain simulation system in the scene of the Internet of things.
In an exemplary embodiment, a computer readable storage medium, such as a memory including instructions executable by a processor in a terminal to perform the above-described deep reinforcement learning based method of blockchain performance optimization under the internet of things, is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (2)
1. The method for optimizing the performance of the block chain under the Internet of things based on deep reinforcement learning is characterized by comprising the following steps:
s1, constructing a block chain system based on a slice deep learning proof S-PoDL consensus algorithm in an Internet of things scene; wherein, the performance evaluation index system of the block chain system comprises: scalability, security, and latency;
s2, constructing a Markov performance optimization model according to the block chain system;
s3, solving the performance optimization model by adopting a double-depth Q network DDQN algorithm to obtain the optimal expandability configuration of the block chain system in the Internet of things scene;
the slice deep learning in S1 proves an S-PoDL consensus algorithm, which comprises the following steps:
s11, dividing N nodes in a block chain system into K fragments, wherein each fragment contains a full node to generate a block, and the number of the nodes in each fragment is N/K;
s12, distributing training tasks of the deep learning model to nodes of each patch;
s13, verifying the validity and accuracy of the block by the full node, and receiving the first verified block;
the step S12 of assigning the training task of the deep learning model to the node of each patch includes:
the model requester issues a plurality of deep learning models and training sets to nodes in different segments for obtaining an optimal deep learning model, wherein the model requester is set as honest; n nodes in the blockchain system start training the model under the condition that the test set is not obtained; after the node finishes training, generating a block head according to the rule of the bottom layer block chain system; the node submits the block header to the corresponding full node;
the full node in S13 verifies the validity and accuracy of the block and accepts the first verified block, including:
the model requester issues the test set to nodes in different fragments, and each node calculates the precision of the deep learning model; the nodes submit blocks and training models containing precision to the whole nodes; the full node verifies the validity of the block by comparing the hash values submitted in the two stages; sequencing the blocks with effectiveness according to the descending order of precision; the full node sequentially verifies the precision submitted by the blocks and receives the first verified block;
wherein the two phases comprise: distributing training tasks of the deep learning model to nodes of each patch and verifying validity and accuracy of the block by all nodes;
and in the step S2, constructing a Markov performance optimization model according to the blockchain system, wherein the method comprises the following steps of:
the Markov performance optimization model comprises: state space S (t), action space a (t), reward function R;
the state space S (t) includes an average transaction size, the computing resource c= { C of the node i Data transmission rate d= { D between nodes } i,j -the state space S (t), as shown in the following formula (1):
S(t)=[X,C,D] t (1)
the motion space A (t) comprises the number K of fragments and the block size S B Block interval T B The motion space a (t) is represented by the following formula (2):
A(t)=[K,S B ,T B ] t (2)
the reward function is represented by the following formula (3):
wherein r is t (s t ,a t ) Represented in state s t Lower selection action a t Generated rewards S B Representing the block size, i.e., the number of bytes contained in each block, determines how many transactions, T, are contained in a block B Representing the block interval, i.e., the average time required by the block producer to generate a new block, representing the block release rate, and X representing the average transaction size;
the step S3 of solving the performance optimization model by adopting a double-depth Q network DDQN algorithm comprises the following steps:
s31, initializing a parameter omega of a current Q network, initializing a parameter omega' =omega of a target Q network, and emptying an experience playback pool B;
s32, initializing a search probability epsilon, a time interval C and a maximum round number T;
s33, setting an initial time slot t=0, initializing a state S t ;
S34, putting the state S t As input to the current Q network, and employing epsilon-greedy policy selection action a t ;
S35, in the state S t Executing the action a t Obtaining a new state s t+1 Sum prize r t ;
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to the experience playback pool B;
s37, randomly extracting q pieces of experience information from the experience playback pool B (S i ,a i ,r i ,s i+1 ) Learning and calculating the target Q value y i And updating parameters omega, y of the current network by using a gradient back propagation mode i The following formula (4):
y i =r i +γQ(s i+1 ,argmax a Q(s i+1 ,a i ;ω);ω - ) (4)
s38, after each C iterations are completed, making the parameter omega' =omega of the target Q network;
s39, order S t =s t+1 T=t+1, go to step S34 until the maximum round number T is reached.
2. The utility model provides a thing networking lower block chain performance optimizing device based on degree of depth reinforcement study which characterized in that, the device includes:
the initialization module is used for constructing a blockchain system based on a slice deep learning proof S-PoDL consensus algorithm in the scene of the Internet of things; wherein, the performance evaluation index system of the block chain system comprises: scalability, security, and latency;
the building module is used for building a Markov performance optimization model according to the block chain system;
the output module is used for solving the performance optimization model by adopting a double-depth Q network DDQN algorithm to obtain the optimal expandability configuration of the block chain system in the scene of the Internet of things;
the slice deep learning proves an S-PoDL consensus algorithm, which comprises the following steps:
s11, dividing N nodes in a block chain system into K fragments, wherein each fragment contains a full node to generate a block, and the number of the nodes in each fragment is N/K;
s12, distributing training tasks of the deep learning model to nodes of each patch;
s13, verifying the validity and accuracy of the block by the full node, and receiving the first verified block;
the step S12 of assigning the training task of the deep learning model to the node of each patch includes:
the model requester issues a plurality of deep learning models and training sets to nodes in different segments for obtaining an optimal deep learning model, wherein the model requester is set as honest; n nodes in the blockchain system start training the model under the condition that the test set is not obtained; after the node finishes training, generating a block head according to the rule of the bottom layer block chain system; the node submits the block header to the corresponding full node;
the full node in S13 verifies the validity and accuracy of the block and accepts the first verified block, including:
the model requester issues the test set to nodes in different fragments, and each node calculates the precision of the deep learning model; the nodes submit blocks and training models containing precision to the whole nodes; the full node verifies the validity of the block by comparing the hash values submitted in the two stages; sequencing the blocks with effectiveness according to the descending order of precision; the full node sequentially verifies the precision submitted by the blocks and receives the first verified block;
wherein the two phases comprise: distributing training tasks of the deep learning model to nodes of each patch and verifying validity and accuracy of the block by all nodes;
the establishing a Markov performance optimization model according to the blockchain system comprises the following steps:
the Markov performance optimization model comprises: state space S (t), action space a (t), reward function R;
the state space S (t) includes an average transaction size, the computing resource c= { C of the node i Data transmission rate d= { D between nodes } i,j -the state space S (t), as shown in the following formula (1):
S(t)=[X,C,D] t (1)
the motion space A (t) comprises the number K of fragments and the block size S B Block interval T B The motion space a (t) is represented by the following formula (2):
A(t)=[K,S B ,T B ] t (2)
the reward function is represented by the following formula (3):
wherein r is t (s t ,a t ) Represented in state s t Lower selection action a t Generated rewards S B Representing the block size, i.e., the number of bytes contained in each block, determines how many transactions, T, are contained in a block B Representing the block interval, i.e., the average time required by the block producer to generate a new block, representing the block release rate, and X representing the average transaction size;
the method for solving the performance optimization model by adopting the double-depth Q network DDQN algorithm comprises the following steps:
s31, initializing a parameter omega of a current Q network, initializing a parameter omega' =omega of a target Q network, and emptying an experience playback pool B;
s32, initializing a search probability epsilon, a time interval C and a maximum round number T;
s33, setting an initial time slot t=0, initializing a state S t ;
S34, putting the state S t As input to the current Q network, and employing epsilon-greedy policy selection action a t ;
S35, in the state S t Executing the action a t Obtaining a new state s t+1 Sum prize r t ;
S36, four-element group (S) t ,a t ,r t ,s t+1 ) Store to the experience playback pool B;
s37, randomly extracting q pieces of experience information from the experience playback pool B (S i ,a i ,r i ,S i+1 ) Learning and calculating the target Q value y i And updating parameters omega, y of the current network by using a gradient back propagation mode i The following formula (4):
y i =r i +γQ(s i+1 ,argmax a Q(s i+1 ,a i ;ω);ω - ) (4)
s38, after each C iterations are completed, making the parameter omega' =omega of the target Q network;
s39, order S t =s t+1 T=t+1, go to step S34 until the maximum round number T is reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310428183.6A CN116702583B (en) | 2023-04-20 | 2023-04-20 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310428183.6A CN116702583B (en) | 2023-04-20 | 2023-04-20 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116702583A CN116702583A (en) | 2023-09-05 |
CN116702583B true CN116702583B (en) | 2024-03-19 |
Family
ID=87830071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310428183.6A Active CN116702583B (en) | 2023-04-20 | 2023-04-20 | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116702583B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115102867A (en) * | 2022-05-10 | 2022-09-23 | 内蒙古工业大学 | Block chain fragmentation system performance optimization method combined with deep reinforcement learning |
CN115935442A (en) * | 2022-12-09 | 2023-04-07 | 湖南天河国云科技有限公司 | Block chain performance optimization method based on multi-agent deep reinforcement learning |
-
2023
- 2023-04-20 CN CN202310428183.6A patent/CN116702583B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115102867A (en) * | 2022-05-10 | 2022-09-23 | 内蒙古工业大学 | Block chain fragmentation system performance optimization method combined with deep reinforcement learning |
CN115935442A (en) * | 2022-12-09 | 2023-04-07 | 湖南天河国云科技有限公司 | Block chain performance optimization method based on multi-agent deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN116702583A (en) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163600B (en) | Block chain system and method using the same | |
CN111931242A (en) | Data sharing method, computer equipment applying same and readable storage medium | |
CN108694077B (en) | Distributed system task scheduling method based on improved binary system bat algorithm | |
CN115935442A (en) | Block chain performance optimization method based on multi-agent deep reinforcement learning | |
CN113485826B (en) | Load balancing method and system for edge server | |
CN114626547A (en) | Group collaborative learning method based on block chain | |
CN111800274B (en) | Verifiable calculation energy consumption optimization method based on block chain | |
CN112312299A (en) | Service unloading method, device and system | |
CN115102867B (en) | Block chain slicing system performance optimization method combining deep reinforcement learning | |
CN114330754A (en) | Strategy model training method, device and equipment | |
CN115481441A (en) | Difference privacy protection method and device for federal learning | |
CN114760308A (en) | Edge calculation unloading method and device | |
TWI763120B (en) | Computer-implemented method of an execution device, system for performing a software-implementated application and apparatus for generating an action selection policy for a software-implementated application | |
TWI770671B (en) | Method for generating action selection policies, system and device for generating action selection policies for software-implemented application | |
CN116702583B (en) | Method and device for optimizing performance of block chain under Internet of things based on deep reinforcement learning | |
CN111340623A (en) | Data storage method and device | |
Qu et al. | Blockchained Dual-Asynchronous Federated Learning Services for Digital Twin Empowered Edge-Cloud Continuum | |
CN114997400A (en) | Neural network acceleration reasoning method | |
Zhuohao et al. | A Blockchain-Based Auditable Semi-Asynchronous Federated Learning for Heterogeneous Clients | |
CN118586474B (en) | Self-adaptive asynchronous federal learning method and system based on deep reinforcement learning | |
Wei et al. | Connecting AI Learning and Blockchain Mining in 6G Systems | |
Cam et al. | FlwrBC: Incentive mechanism design for federated learning by using blockchain | |
CN108415783A (en) | A kind of heterogeneous polynuclear method for allocating tasks based on improvement ant colony algorithm | |
CN116506444B (en) | Block chain stable slicing method based on deep reinforcement learning and reputation mechanism | |
CN115987998B (en) | Micro-service system leader election method, system, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |