CN112579194B

CN112579194B - Block chain consensus task unloading method and device based on time delay and transaction throughput

Info

Publication number: CN112579194B
Application number: CN202011359457.3A
Authority: CN
Inventors: 安致嫄; 徐思雅; 舒新建; 吴利杰; 郭少勇; 刘岩; 王雷; 秦晓阳; 廖博娴; 王春迎; 王得全
Original assignee: State Grid Corp of China SGCC; Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Beijing University of Posts and Telecommunications; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2023-04-07
Anticipated expiration: 2040-11-27
Also published as: CN112579194A

Abstract

The embodiment of the invention provides a block chain consensus task unloading method and device based on time delay and transaction throughput, wherein the method comprises the following steps: based on a preset Markov Decision Process (MDP), obtaining the current channel condition at the current moment, the current available computing resources of the MEC server and the current trust values of each second mobile device as the current Markov state; the MDP comprises the following components: the method comprises the following steps of presetting a state space, a preset reward function and a preset action space; and inputting the current Markov state into a preset asynchronous dominant actor critic algorithm A3C model, so that the A3C model calculates rewards based on a reward function, and obtains and outputs target actions corresponding to the current Markov state based on the rewards. The embodiment of the invention can improve the transaction throughput on the basis of reducing the processing time delay of the block chain consensus task in the block chain system.

Description

Block chain consensus task unloading method and device based on time delay and transaction throughput

Technical Field

The invention relates to the technical field of block chains, in particular to a block chain consensus task unloading method and device based on time delay and transaction throughput.

Background

In a blockchain system, it is usually necessary to write data into a new block and verify the authenticity of the block after obtaining the new block, in the process, a blockchain consensus task is generated, which is usually a computationally intensive computation task. The blockchain system is generally distributed in the mobile device, and because the computing power of the mobile device is limited, the time for processing the blockchain consensus task is long, and therefore, after a blockchain consensus task is generated, the blockchain consensus task can be unloaded to a device with a large computing power, so as to improve the processing speed for processing the blockchain consensus task.

In the prior art, two unloading rules are usually selected, and the first unloading rule is: the block chain consensus task is offloaded to an MEC (Mobile Edge Computing) server, and another offloading rule is as follows: offloading the blockchain consensus task to a plurality of other mobile devices.

For how to select the offload rule, various methods are proposed in the prior art, but the convergence speed of the algorithm for selecting the offload rule in the methods proposed in the prior art is slow, and the processing performance is not considered when selecting the offload rule, which results in a long time delay for processing the blockchain consensus task in the blockchain system and a low transaction throughput.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for unloading a block chain consensus task based on time delay and transaction throughput, so as to improve the transaction throughput on the basis of reducing the processing time delay of the block chain consensus task in a block chain system.

The specific technical scheme is as follows:

in a first aspect of the embodiments of the present invention, a method for offloading a blockchain consensus task based on latency and transaction throughput is provided, where the method is applied to a first mobile device in a blockchain system, and the blockchain system further includes: a plurality of mobile edge computing MEC servers and a plurality of second mobile devices, wherein the first mobile device is a mobile device for generating tile data; the method comprises the following steps:

acquiring a block chain consensus task; the block chain consensus task is a task generated by writing target transaction data into a block and verifying the authenticity of the block data;

based on a preset Markov Decision Process (MDP), obtaining the current channel condition at the current moment, the current available computing resources of each MEC server and the current trust value of each second mobile device as a current Markov state; the MDP comprises the following steps: the system comprises a preset state space, a preset reward function and a preset action space, wherein the state space comprises: channel conditions, available computing resources of the MEC server, and trust values of the respective second mobile devices, the action space comprising: an offload decision, transmit power allocation, block size, and block inter time; the reward function is set in advance based on the principles of minimization of unloaded block chain consensus task processing delay and maximization of transaction throughput of the first mobile equipment;

inputting the current Markov state into a preset asynchronous dominant actor critic algorithm A3C model, so that the A3C model calculates rewards based on the reward function, and obtains and outputs target actions corresponding to the current Markov state based on the rewards, wherein the target actions comprise target unloading rules, target transmission power distribution, target block size and target block interval time;

when the target unloading rule is that the block chain consensus task is not unloaded, processing the block chain consensus task to obtain a first processing result;

when the target unloading rule is that a block chain consensus task is unloaded to an MEC server, sending the block chain consensus task to the MEC server so that the MEC server receives the block chain consensus task, processes the block chain consensus task, obtains a second processing result, and returns the second processing result to the first mobile device; the first mobile equipment receives a second processing result returned by the MEC server;

when the target unloading rule is that the block chain consensus task is unloaded to a plurality of second mobile devices, splitting the block chain consensus task into a preset number of subtasks and sending the subtasks to the plurality of second mobile devices; enabling the plurality of second mobile devices to receive the subtasks sent by the first mobile device, process the subtasks, and return the subtask processing results to the first mobile device;

and receiving a plurality of subtask processing results returned by the plurality of second mobile devices, and generating a third processing result.

In a second aspect of the embodiments of the present invention, there is provided a device for offloading a blockchain consensus task based on latency and transaction throughput, which is applied to a first mobile device in a blockchain system, where the blockchain system further includes: a plurality of mobile edge computing MEC servers and a plurality of second mobile devices, wherein the first mobile device is a mobile device for generating tile data; the device comprises:

the block chain consensus task acquisition module is used for acquiring a block chain consensus task; the block chain consensus task is a task generated by writing target transaction data into a block and verifying the authenticity of the block data;

a current Markov state obtaining module, configured to obtain, based on a preset Markov decision process MDP, a current channel condition at a current time, a current available computing resource of each MEC server, and a current trust value of each second mobile device, as a current Markov state; the MDP comprises the following steps: the system comprises a preset state space, a preset reward function and a preset action space, wherein the state space comprises: channel conditions, available computing resources of the MEC server, and trust values of the respective second mobile devices, the action space comprising: an offload decision, transmit power allocation, block size, and block inter time; the reward function is set in advance based on the principles of minimization of unloaded block chain consensus task processing delay and maximization of transaction throughput of the first mobile equipment;

a target action obtaining module, configured to input the current markov state into a preset asynchronous dominant actor critic algorithm A3C model, so that the A3C model calculates an incentive based on the incentive function, and obtain and output a target action corresponding to the current markov state based on the incentive, where the target action includes a target offload rule, a target transmission power allocation, a target block size, and a target block interval time;

the block chain consensus task processing module is used for processing the block chain consensus task to obtain a first processing result when the target unloading rule is that the block chain consensus task is not unloaded;

the block chain consensus task sending module is used for sending the block chain consensus task to the MEC server when the target unloading rule is that the block chain consensus task is unloaded to the MEC server, so that the MEC server receives the block chain consensus task, processes the block chain consensus task, obtains a second processing result, and returns the second processing result to the first mobile device;

the block chain consensus task splitting module is used for splitting the block chain consensus task into a preset number of subtasks and sending the subtasks to a plurality of second mobile devices when the target unloading rule is that the block chain consensus task is unloaded to the second mobile devices; enabling the plurality of second mobile devices to receive the subtasks sent by the first mobile device, process the subtasks, and return the subtask processing results to the first mobile device;

a second processing result receiving module, configured to receive a second processing result returned by the MEC server;

and the third processing result generation module is used for receiving a plurality of subtask processing results returned by the plurality of second mobile devices and generating a third processing result.

In a third aspect of the embodiments of the present invention, a mobile device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus; a memory for storing a computer program; and the processor is used for realizing any one of the block chain consensus task unloading methods based on the time delay and the transaction throughput when executing the program stored in the memory.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for offloading the block chain consensus task based on latency and transaction throughput is implemented as described in any one of the above.

Embodiments of the present invention further provide a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the above methods for offloading a blockchain consensus task based on latency and transaction throughput.

The embodiment of the invention has the following beneficial effects:

according to the block chain consensus task unloading method and device based on time delay and transaction throughput, provided by the embodiment of the invention, as the reward function in the MDP is set in advance based on the principle of minimization of unloaded block chain consensus task processing time delay and maximization of transaction throughput of the first mobile equipment, in the A3C model, reward tends to be maximized through multiple training, and target actions are obtained and output. Therefore, the target action is the action corresponding to the reward of minimizing the processing delay of the unloaded blockchain consensus task and maximizing the transaction throughput of the first mobile device, and therefore, the embodiment of the invention can ensure that the transaction throughput is improved on the basis of reducing the processing delay of the blockchain consensus task in the blockchain system. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a block chain consensus task offloading method based on latency and transaction throughput according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a block chain system according to an embodiment of the present invention;

FIG. 3a is a diagram illustrating the transition relationship between actions and states in MDP;

FIG. 3b is a specific flowchart of step S103 in the embodiment shown in FIG. 1;

FIG. 4 is a schematic structural diagram of the A3C model;

FIG. 5 is a graph showing the relationship between the average power and the average reward in a simulation experiment;

FIG. 6 is a graph showing the relationship between block interval and average reward in a simulation experiment;

FIG. 7 is a diagram illustrating the relationship between block size and average reward in a simulation experiment;

FIG. 8 is a diagram illustrating the relationship between the CPU cycle frequency of the MEC server and the average reward in a simulation experiment;

fig. 9 is a schematic diagram of a relationship between CPU cycle frequency of the MEC server, total latency and average transaction throughput in a simulation experiment;

fig. 10 is a schematic structural diagram of a block chain consensus task offloading device based on latency and transaction throughput according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a mobile device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for offloading a blockchain consensus task based on latency and transaction throughput, which is applied to a first mobile device 201 in a blockchain system shown in fig. 2, where the blockchain system further includes: a plurality of MEC servers 202 and a plurality of second mobile devices 203, wherein the first mobile device 201 is a mobile device for generating tile data. The MEC servers 202 are connected by wired links, and each MEC server 202 has computing power to handle blockchain consensus tasks.

In the blockchain system, the mobile device may be a user terminal, such as a mobile phone, a computer, etc. The blockchain system can be applied to energy transactions and other transactions, when the blockchain system is applied to energy transactions, energy transactions can be carried out between every two mobile devices, and the products of the transactions can be energy products such as natural gas, petroleum and the like. When a user performs an energy transaction on a certain mobile device, the mobile device is the first mobile device 201, and the mobile device records transaction data of the energy transaction process. A blockchain application may be installed in each mobile device in advance, and the blockchain application may record energy transaction data performed in the blockchain system, that is, generate a block corresponding to the transaction process.

As shown in fig. 1, the method for offloading block chain consensus task based on latency and transaction throughput according to the embodiment of the present invention may include:

s101, acquiring a block chain consensus task.

The blockchain consensus task is a task that is generated by writing target transaction data into a block and verifying the authenticity of the block data. After a user performs an energy transaction in a first mobile device, the first user device needs to write data of the energy transaction into a block of the first user device, and before that, a Hash value of the block to be generated this time needs to be calculated first, specifically, an existing Hash value calculation method may be adopted, and a Hash value is calculated according to target transaction data, for example, an SHA (Secure Hash Algorithm) 256 may be adopted to calculate the Hash value, after the Hash value is obtained, the first mobile device may write the Hash value and the target transaction data this time into a new block together, and perform block authenticity verification in cooperation with a second mobile device.

S102, based on a preset MDP (Markov Decision Process), obtaining a current channel condition at the current time, a current available computing resource of each MEC server, and a current trust value of each second mobile device as a current Markov state.

The current channel condition may include: channel gain between the first mobile device and the MEC server; and a channel gain between the first mobile device and each second mobile device. The current available computing resource of the MEC server represents the remaining computing resource of the MEC server after processing the block chain consensus task unloaded to the MEC server at the current moment, and each MEC server can compute the available computing resource in real time and send the computing resource to the first mobile device.

The trust value is used for representing a credibility evaluation index of one entity to another entity at a certain moment, and the trust value can be obtained by calculation based on the transaction success times, the transaction failure times and the transaction result uncertain times in the historical transaction records of each second mobile device. Respective trust values may be calculated by the second mobile devices and sent to the first mobile device; the first mobile device may also obtain historical transaction records for each second mobile device and calculate a trust value for each second mobile device based thereon. The obtained current channel conditions at the current time, the current available computational resources of the respective MEC servers, and the current trust values of the respective second mobile devices may be taken as the current markov states.

The MDP comprises the following components: the system comprises a preset state space, a preset reward function and a preset action space. MDP may be defined by the tuple < S, a, P, r >, where S represents a set of states in a state space, a represents a set of actions in an action space, P represents a state transition probability, and r represents a reward.

Wherein the state space comprises: the Markov states in the state space may be represented as channel conditions, available computational resources of the MEC server, and trust values of the respective second mobile devices

S (t) represents the current Markov state, G (t) represents the set of channel conditions at the current time,and G (t) = { G = _n (t),g _n,k (t)}，g _n (t) represents the current channel gain between the nth first mobile device and the MEC server, g _n,k (t) represents the current channel gain between the nth first mobile device and the kth second mobile device. C (t) = { C ₁ (t),C ₂ (t),...,C _n1 (t)}，C _n1 (t) represents the currently available computing resources of the n1 st MEC server. D ^trust (t) represents a current set of trust values for the second mobile device.

After performing action a (t), the probability of the state space transitioning from the current state s (t) to the next state s (t + 1) can be expressed as:

where f is a predetermined state transition probability density function.

The motion space a (t) includes: offload decision a (t), transmit power allocation P (t), block size S _b (T) and Block Interval time T _b (t), A ((t) = [ a (t), P (t), S) may be used _k (t),T _k (t)]An action set is defined. The offloading decision may be expressed as

a _N (t) represents an offloading decision of the nth first mobile device. Since a plurality of first mobile devices are usually included in the blockchain system, and at a certain time, an offloading decision may need to be performed on the plurality of first mobile devices, N of the offloading decisions may be set to the number of first mobile devices that need to perform the offloading decision.

In the embodiment of the present invention, a plurality of preset uninstalling rules may be set, where the preset uninstalling rules are respectively: the first mobile device does not offload the blockchain consensus task; the first mobile equipment unloads the block chain consensus task to the MEC server; the first mobile device offloads the blockchain consensus task to a plurality of second mobile devices, and the offloading decision in the action space represents a finally selected offloading rule.

The transmission power allocation P (t) may be expressed as:

wherein P is _total,N (t) represents the assigned total transmission power for the Nth first mobile device. When the offload decision is not to offload blockchain consensus tasks, the transmission power allocation is 0; when the unloading decision is to unload the block chain consensus task into the MEC server, transmitting power allocation to represent the power allocated for transmitting the block chain consensus task; when the offloading decision is to offload the blockchain consensus task to the plurality of second mobile devices, the blockchain consensus task needs to be split into a preset number of sub-tasks, and therefore the transmission power allocation includes transmission power allocated to each second mobile device to transmit each sub-task and a total transmission power allocated to each second mobile device to transmit each sub-task.

The block size represents the data size of the block generated for the target transaction data of the time, and the block size satisfies

Wherein->

Representing a preset block size threshold. The block interval time represents the time required to generate a new block and satisfies ≦ ≦ for the new block>

Wherein->

Representing a preset block interval time threshold.

The reward function is set in advance based on the principle that the unloaded blockchain consensus task processing delay is minimized and the transaction throughput of the first mobile device is maximized.

S103, inputting the current Markov state into a preset A3C (Asynchronous Advantage Actor-Critical Algorithm) model, so that the A3C model calculates reward based on a reward function, and obtains and outputs a target action corresponding to the current Markov state based on the reward.

In an embodiment of the invention, a current markov state may be input into an A3C model, the A3C model being capable of calculating a reward based on a reward function, and obtaining and outputting a target action corresponding to the current markov state based on the reward, the target action including a target offload rule, a target transmission power allocation, a target block size, and a target block inter-time.

And S104, when the target unloading rule is that the block chain consensus task is not unloaded, processing the block chain consensus task to obtain a first processing result.

When the target uninstalling rule is that the block chain consensus task is not uninstalled, the first mobile device processes the block chain consensus task, that is, calculates a hash value according to a preset hash value calculation method, and obtains a first processing result, where the first processing result is the hash value obtained by the calculation.

S105, when the target offloading rule is to offload the blockchain consensus task to the MEC server, sending the blockchain consensus task to the MEC server, and then executing step S107.

When the target unloading rule is to unload the block chain consensus task to the MEC server, the first mobile device may select an MEC server with the largest currently available computing resource according to the currently available computing resources of each MEC server, and send the block chain consensus task to the MEC server, so that the MEC server receives the block chain consensus task, processes the block chain consensus task, obtains a second processing result, and returns the second processing result to the first mobile device.

And S106, when the target unloading rule is to unload the block chain consensus task to a plurality of second mobile devices, splitting the block chain consensus task into a preset number of subtasks and sending the subtasks to the plurality of second mobile devices, and then executing the step S108.

When the target unloading rule is to unload the blockchain consensus task to the plurality of second mobile devices, the first mobile device may split the blockchain consensus task into a preset number of subtasks, select a preset number of second mobile devices with trust values meeting a trust value threshold from the plurality of second mobile devices according to the current trust value of each second mobile device, and send the split preset number of subtasks to the preset number of second mobile devices, so that the plurality of second mobile devices receive the subtasks sent by the first mobile device, process the subtask processing results, and return the subtask processing results to the first mobile device. The confidence threshold may be a value set empirically in advance, and may be 0.5, for example.

And S107, receiving a second processing result returned by the MEC server.

Since the second processing result calculated by the MEC server is a hash value, the hash value can be directly obtained after the first mobile device receives the second processing result returned by the MEC server 101.

And S108, receiving a plurality of subtask processing results returned by the plurality of second mobile devices, and generating a third processing result.

Since the second mobile devices respectively process the subtasks to obtain the subtask processing results, after receiving the subtask processing results returned by each second mobile device, the first mobile device needs to generate a third processing result according to the plurality of subtask processing results to obtain the hash value.

According to the block chain consensus task unloading method based on time delay and transaction throughput, the first mobile device obtains the current channel condition at the current moment, the current available computing resources of an MEC server and the current trust value of each second mobile device as the current Markov state based on the preset MDP; inputting the current Markov state into a preset A3C model, so that the A3C model calculates reward based on a reward function, and obtains and outputs a target action corresponding to the current Markov state based on the reward. Since the reward function in the MDP is set in advance based on the principles of minimization of offloaded blockchain consensus task processing delay and maximization of transaction throughput of the first mobile device, in the A3C model, the reward can be maximized through multiple training, and the target action can be obtained and output. Therefore, the target action is the action corresponding to the reward of minimizing the processing delay of the unloaded blockchain consensus task and maximizing the transaction throughput of the first mobile device, and therefore, the embodiment of the invention can ensure that the transaction throughput is improved on the basis of reducing the processing delay of the blockchain consensus task in the blockchain system.

As an optional implementation manner of the embodiment of the present invention, as shown in fig. 3a, the MDP further includes: the transition relation between the action and the state characterizes that: in different states, the current state transitions into different states by performing different actions. For example, the current state of the state space is state 1, when action 1 is performed, the state space may transition from state 1 to state 2, and then when action 3 is performed, the state space may transition from state 2 to state 4; the current state of the state space is state 1, and when action 2 is performed, the state space may transition from state 1 to state 3.

As shown in fig. 3b, the step S103 of inputting the current markov state into the preset asynchronous dominant actor critics algorithm A3C model in the embodiment of fig. 1, so that the A3C model calculates the reward based on the reward function, and obtains and outputs the target action corresponding to the current markov state based on the reward, includes:

s301, inputting the current Markov state into the A3C model.

As shown in fig. 4, the A3C model includes a global network and a plurality of local networks, the global network includes a neural network, the neural network includes an operator network and a critic network, the network structure of the local networks is the same as that of the global network, and when a state is input, a policy function and a state cost function can be obtained. The strategy function is used for executing the current action by combining the current state of the A3C and the current model parameter; and the state cost function is used for substituting the corresponding parameters to calculate the advantage function value. The current markov state may be input into the global network and the plurality of local networks.

S302, setting the initial model parameters in the A3C model as the current model parameters.

In this step, an actor network initial model parameter of the global network in the A3C model may be set as a current actor network model parameter; and setting the critic network initial model parameters of the global network as the current critic network model parameters. In the initial state of the A3C model, the current model parameters are the same as the initial model parameters, and then the model parameters of the local network may be updated with the model parameters of the global network.

A global iteration number threshold, a local iteration number threshold, an attenuation factor, and an agent number threshold may also be input. The initial value of the number of agents may be set to the current number of agents, which is the same as the initial value of the number of agents in the initial state of the A3C model, for example, the initial value is 1, and then step S303 is performed. Wherein the agent number threshold is the same as the number of local networks.

S303, the current Markov state is taken as the A3C current state.

S304, based on the current state of the A3C, the current model parameters and the preset strategy function, the current action is executed.

Policy function pi (A) _t ,s _t (ii) a θ') can be a function preset in the A3C model local network and characterized in the state s _t Next, action A is performed _t Is a function of the model parameter theta' of the actor network in the local network. The MDP further includes a transition relationship between the actions and the states, so that the current action can be executed based on the current state of the A3C, the current model parameters, and a preset policy function, specifically, the probability of executing each action can be determined according to the transition relationship and the state transition function, and an action with the highest probability is selected as the current action. Wherein the current action comprises: current offload rules, current transmit power allocation, current block size, and current block interval time.

It should be noted that, after the current offload rule is determined, if the current offload rule is to offload the blockchain consensus task to multiple second mobile devices, a preset number of second mobile devices may be determined as second mobile devices for processing the subtasks according to the current trust value of each second mobile device. And if the current unloading rule is to unload the blockchain consensus task to the MEC server, determining a target MEC server according to the available computing resources of each MEC server.

As an optional implementation manner in the embodiment of the present invention, after the current action is executed, it may be determined whether the current action meets a preset condition, where the preset condition includes:

(C1):a _n (t)∈{0,1,2}

(C2):0≤P _tot,n (t)≤P _T

(C3):T _tot，n (t)≤ω×T _b (t)

in the formula, a _n (t) denotes the current offload rule, P _tot,n (t) denotes the currently allocated total transmission power, P _T Indicating a system preset total transmission power, T _tot，n (t) represents the current total delay, ω represents a predetermined weight factor for the inter-block time, and ω represents the current total delay>1，T _b (t) represents the current block interval time, ρ (k) represents the preset proportion of the subtask occupied block chain common task needing to be sent to the kth second mobile equipment, and D ₁ (t) represents the amount of data for the blockchain consensus task, L _k (t) represents the current link capacity of the connection between the first mobile device and the kth second mobile device.

Wherein 0, 1 and 2 in C1 are identifiers of three preset offload rules, respectively, identifier 0 indicates that the blockchain consensus task is not offloaded, identifier 1 indicates that the blockchain consensus task is offloaded to the MEC server, and identifier 2 indicates that the blockchain consensus task is offloaded to a plurality of second mobile devices, so that the C1 expression is used to determine whether the current offload rule is one of the three preset offload rules.

C2, indicating that the currently allocated total transmission power cannot exceed the system preset total transmission power, and when the current offload rule is that the block chain consensus task is not offloaded, the currently allocated total transmission power is 0. When the current offloading rule is to offload the blockchain consensus task to the MEC server, the currently allocated total transmission power is: and transmitting the transmission power allocated by the block chain consensus task to the MEC server. When the current offloading rule is to offload the blockchain consensus task to the plurality of second mobile apparatuses, the currently allocated total transmission power is the sum of the transmission powers allocated to the second mobile apparatuses.

C3 indicates that the current total delay needs to be less than or equal to the product of the current block size and ω, where ω represents a preset weighting factor for the block interval time, and ω >1, which may be a value preset empirically or experimentally.

C4 indicates that when the current offload rule is to offload the blockchain consensus task to multiple second mobile devices, the data amount of the subtask sent by the first mobile device to each second mobile device cannot exceed the current link capacity of the first mobile device and each second mobile device.

If the current action does not meet any one of the preset conditions, the current reward is set to be a preset numerical value, and the preset numerical value can be 0. If the current action simultaneously satisfies C1, C2, C3, and C4 in the preset condition, the following steps may be performed:

s305, obtain the current reward according to the current offload rule, the current transmission power allocation, the current block size, the current block interval time and the reward function in the current action.

The reward functions are all related to the offload rules, transmit power allocation, block size, and block inter-time, so that the current reward can be calculated according to the current offload rules, current transmit power allocation, current block size, current block inter-time, and reward functions in the current action.

The above process can be expressed as:

in the formula, r _t Represents the current reward, O (t) represents the reward function, if C1-C4 aressifieds represents that the current action simultaneously satisfies C1, C2, C3 and C4 in the preset condition, _otherwise indicating that the current action does not satisfy any of the preset conditions.

S306, based on the current A3C state, the current action and the transition relation between the action and the state, obtaining the next A3C state. As shown in fig. 3a, in the current A3C state, when a current action is selected, the next state can be determined according to the transition relationship between the actions and the states.

S307, judging whether the next A3C state is an A3C termination state or not according to whether the current reward meets a preset reward condition or not, and judging whether the times of executing the current action meet a preset action execution time threshold or not.

The preset reward condition can be that the current reward is the same as the previous reward, or the difference value between the current reward and the previous reward is within a preset range; or the difference value between the current reward and the reward of the preset times before the current reward is in a preset range. If the current reward meets the preset reward condition, the next A3C state is an A3C termination state, otherwise, the next A3C state is not the A3C termination state.

Whether the number of times of executing the current action meets a preset action execution number threshold value can also be judged, and the action execution number threshold value can be a local iteration number threshold value.

And S308, if the next A3C state is not the A3C termination state and the number of times of executing the current action does not meet the preset action execution number threshold, taking the next A3C state as the current A3C state, returning to the step S304, and executing the current action based on the current A3C state, the current model parameters and the preset strategy function.

If the next A3C state is not the A3C terminated state and the number of times the current action is performed does not satisfy the preset number of times the action is performed threshold, it indicates that the local network does not converge, and therefore, the step of S304 may be returned.

S309, if the next A3C state is an A3C terminated state, or the number of times of executing the current action satisfies a preset action number threshold, calculating an advantage function value corresponding to each A3C state based on the reward corresponding to each A3C state and a preset advantage function calculation formula.

If the next A3C state is an A3C terminated state, or the number of times of executing the current action satisfies a preset action execution number threshold, it indicates that the local network has converged, and therefore, an advantage function value may be calculated, and before calculating the advantage function value, a policy value corresponding to each action needs to be calculated first, and then the advantage function value is calculated using the policy values. The strategy value corresponding to each action is calculated by adopting the following expression:

in the formula, R _t (θ _v ') denotes the policy value, V(s) _t ；θ _v ') indicates and status s _t And criticc model parameter θ in local network _v ' associated State cost function, gamma denotes the decay factor, r _t+1 Represents a reward, V(s), corresponding to t +1 actions performed _t+k1 ；θ _v ') indicates and status s _t+k1 And parameter theta of critical network in local network _v 'associated State cost function, theta' represents the model parameters of the actor network in the local network, theta _v ' model parameter, k, representing critic network in local network ₁ Indicating the number of sampling steps.

As can be seen from the above expression, calculating the policy value corresponding to each action requires using the policy value corresponding to the next action, and therefore the policy value corresponding to the last action needs to be calculated. The strategy value corresponding to the last action is calculated by adopting the following expression:

the meaning of the above formula is: when the state corresponding to the last action is the terminal state (ter)minor state), the strategy value corresponding to the last action is set to 0; when the state corresponding to the last action is not the terminal state, determining the strategy value corresponding to the last action as the state s _t And the model parameter theta of the actor network in the local network _v ' associated State cost function V(s) _t ；θ _v ') may also be understood herein as a terminated state.

The dominance function expression in the A3C model may be:

in the formula, A (A) _t ,s _t ；θ′,θ _v ') denotes the value of the merit function, V(s) _t ；θ _v ') indicates and status s _t And parameter theta of critical network in local network _v ' associated state cost function.

S310, select the maximum value from the merit function values corresponding to the respective A3C states as the target merit function value. After the merit functions corresponding to the A3C states are obtained, the maximum value can be selected as the target merit function.

And S311, calculating a loss function value by using a preset loss function calculation formula and the target advantage function value.

Since the operator network and the criticic network are included in the global network, the loss function values of the operator network and the criticic network can be calculated respectively. Specifically, an operator network loss function value is calculated by using a preset operator network loss function calculation formula and a target advantage function, wherein the operator network loss function calculation formula is as follows:

J _π (θ′)＝logπ(A _t ,s _t ；θ′)A(A _t ,s _t ；θ′,θ _v ′)+cH(π(s _t ；θ′))

in the formula, J _π (theta') represents the value of the actor network loss function, pi (A) _t ,s _t (ii) a θ') represents a policy function, A (A) _t ,s _t ；θ′,θ _v ') watchIndicating the value of the objective merit function, c indicating a predetermined hyper-parameter, H(s) _t (ii) a θ')) represents a preset entropy function.

Calculating a criticic network loss function value by using a preset criticic network loss function calculation formula and a target advantage function, wherein the criticic network loss function calculation formula is as follows:

J _v (θ _v ′)＝A(A _t ,s _t ；θ′,θ _v ′) ² ＝(R _t (θ _v ′)-V(s _t ；θ _v ′)) ²

in the formula, J _v (θ _v ') denotes the critical network loss function value, A (A) _t ,s _t ；θ′,θ _v ') indicates the value of the objective merit function.

S312, a cumulative gradient is calculated by using a preset cumulative gradient calculation formula and the loss function value.

The cumulative gradient of the operator network in the local network can be calculated by utilizing a preset calculation formula of the cumulative gradient of the operator network and a function value of the loss of the operator network, wherein the calculation formula of the cumulative gradient of the operator network is as follows:

in the formula (I), the compound is shown in the specification,

representing the cumulative gradient of the operator network in the local network.

The critical network cumulative gradient in the local network can be calculated by using a preset critical network cumulative gradient calculation formula and a critical network loss function value, wherein the critical network cumulative gradient calculation formula is as follows:

in the formula (I), the compound is shown in the specification,

representing the critic network cumulative gradient in the local network.

And S313, updating the current model parameters by using the accumulated gradient.

The current model parameters of the actor network in the global network can be updated by using the accumulated gradient of the actor network in the local network, and the current model parameters of the critic network in the global network can be updated by using the accumulated gradient of the critic network in the local network.

And S314, judging whether the current iteration number is greater than or equal to a preset iteration number threshold value.

And judging whether the current iteration number is greater than or equal to a preset iteration number threshold value or not so as to judge whether the A3C model is converged or not. The preset iteration number threshold may be a global iteration number threshold.

If yes, the A3C model converges, and step S315 is executed to output the target action corresponding to the last A3C state. If yes, the A3C model is shown to be converged, and the target action corresponding to the last A3C state in the global network can be output.

If not, the process returns to step S303, and the current markov state is set as the A3C current state. If not, it indicates that the A3C model is not converged, the iterative process is entered again, and the step S303 is returned, and the current agent number is increased by the preset number to obtain the next agent number, where the preset number may be 1. In addition, the current model parameters of the global network can be updated to the current model parameters of each local network.

As an alternative implementation manner of the present invention, the step S305 of the embodiment flow shown in fig. 3b, obtaining the current reward according to the current offload rule, the current transmission power allocation, the current block size, the current block interval time and the reward function in the current action, may include:

step one, calculating the current total time delay by using a first preset expression, wherein the first preset expression is as follows:

in the formula, T _tot,n (t) represents the current total delay of the nth first mobile device, a _n (t) denotes the current offload rule, a _n (t)∈{0,1,2}，a _n (t) the unload rule corresponding to the value of 0 is not to unload the blockchain consensus task, a _n The unloading rule corresponding to the value of (t) being 1 is to unload the block chain consensus task to the MEC server, and a _n (T) the offloading rule corresponding to the value of 2 is to offload blockchain consensus tasks to a plurality of second mobile devices, T ^(L) (T) represents the current first mobile device processing delay, T ^(A) (T) represents the current MEC server processing delay, T ^(D) (t) represents the current second mobile device processing delay, and t represents the current number of actions.

Secondly, calculating the current reward by using a reward function, wherein the reward function is as follows:

where O (t) represents the current award, ω ₁ A predetermined weighting factor, 0, representing the throughput of the transaction<ω ₁ <1, Ψ (t) represents the current transaction throughput, ω ₂ A predetermined coefficient, T, representing the total delay _tot,n (t) represents a current total latency of the nth first mobile device, N representing a number of first mobile devices; wherein the current transaction throughput is calculated based on the current block size and the current block interval time in the current action. It should be noted that, because the transaction throughput and the total delay are not in the same order of magnitude, a preset coefficient ω may be set for the current total delay ₂ And 0 is<ω ₂ <1, a preset coefficient can be calculated according to a quotient between the total time delay and the transaction throughput which are actually calculated, and can also be obtained according to an experiment.

As an optional implementation manner of the embodiment of the present invention, the processing delay of the current first mobile device may be calculated by using the following expression:

in the formula, T ^(L,e) (t) representing a current first mobile device processing latency for the first mobile device to process the blockchain consensus task; d ₁ (t) represents the amount of data for the blockchain consensus task, X _t Representing the computational workload of the block chain consensus task;

indicating the computational power of the second mobile device in CPU cycles/s. It should be noted that, since both the first mobile device and the second mobile device can be user terminals, the computing capabilities of the first mobile device and the second mobile device are the same.

As an optional implementation manner of the embodiment of the present invention, the processing delay of the current MEC server may be calculated in the following manner: the method comprises the following steps that firstly, a second preset expression is used for calculating the current transmission rate of the first mobile equipment for sending the block chain consensus task to the MEC server, and the second preset expression is as follows:

in the formula (I), the compound is shown in the specification,

representing the current transmission rate at which the first mobile device sends the blockchain consensus task to the MEC server, B representing the channel bandwidth, P _n (t) represents the currently allocated transmission power when the current offload rule is to offload the blockchain consensus task to the MEC server, g _n (t) represents the channel gain, σ, between the first mobile device and the MEC server _n (t) represents a channel noise variance between the first mobile device and the MEC server.

And secondly, calculating the current transmission delay of the first mobile equipment for sending the block chain consensus task to the MEC server by using a third preset expression:

in the formula, T ^(A,u) (t) current transmission delay, D, representing the first mobile device sending the blockchain consensus task to the MEC server ₁ (t) represents the amount of data for the blockchain consensus task,

representing a current transmission rate at which the first mobile device sends the blockchain consensus task to the MEC server.

And thirdly, calculating the sum of the current transmission delay of the first mobile equipment for sending the block chain consensus task to the MEC server, the current execution delay of the MEC server for processing the block chain consensus task and the current queuing delay of the block chain consensus task waiting for processing in the MEC server, and taking the sum as the processing delay of the current MEC server.

In the embodiment of the present invention, the current execution delay of the MEC server for processing the blockchain consensus task may be calculated by using the following expression:

in the formula, T ^(A,e) (t) represents the current execution delay of the MEC server processing the blockchain consensus task, D ₁ (t) represents the amount of data for the blockchain consensus task, X _t Representing the computational workload of the blockchain consensus task; f. of _t ^A The computing power of the MEC server is expressed in CPU cycles/s.

The current queuing delay of the block chain consensus task waiting for processing at the MEC server can be calculated by adopting the following expression:

in the formula, T ^(A,q) (t) represents the current queuing delay, Q, of the block chain consensus task waiting for processing at the MEC server _t Representing regions to be processed in MEC serversThe total number of CPU cycles of the block chain consensus task; f. of _t ^A Representing the computing power of the MEC server.

In the embodiment of the invention, the processing delay of the current MEC server is calculated, the current transmission delay generated in the process of sending the block chain consensus task to the MEC server by the first mobile equipment, the current execution delay generated in the process of processing the block chain consensus task by the MEC server are fully considered, and the current queuing delay generated when the block chain consensus task is queued in the MEC server after the block chain consensus task is sent to the MEC server by the first mobile equipment is further considered, so that the calculated processing delay of the current MEC server is more accurate.

As an optional implementation manner of the embodiment of the present invention, the current processing delay of the second mobile device may be calculated as follows: the method comprises the steps that firstly, the current transmission rate of the first mobile equipment for sending the subtasks to each second mobile equipment is calculated by using a fourth preset expression; the second mobile devices receiving the subtasks are determined according to the current trust values of the second mobile devices, and the fourth preset expression is as follows:

in the formula (I), the compound is shown in the specification,

represents the current transmission rate at which the first mobile device sends the subtask to the kth second mobile device, B represents the channel bandwidth, <' > or >>

Representing the current trust value, P, of the kth second mobile device _n,k (t) a transmission power currently allocated when the current offloading rule is to offload the blockchain consensus task to the plurality of second mobile devices, g _n,k (t) denotes the channel gain, σ, between the first mobile device and the kth second mobile device _n,k (t) represents noise between the first mobile device and the kth second mobile deviceThe acoustic variance.

And secondly, calculating the current transmission time delay of the first mobile equipment for sending the subtasks to the plurality of second mobile equipment by using a fifth preset expression:

in the formula, T ^(D,u) (t, k) represents the current transmission delay of the first mobile device sending the subtask to the kth second mobile device, D ₁ (t) represents the amount of data for the blockchain consensus task,

indicating the current transmission rate at which the first mobile device sends the subtask to the kth second mobile device.

And thirdly, calculating the sum of the current transmission time delay of the first mobile equipment for sending the subtasks to each second mobile equipment and the current execution time delay of the second mobile equipment for processing the subtasks, and taking the sum as the current processing time delay of the second mobile equipment. Wherein, the current execution delay of the second mobile device for processing the subtask can be calculated by using the following expression:

in the formula, T ^(D,e) (t,k _t ) Indicating the current execution delay, D, of the second mobile device processing the subtask ₁ (t) represents the amount of data for the blockchain consensus task, X _t Representing the computational workload of the blockchain consensus task;

indicating the computational power of the second mobile device in CPU cycles/s.

In the embodiment of the invention, when the processing delay of the current second mobile equipment is calculated, the current transmission delay generated in the process of sending the subtask to the second mobile equipment by the first mobile equipment and the current execution delay generated in the process of processing the subtask by the second mobile equipment are fully considered, so that the calculated processing delay of the current second mobile equipment is more accurate.

As an optional implementation manner of the embodiment of the present invention, the current transaction throughput is calculated by using a sixth preset expression, where the sixth preset expression is:

where Ψ (t) represents the current transaction throughput, χ represents the data size of the target transaction data, S _b (T) denotes the current block size, T _b (t) represents a current block interval time,

denotes that S is less than or equal to _b (t)/χ, the largest integer.

The performance of the block chain consensus task offloading method based on time delay and transaction throughput according to the embodiment of the present invention is described by simulation experiments as follows: the simulation experiment parameters are set as follows: the power density of transmission noise is-174 dBm/Hz, the bandwidth is 180KHz, the data volume of target transaction data is 200kb, and the weight factor omega of transaction throughput ₁ =0.5, preset coefficient ω ₂ Learning rate η of =0.2,actor network _a Learning rate η of =0.001,critic network _c =0.01. In order to verify the effectiveness of the block chain consensus task unloading method based on time delay and transaction throughput provided by the embodiment of the invention, the following three different models are respectively established: a resource random allocation model in which a first mobile device randomly selects an action space; the block chain consensus task is unloaded to an MEC server model, and in the model, the block chain consensus task of each first mobile device is only unloaded to the MEC server; in the fixed block chain task size model, only the offload decision, the transmission power allocation, and the block interval time are variables in the actions of each first mobile device, and the block size is a fixed value.

Fig. 5 shows the relationship between transmission power allocation and average reward. It can be seen that the average reward increases with increasing transmission power allocation. In addition, the block chain consensus task offloading method based on time delay and transaction throughput provided by the embodiment of the invention has the largest average reward, namely the performance of the method is superior to that of the other three models. As can be seen from fig. 5, when the sum of the allocated transmission powers is greater than 0.7, the average bonus increases slowly for the four models. The reason is that as the allocated transmission power increases, the communication overhead increases, such as energy consumption, thereby affecting the total latency, which in turn results in a slow increase in the average reward. Fig. 6 shows the relationship between the different block interval times and the average prize. As can be seen from fig. 6, the average reward in each of the four models decreases as the inter-block time increases. The reason is that the transaction throughput decreases as the inter-chunk time increases, with the data amount and chunk size of the target transaction data unchanged.

Fig. 7 shows the relationship between block size and average prize. It can be seen that in addition to the fixed blockchain task size model, the average reward of the other three models increases as the chunk size increases. Fig. 8 shows the relationship between the CPU cycle frequency of the MEC server and the average prize. As can be seen from fig. 8, the average reward increases with increasing CPU cycle frequency when the weighting factor for the traffic throughput is constant. In addition, it can be seen that the average reward increases with an increase in the weighting factor on the transaction throughput, indicating that the average reward is more affected by the average transaction throughput.

Fig. 9 shows the relationship between CPU cycle frequency of MEC server and the total latency and average transaction throughput. A trade-off between overall latency and transaction throughput may be achieved on the basis of fig. 9. Simulation results show that compared with a model for unloading the blockchain consensus task to the MEC server, the average time delay of the blockchain consensus task unloading method based on the time delay and the transaction throughput can be reduced by 3.4%, and the average transaction throughput is improved by 52.4%. Compared with a fixed block chain task size model, the block chain consensus task unloading method based on the time delay and the transaction throughput of the embodiment of the invention has the advantages that the average time delay is reduced by 1.7%, and the average transaction throughput is improved by 12.1%.

Referring to fig. 10, an embodiment of the present invention provides an embodiment of a block chain consensus task offloading device based on latency and transaction throughput, which corresponds to the process shown in fig. 1, and is applied to a first mobile device 201 in a block chain system shown in fig. 2, where the block chain system further includes: a plurality of MEC servers 202 and a plurality of second mobile devices 203, wherein the first mobile device 201 is a mobile device for generating tile data.

The block chain consensus task unloading device based on time delay and transaction throughput provided by the embodiment of the invention comprises:

a block chain consensus task obtaining module 1001 configured to obtain a block chain consensus task; the blockchain consensus task is a task generated by writing target transaction data into a block and verifying the authenticity of the block data. A current markov state obtaining module 1002, configured to obtain, based on a preset markov decision process MDP, a current channel condition at a current time, a current available computing resource of each MEC server, and a current trust value of each second mobile device as a current markov state; the MDP comprises the following steps: the system comprises a preset state space, a preset reward function and a preset action space, wherein the state space comprises: channel conditions, available computing resources of the MEC server, and trust values of the respective second mobile devices, the action space comprising: an offload decision, transmit power allocation, block size, and block inter time; the reward function is set in advance based on the principle that the unloaded blockchain consensus task processing delay is minimized and the transaction throughput of the first mobile device is maximized. A target action obtaining module 1003, configured to input the current markov state into a preset asynchronous dominant actor critics algorithm A3C model, so that the A3C model calculates an incentive based on the incentive function, and obtain and output a target action corresponding to the current markov state based on the incentive, where the target action includes a target offload rule, a target transmission power allocation, a target block size, and a target block interval time. The block chain consensus task processing module 1004 is configured to, when the target uninstalling rule is that the block chain consensus task is not uninstalled, process the block chain consensus task to obtain a first processing result. A block chain consensus task sending module 1005, configured to send the block chain consensus task to the MEC server when the target offload rule is to offload the block chain consensus task to the MEC server, so that the MEC server receives the block chain consensus task, processes the block chain consensus task, obtains a second processing result, and returns the second processing result to the first mobile device. A block chain consensus task splitting module 1006, configured to split the block chain consensus task into a preset number of subtasks and send the subtasks to the plurality of second mobile devices when the target offloading rule is to offload the block chain consensus task to the plurality of second mobile devices; and enabling the plurality of second mobile devices to receive the subtasks sent by the first mobile device, process the subtasks, obtain a subtask processing result and return the subtask processing result to the first mobile device. A second processing result receiving module 1007, configured to receive a second processing result returned by the MEC server. The third processing result generating module 1008 is configured to receive a plurality of sub-task processing results returned by the plurality of second mobile devices, and generate a third processing result.

As an optional implementation manner of the embodiment of the present invention, the MDP further includes: transition relationships between actions and states. The target action obtaining module 1003 includes: and the current Markov state input submodule is used for inputting the current Markov state into the A3C model. And the model parameter setting submodule is used for setting the initial model parameters in the A3C model as the current model parameters. And the A3C current state setting submodule is used for taking the current Markov state as the A3C current state. The current action execution sub-module is used for executing the current action based on the A3C current state, the current model parameters and a preset strategy function, and the current action comprises the following steps: current offload rules, current transmit power allocation, current block size, and current block interval time. And the current reward obtaining sub-module is used for obtaining the current reward according to the current unloading rule, the current transmission power distribution, the current block size, the current block interval time and a reward function in the current action. And the A3C state obtaining submodule is used for obtaining the next A3C state based on the current A3C state, the current action and the transition relation between the action and the state. And the reward judging submodule is used for judging whether the next A3C state is an A3C termination state or not according to whether the current reward meets a preset reward condition or not and judging whether the number of times of executing the current action meets a preset action execution number threshold or not. And the triggering sub-module is used for taking the next A3C state as the current A3C state and triggering the current action execution sub-module to execute the current action based on the current A3C state, the current model parameters and a preset strategy function if the next A3C state is not the A3C termination state and the number of times of executing the current action does not meet a preset action execution number threshold. And the advantage function calculation sub-module is used for calculating the advantage function value corresponding to each A3C state based on the reward corresponding to each A3C state and a preset advantage function calculation formula if the next A3C state is an A3C termination state or the number of times of executing the current action meets a preset action execution number threshold. And the target advantage function selection submodule is used for selecting a maximum value from the advantage function values corresponding to the A3C states as a target advantage function value. And the loss function value calculation submodule is used for calculating the loss function value by utilizing a preset loss function calculation formula and the target advantage function value. And the cumulative gradient calculation submodule is used for calculating the cumulative gradient by utilizing a preset cumulative gradient calculation formula and the loss function value. And the model parameter updating submodule is used for updating the current model parameters by utilizing the accumulated gradient. And the iteration frequency judging submodule is used for judging whether the current iteration frequency is greater than or equal to a preset iteration frequency threshold, if so, the A3C model is converged, and a target action corresponding to the last A3C state is output, and if not, the A3C current state setting submodule is triggered to execute the current Markov state as the A3C current state.

As an optional implementation manner of the embodiment of the present invention, the current reward obtaining sub-module includes: calculating the current total time delay by using a first preset expression, wherein the first preset expression is as follows:

in the formula, T _tot,n (t) represents the current total delay of the nth first mobile device, a _n (t) denotes the current offload rule, a _n (t)∈{0,1,2}，a _n (t) the unload rule corresponding to the value of 0 is not to unload the blockchain consensus task, a _n The unloading rule corresponding to the value (t) of 1 is to unload the block chain consensus task to the MEC server, and a _n (T) the offloading rule corresponding to the value of 2 is to offload blockchain consensus tasks to a plurality of second mobile devices, T ^(L) (T) represents the current first mobile device processing delay, T ^(A) (T) represents the current MEC server processing delay, T ^(D) (t) represents the current second mobile device processing delay, and t represents the current number of actions.

A current reward calculation unit for calculating a current reward using a reward function, the reward function being:

where O (t) represents the current award, ω ₁ A predetermined weighting factor, 0, representing the throughput of the transaction<ω ₁ <1, Ψ (t) represents the current transaction throughput, ω ₂ A predetermined coefficient, T, representing the total delay _tot,n (t) represents a current total latency of the nth first mobile device, N representing a number of first mobile devices; wherein the current transaction throughput is calculated based on the current block size and the current block interval time in the current action.

As an optional implementation manner of the embodiment of the present invention, the device for unloading a block chain consensus task based on latency and transaction throughput provided by the embodiment of the present invention further includes: the current action judging module is used for judging whether the current action meets preset conditions or not, and the preset conditions comprise:

(C1):a _n (t)∈{0,1,2}

(C2):0≤P _tot,n (t)≤P _T

(C3):T _tot，n (t)≤ω×T _b (t)

in the formula, a _n (t) denotes the current offload rule, P _tot,n (t) denotes the currently allocated total transmission power, P _T Indicating a system preset total transmission power, T _tot，n (t) represents the current total delay, ω represents a predetermined weight factor for the inter-block time, and ω represents the current total delay>1，T _b (t) represents the current block interval time, ρ (k) represents the preset proportion of the subtask occupied block chain consensus task which needs to be sent to the kth second mobile equipment, and D ₁ (t) represents the amount of data for the blockchain consensus task, L (t) _k Representing the current link capacity of the connection between the first mobile device and the kth second mobile device.

A current reward earner submodule, specifically configured to: and if the current action simultaneously meets C1, C2, C3 and C4 in the preset condition, executing the step of calculating the total time delay by using a first preset expression. And if the current action does not meet any one of the preset conditions, setting the current reward as a preset numerical value.

As an optional implementation manner of the embodiment of the present invention, the device for unloading a block chain consensus task based on latency and transaction throughput provided by the embodiment of the present invention further includes: the first current transmission rate calculation module is configured to calculate, by using a second preset expression, a current transmission rate at which the first mobile device sends the block chain consensus task to the MEC server, where the second preset expression is:

in the formula (I), the compound is shown in the specification,

indicating the current transmission rate at which the first mobile device sends the block chain consensus task to the MEC server, B indicating the channel bandwidth, P _n (t) represents the currently allocated transmission power when the current offload rule is to offload the blockchain consensus task to the MEC server, g _n (t) represents the channel gain, σ, between the first mobile device and the MEC server _n (t) represents a channel noise variance between the first mobile device and the MEC server.

The first current transmission delay calculation module is configured to calculate, by using a third preset expression, a current transmission delay for the first mobile device to send the block chain consensus task to the MEC server:

in the formula, T ^(A,u) (t) current transmission delay, D, for the first mobile device to send the blockchain consensus task to the MEC server ₁ (t) represents the amount of data for the blockchain consensus task,

And the current MEC server processing delay calculation module is used for calculating the sum of the current transmission delay of the first mobile equipment for sending the block chain consensus task to the MEC server, the current execution delay of the MEC server for processing the block chain consensus task, and the current queuing delay of the block chain consensus task waiting for processing in the MEC server, and taking the sum as the current MEC server processing delay.

As an optional implementation manner of the embodiment of the present invention, the device for unloading a block chain consensus task based on latency and transaction throughput provided by the embodiment of the present invention further includes: the second current transmission rate calculation module is configured to calculate, by using a fourth preset expression, a current transmission rate at which the first mobile device sends the subtask to each second mobile device, receive the second mobile device of the subtask, and determine the second mobile device to obtain the subtask according to the current trust value of each second mobile device, where the fourth preset expression is:

in the formula (I), the compound is shown in the specification,

representing the current transmission rate at which the first mobile device sends the subtask to the kth second mobile device, B representing the channel bandwidth, —>

Representing the current trust value, P, of the kth second mobile device _n,k (t) a transmission power currently allocated when the current offloading rule is to offload the blockchain consensus task to the plurality of second mobile devices, g _n,k (t) denotes the channel gain, σ, between the first mobile device and the kth second mobile device _n,k (t) represents a noise variance between the first mobile device and the kth second mobile device.

A second current transmission delay calculation module, configured to calculate, by using a fifth preset expression, current transmission delays for the first mobile device to send the subtasks to the plurality of second mobile devices:

And the current processing delay calculation module of the second mobile equipment is used for calculating the sum of the current transmission delay of the subtask sent by the first mobile equipment to each second mobile equipment and the current execution delay of the subtask processed by the second mobile equipment, and taking the sum as the processing delay of the current second mobile equipment.

where Ψ (t) represents the current transaction throughput, χ represents the data size of the target transaction data, S _b (T) denotes the current block size, T _b (t) represents a current block interval time.

An embodiment of the present invention further provides a mobile device, as shown in fig. 11, including a processor 1101, a communication interface 1102, a memory 1103, and a communication bus 1104, where the processor 1101, the communication interface 1102, and the memory 1103 complete communication with each other through the communication bus 1104; a memory 1103 for storing a computer program; the processor 1101 is configured to implement steps S101 to S108 shown in fig. 1 when executing the program stored in the memory 1103. The communication bus mentioned in the above mobile device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for communication between the mobile device and other devices. The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above-mentioned steps of the method for offloading block chain consensus task based on latency and transaction throughput. In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-mentioned embodiments of the method for offloading blockchain consensus tasks based on latency and transaction throughput.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment. The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A block chain consensus task unloading method based on time delay and transaction throughput is applied to a first mobile device in a block chain system, and the block chain system further comprises: a plurality of mobile edge computing MEC servers and a plurality of second mobile devices, wherein the first mobile device is a mobile device for generating tile data; the method comprises the following steps:

based on a preset Markov Decision Process (MDP), obtaining the current channel condition at the current moment, the current available computing resources of each MEC server and the current trust value of each second mobile device as a current Markov state; the MDP comprises the following steps: the system comprises a preset state space, a preset reward function and a preset action space, wherein the state space comprises: channel conditions, available computing resources of the MEC server, and trust values of the respective second mobile devices, the action space comprising: an offload decision, transmit power allocation, block size, and block inter time; the reward function is set in advance based on the principle that unloaded block chain consensus task processing time delay is minimized and transaction throughput of the first mobile equipment is maximized;

when the target unloading rule is that a block chain consensus task is unloaded to an MEC server, the block chain consensus task is sent to the MEC server, so that the MEC server receives the block chain consensus task, processes the block chain consensus task, obtains a second processing result, and returns the second processing result to the first mobile device; the first mobile equipment receives a second processing result returned by the MEC server;

2. The method of claim 1, wherein the MDP further comprises: transition relationships between actions and states;

the step of inputting the current markov state into a preset asynchronous dominant actor critics algorithm A3C model, so that the A3C model calculates a reward based on the reward function, and obtaining and outputting a target action corresponding to the current markov state based on the reward, includes:

inputting the current Markov state into the A3C model;

setting initial model parameters in the A3C model as current model parameters;

taking the current Markov state as an A3C current state;

executing a current action based on the A3C current state, the current model parameters and a preset strategy function, wherein the current action comprises: current offload rules, current transmit power allocation, current block size, and current block interval time;

obtaining a current reward according to a current unloading rule, current transmission power distribution, current block size, current block interval time and the reward function in the current action;

obtaining a next A3C state based on the A3C current state, the current action and the transition relation between the action and the state;

judging whether the next A3C state is an A3C termination state or not according to whether the current reward meets a preset reward condition or not, and judging whether the times of executing the current action meet a preset action execution time threshold or not;

if the next A3C state is not the A3C termination state and the number of times of executing the current action does not meet a preset action execution number threshold, taking the next A3C state as the A3C current state, and returning to the step of executing the current action based on the A3C current state, the current model parameter and a preset strategy function;

if the next A3C state is an A3C termination state, or the number of times of executing the current action meets a preset action execution number threshold, calculating an advantage function value corresponding to each A3C state based on the reward corresponding to each A3C state and a preset advantage function calculation formula;

selecting a maximum value from the advantage function values corresponding to the A3C states as a target advantage function value;

calculating a loss function value by using a preset loss function calculation formula and the target advantage function value;

calculating an accumulated gradient by using a preset accumulated gradient calculation formula and the loss function value;

updating the current model parameters with the cumulative gradient;

and judging whether the current iteration number is greater than or equal to a preset iteration number threshold, if so, converging the A3C model, and outputting a target action corresponding to the last A3C state, otherwise, returning to the step of taking the current Markov state as the A3C current state.

3. The method of claim 2,

the step of obtaining a current reward according to the current offload rule, the current transmission power allocation, the current block size, the current block interval time, and the reward function in the current action includes:

calculating the current total time delay by using a first preset expression, wherein the first preset expression is as follows:

in the formula, T _tot,n (t) represents the current total delay of the nth first mobile device, a _n (t) denotes the current offload rule, a _n (t)∈{0,1,2}，a _n (t) the unload rule corresponding to the value of 0 is not to unload the blockchain consensus task, a _n (t) the unloading rule corresponding to the value 1 is to unload the block chain consensus task to the MEC server, a _n (T) an offload rule corresponding to a value of 2 is to offload the blockchain consensus task to a plurality of the second mobile devices, T is ^(L) (T) represents the current first mobile device processing delay, T ^(A) (T) represents the current MEC server processing delay, T ^(D) (t) represents the current processing delay of the second mobile equipment, and t represents the current action times;

calculating the current reward using the reward function, the reward function being:

where O (t) represents the current award, ω ₁ A predetermined weighting factor, 0, representing the throughput of the transaction<ω ₁ <1, Ψ (t) represents the current transaction throughput, ω ₂ A predetermined coefficient, T, representing the total delay _tot,n (t) represents the current total latency of the nth first mobile device, N representing the number of said first mobile devices; wherein the current transaction throughput is calculated based on a current block size and a current block interval time in a current action.

4. The method of claim 3, wherein prior to the step of obtaining a current reward based on the current offload rule, the current transmission power allocation, the current block size, the current block spacing time, and the reward function in the current action, the method further comprises:

judging whether the current action meets a preset condition, wherein the preset condition comprises the following steps:

(C1):a _n (t)∈{0,1,2}

(C2):0≤P _tot,n (t)≤P _T

(C3):T _tot，n (t)≤ω×T _b (t)

(C4):

in the formula, a _n (t) denotes the current offload rule, P _tot,n (t) denotes the currently allocated total transmission power, P _T Indicating a system preset total transmission power, T _tot，n (t) represents the current total delay, ω represents a predetermined weight factor for the inter-block time, and ω represents the current total delay>1，T _b (t) represents the current block interval time, rho (k) represents the preset proportion of the subtasks needing to be sent to the kth second mobile equipment to the block chain consensus task, D ₁ (t) represents the amount of data for the blockchain consensus task, L _k (t) represents a current link capacity of a connection between the first mobile device and the kth second mobile device;

if the current action simultaneously meets C1, C2, C3 and C4 in the preset condition, executing the step of calculating the total time delay by using a first preset expression;

and if the current action does not meet any one of the preset conditions, setting the current reward as a preset numerical value.

5. The method of claim 3, wherein before the step of calculating the current total delay using the first pre-set expression, the method further comprises:

calculating the current transmission rate of the first mobile device for sending the block chain consensus task to the MEC server by using a second preset expression, wherein the second preset expression is as follows:

in the formula (I), the compound is shown in the specification,

representing a current transmission rate at which the first mobile device sends the blockchain consensus task to the MEC server, B representing a channel bandwidth, P _n (t) represents a currently allocated transmission power when the current offload rule is to offload the blockchain consensus task into the MEC server, g _n (t) represents a channel gain, σ, between the first mobile device and the MEC server _n (t) represents a channel noise variance between the first mobile device and the MEC server;

calculating the current transmission delay of the first mobile device for sending the block chain consensus task to the MEC server by using a third preset expression:

in the formula, T ^(A,u) (t) represents a current transmission delay for the first mobile device to send the blockchain consensus task to the MEC server, D ₁ (t) represents an amount of data of the blockchain consensus task,

representing a current transmission rate at which the first mobile device sends the blockchain consensus task to the MEC server;

calculating the current transmission delay of the first mobile device for sending the block chain consensus task to the MEC server, the current execution delay of the MEC server for processing the block chain consensus task, and the sum of the current queuing delay of the block chain consensus task waiting for processing in the MEC server as the processing delay of the current MEC server.

6. The method of claim 3, wherein before the step of calculating the current total delay using the first preset expression, the method further comprises:

calculating the current transmission rate of the first mobile device for sending the subtasks to the second mobile devices by using a fourth preset expression, receiving the second mobile devices of the subtasks, and determining the second mobile devices according to the current trust values of the second mobile devices, wherein the fourth preset expression is as follows:

in the formula (I), the compound is shown in the specification,

a current transmission rate indicating that the first mobile device sent the subtask to the kth second mobile device, B indicating a channel bandwidth, <' > or >>

Representing the current trust value, P, of the kth second mobile device _n,k (t) a transmission power currently allocated when the current offloading rule is to offload the blockchain consensus task to a plurality of the second mobile devices, g _n,k (t) represents the channel gain, σ, between the first mobile device and the kth second mobile device _n,k (t) represents a noise variance between the first mobile device and a kth second mobile device;

calculating the current transmission time delay of the first mobile device for sending the subtasks to the plurality of second mobile devices by using a fifth preset expression:

in the formula, T ^(D,u) (t, k) represents a current transmission delay of the first mobile device to send the subtask to the kth second mobile device,D ₁ (t) represents an amount of data of the blockchain consensus task,

representing a current transmission rate at which the first mobile device transmits the subtask to the kth second mobile device;

and calculating the sum of the current transmission delay of the first mobile equipment for sending the block chain consensus task to each second mobile equipment and the current execution delay of the second mobile equipment for processing the block chain consensus task, wherein the sum is used as the processing delay of the current second mobile equipment.

7. The method of claim 3, wherein the current transaction throughput is calculated using a sixth predetermined expression, and wherein the sixth predetermined expression is:

where Ψ (t) represents the current transaction throughput, χ represents the data volume of the target transaction data, S _b (T) denotes the current block size, T _b (t) represents a current block interval time.

8. A block chain consensus task unloading device based on time delay and transaction throughput is applied to a first mobile device in a block chain system, and the block chain system further comprises: a plurality of mobile edge computing MEC servers and a plurality of second mobile devices, wherein the first mobile device is a mobile device for generating tile data; the device comprises:

9. The mobile equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.