CN112202928B

CN112202928B - Credible unloading cooperative node selection system and method for sensing edge cloud block chain network

Info

Publication number: CN112202928B
Application number: CN202011276468.5A
Authority: CN
Inventors: 刘建华; 沈士根; 方朝曦; 黄龙军; 李琪; 冯晟; 方曙琴
Original assignee: University of Shaoxing
Current assignee: University of Shaoxing
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2022-05-17
Anticipated expiration: 2040-11-16
Also published as: CN112202928A

Abstract

The invention discloses a system and a method for selecting a credible unloading cooperative node of a sensing edge cloud block chain network. The system includes a sensing cloud edge node, and a tile created in the edge node, the edge node and the tile chain forming an edge DAG tile chain network. The method comprises the following steps: (1) acquiring a training task issued by a training task issuing node; (2) taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodes_v,hAnd registering into a DAG block chain; (3) and adopting reinforcement learning planning to calculate an optimization strategy of the task unloading path in the DAG block chain according to the transaction state of each edge node, and formulating an unloading path action set according to the optimization strategy. Aiming at multi-hop computation task cooperative unloading, the invention establishes a multi-hop computation task cooperative unloading model based on an edge DAG block chain, and nodes participating in computation task cooperative unloading are registered in an edge DAG block chain network to cooperatively complete multi-hop distributed federal learning tasks.

Description

Credible unloading cooperative node selection system and method for sensing edge cloud block chain network

Technical Field

The invention belongs to the technical field of Internet of things, and particularly relates to a system and a method for selecting a trusted unloading cooperative node of a sensing edge cloud block chain network.

Background

In order to reduce the dependence of the unloading of computing tasks in the sensing edge cloud on the remote cloud, the computing tasks are unloaded among the edge nodes in a multi-hop cooperation mode to complete the training of the model. The multi-hop computation task cooperation unloading process comprises two stages of computation task transmission and model distributed training, has the characteristics of good expansibility, strong robustness and the like, and can better support distributed federal learning of the computation tasks and protect private data of nodes. However, as the number of hops and nodes increases, a number of challenges are also presented to optimize the quality of service for the trusted offload of computing tasks.

The model training mode based on multi-hop calculation task unloading can effectively avoid single-point failure, fully utilizes local data of edge nodes to train the model in a distributed mode, and can effectively improve the performance of federal learning. However, distributed federal learning by means of multi-hop computing task offload techniques faces security issues. Due to the selfness of the edge nodes, the training of the computing task cannot reach the expected training precision, or malicious nodes modify the trained model to cheat the nodes which cooperate with each other, and mislead the next-hop node to continue to perform inefficient training. This makes the behavior of the edge node participating in the cooperation unreliable, resulting in that low-delay credible cooperation cannot be performed, and reduces the service quality of computation task offloading and distributed federal learning. Therefore, a key challenge in solving this problem is how to balance the coordination decision between the offloading delay of the multi-hop computing task of the edge node and the trusted collaboration, so as to improve the service quality of the offloading path of the multi-hop computing task. In the face of this challenge, researchers have proposed some methods for collaborative offloading of computing tasks. Yan et al consider the task graph of a single-user edge computing system and propose a reinforcement learning framework to optimize the offloading decision of tasks at local or edge nodes and the resource allocation problem, but this scheme does not consider the multi-hop computing taskService unloading scene (the "Offloading and resource allocation With general task hierarchy in Mobile edge computing: A Deep requirement learning approach," in IEEETransactiononson Wirelesscommunications_,vol.19, No.8, pp.5404-5419, aug.2020). Hong et al model the optimization problem of the computation task offload path including Edge nodes and Cloud nodes as a Multi-Hop computation task offload game, and propose a QoS-aware Distributed algorithm, but do not consider the trust problem of inter-node cooperative offload ("Multi-Hop cooperative offload for Industrial IoT-Edge-Cloud Computing environment," in IEEE Transactions on Parallel and Distributed Systems)_,vol.30, No.12, pp.2759-2774,1 dec.2019). L. xiao et al propose a trust mechanism based on block chain to resist selfish edge attack and spoofing record attack, and enhance security of computation task offloading between mobile device and edge node by a method of establishing reputation, but do not achieve security of multi-hop computation task offloading between edge nodes ("attention learning and block-based trust for creating network Networks," in ieee transaction communications Networks_,vol.68, No.9, pp.5460-5470, Sept.2020). These research protocols also suffer from the following deficiencies:

(1) the proposed solution considers the multi-hop computation task offloading and the cooperative training among the multi-hop edge nodes less, but only considers the single-hop computation task offloading performance from the sensing device to the edge nodes, and cannot support the multi-hop distributed federal learning. Therefore, the proposed solution has limitations to be applied in multi-hop distributed computation task cooperative offloading.

(2) The proposed solution does not combine the block chain technique to achieve the trusted cooperative offloading of the multi-hop computing task. Particularly, with the increase of the number of nodes and the number of hops in the multi-hop computing task unloading, the credible cooperation and unloading delay decision space among the nodes is increased, and the existing solution does not provide a corresponding processing method.

(3) The existing solution does not consider the situation that an intelligent attacker uses means such as increasing computing time and modifying a model to attack the multi-hop computing task unloading node, and does not provide a trusted cooperative node selection method in the multi-hop computing task unloading aiming at the type of attack.

Disclosure of Invention

In order to solve the defects of the method, the invention provides a multi-hop computation task unloading method based on a DAG block chain in a sensing edge cloud environment, and the method realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time of a malicious edge node, model modification and other incredible behaviors.

To achieve the above object, according to one aspect of the present invention, a system for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network is provided, where the system includes a sensor cloud edge node and a block created in the edge node, and the edge node and the blockchain form an edge DAG blockchain network G_b＝(V_b,E_b) In which V is_bThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; e_bThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;

the block of the edge node stores a model, training time and model size of the training task which can not be changed;

the blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactions_kChoose the action value of

The transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of the training task of each node are recorded, and the trust of each node is updated.

Preferably, the transaction request node of the trusted offload cooperative node selection system in the sensor edge cloud block chain network is configured to initiate a transaction to another sensor cloud edge nodeRequest phi_j＝(D_j,Y_j,Υ_j) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phi_j＝(D_j,Y_j,Υ_j) Wherein D is_jIs the model size, in bits; y is_jResources spent to complete the requested training task for the transaction; gamma ray_jAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.

The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;

the intelligent contract SC ═ { l (t) | t ∈ [ t [)_min,t_max]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is t_minAnd t_maxThe lower limit and the upper limit of the confidence interval according to the training time.

Preferably, the system for selecting trusted offload cooperative nodes in the sensing edge cloud block chain network comprises a policy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, and preferably a DNN network.

Preferably, in the system for selecting a trusted offload cooperative node in a sensing edge cloud block chain network, the inputs of the policy network are: current transaction response node v_k+1All transaction request nodes observed, v ═ v_kThe transaction status of } is; transaction request node v_kState of(s)_v,kRepresenting a transaction requesting node v_kThe status of the transaction at the time of initiation of the transaction, wherein

A state of the intelligent contract is represented,

the representation is in compliance with a smart contract,

a violation of the intelligent contract is indicated,

indicating that the task offload delay time is calculated on the h-th hop of the transaction connection

Whether it is short or long, when

If so, the delay time is long, otherwise, the delay time is short;

in order to calculate the task offload transmission latency,

offloading a transmission delay threshold for a preset computing task;

the output of the policy network is: each transaction request node v_kState of(s)_v,kTo action a_v,kIs mapped with probability P (a)_t＝a_v,k|s_t＝s_v,k,θ_tTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta^*(a_v,k|s_v,k)＝P(a_t＝a_v,k|s_t＝s_v,k,θ_t＝θ)；

The adopted reward function for training the strategy network is as follows: r is_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h，C_v,hCost function C for the sensor cloud edge node_v,hPreference is given toTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:

in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+.. cost of return on discount, r₁,r₂,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated value

Preferably, the value of (A) is a parameter of

And (4) estimating a value network.

Preferably, the system for selecting trusted offload cooperative nodes in sensing edge cloud block chain network comprises a value network, preferably a DNN network, whose input is the transaction state of the transaction response node and output is a value

With network parameters of

The update equation is as follows:

wherein ξ_vIs the learning rate.

The value network is iteratively updated using the square of the error using a loss function of

According to another aspect of the present invention, a method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is provided, which includes the following steps:

(1) acquiring a training task gamma issued by a training task issuing node_n＝{w_nThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction process_dThe credibility tolerance parameter lambda of the task intelligent contract_s；

(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodes_v,hAnd registering into a DAG block chain;

(3) trading state s according to each edge node in DAG block chain obtained in step (1)_v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning

According to an optimization strategy

Set of actions for creating offload paths

Establishing a transaction connection tau conforming to an intelligent contract between the transaction request node and the transaction response node of each hop, thereby forming a task unloading path;

wherein the optimization strategy

Pr represents the state s_v,kTo action a_v,kThe probability of the mapping of (a) to (b),

responding to a node v for trading in a computing task offload transaction_k+1For transaction request node v_kOptimal confirmation selection action set, action taken

Meaning that the transaction requesting node is not selected as a collaborator,

indicating that the transaction request node is selected as a collaborator; state-action pair { a_v,k|s_v,kDenotes at transaction request node v_kState of(s)_v,kConditional transaction response node v_k+1Confirmation selection action of_v,k。

Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network is used for transaction phi_j＝(D_j,Y_j,Υ_j) Wherein D is_jThe size of the model is represented, and the unit is bits; y is_jRepresents the resources that need to be spent in completing the training task; gamma ray_jBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge node_v,hFor it to act as a transaction response node v_k+1Selecting a transaction requesting node v_kThen, the transaction cost is unloaded by the calculation task on the h-th hop transaction connection, which comprises time delay and credible tolerance, and the calculation is carried out according to the following method:

wherein λ is_dOffloading transactions for multi-hop computing tasksTolerance parameter of in-process delay time, λ_sA credibility tolerance parameter of the intelligent contract;

in order to calculate the task offload transmission latency,

x_k+1,krepresenting a transaction requesting node v_kWhether the trained model is confirmed to be accepted and unloaded to a transaction response node v_k+1And (6) processing. If x _k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise x_k+1,k＝0，

Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, p_kRepresenting transmission power, σ²Representing the noise power. g_kRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;

the execution time of the task in the unloading for the h-th hop, wherein L_jIs the total computational load, f_cThe service rate of each CPU core is a configurable variable;

task queue wait time for all nodes in the h-th hop transaction connection,

representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, f_cIs the service rate of each CPU core, is a configurable variable,

for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number of_jA value of 1 indicates successful unloading, otherwise it is 0. I is_{*}Is an indicator function, if the condition is true, then I_{*}1, otherwise I_{*}0, the amount of tasks z already present in the current edge DAG blockchain trading node_hService parameter is delta_hPoisson distribution of, i.e.

Φ_h1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.

Preferably, in the method for selecting a trusted offload cooperative node in the sensing edge cloud block chain network, the objective of the reinforcement learning in step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:

MTOR:minC_o

a_v＝{a_v,1,a_v,2,...,a_v,ε}

SC＝{l(t)|t∈[t_min,t_max]}

wherein C is_oTo calculate task offload transaction costs, a_v＝{a_v,1,a_v,2,...,a_v,εIs the action set, SC ═ l (t) | t ∈ [ t }_min,t_max]Is an intelligent contract.

The accumulated reward function adopted by the reinforcement learning is as follows:

wherein r is_hRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r is_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

Preferably, a greedy algorithm is adopted to regard the strategy optimization as a Markov process, and a maximum-time action strategy pi of the instantaneous reward function of each jump is obtained^*(s_v,k) And recording the action strategy acquisition optimization strategy of the h hop as:

wherein, P_TTo transmit the probability, gamma is the discount factor, V(s)_v,k+1|π^*) To obtain an optimum strategy pi^*The state value function of time, defined as:

preferably, in the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, the optimization strategy is solved through a model-free reinforcement learning algorithm in the step (3)

The method comprises the following specific steps:

(3-1) initializing a task unloading parameter theta to obtain a current strategy network, namely taking the last updated task unloading parameter theta as the task unloading parameter theta of the current strategy network;

(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offload_k+1Observation and collection of transaction node request points v_kTransaction state s of_v,kAdopting the current strategy network to calculate all current transaction request nodes v_kAnd a transaction response node v_k+1Action strategy of^*(s_v,k) Estimating the instantaneous prize r_hThereby determining action a_v,kTo select a transaction requesting node v_kOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained^*(s_v,k) Composition optimization strategy

The current transaction request node v is calculated by adopting the current policy network_kAnd optimization strategy pi of all transaction response nodes^*(s_v,k) The method specifically comprises the following steps:

when requesting transaction node v_kIs s_v,kAction a taken by the transaction response node_v,kThe probability of (c) is:

π(a_v,k|s_v,k)＝P(a_t＝a_v,k|s_t＝s_v,k,θ_t＝θ)

wherein P is in state s_v,kWhen, the action taken is a_v,kθ is a policy network parameter.

Preferably, the policy network employs a DNN architecture.

The optimization strategy is pi(s)_v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection a_v,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.

The instant prizes are estimated as follows: r is_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.

Preferably, in the method for selecting a trusted offload cooperative node in the sensor edge cloud block chain network, the updating the experience cache in step (3) is specifically: recording transaction request node status, transaction response node status, action value, instant reward r in experience cache_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network,it (4) caches learning parameters of the update value function according to experience

And a task offload parameter θ;

learning parameters of the update value function

The method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:

wherein ξ_vIs the learning rate, the loss function is

Is the output result of the value function.

The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+.. report cost back by discounting, caching based on experienceCalculating the stored historical instant reward, wherein gamma is a discount factor; function of estimated value

The value of (A) preferably adopts a parameter

And (4) estimating a value network.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and multi-hop distributed federal learning tasks are completed in a cooperation mode.

In order to realize low-delay and credible multi-hop computation task cooperative unloading, a multi-hop computation task unloading delay cost function and an intelligent contract model in an edge DAG block chain network are established.

In order to solve the problem of confirmation and selection of transaction nodes in a multi-hop computation task unloading path, the invention models the problem into a Markov decision process for reverse transaction request node selection based on a DAG block chain, and further provides a collaborative node selection algorithm in multi-hop computation task unloading based on reinforcement learning.

Drawings

Fig. 1 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network provided by the present invention;

fig. 2 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network according to an embodiment of the present invention.

The same reference numbers will be used throughout the drawings to refer to the same or like elements or structures, wherein:

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The system for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, as shown in fig. 1, includes sensing cloud edge nodes and blocks created in the edge nodes, where the edge nodes and the block chain form an edge DAG block chain network G_b＝(V_b,E_b) In which V is_bThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; e_bThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; preferably, the policy network, more preferably, the value network;

the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;

the transaction request node is used for initiating a transaction request phi to other sensing cloud edge nodes_j＝(D_j,Y_j,Υ_j) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phi_j＝(D_j,Y_j,Υ_j) Wherein D is_jIs the model size, in bits; y is_jResources spent to complete the requested training task for the transaction; gamma ray_jAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.

The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, the transaction request is confirmed, and the transaction request confirmation and the intelligent and the required number of bitcoins are sent to the transaction request node.

The optimization strategy is preferably obtained by solving using an enhanced learning model, and preferably can be solved by adopting a strategy network, as shown in fig. 2.

The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, preferably a DNN network; specifically, the method comprises the following steps:

the inputs to the policy network are: current transaction response node v_k+1All transaction request nodes observed, v ═ v_kThe transaction status of } is; transaction request node v_kState of(s)_v,kRepresenting a transaction requesting node v_kThe status of the transaction at the time of initiation of the transaction, wherein

A state of the intelligent contract is represented,

the representation is in compliance with a smart contract,

a violation of the intelligent contract is indicated,

Whether it is short or long, when

If so, the delay time is long, otherwise, the delay time is short;

in order to calculate the task offload transmission latency,

offloading a transmission delay threshold for a preset computing task;

The adopted reward function for training the strategy network is as follows: r is_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h，C_v,hCost function C of the sensing cloud edge node_v,hPreferably, a random gradient descent method is adopted to train and update a performance function of the multi-hop calculation task unloading strategy parameter theta, and the performance function is specifically as follows:

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+.. return cost for discount, r₁,r₂,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated value

Preferably, the value of (A) is a parameter of

And (4) estimating a value network.

The value network, preferably a DNN network, has as input the transaction status of the transaction response node and as output a value

With network parameters of

The update equation is as follows:

wherein ξ_vIs the learning rate.

The invention provides a multi-hop computation task unloading method based on a DAG block chain under a sensing edge cloud environment, which realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time, model modification and other incredible behaviors of a malicious edge node.

The invention provides a method for selecting a trusted unloading cooperative node in a sensing edge cloud block chain network, which comprises the following steps:

(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodes_v,hAnd registering to a DAG block chain;

for transaction phi_j＝(D_j,Y_j,Υ_j) Wherein D is_jRepresenting the size of the model in bits; y is_jRepresents the resources that need to be spent in completing the training task; gamma ray_jBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge node_v,hFor it to act as a transaction response node v_k+1Selecting a transaction requesting node v_kAnd then, calculating task unloading transaction cost on the h-th hop of transaction connection, wherein the task unloading transaction cost comprises time delay and credibility tolerance, and calculating according to the following method:

wherein λ is_dTolerance parameter, lambda, of delay time in offloading transactions for a multi-hop computing task_sA credibility tolerance parameter of the intelligent contract;

in order to calculate the task offload transmission latency,

Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, p_kDenotes the transmission power, σ²Representing the noise power. g_kRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;

task queue wait time for all nodes in the h-th hop transaction connection,

for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number of_jA value of 1 indicates successful unloading, otherwise it is 0. I is_{*}Is an indicator function, if the condition is true, then I_{*}1, otherwise I_{*}0, the amount of tasks already present in the current edge DAG blockchain trading nodez_hService parameter is delta_hPoisson distribution of, i.e.

The edge DAG block chain G_b＝(V_b,E_b) Is a directed task graph, where V_bThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; e_bThe two parties are connected for the transaction of the participants, namely, the two parties conduct the transaction according to the preset intelligent contract.

According to an optimization strategy

Set of actions to formulate offload paths

Establishing a transaction connection tau which accords with an intelligent contract between the transaction request node and the transaction response node of each hop so as to form a task unloading path;

wherein the optimization strategy

responding to a node v for trading in a computing task offload transaction_k+1For transaction request node v_kAdoptOptimal confirmation selection action set, action

Meaning that the transaction requesting node is not selected as a collaborator,

The goal of reinforcement learning is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:

MTOR:minC_o

a_v＝{a_v,1,a_v,2,...,a_v,ε}

SC＝{l(t)|t∈[t_min,t_max]}

wherein C is_oTo calculate task offload transaction costs, a_v＝{a_v,1,a_v,2,...,a_v,εIs action set, SC ═ { l (t) | t ∈ [ t ])_min,t_max]Is an intelligent contract.

solving optimization strategies preferably by model-free reinforcement learning algorithms

The method comprises the following specific steps:

(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;

π(a_v,k|s_v,k)＝P(a_t＝a_v,k|s_t＝s_v,k,θ_t＝θ)

Preferably, the policy network employs a DNN architecture.

The instant prizes are estimated as follows: r is a radical of hydrogen_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

The updating the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.

The updating experience cache specifically comprises: recording transaction request node status, transaction response node status, action value, instant reward r in experience cache_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

(4) Caching learning parameters of update value functions according to experience

And a task offload parameter θ;

learning parameters of the update value functionNumber of

wherein xi is_vIs the learning rate, the loss function is

Is the output result of the value function.

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+.. calculating the discount return cost according to the historical instantaneous reward stored in the experience cache, wherein gamma is a discount factor; function of estimated value

The value of (A) preferably adopts a parameter

And (4) estimating a value network.

Aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and a multi-hop distributed federal learning task is completed in a cooperation mode.

The invention designs a credible cooperative node selection method in multi-hop computing task unloading by combining a block chain technology and a reinforcement learning algorithm. The method comprises the steps of firstly establishing an edge DAG block chain network according to a DAG (directedacyclinograph) graph unloaded in a multi-hop computing task cooperation mode. Then, a Markov decision process is formed by formalizing the problem of selecting the credible cooperative nodes in the multi-hop computing task unloading. On the basis of considering the dynamics of the unloading transaction connection of the computing task of the edge node and the selfishness of the node, in order to select the credible cooperative unloading node, the invention provides a multi-hop computing task unloading transaction node selection algorithm based on reinforcement learning by combining with the block chain technology. Thereby improving the credible service quality of the multi-hop computing task unloading.

The following are examples:

the system for selecting the credible unloading cooperative node in the sensing edge cloud block chain network comprises sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain G_b＝(V_b,E_b) In which V is_bOffloading for participating in a computing taskCarrying edge block chain nodes of transaction, and taking the edge block chain nodes as transaction request nodes and transaction response nodes when the calculation task is unloaded; e_bThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; also included are a policy network, and a value network;

The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending transaction request confirmation and intelligent and required number of bitcoins to the transaction request node.

The intelligent contract SC ═ { l (t) | t ∈ [ t [)_min,t_max]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, wherein the larger the value l (t) is, the higher the degree of compliance of the intelligent contract for task unloading transaction is calculated; t is t_minAnd t_maxThe lower limit and the upper limit of the confidence interval according to the training time.

Is mapped toAnd establishing transaction connection by taking the transaction request node with the highest probability as a cooperative node, taking the transaction response node as a transaction request node of the next hop, recording a model, training time and model size of the completion of the training task of each node, and updating the trust of each node.

The optimization strategy is obtained by solving the optimization strategy by using an enhanced learning model and solving the optimization strategy by using a strategy network.

The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, is a model-free reinforcement learning structure and adopts a DNN network; specifically, the method comprises the following steps:

the inputs to the policy network are: current transaction response node v_k+1All transaction request nodes observed, v ═ v_kThe trade state of the node is defined, k is a trade response node subscript, and the maximum value of the node is equal to the maximum hop number; transaction request node v_kState of(s)_v,kRepresenting a transaction requesting node v_kThe status of the transaction at the time of initiation of the transaction, wherein

A state of the intelligent contract is represented,

the representation is in compliance with a smart contract,

a violation of the intelligent contract is indicated,

Whether it is short or long, when

If so, the delay time is long, otherwise, the delay time is short;

in order to calculate the task offload transmission latency,

offloading a transmission delay threshold for a preset computing task;

The adopted reward function for training the strategy network is as follows: r is_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h，C_v,hCost function C for the sensor cloud edge node_v,hTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:

wherein xi is_pFor learning rate, G ═ γ r₁+γ²r₂+.. cost of return on discount, r₁,r₂,.. historical instant rewards read from a verified cache for storageThe historical instant reward of (1), gamma is a discount factor; function of estimated value

Preferably, the value of (A) is a parameter of

And (4) estimating a value network.

The value network is a DNN network, the input of the value network is the transaction state of the transaction response node, and the output of the value network is a value

With network parameters of

The update equation is as follows:

wherein ξ_vIs the learning rate.

Due to the high-delay characteristic that the computing task is unloaded to the remote cloud node, in order to reduce the unloading cost of the computing task, the computing task of the sensing equipment is unloaded to the edge nodes, and the invention considers that the edge nodes process the computing task gamma in a distributed mode in a multi-hop cooperation mode_n＝{w_n} of the position of the frame. Wherein w_nRepresenting the model that needs to be trained. When computing task gamma_nIs offloaded to a plurality of edge nodes for each to-be-received model w_nThe next-hop edge node firstly confirms the workload certification of model training and then unloads the next-hop edge node to the edge node after confirmationAnd continuing training. For a computation task Γ_nN edge nodes form a multi-hop computation task unloading path and participate in task training successively. Thus, the multi-hop computing task offload process can be represented as a directed task graph G_a＝(V_a,E_a) In which V is_aRepresenting edge nodes that are associated with a trusted context for task offloading, such as: execution time, queue time, model training results, and the like. E_aRepresenting offload connections between edge nodes, which is related to compute task offload transfer rates and transfer times between edge nodes. In order to record the training result of each edge node and prevent malicious nodes from modifying the trained model data, the invention defines a DAG block chain on the basis of the task graph, the blocks are created in the edge nodes, the edge nodes and the block chain are integrated into an edge DAG block chain network, the nodes in the network are called edge block chain nodes, and the edge nodes and the block chain are modeled into a directed graph G of the integrated DAG block chain_b＝(V_b,E_b) As shown in FIG. 1, wherein V_bThe border area blockchain node which participates in a computation task unloading transaction has two roles, a computation task unloading transaction request node and a transaction response node. E_bThe method represents the transaction connection of the participants, and the two parties participating in the transaction can establish the transaction connection only by following a certain intelligent contract.

After a model is trained by a computation task unloading request node, the model is stored in a block to ensure that the model cannot be changed. And meanwhile, the training duration and the size of the model are automatically recorded in the block, then a calculation task unloading transaction is initiated to the transaction response node, when a transaction responder receives a transaction request unloaded by the transaction request node, the responder firstly reversely confirms whether the transaction is credible according to the intelligent contract, and if the transaction response node finds that the unloaded transaction cannot realize the condition in the intelligent contract, the transaction confirmation is failed, and other transaction request nodes are selected to continue to confirm. Otherwise, the transaction is confirmed to be passed, the transaction response node sends a certain bitcoin to the transaction request node to serve as model reward, and meanwhile the trust degree of the transaction request node is updated.

The method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network provided by the embodiment comprises the following steps:

wherein λ is_dTolerance parameter, lambda, for delay times in off-loading transactions for multi-hop computing tasks_sA credibility tolerance parameter of the intelligent contract;

in order to calculate the task offload transmission latency,

task queue wait time for all nodes in the h-th hop transaction connection,

Φ_h1-l (t), where l (t) represents the probability that the expected model training time t of the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the computation of anybody over the transaction connectionThe higher the compliance of the service offload transaction smart contract.

Unloading intelligent contracts and delay cost estimation of multi-hop computing tasks in an edge DAG block chain network:

block chain network graph G at edge DAG_bIn the method, a calculation task unloading transaction request node initiates a transaction request phi to an adjacent marginal area block chain node_j＝(D_j,Y_j,Υ_j) Wherein D is_jRepresenting the size of the model in bits, Y_jIndicating the resources that need to be spent to complete the training task. Gamma ray_jAnd indicating that the training model consumes bitcoins of unit resources of the edge block chain nodes. After the buyer of the transaction response node as the model confirms the training result, certain bitcoins are sent to the transaction request node to compensate the resource consumption. When the maximum training hop number of the training task issuing node is set to be epsilon, at least epsilon +1 nodes are required in the edge DAG block chain network participating in the unloading of the multi-hop computing task. However, due to the fact that the destructive behaviors (such as delay of calculation time, modification of models and the like) of the intelligent attacker make transaction connection, calculation behaviors and training results in the marginal block chain network become untrustworthy, the number of times of failure of transaction between the calculation task unloading transaction request and the response node is increased, and the transaction trust is reduced. Finally, the trust lifetime of the edge DAG blockchain network becomes smaller as the node trust level decays. To increase the trusted lifetime of the edge DAG blockchain network, the intelligent contract is triggered when the transaction response node confirms that the computing task offloads the transaction request node. At the moment, only the transaction meeting the intelligent contract can be used as a credible transaction, and the transaction response node reversely selects the transaction request node as a cooperative node on the calculation task unloading path. The invention takes the training time of the model as the workload proof of the edge DAG block link points. Thus, an intelligent contract for computing task offload transactions is defined as SC ═ { l (t) | t ∈ [ t ])_min,t_max]L (t) represents the probability that the expected model training time t of the transaction response node falls in the credible interval, and the larger the value of l (t), the higher the degree of compliance of the intelligent contract for calculating task unloading transaction. t is t_minAnd t_maxAre parameters that can be set according to the goal of the training. When in useWhen the transaction response node receives a transaction request, whether the training time of the model meets the credible interval requirement of the training time in the intelligent contract is searched and recorded in the block, and if the training time does not meet the credible interval requirement of the training time in the intelligent contract, the calculation task unloading transaction cannot be carried out. Within a one-hop compute task offload transaction connection, the set of transaction request nodes that a transaction response node may choose to acknowledge is defined as v ═ { v ═ v_kThe invention considers the computation task unloading transmission delay time, the execution time of the computation task on the edge DAG block chain network node and the waiting time of the task queue in the computation task unloading transaction process.

The calculation of the task offload transfer delay time is calculated as follows:

since in an edge DAG blockchain network, a plurality of transaction request nodes initiate acknowledgement requests to response nodes, the change of the acknowledgement request channel state causes delay of the transmission time of the computation task unloading, in order to calculate the transmission time of the transaction request nodes unloading the computation task to the response nodes, the available unloading transmission rate of the computation task on a transaction connection tau is defined as:

wherein B represents a bandwidth, p_kRepresenting transmission power, σ²Representing the noise power. g_kRepresenting the channel gain for indicating the transmission loss from the transaction requesting node to the responding node. Therefore, the calculation task unloading transmission delay time from the transaction request node to the response node is as follows:

wherein x is_k+1,kRepresenting a transaction requesting node v_kWhether the trained model is confirmed to be accepted and unloaded to a transaction response node v_k+1And (6) processing. If x _k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise x_k+1k0 thus makes it possible to calculate the time of transmission and reception as

Further calculating the transmission time on the whole computation task unloading path as

The edge DAG block chain node task execution time is calculated as follows:

in the multi-hop computation task unloading, each edge DAG block chain node needs to complete workload certification through a model training task. The present disclosure assumes that the edge DAG block segment has χ kernels. The execution time of the task is

Wherein L is_jIs the total computational load. f. of_cIs the service rate of each CPU core and is a configurable variable.

The edge DAG block nexus task queue latency is calculated as follows:

since the nodes participating in the transaction receive tasks offloaded by multiple transaction requesting nodes, the computational task offload delay time of the edge DAG blockchain network is also related to the amount of tasks in the current node receive queue. Existing task amount z in current edge DAG blockchain trading node_hService parameter is delta_hPoisson distribution of, i.e.

From this, it can be calculated that the average arrival rate of the computation task offload is:

where M represents the number of offloads in the h-th hop transaction connection. x is the number of_jA value of 1 indicates successful unloading, otherwise it is 0. I is_{*}Is an indicator function, if the condition is true, then I_{*}1, otherwise I_{*}0. Since the currently processed task requires a certain training time to complete, each training task arriving in the queue needs to wait for the completion of the task already being processed before being processed. Therefore, the training task entering the queue needs to wait for a period of time to be processed, and the waiting time of the task queue of all nodes in the h-th hop transaction connection

Wherein the content of the first and second substances,

the number of resources required by all nodes in the h-th hop transaction connection to process the tasks in the queue is represented, and therefore, the total delay time for calculating the task unloading transaction is as follows:

According to an optimization strategy

Set of actions to formulate offload paths

Establishing a transaction connection tau conforming to the intelligent contract between the transaction request node and the transaction response node of each hop so as to form a taskA traffic offload path;

wherein the optimization strategy

Meaning that the transaction requesting node is not selected as a collaborator,

means for selecting a transaction request node as a collaborator; state-action pair { a_v,k|s_v,kDenotes at transaction request node v_kState of(s)_v,kConditional transaction response node v_k+1Confirmation selection action of_v,k。

MTOR:minC_o

a_v＝{a_v,1,a_v,2,...,a_v,ε}

SC＝{l(t)|t∈[t_min,t_max]}

where Co is the computational task offload transaction cost, a_v＝{a_v,1,a_v,2,...,a_vE isAction set, SC ═ l (t) | t ∈ [ t ]_min,t_max]Is an intelligent contract.

The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained^*(s_v,k) And recording the action strategy acquisition optimization strategy of the h hop as:

wherein, P_TFor the transmission probability, gamma is the discounting factor, V(s)_v,k+1|π^*) To obtain an optimum strategy pi^*The state value function of time, defined as:

the invention designs a method for selecting a trusted cooperative node in multi-hop computing task unloading based on a DAG block chain, which constructs a trusted transaction path by selecting a trusted computing task unloading node. Since in DAG blockchain based edge networks, the decision by a transaction response node whether to accept a task offload from a transaction request node depends on the confirmation of the trustworthiness of the previous transaction request node. Thus, the backward selection of transaction request nodes in multi-hop computing task offloading can be modeled as a Markov decision process, which can be defined as a tuple Θ_M＝(S,A_v,k,Pr,C_v,h) Wherein, in the step (A),

1)S：S＝{s_v,k∈S|S＝s_v,1,s_v,2,...,s_v,ndenotes the state space of transactions between edge DAG blockchain nodes, s_v,kRepresenting a transaction requesting node v_kThe status of the transaction when initiated.

Wherein the content of the first and second substances,

a state of the intelligent contract is represented,

the representation is in compliance with a smart contract,

a violation of the intelligent contract is indicated,

Whether it is short or long, when

If so, the delay time is long, otherwise, the delay time is short;

in order to calculate the task offload transmission latency,

offloading a transmission delay threshold for a preset computing task. In a computing task offload transaction, a transaction response node v_k+1For transaction request node v_kConfirmation is carried out, v after the confirmation is passed_k+1Becomes the requesting node for the next hop to compute any offload transactions.

2)A_v,k：

Representing a possible action space. Wherein, a_v,kRepresenting a transaction response node v in a computing task offload transaction_k+1For transaction request node v_kThe affirmative selection action to be taken is,

meaning that the transaction requesting node is not selected as a collaborator,

indicating that the transaction request node is selected as a collaborator. State-action pair { a_v,k|s_v,kDenotes at transaction request node v_kState of(s)_v,kTransaction response node v_k+1Confirmation selection action of_v,k。

3) Pr: represents a state s_v,kTo action a_v,kThe mapping probability of (2). In an untrusted computing task offload environment, the goal of a responding node in a computing task offload transaction is to obtain an optimization strategy pi^*I.e. by s_v,kTo a_v,kThe mapping probability of (2). According to an optimization strategy pi^*Transaction response node v_k+1Confirmation selection action that can find optimum

The optimization strategy for the transaction connection set on the computation task unloading path is

The optimal confirmation selection action set of the transaction response node is as follows:

4)C_v,hrepresenting a transaction requesting node v_kAnd a transaction response node v_k+1The computational task on the h-th hop's transaction connection offloads the transaction cost. Including time delay and confidence tolerance, computerThe transaction offload transaction cost may be calculated as:

wherein λ is_dAnd the tolerance parameter represents the delay time in the process of unloading the transaction of the multi-hop computing task. Lambda [ alpha ]_sRepresenting a confidence tolerance parameter of the smart contract. Phi_h1-l (t). For an offloaded computing task, after setting a maximum number of training hops, a transaction node in the offload path of the computing task that is to be trusted confirms the selection of the action set

The trading nodes confirm that the multi-hop path formed by the elements in the selection action set should meet the delay sensitivity requirement and the intelligent contract condition of the calculation task unloading, and the transaction cost of the calculation task unloading is minimized.

In reinforcement learning, the search process of the optimization strategy can be modeled as a Markov decision process, and the invention makes theta_MFurther expanded to theta_RL＝(S,A_v,k,P_T,r_kγ), wherein S and A_v,kIs theta_MState space and actions in (1). P_TIs the probability of delivery. r is_hRepresenting the instant prize function. γ is a discounting factor. Selecting a collaborative transaction requesting node using reinforcement learning to obtain a set of transaction node validation selection actions in a trusted computing task offload path

In the process of confirming and selecting the transaction nodes of the multi-hop computation task unloading path, transaction response nodes v_k+1Firstly, all transaction request nodes v are observed_kCurrent transaction state s_vK, selecting a transaction connection τ to perform a confirmation selection action a on its transaction status_v,k. Then, the transaction response node v_k+1Earning a reward r_h. Transaction response node randomly selecting an action a using a greedy search strategy_v,k～π(s_v,k) To confirm the transaction connection tau of the first arriving transaction requesting node. Edge DAG blockchain network passing transitive probability P_T(s_v,k+1|s_v,k,a_v,k) The state of the trading nodes on the edge DAG blockchain network is updated. At this time, the transaction response node v_k+1Get a transient reward r of the trade connection tau_hTo evaluate the efficiency of his confirmation of the selection action after it has been made. If the delay is short and the intelligent contract condition is met, the transaction response node firstly sends a certain bitcoin to the transaction request node v_kThen v_k+1Receiving v_kThe transmitted model begins to be trained by using local data, and v is trained after a certain time_k+1To v_k+2Initiating a transaction request, v_k+2Node observation v_k+1Transaction status s of a node_v,k+1And performing a confirmation selection action a_v,k+1And the node selection process is ended until the maximum hop count is reached by selecting the coordinated transaction request nodes in the process. The goal of the reinforcement learning participants is to maximize the rewards per transaction. Therefore, in one distributed federal learning task transaction, a complete multi-hop calculation task trusted unloading path can be discovered by using reinforcement learning.

The transaction response node v can be obtained by the formula (5)_k+1Obtaining an instantaneous reward r over a transaction connection at the h-th hop_hComprises the following steps:

r_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h (7)

accordingly, in a distributed federated learning task transaction, the reward of the cumulative policy π brought by the transaction response node confirmation selection can be expressed as follows:

where γ is the discount factor for each hop transaction, indicating the importance of the selection of the future transaction requesting node to the selection of the current transaction requesting node. Computing task offload transactions once computing task offload reaches a preset maximum number of hopsAnd (5) stopping. In the multi-hop calculation task unloading process, the reinforcement learning participant records the optimal calculation task unloading transaction path in the block

And the transaction response node validates the selected reward each time. From the accumulated rewards, a slave status slave s can be defined_v,1And a strategy pi starting state value function:

the online multi-hop calculation task unloading method selects the optimal strategy pi to maximize the value function of each state, namely

In equation (10), the transfer probability and the reward function are used to solve for pi^*(s), it is very difficult to model the transfer probabilities and reward functions accurately. In addition, changes in the transaction connection channel and the intelligent contract state are affected by the resource allocation and confidence tolerance of the edge DAG block nodes. If the transaction unloading path of the multi-hop computing task is long, the transaction state space of the edge DAG block chain node becomes complex and huge. Therefore, the online computing task unloading decision problem provided by the invention can be solved by using a model-free reinforcement learning algorithm. In the proposed method the policy vector θ is parameterized. At time t, when requesting a transaction node v_kIs s_v,kAction a taken by the transaction response node_v,kThe probability of (c) is:

π(a_v,k|s_v,k)＝P(a_t＝a_v,k|s_t＝s_v,k,θ_t＝θ) (11)

in order to learn the multi-hop calculation task offloading policy parameter, a performance function defining the multi-hop calculation task offloading policy parameter θ is as follows:

in order to maximize the reward of the edge DAG block link points, the trading response node updates an L (theta) parameter theta by using a random gradient descent method, wherein the updating equation of the parameter theta is as follows:

wherein ξ_pThe learning rate. From the strategic gradient theory, one can obtain:

wherein q is_π(s_v,k,a_v,k) Is a state-action value function of the strategy pi, and G ═ γ r₁+γ²r₂+.. is a discounted return cost. The parameter θ is updated using equation (14):

in order to further improve the learning performance, the invention uses an Actor-critic method to approximate the learning of the strategy and the value function, and updates the strategy by learning the value function and using the value function as critic. Make at state s_v,The value function of k estimate is

Wherein

Is a learned parameter, therefore, the update equation for the policy parameter θ becomes:

wherein the estimated value function

Parameter (2) of

Is also updated as follows:

in which ξ_vIs the learning rate, iteratively updated using the square of the error, and a loss function of

Wherein, the first and the second end of the pipe are connected with each other,

since neural networks can approximate complex functions, the present invention uses DNN to learn the policy and value functions, thereby establishing a policy network and a value network. Therefore, under the environment of an edge DAG blockchain network, the method for selecting the trusted cooperative node in the unloading of the multi-hop computing task based on reinforcement learning is composed of two parts, as shown in FIG. 2, one part is an Actor policy network updating policy, and the other part is a Critic value network evaluation value function and updating policy.

Solving optimization strategies through model-free reinforcement learning algorithm

The method comprises the following specific steps:

(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offload_k+1Observation and collection of transaction node request points v_kTransaction state s of_v,kUsing the current policy network to calculate all current transaction request sectionsPoint v_kAnd a transaction response node v_k+1Action strategy of^*(s_v,k) Estimating the instantaneous prize r_hThereby determining action a_v,kTo select a transaction requesting node v_kOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained^*(s_v,k) Composition optimization strategy

The current transaction request node v is calculated by adopting the current strategy network_kAnd optimization strategy of all transaction response nodes^*(s_v,k) The method specifically comprises the following steps:

π(a_v,k|s_v,k)＝P(a_t＝a_v,k|s_t＝s_v,k,θ_t＝θ)

The policy network employs a DNN architecture.

The updating experience cache specifically comprises: caching in experienceRecords the state of the transaction request node, the state of the transaction response node, the action value and the instant reward r_h(s_v,k+1,a_v,k,s_v,k)＝-C_v,h。

Can be expressed as the following algorithm:

algorithm 1: inputting a multi-hop computing task unloading transaction node confirmation selection mechanism: edge DAG blockchain nodes, cost functions, calculation task offload transaction request nodes, maximum hop count ε, and learning rate { ξ ] of reinforcement learning_p,ξ_v}。

And (3) outputting: the multi-hop computing task offloads the set of transaction nodes.

(4) Learning parameters for caching update value functions based on experience

And a task offload parameter θ;

learning parameters of the update value function

wherein ξ_vIs the learning rate, the loss function is

Is the output result of the value function.

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+., calculating the discount return cost according to the historical instant reward stored in the experience cache, wherein gamma is a discount factor; function of estimated value

The value of (A) preferably adopts a parameter

And (4) estimating a value network.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The system for selecting the trusted offload cooperative nodes in the sensing edge cloud block chain network is characterized by comprising sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain network G_b＝(V_b，E_b) In which V is_bThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; e_bThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;

The transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of a training task of each node are recorded, and the trust of each node is updated;

the optimization strategy is obtained by solving through an enhanced learning model, and specifically is solved through a strategy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network;

the inputs to the policy network are: current transaction response node v_k+1All transaction request nodes observed, v ═ v_kThe transaction status of } is; transaction request node v_kState of(s)_v，kRepresenting a transaction requesting node v_kThe status of the transaction at the time of initiation of the transaction, wherein

A state of the intelligent contract is represented,

the representation is in compliance with a smart contract,

a violation of the intelligent contract is indicated,

indicating that in the h-hop transaction connection, the task unloading delay time is calculated

Whether it is short or long, when

If so, the delay time is long, otherwise, the delay time is short;

in order to calculate the task offload transmission latency,

offloading a transmission delay threshold for a preset computing task;

the output of the policy network is: each transaction request node v_kState of(s)_v，kTo action a_v，kIs mapped with probability P (a)_t＝a_v，k|s_t＝s_v，k，θ_tTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta^*(a_v，k|s_v，k)＝P(a_t＝a_v，k|s_t＝s_v，k，θ_t＝θ)。

2. The system for selecting trusted offload cooperative nodes in sensor edge cloud blockchain network of claim 1, wherein the transaction request node is configured to initiate transaction requests φ to other sensor cloud edge nodes_j＝(D_j，Y_j，Υ_j) When receiving the confirmation of the transaction request, updating the trust level; the above-mentionedRequest for transaction phi_j＝(D_j，Y_j，Υ_j) Wherein D is_jIs the model size, in bits; y is_jResources spent to complete the requested training task for the transaction; gamma ray_jThe number of bitcoins of unit resource values of the edge block chain nodes is consumed for training the model;

the transaction response node is used for reversely judging the reliability of the transaction according to the intelligent contract when receiving the transaction request, and judging that the reliability is low and rejecting the transaction request when the transaction unloaded by the transaction request cannot realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;

the intelligent contract SC ═ { l (t) | t ∈ [ t [)_min，t_max]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is t_minAnd t_maxThe lower limit and the upper limit of the confidence interval according to the training time.

3. The system for trusted offload collaboration node selection in a sensor edge cloud blockchain network of claim 1, wherein the policy network is a model-free reinforcement learning architecture.

4. The system for selecting trusted offload cooperative nodes in a sensor edge cloud blockchain network according to claim 1, wherein a reward function used for training the policy network is: r is_h(s_v，k+1，a_v，k，s_v，k)＝-C_v，h，C_v，hCost function C for the sensor cloud edge node_v，h。

5. The system for selecting the trusted offload cooperative node in the sensor edge cloud block chain network according to claim 4, wherein a random gradient descent method is adopted to train and update a performance function of the offload policy parameter θ of the multi-hop calculation task, and the performance function is specifically:

wherein ξ_pFor learning rate, G ═ γ r₁+γ²r₂+.. cost of return on discount, r₁，r₂,.. historical instant rewards are read from the historical instant rewards stored in the experience cache, and gamma is a discount factor; function of estimated value

The value of (A) is given by the parameter

And (4) estimating a value network.

6. The system of claim 5, wherein the value network has an input of a transaction state of the transaction response node and an output of a value

With network parameters of

The update equation is as follows:

wherein ξ_vIs the learning rate;

7. A method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is characterized by comprising the following steps:

(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodes_v，hAnd registering to a DAG block chain;

(3) trading state s according to each edge node in DAG block chain obtained in step (1)_v，kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning

According to an optimization strategy

Set of actions to formulate offload paths

Transaction request node and transaction response node at each hopTransaction connection tau conforming to the intelligent contract is established between the points, so that a task unloading path is formed;

wherein the optimization strategy

Pr represents the state s_v，kTo action a_v，kThe probability of the mapping of (a) to (b),

Meaning that the transaction requesting node is not selected as a collaborator,

indicating that the transaction request node is selected as a collaborator; state-action pair { a_v，k|s_v，kDenotes at transaction request node v_kState of(s)_v，kConditional transaction response node v_k+1Confirmation selection action of_v，k。

8. The method of claim 7, wherein the method for selecting trusted offload cooperative nodes in the sensor edge cloud blockchain network is for a transaction φ_j＝(D_j，Y_j，Υ_j) Wherein D is_jRepresenting the size of the model in bits; y is_jRepresents the resources that need to be spent in completing the training task; gamma ray_jBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge node_v，hFor it to act as a transaction response node v_k+1Selecting a transaction requesting node v_kWhen, transaction connection at h hopThe calculation task of (1) offloads the transaction cost, which includes time delay and credibility tolerance, and is calculated according to the following method:

in order to calculate the task offload transmission latency,

x_k+1，krepresenting a transaction requesting node v_kWhether the trained model is confirmed to be accepted and unloaded to a transaction response node v_k+1Processing; if x_k+1，k1 indicates that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise x_k+1，k＝0，

Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, p_kRepresenting transmission power, σ²Representing the noise power; g_kRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;

task queue wait time for all nodes in the h-th hop transaction connection,

the average arrival rate of task unloading, wherein M represents the unloading times in the h-th hop transaction connection; x is the number of_j1 indicates successful unloading, otherwise 0; i is_{*}Is an indicator function, if the condition is true, then I_{*}1, otherwise I_{*}0, the amount of tasks z already present in the current edge DAG blockchain trading node_hService parameter is delta_hPoisson distribution of, i.e.

9. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 7, wherein the objective of the reinforcement learning in the step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:

MTOR：min C_o

a_v＝{a_v，1，a_v，2，...，a_v，ε}

SC＝{l(t)|t∈[t_min，t_max]}

wherein C is_oFor computing tasksOff-loading transaction costs, a_v＝{a_v，1，a_v，2，...，a_v，εIs the action set, SC ═ r (t) | t ∈ [ t }_min，t_max]The intelligent contract is obtained;

wherein r is_hRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r is_h(s_v，k+1，a_v，k，s_v，k)＝-C_v，h；

The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained^*(s_v，k) And recording the action strategy acquisition optimization strategy of the h hop as:

wherein, P_TTo transmit the probability, gamma is the discount factor, V(s)_v，k+1|π^*) To obtain an optimum strategy pi^*The state value function of time, defined as:

the updating experience cache specifically comprises the following steps: recording transaction request node status, transaction response node status, action value, instant reward r in experience cache_h(s_v，k+1，a_v，k，s_v，k)＝-C_v，h。

10. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 8, wherein the step (3) solves the optimization strategy through a model-free reinforcement learning algorithm

The method comprises the following specific steps:

(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offload_k+1Observation and collection of transaction node request points v_kTransaction state s of_v，kAdopting the current strategy network to calculate all current transaction request nodes v_kAnd a transaction response node v_k+1Action strategy of^*(s_v，k) Estimating the instantaneous prize r_hThereby determining action a_v，kTo select a transaction requesting node v_kOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained^*(s_v，k) Composition optimization strategy

The current transaction request node v is calculated by adopting the current policy network_kAnd optimization strategy pi of all transaction response nodes^*(s_v，k) The method specifically comprises the following steps:

when requesting transaction node v_kIs s_v，kAction a taken by the transaction response node_v，kThe probability of (c) is:

π(a_v，k|s_v，k)＝P(a_t＝a_v，k|s_t＝s_v，k，θ_t＝θ)

wherein P is in state s_v，kWhen, the action taken is a_v，kTheta is a policy network parameter;

the optimization strategy is pi(s)_v，k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection a_v，kTrade with highest probability of 1The node is solved to be used as a cooperative node, and the transaction request initiated by the node is confirmed;

the instant prizes are estimated as follows: r is_h(s_v，k+1，a_v，k，s_v，k)＝-C_v，h；

11. The method for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network as claimed in claim 8, comprising the steps of: (4) caching learning parameters of update value functions according to experience

And a task offload parameter θ;

learning parameters of the update value function

wherein ξ_vIs the learning rate, the loss function is

Is the output result of the value function;

the updating of the task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:

in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by the equation:

The value of (A) preferably adopts a parameter

And (4) estimating a value network.