CN112202928B - Credible unloading cooperative node selection system and method for sensing edge cloud block chain network - Google Patents
Credible unloading cooperative node selection system and method for sensing edge cloud block chain network Download PDFInfo
- Publication number
- CN112202928B CN112202928B CN202011276468.5A CN202011276468A CN112202928B CN 112202928 B CN112202928 B CN 112202928B CN 202011276468 A CN202011276468 A CN 202011276468A CN 112202928 B CN112202928 B CN 112202928B
- Authority
- CN
- China
- Prior art keywords
- transaction
- node
- task
- hop
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Y—INFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
- G16Y10/00—Economic sectors
- G16Y10/75—Information technology; Communication
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Y—INFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
- G16Y40/00—IoT characterised by the purpose of the information processing
- G16Y40/50—Safety; Security of things, users, data or systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/50—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees
Abstract
The invention discloses a system and a method for selecting a credible unloading cooperative node of a sensing edge cloud block chain network. The system includes a sensing cloud edge node, and a tile created in the edge node, the edge node and the tile chain forming an edge DAG tile chain network. The method comprises the following steps: (1) acquiring a training task issued by a training task issuing node; (2) taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain; (3) and adopting reinforcement learning planning to calculate an optimization strategy of the task unloading path in the DAG block chain according to the transaction state of each edge node, and formulating an unloading path action set according to the optimization strategy. Aiming at multi-hop computation task cooperative unloading, the invention establishes a multi-hop computation task cooperative unloading model based on an edge DAG block chain, and nodes participating in computation task cooperative unloading are registered in an edge DAG block chain network to cooperatively complete multi-hop distributed federal learning tasks.
Description
Technical Field
The invention belongs to the technical field of Internet of things, and particularly relates to a system and a method for selecting a trusted unloading cooperative node of a sensing edge cloud block chain network.
Background
In order to reduce the dependence of the unloading of computing tasks in the sensing edge cloud on the remote cloud, the computing tasks are unloaded among the edge nodes in a multi-hop cooperation mode to complete the training of the model. The multi-hop computation task cooperation unloading process comprises two stages of computation task transmission and model distributed training, has the characteristics of good expansibility, strong robustness and the like, and can better support distributed federal learning of the computation tasks and protect private data of nodes. However, as the number of hops and nodes increases, a number of challenges are also presented to optimize the quality of service for the trusted offload of computing tasks.
The model training mode based on multi-hop calculation task unloading can effectively avoid single-point failure, fully utilizes local data of edge nodes to train the model in a distributed mode, and can effectively improve the performance of federal learning. However, distributed federal learning by means of multi-hop computing task offload techniques faces security issues. Due to the selfness of the edge nodes, the training of the computing task cannot reach the expected training precision, or malicious nodes modify the trained model to cheat the nodes which cooperate with each other, and mislead the next-hop node to continue to perform inefficient training. This makes the behavior of the edge node participating in the cooperation unreliable, resulting in that low-delay credible cooperation cannot be performed, and reduces the service quality of computation task offloading and distributed federal learning. Therefore, a key challenge in solving this problem is how to balance the coordination decision between the offloading delay of the multi-hop computing task of the edge node and the trusted collaboration, so as to improve the service quality of the offloading path of the multi-hop computing task. In the face of this challenge, researchers have proposed some methods for collaborative offloading of computing tasks. Yan et al consider the task graph of a single-user edge computing system and propose a reinforcement learning framework to optimize the offloading decision of tasks at local or edge nodes and the resource allocation problem, but this scheme does not consider the multi-hop computing taskService unloading scene (the "Offloading and resource allocation With general task hierarchy in Mobile edge computing: A Deep requirement learning approach," in IEEETransactiononson Wirelesscommunications,vol.19, No.8, pp.5404-5419, aug.2020). Hong et al model the optimization problem of the computation task offload path including Edge nodes and Cloud nodes as a Multi-Hop computation task offload game, and propose a QoS-aware Distributed algorithm, but do not consider the trust problem of inter-node cooperative offload ("Multi-Hop cooperative offload for Industrial IoT-Edge-Cloud Computing environment," in IEEE Transactions on Parallel and Distributed Systems),vol.30, No.12, pp.2759-2774,1 dec.2019). L. xiao et al propose a trust mechanism based on block chain to resist selfish edge attack and spoofing record attack, and enhance security of computation task offloading between mobile device and edge node by a method of establishing reputation, but do not achieve security of multi-hop computation task offloading between edge nodes ("attention learning and block-based trust for creating network Networks," in ieee transaction communications Networks,vol.68, No.9, pp.5460-5470, Sept.2020). These research protocols also suffer from the following deficiencies:
(1) the proposed solution considers the multi-hop computation task offloading and the cooperative training among the multi-hop edge nodes less, but only considers the single-hop computation task offloading performance from the sensing device to the edge nodes, and cannot support the multi-hop distributed federal learning. Therefore, the proposed solution has limitations to be applied in multi-hop distributed computation task cooperative offloading.
(2) The proposed solution does not combine the block chain technique to achieve the trusted cooperative offloading of the multi-hop computing task. Particularly, with the increase of the number of nodes and the number of hops in the multi-hop computing task unloading, the credible cooperation and unloading delay decision space among the nodes is increased, and the existing solution does not provide a corresponding processing method.
(3) The existing solution does not consider the situation that an intelligent attacker uses means such as increasing computing time and modifying a model to attack the multi-hop computing task unloading node, and does not provide a trusted cooperative node selection method in the multi-hop computing task unloading aiming at the type of attack.
Disclosure of Invention
In order to solve the defects of the method, the invention provides a multi-hop computation task unloading method based on a DAG block chain in a sensing edge cloud environment, and the method realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time of a malicious edge node, model modification and other incredible behaviors.
To achieve the above object, according to one aspect of the present invention, a system for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network is provided, where the system includes a sensor cloud edge node and a block created in the edge node, and the edge node and the blockchain form an edge DAG blockchain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;
the block of the edge node stores a model, training time and model size of the training task which can not be changed;
the blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value ofThe transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of the training task of each node are recorded, and the trust of each node is updated.
Preferably, the transaction request node of the trusted offload cooperative node selection system in the sensor edge cloud block chain network is configured to initiate a transaction to another sensor cloud edge nodeRequest phij=(Dj,Yj,Υj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yj,Υj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;
the intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
Preferably, the system for selecting trusted offload cooperative nodes in the sensing edge cloud block chain network comprises a policy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, and preferably a DNN network.
Preferably, in the system for selecting a trusted offload cooperative node in a sensing edge cloud block chain network, the inputs of the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein A state of the intelligent contract is represented,the representation is in compliance with a smart contract,a violation of the intelligent contract is indicated,indicating that the task offload delay time is calculated on the h-th hop of the transaction connectionWhether it is short or long, whenIf so, the delay time is long, otherwise, the delay time is short;in order to calculate the task offload transmission latency,offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,k,θtTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,hPreference is given toTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+.. cost of return on discount, r1,r2,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated valuePreferably, the value of (A) is a parameter ofAnd (4) estimating a value network.
Preferably, the system for selecting trusted offload cooperative nodes in sensing edge cloud block chain network comprises a value network, preferably a DNN network, whose input is the transaction state of the transaction response node and output is a valueWith network parameters ofThe update equation is as follows:
wherein ξvIs the learning rate.
According to another aspect of the present invention, a method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is provided, which includes the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts;
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain;
(3) trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planningAccording to an optimization strategySet of actions for creating offload pathsEstablishing a transaction connection tau conforming to an intelligent contract between the transaction request node and the transaction response node of each hop, thereby forming a task unloading path;
wherein the optimization strategyPr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken Meaning that the transaction requesting node is not selected as a collaborator,indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k。
Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network is used for transaction phij=(Dj,Yj,Υj) Wherein D isjThe size of the model is represented, and the unit is bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkThen, the transaction cost is unloaded by the calculation task on the h-th hop transaction connection, which comprises time delay and credible tolerance, and the calculation is carried out according to the following method:
wherein λ isdOffloading transactions for multi-hop computing tasksTolerance parameter of in-process delay time, λsA credibility tolerance parameter of the intelligent contract;in order to calculate the task offload transmission latency,xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;task queue wait time for all nodes in the h-th hop transaction connection,representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
Preferably, in the method for selecting a trusted offload cooperative node in the sensing edge cloud block chain network, the objective of the reinforcement learning in step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoTo calculate task offload transaction costs, av={av,1,av,2,...,av,εIs the action set, SC ═ l (t) | t ∈ [ t }min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h。
Preferably, a greedy algorithm is adopted to regard the strategy optimization as a Markov process, and a maximum-time action strategy pi of the instantaneous reward function of each jump is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1|π*) To obtain an optimum strategy pi*The state value function of time, defined as:
preferably, in the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, the optimization strategy is solved through a model-free reinforcement learning algorithm in the step (3)The method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain a current strategy network, namely taking the last updated task unloading parameter theta as the task unloading parameter theta of the current strategy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
Preferably, the policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h。
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
Preferably, in the method for selecting a trusted offload cooperative node in the sensor edge cloud block chain network, the updating the experience cache in step (3) is specifically: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h。
Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network,it (4) caches learning parameters of the update value function according to experienceAnd a task offload parameter θ;
learning parameters of the update value functionThe method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+.. report cost back by discounting, caching based on experienceCalculating the stored historical instant reward, wherein gamma is a discount factor; function of estimated valueThe value of (A) preferably adopts a parameterAnd (4) estimating a value network.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and multi-hop distributed federal learning tasks are completed in a cooperation mode.
In order to realize low-delay and credible multi-hop computation task cooperative unloading, a multi-hop computation task unloading delay cost function and an intelligent contract model in an edge DAG block chain network are established.
In order to solve the problem of confirmation and selection of transaction nodes in a multi-hop computation task unloading path, the invention models the problem into a Markov decision process for reverse transaction request node selection based on a DAG block chain, and further provides a collaborative node selection algorithm in multi-hop computation task unloading based on reinforcement learning.
Drawings
Fig. 1 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network provided by the present invention;
fig. 2 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network according to an embodiment of the present invention.
The same reference numbers will be used throughout the drawings to refer to the same or like elements or structures, wherein:
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The system for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, as shown in fig. 1, includes sensing cloud edge nodes and blocks created in the edge nodes, where the edge nodes and the block chain form an edge DAG block chain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; preferably, the policy network, more preferably, the value network;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the transaction request node is used for initiating a transaction request phi to other sensing cloud edge nodesj=(Dj,Yj,Υj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yj,Υj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, the transaction request is confirmed, and the transaction request confirmation and the intelligent and the required number of bitcoins are sent to the transaction request node.
The intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
The blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value ofThe transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of the training task of each node are recorded, and the trust of each node is updated.
The optimization strategy is preferably obtained by solving using an enhanced learning model, and preferably can be solved by adopting a strategy network, as shown in fig. 2.
The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, preferably a DNN network; specifically, the method comprises the following steps:
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein A state of the intelligent contract is represented,the representation is in compliance with a smart contract,a violation of the intelligent contract is indicated,indicating that the task offload delay time is calculated on the h-th hop of the transaction connectionWhether it is short or long, whenIf so, the delay time is long, otherwise, the delay time is short;in order to calculate the task offload transmission latency,offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,k,θtTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C of the sensing cloud edge nodev,hPreferably, a random gradient descent method is adopted to train and update a performance function of the multi-hop calculation task unloading strategy parameter theta, and the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+.. return cost for discount, r1,r2,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated valuePreferably, the value of (A) is a parameter ofAnd (4) estimating a value network.
The value network, preferably a DNN network, has as input the transaction status of the transaction response node and as output a valueWith network parameters ofThe update equation is as follows:
wherein ξvIs the learning rate.
The invention provides a multi-hop computation task unloading method based on a DAG block chain under a sensing edge cloud environment, which realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time, model modification and other incredible behaviors of a malicious edge node.
The invention provides a method for selecting a trusted unloading cooperative node in a sensing edge cloud block chain network, which comprises the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts;
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering to a DAG block chain;
for transaction phij=(Dj,Yj,Υj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkAnd then, calculating task unloading transaction cost on the h-th hop of transaction connection, wherein the task unloading transaction cost comprises time delay and credibility tolerance, and calculating according to the following method:
wherein λ isdTolerance parameter, lambda, of delay time in offloading transactions for a multi-hop computing tasksA credibility tolerance parameter of the intelligent contract;in order to calculate the task offload transmission latency,xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkDenotes the transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;task queue wait time for all nodes in the h-th hop transaction connection,representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks already present in the current edge DAG blockchain trading nodezhService parameter is deltahPoisson distribution of, i.e.Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
The edge DAG block chain Gb=(Vb,Eb) Is a directed task graph, where VbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe two parties are connected for the transaction of the participants, namely, the two parties conduct the transaction according to the preset intelligent contract.
(3) Trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planningAccording to an optimization strategySet of actions to formulate offload pathsEstablishing a transaction connection tau which accords with an intelligent contract between the transaction request node and the transaction response node of each hop so as to form a task unloading path;
wherein the optimization strategyPr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkAdoptOptimal confirmation selection action set, action Meaning that the transaction requesting node is not selected as a collaborator,indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k。
The goal of reinforcement learning is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoTo calculate task offload transaction costs, av={av,1,av,2,...,av,εIs action set, SC ═ { l (t) | t ∈ [ t ])min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h。
Preferably, a greedy algorithm is adopted to regard the strategy optimization as a Markov process, and a maximum-time action strategy pi of the instantaneous reward function of each jump is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1|π*) To obtain an optimum strategy pi*The state value function of time, defined as:
solving optimization strategies preferably by model-free reinforcement learning algorithmsThe method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
Preferably, the policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r is a radical of hydrogenh(sv,k+1,av,k,sv,k)=-Cv,h。
The updating the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
The updating experience cache specifically comprises: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h。
(4) Caching learning parameters of update value functions according to experienceAnd a task offload parameter θ;
learning parameters of the update value functionNumber ofThe method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
wherein xi isvIs the learning rate, the loss function is Is the output result of the value function.
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+.. calculating the discount return cost according to the historical instantaneous reward stored in the experience cache, wherein gamma is a discount factor; function of estimated valueThe value of (A) preferably adopts a parameterAnd (4) estimating a value network.
Aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and a multi-hop distributed federal learning task is completed in a cooperation mode.
In order to realize low-delay and credible multi-hop computation task cooperative unloading, a multi-hop computation task unloading delay cost function and an intelligent contract model in an edge DAG block chain network are established.
In order to solve the problem of confirmation and selection of transaction nodes in a multi-hop computation task unloading path, the invention models the problem into a Markov decision process for reverse transaction request node selection based on a DAG block chain, and further provides a collaborative node selection algorithm in multi-hop computation task unloading based on reinforcement learning.
The invention designs a credible cooperative node selection method in multi-hop computing task unloading by combining a block chain technology and a reinforcement learning algorithm. The method comprises the steps of firstly establishing an edge DAG block chain network according to a DAG (directedacyclinograph) graph unloaded in a multi-hop computing task cooperation mode. Then, a Markov decision process is formed by formalizing the problem of selecting the credible cooperative nodes in the multi-hop computing task unloading. On the basis of considering the dynamics of the unloading transaction connection of the computing task of the edge node and the selfishness of the node, in order to select the credible cooperative unloading node, the invention provides a multi-hop computing task unloading transaction node selection algorithm based on reinforcement learning by combining with the block chain technology. Thereby improving the credible service quality of the multi-hop computing task unloading.
The following are examples:
the system for selecting the credible unloading cooperative node in the sensing edge cloud block chain network comprises sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain Gb=(Vb,Eb) In which V isbOffloading for participating in a computing taskCarrying edge block chain nodes of transaction, and taking the edge block chain nodes as transaction request nodes and transaction response nodes when the calculation task is unloaded; ebThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; also included are a policy network, and a value network;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the transaction request node is used for initiating a transaction request phi to other sensing cloud edge nodesj=(Dj,Yj,Υj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yj,Υj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending transaction request confirmation and intelligent and required number of bitcoins to the transaction request node.
The intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, wherein the larger the value l (t) is, the higher the degree of compliance of the intelligent contract for task unloading transaction is calculated; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
The blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value ofIs mapped toAnd establishing transaction connection by taking the transaction request node with the highest probability as a cooperative node, taking the transaction response node as a transaction request node of the next hop, recording a model, training time and model size of the completion of the training task of each node, and updating the trust of each node.
The optimization strategy is obtained by solving the optimization strategy by using an enhanced learning model and solving the optimization strategy by using a strategy network.
The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, is a model-free reinforcement learning structure and adopts a DNN network; specifically, the method comprises the following steps:
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe trade state of the node is defined, k is a trade response node subscript, and the maximum value of the node is equal to the maximum hop number; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein A state of the intelligent contract is represented,the representation is in compliance with a smart contract,a violation of the intelligent contract is indicated,indicating that the task offload delay time is calculated on the h-th hop of the transaction connectionWhether it is short or long, whenIf so, the delay time is long, otherwise, the delay time is short;in order to calculate the task offload transmission latency,offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,k,θtTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,hTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein xi ispFor learning rate, G ═ γ r1+γ2r2+.. cost of return on discount, r1,r2,.. historical instant rewards read from a verified cache for storageThe historical instant reward of (1), gamma is a discount factor; function of estimated valuePreferably, the value of (A) is a parameter ofAnd (4) estimating a value network.
The value network is a DNN network, the input of the value network is the transaction state of the transaction response node, and the output of the value network is a valueWith network parameters ofThe update equation is as follows:
wherein ξvIs the learning rate.
Due to the high-delay characteristic that the computing task is unloaded to the remote cloud node, in order to reduce the unloading cost of the computing task, the computing task of the sensing equipment is unloaded to the edge nodes, and the invention considers that the edge nodes process the computing task gamma in a distributed mode in a multi-hop cooperation moden={wn} of the position of the frame. Wherein wnRepresenting the model that needs to be trained. When computing task gammanIs offloaded to a plurality of edge nodes for each to-be-received model wnThe next-hop edge node firstly confirms the workload certification of model training and then unloads the next-hop edge node to the edge node after confirmationAnd continuing training. For a computation task ΓnN edge nodes form a multi-hop computation task unloading path and participate in task training successively. Thus, the multi-hop computing task offload process can be represented as a directed task graph Ga=(Va,Ea) In which V isaRepresenting edge nodes that are associated with a trusted context for task offloading, such as: execution time, queue time, model training results, and the like. EaRepresenting offload connections between edge nodes, which is related to compute task offload transfer rates and transfer times between edge nodes. In order to record the training result of each edge node and prevent malicious nodes from modifying the trained model data, the invention defines a DAG block chain on the basis of the task graph, the blocks are created in the edge nodes, the edge nodes and the block chain are integrated into an edge DAG block chain network, the nodes in the network are called edge block chain nodes, and the edge nodes and the block chain are modeled into a directed graph G of the integrated DAG block chainb=(Vb,Eb) As shown in FIG. 1, wherein VbThe border area blockchain node which participates in a computation task unloading transaction has two roles, a computation task unloading transaction request node and a transaction response node. EbThe method represents the transaction connection of the participants, and the two parties participating in the transaction can establish the transaction connection only by following a certain intelligent contract.
After a model is trained by a computation task unloading request node, the model is stored in a block to ensure that the model cannot be changed. And meanwhile, the training duration and the size of the model are automatically recorded in the block, then a calculation task unloading transaction is initiated to the transaction response node, when a transaction responder receives a transaction request unloaded by the transaction request node, the responder firstly reversely confirms whether the transaction is credible according to the intelligent contract, and if the transaction response node finds that the unloaded transaction cannot realize the condition in the intelligent contract, the transaction confirmation is failed, and other transaction request nodes are selected to continue to confirm. Otherwise, the transaction is confirmed to be passed, the transaction response node sends a certain bitcoin to the transaction request node to serve as model reward, and meanwhile the trust degree of the transaction request node is updated.
The method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network provided by the embodiment comprises the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts;
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain;
for transaction phij=(Dj,Yj,Υj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkAnd then, calculating task unloading transaction cost on the h-th hop of transaction connection, wherein the task unloading transaction cost comprises time delay and credibility tolerance, and calculating according to the following method:
wherein λ isdTolerance parameter, lambda, for delay times in off-loading transactions for multi-hop computing taskssA credibility tolerance parameter of the intelligent contract;in order to calculate the task offload transmission latency,xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;task queue wait time for all nodes in the h-th hop transaction connection,representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.Φh1-l (t), where l (t) represents the probability that the expected model training time t of the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the computation of anybody over the transaction connectionThe higher the compliance of the service offload transaction smart contract.
Unloading intelligent contracts and delay cost estimation of multi-hop computing tasks in an edge DAG block chain network:
block chain network graph G at edge DAGbIn the method, a calculation task unloading transaction request node initiates a transaction request phi to an adjacent marginal area block chain nodej=(Dj,Yj,Υj) Wherein D isjRepresenting the size of the model in bits, YjIndicating the resources that need to be spent to complete the training task. Gamma rayjAnd indicating that the training model consumes bitcoins of unit resources of the edge block chain nodes. After the buyer of the transaction response node as the model confirms the training result, certain bitcoins are sent to the transaction request node to compensate the resource consumption. When the maximum training hop number of the training task issuing node is set to be epsilon, at least epsilon +1 nodes are required in the edge DAG block chain network participating in the unloading of the multi-hop computing task. However, due to the fact that the destructive behaviors (such as delay of calculation time, modification of models and the like) of the intelligent attacker make transaction connection, calculation behaviors and training results in the marginal block chain network become untrustworthy, the number of times of failure of transaction between the calculation task unloading transaction request and the response node is increased, and the transaction trust is reduced. Finally, the trust lifetime of the edge DAG blockchain network becomes smaller as the node trust level decays. To increase the trusted lifetime of the edge DAG blockchain network, the intelligent contract is triggered when the transaction response node confirms that the computing task offloads the transaction request node. At the moment, only the transaction meeting the intelligent contract can be used as a credible transaction, and the transaction response node reversely selects the transaction request node as a cooperative node on the calculation task unloading path. The invention takes the training time of the model as the workload proof of the edge DAG block link points. Thus, an intelligent contract for computing task offload transactions is defined as SC ═ { l (t) | t ∈ [ t ])min,tmax]L (t) represents the probability that the expected model training time t of the transaction response node falls in the credible interval, and the larger the value of l (t), the higher the degree of compliance of the intelligent contract for calculating task unloading transaction. t is tminAnd tmaxAre parameters that can be set according to the goal of the training. When in useWhen the transaction response node receives a transaction request, whether the training time of the model meets the credible interval requirement of the training time in the intelligent contract is searched and recorded in the block, and if the training time does not meet the credible interval requirement of the training time in the intelligent contract, the calculation task unloading transaction cannot be carried out. Within a one-hop compute task offload transaction connection, the set of transaction request nodes that a transaction response node may choose to acknowledge is defined as v ═ { v ═ vkThe invention considers the computation task unloading transmission delay time, the execution time of the computation task on the edge DAG block chain network node and the waiting time of the task queue in the computation task unloading transaction process.
The calculation of the task offload transfer delay time is calculated as follows:
since in an edge DAG blockchain network, a plurality of transaction request nodes initiate acknowledgement requests to response nodes, the change of the acknowledgement request channel state causes delay of the transmission time of the computation task unloading, in order to calculate the transmission time of the transaction request nodes unloading the computation task to the response nodes, the available unloading transmission rate of the computation task on a transaction connection tau is defined as:
wherein B represents a bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting the channel gain for indicating the transmission loss from the transaction requesting node to the responding node. Therefore, the calculation task unloading transmission delay time from the transaction request node to the response node is as follows:
wherein x isk+1,kRepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1k0 thus makes it possible to calculate the time of transmission and reception asFurther calculating the transmission time on the whole computation task unloading path as
The edge DAG block chain node task execution time is calculated as follows:
in the multi-hop computation task unloading, each edge DAG block chain node needs to complete workload certification through a model training task. The present disclosure assumes that the edge DAG block segment has χ kernels. The execution time of the task isWherein L isjIs the total computational load. f. ofcIs the service rate of each CPU core and is a configurable variable.
The edge DAG block nexus task queue latency is calculated as follows:
since the nodes participating in the transaction receive tasks offloaded by multiple transaction requesting nodes, the computational task offload delay time of the edge DAG blockchain network is also related to the amount of tasks in the current node receive queue. Existing task amount z in current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.From this, it can be calculated that the average arrival rate of the computation task offload is:
where M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0. Since the currently processed task requires a certain training time to complete, each training task arriving in the queue needs to wait for the completion of the task already being processed before being processed. Therefore, the training task entering the queue needs to wait for a period of time to be processed, and the waiting time of the task queue of all nodes in the h-th hop transaction connectionWherein the content of the first and second substances,the number of resources required by all nodes in the h-th hop transaction connection to process the tasks in the queue is represented, and therefore, the total delay time for calculating the task unloading transaction is as follows:
the edge DAG block chain Gb=(Vb,Eb) Is a directed task graph, where VbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe two parties are connected for the transaction of the participants, namely, the two parties conduct the transaction according to the preset intelligent contract.
(3) Trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planningAccording to an optimization strategySet of actions to formulate offload pathsEstablishing a transaction connection tau conforming to the intelligent contract between the transaction request node and the transaction response node of each hop so as to form a taskA traffic offload path;
wherein the optimization strategyPr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken Meaning that the transaction requesting node is not selected as a collaborator,means for selecting a transaction request node as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k。
The goal of reinforcement learning is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
where Co is the computational task offload transaction cost, av={av,1,av,2,...,avE isAction set, SC ═ l (t) | t ∈ [ t ]min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h。
The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
wherein, PTFor the transmission probability, gamma is the discounting factor, V(s)v,k+1|π*) To obtain an optimum strategy pi*The state value function of time, defined as:
the invention designs a method for selecting a trusted cooperative node in multi-hop computing task unloading based on a DAG block chain, which constructs a trusted transaction path by selecting a trusted computing task unloading node. Since in DAG blockchain based edge networks, the decision by a transaction response node whether to accept a task offload from a transaction request node depends on the confirmation of the trustworthiness of the previous transaction request node. Thus, the backward selection of transaction request nodes in multi-hop computing task offloading can be modeled as a Markov decision process, which can be defined as a tuple ΘM=(S,Av,k,Pr,Cv,h) Wherein, in the step (A),
1)S:S={sv,k∈S|S=sv,1,sv,2,...,sv,ndenotes the state space of transactions between edge DAG blockchain nodes, sv,kRepresenting a transaction requesting node vkThe status of the transaction when initiated.Wherein the content of the first and second substances,a state of the intelligent contract is represented,the representation is in compliance with a smart contract,a violation of the intelligent contract is indicated,indicating that the task offload delay time is calculated on the h-th hop of the transaction connectionWhether it is short or long, whenIf so, the delay time is long, otherwise, the delay time is short;in order to calculate the task offload transmission latency,offloading a transmission delay threshold for a preset computing task. In a computing task offload transaction, a transaction response node vk+1For transaction request node vkConfirmation is carried out, v after the confirmation is passedk+1Becomes the requesting node for the next hop to compute any offload transactions.
2)Av,k:Representing a possible action space. Wherein, av,kRepresenting a transaction response node v in a computing task offload transactionk+1For transaction request node vkThe affirmative selection action to be taken is, meaning that the transaction requesting node is not selected as a collaborator,indicating that the transaction request node is selected as a collaborator. State-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kTransaction response node vk+1Confirmation selection action ofv,k。
3) Pr: represents a state sv,kTo action av,kThe mapping probability of (2). In an untrusted computing task offload environment, the goal of a responding node in a computing task offload transaction is to obtain an optimization strategy pi*I.e. by sv,kTo av,kThe mapping probability of (2). According to an optimization strategy pi*Transaction response node vk+1Confirmation selection action that can find optimumThe optimization strategy for the transaction connection set on the computation task unloading path isThe optimal confirmation selection action set of the transaction response node is as follows:
4)Cv,hrepresenting a transaction requesting node vkAnd a transaction response node vk+1The computational task on the h-th hop's transaction connection offloads the transaction cost. Including time delay and confidence tolerance, computerThe transaction offload transaction cost may be calculated as:
wherein λ isdAnd the tolerance parameter represents the delay time in the process of unloading the transaction of the multi-hop computing task. Lambda [ alpha ]sRepresenting a confidence tolerance parameter of the smart contract. Phih1-l (t). For an offloaded computing task, after setting a maximum number of training hops, a transaction node in the offload path of the computing task that is to be trusted confirms the selection of the action setThe trading nodes confirm that the multi-hop path formed by the elements in the selection action set should meet the delay sensitivity requirement and the intelligent contract condition of the calculation task unloading, and the transaction cost of the calculation task unloading is minimized.
In reinforcement learning, the search process of the optimization strategy can be modeled as a Markov decision process, and the invention makes thetaMFurther expanded to thetaRL=(S,Av,k,PT,rkγ), wherein S and Av,kIs thetaMState space and actions in (1). PTIs the probability of delivery. r ishRepresenting the instant prize function. γ is a discounting factor. Selecting a collaborative transaction requesting node using reinforcement learning to obtain a set of transaction node validation selection actions in a trusted computing task offload pathIn the process of confirming and selecting the transaction nodes of the multi-hop computation task unloading path, transaction response nodes vk+1Firstly, all transaction request nodes v are observedkCurrent transaction state svK, selecting a transaction connection τ to perform a confirmation selection action a on its transaction statusv,k. Then, the transaction response node vk+1Earning a reward rh. Transaction response node randomly selecting an action a using a greedy search strategyv,k~π(sv,k) To confirm the transaction connection tau of the first arriving transaction requesting node. Edge DAG blockchain network passing transitive probability PT(sv,k+1|sv,k,av,k) The state of the trading nodes on the edge DAG blockchain network is updated. At this time, the transaction response node vk+1Get a transient reward r of the trade connection tauhTo evaluate the efficiency of his confirmation of the selection action after it has been made. If the delay is short and the intelligent contract condition is met, the transaction response node firstly sends a certain bitcoin to the transaction request node vkThen vk+1Receiving vkThe transmitted model begins to be trained by using local data, and v is trained after a certain timek+1To vk+2Initiating a transaction request, vk+2Node observation vk+1Transaction status s of a nodev,k+1And performing a confirmation selection action av,k+1And the node selection process is ended until the maximum hop count is reached by selecting the coordinated transaction request nodes in the process. The goal of the reinforcement learning participants is to maximize the rewards per transaction. Therefore, in one distributed federal learning task transaction, a complete multi-hop calculation task trusted unloading path can be discovered by using reinforcement learning.
The transaction response node v can be obtained by the formula (5)k+1Obtaining an instantaneous reward r over a transaction connection at the h-th hophComprises the following steps:
rh(sv,k+1,av,k,sv,k)=-Cv,h (7)
accordingly, in a distributed federated learning task transaction, the reward of the cumulative policy π brought by the transaction response node confirmation selection can be expressed as follows:
where γ is the discount factor for each hop transaction, indicating the importance of the selection of the future transaction requesting node to the selection of the current transaction requesting node. Computing task offload transactions once computing task offload reaches a preset maximum number of hopsAnd (5) stopping. In the multi-hop calculation task unloading process, the reinforcement learning participant records the optimal calculation task unloading transaction path in the blockAnd the transaction response node validates the selected reward each time. From the accumulated rewards, a slave status slave s can be definedv,1And a strategy pi starting state value function:
the online multi-hop calculation task unloading method selects the optimal strategy pi to maximize the value function of each state, namely
In equation (10), the transfer probability and the reward function are used to solve for pi*(s), it is very difficult to model the transfer probabilities and reward functions accurately. In addition, changes in the transaction connection channel and the intelligent contract state are affected by the resource allocation and confidence tolerance of the edge DAG block nodes. If the transaction unloading path of the multi-hop computing task is long, the transaction state space of the edge DAG block chain node becomes complex and huge. Therefore, the online computing task unloading decision problem provided by the invention can be solved by using a model-free reinforcement learning algorithm. In the proposed method the policy vector θ is parameterized. At time t, when requesting a transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ) (11)
in order to learn the multi-hop calculation task offloading policy parameter, a performance function defining the multi-hop calculation task offloading policy parameter θ is as follows:
in order to maximize the reward of the edge DAG block link points, the trading response node updates an L (theta) parameter theta by using a random gradient descent method, wherein the updating equation of the parameter theta is as follows:
wherein ξpThe learning rate. From the strategic gradient theory, one can obtain:
wherein q isπ(sv,k,av,k) Is a state-action value function of the strategy pi, and G ═ γ r1+γ2r2+.. is a discounted return cost. The parameter θ is updated using equation (14):
in order to further improve the learning performance, the invention uses an Actor-critic method to approximate the learning of the strategy and the value function, and updates the strategy by learning the value function and using the value function as critic. Make at state sv,The value function of k estimate isWhereinIs a learned parameter, therefore, the update equation for the policy parameter θ becomes:
in which ξvIs the learning rate, iteratively updated using the square of the error, and a loss function ofWherein, the first and the second end of the pipe are connected with each other,since neural networks can approximate complex functions, the present invention uses DNN to learn the policy and value functions, thereby establishing a policy network and a value network. Therefore, under the environment of an edge DAG blockchain network, the method for selecting the trusted cooperative node in the unloading of the multi-hop computing task based on reinforcement learning is composed of two parts, as shown in FIG. 2, one part is an Actor policy network updating policy, and the other part is a Critic value network evaluation value function and updating policy.
Solving optimization strategies through model-free reinforcement learning algorithmThe method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kUsing the current policy network to calculate all current transaction request sectionsPoint vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
The current transaction request node v is calculated by adopting the current strategy networkkAnd optimization strategy of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
The policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h。
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
The updating experience cache specifically comprises: caching in experienceRecords the state of the transaction request node, the state of the transaction response node, the action value and the instant reward rh(sv,k+1,av,k,sv,k)=-Cv,h。
Can be expressed as the following algorithm:
algorithm 1: inputting a multi-hop computing task unloading transaction node confirmation selection mechanism: edge DAG blockchain nodes, cost functions, calculation task offload transaction request nodes, maximum hop count ε, and learning rate { ξ ] of reinforcement learningp,ξv}。
And (3) outputting: the multi-hop computing task offloads the set of transaction nodes.
(4) Learning parameters for caching update value functions based on experienceAnd a task offload parameter θ;
learning parameters of the update value functionThe method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+., calculating the discount return cost according to the historical instant reward stored in the experience cache, wherein gamma is a discount factor; function of estimated valueThe value of (A) preferably adopts a parameterAnd (4) estimating a value network.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (11)
1. The system for selecting the trusted offload cooperative nodes in the sensing edge cloud block chain network is characterized by comprising sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value ofThe transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of a training task of each node are recorded, and the trust of each node is updated;
the optimization strategy is obtained by solving through an enhanced learning model, and specifically is solved through a strategy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network;
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein A state of the intelligent contract is represented,the representation is in compliance with a smart contract,a violation of the intelligent contract is indicated,indicating that in the h-hop transaction connection, the task unloading delay time is calculatedWhether it is short or long, whenIf so, the delay time is long, otherwise, the delay time is short;in order to calculate the task offload transmission latency,offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,k,θtTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)。
2. The system for selecting trusted offload cooperative nodes in sensor edge cloud blockchain network of claim 1, wherein the transaction request node is configured to initiate transaction requests φ to other sensor cloud edge nodesj=(Dj,Yj,Υj) When receiving the confirmation of the transaction request, updating the trust level; the above-mentionedRequest for transaction phij=(Dj,Yj,Υj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjThe number of bitcoins of unit resource values of the edge block chain nodes is consumed for training the model;
the transaction response node is used for reversely judging the reliability of the transaction according to the intelligent contract when receiving the transaction request, and judging that the reliability is low and rejecting the transaction request when the transaction unloaded by the transaction request cannot realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;
the intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
3. The system for trusted offload collaboration node selection in a sensor edge cloud blockchain network of claim 1, wherein the policy network is a model-free reinforcement learning architecture.
4. The system for selecting trusted offload cooperative nodes in a sensor edge cloud blockchain network according to claim 1, wherein a reward function used for training the policy network is: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,h。
5. The system for selecting the trusted offload cooperative node in the sensor edge cloud block chain network according to claim 4, wherein a random gradient descent method is adopted to train and update a performance function of the offload policy parameter θ of the multi-hop calculation task, and the performance function is specifically:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+.. cost of return on discount, r1,r2,.. historical instant rewards are read from the historical instant rewards stored in the experience cache, and gamma is a discount factor; function of estimated valueThe value of (A) is given by the parameterAnd (4) estimating a value network.
6. The system of claim 5, wherein the value network has an input of a transaction state of the transaction response node and an output of a valueWith network parameters ofThe update equation is as follows:
wherein ξvIs the learning rate;
7. A method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is characterized by comprising the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts;
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering to a DAG block chain;
(3) trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planningAccording to an optimization strategySet of actions to formulate offload paths Transaction request node and transaction response node at each hopTransaction connection tau conforming to the intelligent contract is established between the points, so that a task unloading path is formed;
wherein the optimization strategyPr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken Meaning that the transaction requesting node is not selected as a collaborator,indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k。
8. The method of claim 7, wherein the method for selecting trusted offload cooperative nodes in the sensor edge cloud blockchain network is for a transaction φj=(Dj,Yj,Υj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkWhen, transaction connection at h hopThe calculation task of (1) offloads the transaction cost, which includes time delay and credibility tolerance, and is calculated according to the following method:
wherein λ isdTolerance parameter, lambda, for delay times in off-loading transactions for multi-hop computing taskssA credibility tolerance parameter of the intelligent contract;in order to calculate the task offload transmission latency,xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1Processing; if xk+1,k1 indicates that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power; gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;task queue wait time for all nodes in the h-th hop transaction connection,representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,the average arrival rate of task unloading, wherein M represents the unloading times in the h-th hop transaction connection; x is the number ofj1 indicates successful unloading, otherwise 0; i is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
9. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 7, wherein the objective of the reinforcement learning in the step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:min Co
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoFor computing tasksOff-loading transaction costs, av={av,1,av,2,...,av,εIs the action set, SC ═ r (t) | t ∈ [ t }min,tmax]The intelligent contract is obtained;
the accumulated reward function adopted by the reinforcement learning is as follows:
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h;
The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1|π*) To obtain an optimum strategy pi*The state value function of time, defined as:
the updating experience cache specifically comprises the following steps: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h。
10. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 8, wherein the step (3) solves the optimization strategy through a model-free reinforcement learning algorithmThe method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)
wherein P is in state sv,kWhen, the action taken is av,kTheta is a policy network parameter;
the optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kTrade with highest probability of 1The node is solved to be used as a cooperative node, and the transaction request initiated by the node is confirmed;
the instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h;
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
11. The method for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network as claimed in claim 8, comprising the steps of: (4) caching learning parameters of update value functions according to experienceAnd a task offload parameter θ;
learning parameters of the update value functionThe method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
the updating of the task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by the equation:
wherein ξpFor learning rate, G ═ γ r1+γ2r2+., calculating the discount return cost according to the historical instant reward stored in the experience cache, wherein gamma is a discount factor; function of estimated valueThe value of (A) preferably adopts a parameterAnd (4) estimating a value network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011276468.5A CN112202928B (en) | 2020-11-16 | 2020-11-16 | Credible unloading cooperative node selection system and method for sensing edge cloud block chain network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011276468.5A CN112202928B (en) | 2020-11-16 | 2020-11-16 | Credible unloading cooperative node selection system and method for sensing edge cloud block chain network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112202928A CN112202928A (en) | 2021-01-08 |
CN112202928B true CN112202928B (en) | 2022-05-17 |
Family
ID=74033564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011276468.5A Active CN112202928B (en) | 2020-11-16 | 2020-11-16 | Credible unloading cooperative node selection system and method for sensing edge cloud block chain network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112202928B (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112887272B (en) * | 2021-01-12 | 2022-06-28 | 绍兴文理学院 | Device and method for controlling ore excavation attack surface in sensing edge cloud task unloading |
CN112804107B (en) * | 2021-01-28 | 2023-04-28 | 南京邮电大学 | Hierarchical federal learning method for self-adaptive control of energy consumption of Internet of things equipment |
CN112783662A (en) * | 2021-02-18 | 2021-05-11 | 绍兴文理学院 | CPU resource trusted sharing system in sensing edge cloud task unloading of integrated block chain |
CN113052331A (en) * | 2021-02-19 | 2021-06-29 | 北京航空航天大学 | Block chain-based Internet of things personalized federal learning method |
CN113222118B (en) * | 2021-05-19 | 2022-09-09 | 北京百度网讯科技有限公司 | Neural network training method, apparatus, electronic device, medium, and program product |
CN113344255B (en) * | 2021-05-21 | 2024-03-19 | 北京工业大学 | Vehicle-mounted network application data transmission and charging optimization method based on mobile edge calculation and block chain |
CN113419849A (en) * | 2021-06-04 | 2021-09-21 | 国网河北省电力有限公司信息通信分公司 | Edge computing node selection method and terminal equipment |
CN113676954B (en) * | 2021-07-12 | 2023-07-18 | 中山大学 | Large-scale user task unloading method, device, computer equipment and storage medium |
CN113537518B (en) * | 2021-07-19 | 2022-09-30 | 哈尔滨工业大学 | Model training method and device based on federal learning, equipment and storage medium |
CN113570039B (en) * | 2021-07-22 | 2024-02-06 | 同济大学 | Block chain system based on reinforcement learning optimization consensus |
CN113645702B (en) * | 2021-07-30 | 2022-06-03 | 同济大学 | Internet of things system supporting block chain and optimized by strategy gradient technology |
CN113590328B (en) * | 2021-08-02 | 2023-06-27 | 重庆大学 | Edge computing service interaction method and system based on block chain |
CN114172558B (en) * | 2021-11-24 | 2024-01-19 | 上海大学 | Task unloading method based on edge calculation and unmanned aerial vehicle cluster cooperation in vehicle network |
CN113887748B (en) * | 2021-12-07 | 2022-03-01 | 浙江师范大学 | Online federal learning task allocation method and device, and federal learning method and system |
CN114301911B (en) * | 2021-12-17 | 2023-08-04 | 杭州谐云科技有限公司 | Task management method and system based on edge-to-edge coordination |
CN115022894B (en) * | 2022-06-08 | 2023-12-19 | 西安交通大学 | Task unloading and computing resource allocation method and system for low-orbit satellite network |
CN115756873B (en) * | 2022-12-15 | 2023-10-13 | 北京交通大学 | Mobile edge computing and unloading method and platform based on federation reinforcement learning |
CN116978509B (en) * | 2023-09-22 | 2023-12-19 | 山东百康云网络科技有限公司 | Electronic prescription circulation method |
CN117610644B (en) * | 2024-01-19 | 2024-04-16 | 南京邮电大学 | Federal learning optimization method based on block chain |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020044353A1 (en) * | 2018-08-30 | 2020-03-05 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for collaborative task offloading automation in smart containers |
CN111124531A (en) * | 2019-11-25 | 2020-05-08 | 哈尔滨工业大学 | Dynamic unloading method for calculation tasks based on energy consumption and delay balance in vehicle fog calculation |
CN111274035A (en) * | 2020-01-20 | 2020-06-12 | 长沙市源本信息科技有限公司 | Resource scheduling method and device in edge computing environment and computer equipment |
CN111447512A (en) * | 2020-03-09 | 2020-07-24 | 重庆邮电大学 | Energy-saving method for edge cloud unloading |
CN111835827A (en) * | 2020-06-11 | 2020-10-27 | 北京邮电大学 | Internet of things edge computing task unloading method and system |
-
2020
- 2020-11-16 CN CN202011276468.5A patent/CN112202928B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020044353A1 (en) * | 2018-08-30 | 2020-03-05 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for collaborative task offloading automation in smart containers |
CN111124531A (en) * | 2019-11-25 | 2020-05-08 | 哈尔滨工业大学 | Dynamic unloading method for calculation tasks based on energy consumption and delay balance in vehicle fog calculation |
CN111274035A (en) * | 2020-01-20 | 2020-06-12 | 长沙市源本信息科技有限公司 | Resource scheduling method and device in edge computing environment and computer equipment |
CN111447512A (en) * | 2020-03-09 | 2020-07-24 | 重庆邮电大学 | Energy-saving method for edge cloud unloading |
CN111835827A (en) * | 2020-06-11 | 2020-10-27 | 北京邮电大学 | Internet of things edge computing task unloading method and system |
Non-Patent Citations (4)
Title |
---|
A Markov Detection Tree-Based Centralized Scheme to Automatically Identify Malicious Webpages on Cloud Platforms;JIANHUA LIU,SHIGEN SHEN,MENGDA XU,XIN WANG,MINGLU LI;《IEEE Access》;20181227;第6卷;全文 * |
基于车联网和移动边缘计算的时延可容忍数据传输;李萌等;《北京工业大学学报》;20180122(第04期);全文 * |
社会属性感知的边缘计算任务调度策略;王汝言等;《电子与信息学报》;20200115(第01期);全文 * |
边缘计算可信协同服务策略建模;乐光学等;《计算机研究与发展》;20200515(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112202928A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112202928B (en) | Credible unloading cooperative node selection system and method for sensing edge cloud block chain network | |
Kang et al. | Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory | |
CN112348204B (en) | Safe sharing method for marine Internet of things data under edge computing framework based on federal learning and block chain technology | |
Asheralieva et al. | Reputation-based coalition formation for secure self-organized and scalable sharding in iot blockchains with mobile-edge computing | |
Zhong et al. | On designing incentive-compatible routing and forwarding protocols in wireless ad-hoc networks: an integrated approach using game theoretical and cryptographic techniques | |
Zou et al. | Reputation-based regional federated learning for knowledge trading in blockchain-enhanced IoV | |
Wang et al. | A novel reputation-aware client selection scheme for federated learning within mobile environments | |
CN113660668B (en) | Seamless trusted cross-domain routing system of heterogeneous converged network and control method thereof | |
Kong et al. | A reliable and efficient task offloading strategy based on multifeedback trust mechanism for IoT edge computing | |
Xu et al. | Deep reinforcement learning assisted edge-terminal collaborative offloading algorithm of blockchain computing tasks for energy Internet | |
CN111262947A (en) | Calculation-intensive data state updating implementation method based on mobile edge calculation | |
CN113626104B (en) | Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture | |
Fu et al. | An incentive mechanism of incorporating supervision game for federated learning in autonomous driving | |
CN116566838A (en) | Internet of vehicles task unloading and content caching method with cooperative blockchain and edge calculation | |
Sethi et al. | FedDOVe: A Federated Deep Q-learning-based Offloading for Vehicular fog computing | |
CN115034390A (en) | Deep learning model reasoning acceleration method based on cloud edge-side cooperation | |
Lan et al. | Deep reinforcement learning for computation offloading and caching in fog-based vehicular networks | |
CN116669111A (en) | Mobile edge computing task unloading method based on blockchain | |
CN112783662A (en) | CPU resource trusted sharing system in sensing edge cloud task unloading of integrated block chain | |
Raja et al. | A Trusted distributed routing scheme for wireless sensor networks using block chain and jelly fish search optimizer based deep generative adversarial neural network (Deep-GANN) technique | |
Zhang et al. | Multiaccess edge integrated networking for Internet of Vehicles: A blockchain-based deep compressed cooperative learning approach | |
Jain et al. | Blockchain enabled trusted task offloading scheme for fog computing: A deep reinforcement learning approach | |
Wang et al. | Eidls: An edge-intelligence-based distributed learning system over internet of things | |
Shaodong et al. | Multi-step reinforcement learning-based offloading for vehicle edge computing | |
CN112910716B (en) | Mobile fog calculation loss joint optimization system and method based on distributed DNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |