CN112202928B - Credible unloading cooperative node selection system and method for sensing edge cloud block chain network - Google Patents

Credible unloading cooperative node selection system and method for sensing edge cloud block chain network Download PDF

Info

Publication number
CN112202928B
CN112202928B CN202011276468.5A CN202011276468A CN112202928B CN 112202928 B CN112202928 B CN 112202928B CN 202011276468 A CN202011276468 A CN 202011276468A CN 112202928 B CN112202928 B CN 112202928B
Authority
CN
China
Prior art keywords
transaction
node
task
hop
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011276468.5A
Other languages
Chinese (zh)
Other versions
CN112202928A (en
Inventor
刘建华
沈士根
方朝曦
黄龙军
李琪
冯晟
方曙琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shaoxing
Original Assignee
University of Shaoxing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shaoxing filed Critical University of Shaoxing
Priority to CN202011276468.5A priority Critical patent/CN112202928B/en
Publication of CN112202928A publication Critical patent/CN112202928A/en
Application granted granted Critical
Publication of CN112202928B publication Critical patent/CN112202928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y10/00Economic sectors
    • G16Y10/75Information technology; Communication
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/50Safety; Security of things, users, data or systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Abstract

The invention discloses a system and a method for selecting a credible unloading cooperative node of a sensing edge cloud block chain network. The system includes a sensing cloud edge node, and a tile created in the edge node, the edge node and the tile chain forming an edge DAG tile chain network. The method comprises the following steps: (1) acquiring a training task issued by a training task issuing node; (2) taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain; (3) and adopting reinforcement learning planning to calculate an optimization strategy of the task unloading path in the DAG block chain according to the transaction state of each edge node, and formulating an unloading path action set according to the optimization strategy. Aiming at multi-hop computation task cooperative unloading, the invention establishes a multi-hop computation task cooperative unloading model based on an edge DAG block chain, and nodes participating in computation task cooperative unloading are registered in an edge DAG block chain network to cooperatively complete multi-hop distributed federal learning tasks.

Description

Credible unloading cooperative node selection system and method for sensing edge cloud block chain network
Technical Field
The invention belongs to the technical field of Internet of things, and particularly relates to a system and a method for selecting a trusted unloading cooperative node of a sensing edge cloud block chain network.
Background
In order to reduce the dependence of the unloading of computing tasks in the sensing edge cloud on the remote cloud, the computing tasks are unloaded among the edge nodes in a multi-hop cooperation mode to complete the training of the model. The multi-hop computation task cooperation unloading process comprises two stages of computation task transmission and model distributed training, has the characteristics of good expansibility, strong robustness and the like, and can better support distributed federal learning of the computation tasks and protect private data of nodes. However, as the number of hops and nodes increases, a number of challenges are also presented to optimize the quality of service for the trusted offload of computing tasks.
The model training mode based on multi-hop calculation task unloading can effectively avoid single-point failure, fully utilizes local data of edge nodes to train the model in a distributed mode, and can effectively improve the performance of federal learning. However, distributed federal learning by means of multi-hop computing task offload techniques faces security issues. Due to the selfness of the edge nodes, the training of the computing task cannot reach the expected training precision, or malicious nodes modify the trained model to cheat the nodes which cooperate with each other, and mislead the next-hop node to continue to perform inefficient training. This makes the behavior of the edge node participating in the cooperation unreliable, resulting in that low-delay credible cooperation cannot be performed, and reduces the service quality of computation task offloading and distributed federal learning. Therefore, a key challenge in solving this problem is how to balance the coordination decision between the offloading delay of the multi-hop computing task of the edge node and the trusted collaboration, so as to improve the service quality of the offloading path of the multi-hop computing task. In the face of this challenge, researchers have proposed some methods for collaborative offloading of computing tasks. Yan et al consider the task graph of a single-user edge computing system and propose a reinforcement learning framework to optimize the offloading decision of tasks at local or edge nodes and the resource allocation problem, but this scheme does not consider the multi-hop computing taskService unloading scene (the "Offloading and resource allocation With general task hierarchy in Mobile edge computing: A Deep requirement learning approach," in IEEETransactiononson Wirelesscommunications,vol.19, No.8, pp.5404-5419, aug.2020). Hong et al model the optimization problem of the computation task offload path including Edge nodes and Cloud nodes as a Multi-Hop computation task offload game, and propose a QoS-aware Distributed algorithm, but do not consider the trust problem of inter-node cooperative offload ("Multi-Hop cooperative offload for Industrial IoT-Edge-Cloud Computing environment," in IEEE Transactions on Parallel and Distributed Systems),vol.30, No.12, pp.2759-2774,1 dec.2019). L. xiao et al propose a trust mechanism based on block chain to resist selfish edge attack and spoofing record attack, and enhance security of computation task offloading between mobile device and edge node by a method of establishing reputation, but do not achieve security of multi-hop computation task offloading between edge nodes ("attention learning and block-based trust for creating network Networks," in ieee transaction communications Networks,vol.68, No.9, pp.5460-5470, Sept.2020). These research protocols also suffer from the following deficiencies:
(1) the proposed solution considers the multi-hop computation task offloading and the cooperative training among the multi-hop edge nodes less, but only considers the single-hop computation task offloading performance from the sensing device to the edge nodes, and cannot support the multi-hop distributed federal learning. Therefore, the proposed solution has limitations to be applied in multi-hop distributed computation task cooperative offloading.
(2) The proposed solution does not combine the block chain technique to achieve the trusted cooperative offloading of the multi-hop computing task. Particularly, with the increase of the number of nodes and the number of hops in the multi-hop computing task unloading, the credible cooperation and unloading delay decision space among the nodes is increased, and the existing solution does not provide a corresponding processing method.
(3) The existing solution does not consider the situation that an intelligent attacker uses means such as increasing computing time and modifying a model to attack the multi-hop computing task unloading node, and does not provide a trusted cooperative node selection method in the multi-hop computing task unloading aiming at the type of attack.
Disclosure of Invention
In order to solve the defects of the method, the invention provides a multi-hop computation task unloading method based on a DAG block chain in a sensing edge cloud environment, and the method realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time of a malicious edge node, model modification and other incredible behaviors.
To achieve the above object, according to one aspect of the present invention, a system for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network is provided, where the system includes a sensor cloud edge node and a block created in the edge node, and the edge node and the blockchain form an edge DAG blockchain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;
the block of the edge node stores a model, training time and model size of the training task which can not be changed;
the blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value of
Figure BDA00027792007500000211
The transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of the training task of each node are recorded, and the trust of each node is updated.
Preferably, the transaction request node of the trusted offload cooperative node selection system in the sensor edge cloud block chain network is configured to initiate a transaction to another sensor cloud edge nodeRequest phij=(Dj,Yjj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yjj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;
the intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
Preferably, the system for selecting trusted offload cooperative nodes in the sensing edge cloud block chain network comprises a policy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, and preferably a DNN network.
Preferably, in the system for selecting a trusted offload cooperative node in a sensing edge cloud block chain network, the inputs of the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein
Figure BDA0002779200750000021
Figure BDA0002779200750000022
A state of the intelligent contract is represented,
Figure BDA0002779200750000023
the representation is in compliance with a smart contract,
Figure BDA0002779200750000024
a violation of the intelligent contract is indicated,
Figure BDA0002779200750000025
indicating that the task offload delay time is calculated on the h-th hop of the transaction connection
Figure BDA0002779200750000026
Whether it is short or long, when
Figure BDA0002779200750000027
If so, the delay time is long, otherwise, the delay time is short;
Figure BDA0002779200750000028
in order to calculate the task offload transmission latency,
Figure BDA0002779200750000029
offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,ktTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,kt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,hPreference is given toTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure BDA00027792007500000210
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA0002779200750000031
wherein ξpFor learning rate, G ═ γ r12r2+.. cost of return on discount, r1,r2,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated value
Figure BDA0002779200750000032
Preferably, the value of (A) is a parameter of
Figure BDA0002779200750000033
And (4) estimating a value network.
Preferably, the system for selecting trusted offload cooperative nodes in sensing edge cloud block chain network comprises a value network, preferably a DNN network, whose input is the transaction state of the transaction response node and output is a value
Figure BDA0002779200750000034
With network parameters of
Figure BDA0002779200750000035
The update equation is as follows:
Figure BDA0002779200750000036
wherein ξvIs the learning rate.
The value network is iteratively updated using the square of the error using a loss function of
Figure BDA0002779200750000037
Figure BDA0002779200750000038
According to another aspect of the present invention, a method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is provided, which includes the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain;
(3) trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning
Figure BDA0002779200750000039
According to an optimization strategy
Figure BDA00027792007500000310
Set of actions for creating offload paths
Figure BDA00027792007500000311
Establishing a transaction connection tau conforming to an intelligent contract between the transaction request node and the transaction response node of each hop, thereby forming a task unloading path;
wherein the optimization strategy
Figure BDA00027792007500000312
Pr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),
Figure BDA00027792007500000313
responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken
Figure BDA00027792007500000314
Figure BDA00027792007500000315
Meaning that the transaction requesting node is not selected as a collaborator,
Figure BDA00027792007500000316
indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k
Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network is used for transaction phij=(Dj,Yjj) Wherein D isjThe size of the model is represented, and the unit is bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkThen, the transaction cost is unloaded by the calculation task on the h-th hop transaction connection, which comprises time delay and credible tolerance, and the calculation is carried out according to the following method:
Figure BDA00027792007500000317
wherein λ isdOffloading transactions for multi-hop computing tasksTolerance parameter of in-process delay time, λsA credibility tolerance parameter of the intelligent contract;
Figure BDA0002779200750000041
in order to calculate the task offload transmission latency,
Figure BDA0002779200750000042
xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,
Figure BDA0002779200750000043
Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;
Figure BDA0002779200750000044
the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;
Figure BDA0002779200750000045
task queue wait time for all nodes in the h-th hop transaction connection,
Figure BDA0002779200750000046
representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,
Figure BDA0002779200750000047
for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.
Figure BDA0002779200750000048
Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
Preferably, in the method for selecting a trusted offload cooperative node in the sensing edge cloud block chain network, the objective of the reinforcement learning in step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
Figure BDA0002779200750000049
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoTo calculate task offload transaction costs, av={av,1,av,2,...,av,εIs the action set, SC ═ l (t) | t ∈ [ t }min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
Figure BDA00027792007500000410
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h
Preferably, a greedy algorithm is adopted to regard the strategy optimization as a Markov process, and a maximum-time action strategy pi of the instantaneous reward function of each jump is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
Figure BDA00027792007500000411
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1*) To obtain an optimum strategy pi*The state value function of time, defined as:
Figure BDA0002779200750000051
preferably, in the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, the optimization strategy is solved through a model-free reinforcement learning algorithm in the step (3)
Figure BDA0002779200750000052
The method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain a current strategy network, namely taking the last updated task unloading parameter theta as the task unloading parameter theta of the current strategy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
Figure BDA0002779200750000053
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,kt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
Preferably, the policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
Preferably, in the method for selecting a trusted offload cooperative node in the sensor edge cloud block chain network, the updating the experience cache in step (3) is specifically: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h
Preferably, the method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network,it (4) caches learning parameters of the update value function according to experience
Figure BDA0002779200750000054
And a task offload parameter θ;
learning parameters of the update value function
Figure BDA0002779200750000055
The method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
Figure BDA0002779200750000056
wherein ξvIs the learning rate, the loss function is
Figure BDA0002779200750000057
Figure BDA0002779200750000058
Is the output result of the value function.
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure BDA0002779200750000059
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA0002779200750000061
wherein ξpFor learning rate, G ═ γ r12r2+.. report cost back by discounting, caching based on experienceCalculating the stored historical instant reward, wherein gamma is a discount factor; function of estimated value
Figure BDA0002779200750000062
The value of (A) preferably adopts a parameter
Figure BDA0002779200750000063
And (4) estimating a value network.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and multi-hop distributed federal learning tasks are completed in a cooperation mode.
In order to realize low-delay and credible multi-hop computation task cooperative unloading, a multi-hop computation task unloading delay cost function and an intelligent contract model in an edge DAG block chain network are established.
In order to solve the problem of confirmation and selection of transaction nodes in a multi-hop computation task unloading path, the invention models the problem into a Markov decision process for reverse transaction request node selection based on a DAG block chain, and further provides a collaborative node selection algorithm in multi-hop computation task unloading based on reinforcement learning.
Drawings
Fig. 1 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network provided by the present invention;
fig. 2 is a schematic structural diagram of a trusted offload cooperative node selection system in a sensing edge cloud block chain network according to an embodiment of the present invention.
The same reference numbers will be used throughout the drawings to refer to the same or like elements or structures, wherein:
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The system for selecting the trusted offload cooperative node in the sensing edge cloud block chain network, as shown in fig. 1, includes sensing cloud edge nodes and blocks created in the edge nodes, where the edge nodes and the block chain form an edge DAG block chain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; preferably, the policy network, more preferably, the value network;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the transaction request node is used for initiating a transaction request phi to other sensing cloud edge nodesj=(Dj,Yjj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yjj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, the transaction request is confirmed, and the transaction request confirmation and the intelligent and the required number of bitcoins are sent to the transaction request node.
The intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
The blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value of
Figure BDA0002779200750000071
The transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of the training task of each node are recorded, and the trust of each node is updated.
The optimization strategy is preferably obtained by solving using an enhanced learning model, and preferably can be solved by adopting a strategy network, as shown in fig. 2.
The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, preferably a model-free reinforcement learning structure, preferably a DNN network; specifically, the method comprises the following steps:
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein
Figure BDA0002779200750000072
Figure BDA0002779200750000073
A state of the intelligent contract is represented,
Figure BDA0002779200750000074
the representation is in compliance with a smart contract,
Figure BDA0002779200750000075
a violation of the intelligent contract is indicated,
Figure BDA0002779200750000076
indicating that the task offload delay time is calculated on the h-th hop of the transaction connection
Figure BDA0002779200750000077
Whether it is short or long, when
Figure BDA0002779200750000078
If so, the delay time is long, otherwise, the delay time is short;
Figure BDA0002779200750000079
in order to calculate the task offload transmission latency,
Figure BDA00027792007500000710
offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,ktTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,kt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C of the sensing cloud edge nodev,hPreferably, a random gradient descent method is adopted to train and update a performance function of the multi-hop calculation task unloading strategy parameter theta, and the performance function is specifically as follows:
Figure BDA00027792007500000711
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA00027792007500000712
wherein ξpFor learning rate, G ═ γ r12r2+.. return cost for discount, r1,r2,.. historical instant rewards read from historical instant rewards stored in a passing cache, wherein gamma is a discount factor; function of estimated value
Figure BDA00027792007500000713
Preferably, the value of (A) is a parameter of
Figure BDA00027792007500000714
And (4) estimating a value network.
The value network, preferably a DNN network, has as input the transaction status of the transaction response node and as output a value
Figure BDA00027792007500000715
With network parameters of
Figure BDA00027792007500000716
The update equation is as follows:
Figure BDA00027792007500000717
wherein ξvIs the learning rate.
The value network is iteratively updated using the square of the error using a loss function of
Figure BDA00027792007500000718
Figure BDA00027792007500000719
The invention provides a multi-hop computation task unloading method based on a DAG block chain under a sensing edge cloud environment, which realizes low-delay credible cooperative unloading on the basis of considering the increasement of computation time, model modification and other incredible behaviors of a malicious edge node.
The invention provides a method for selecting a trusted unloading cooperative node in a sensing edge cloud block chain network, which comprises the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering to a DAG block chain;
for transaction phij=(Dj,Yjj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkAnd then, calculating task unloading transaction cost on the h-th hop of transaction connection, wherein the task unloading transaction cost comprises time delay and credibility tolerance, and calculating according to the following method:
Figure BDA0002779200750000081
wherein λ isdTolerance parameter, lambda, of delay time in offloading transactions for a multi-hop computing tasksA credibility tolerance parameter of the intelligent contract;
Figure BDA0002779200750000082
in order to calculate the task offload transmission latency,
Figure BDA0002779200750000083
xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,
Figure BDA0002779200750000084
Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkDenotes the transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;
Figure BDA0002779200750000085
the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;
Figure BDA0002779200750000086
task queue wait time for all nodes in the h-th hop transaction connection,
Figure BDA0002779200750000087
representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,
Figure BDA0002779200750000088
for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks already present in the current edge DAG blockchain trading nodezhService parameter is deltahPoisson distribution of, i.e.
Figure BDA0002779200750000089
Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
The edge DAG block chain Gb=(Vb,Eb) Is a directed task graph, where VbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe two parties are connected for the transaction of the participants, namely, the two parties conduct the transaction according to the preset intelligent contract.
(3) Trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning
Figure BDA0002779200750000091
According to an optimization strategy
Figure BDA0002779200750000092
Set of actions to formulate offload paths
Figure BDA0002779200750000093
Establishing a transaction connection tau which accords with an intelligent contract between the transaction request node and the transaction response node of each hop so as to form a task unloading path;
wherein the optimization strategy
Figure BDA0002779200750000094
Pr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),
Figure BDA0002779200750000095
responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkAdoptOptimal confirmation selection action set, action
Figure BDA0002779200750000096
Figure BDA0002779200750000097
Meaning that the transaction requesting node is not selected as a collaborator,
Figure BDA0002779200750000098
indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k
The goal of reinforcement learning is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
Figure BDA0002779200750000099
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoTo calculate task offload transaction costs, av={av,1,av,2,...,av,εIs action set, SC ═ { l (t) | t ∈ [ t ])min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
Figure BDA00027792007500000910
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h
Preferably, a greedy algorithm is adopted to regard the strategy optimization as a Markov process, and a maximum-time action strategy pi of the instantaneous reward function of each jump is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
Figure BDA00027792007500000911
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1*) To obtain an optimum strategy pi*The state value function of time, defined as:
Figure BDA00027792007500000912
solving optimization strategies preferably by model-free reinforcement learning algorithms
Figure BDA00027792007500000913
The method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
Figure BDA0002779200750000101
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,kt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
Preferably, the policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r is a radical of hydrogenh(sv,k+1,av,k,sv,k)=-Cv,h
The updating the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
The updating experience cache specifically comprises: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h
(4) Caching learning parameters of update value functions according to experience
Figure BDA0002779200750000102
And a task offload parameter θ;
learning parameters of the update value functionNumber of
Figure BDA0002779200750000103
The method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
Figure BDA0002779200750000104
wherein xi isvIs the learning rate, the loss function is
Figure BDA0002779200750000105
Figure BDA0002779200750000106
Is the output result of the value function.
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure BDA0002779200750000107
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA0002779200750000108
wherein ξpFor learning rate, G ═ γ r12r2+.. calculating the discount return cost according to the historical instantaneous reward stored in the experience cache, wherein gamma is a discount factor; function of estimated value
Figure BDA0002779200750000109
The value of (A) preferably adopts a parameter
Figure BDA00027792007500001010
And (4) estimating a value network.
Aiming at a multi-hop computation task cooperation unloading scene, a multi-hop computation task cooperation unloading model based on an edge DAG block chain is established, nodes participating in computation task cooperation unloading are registered in an edge DAG block chain network, and a multi-hop distributed federal learning task is completed in a cooperation mode.
In order to realize low-delay and credible multi-hop computation task cooperative unloading, a multi-hop computation task unloading delay cost function and an intelligent contract model in an edge DAG block chain network are established.
In order to solve the problem of confirmation and selection of transaction nodes in a multi-hop computation task unloading path, the invention models the problem into a Markov decision process for reverse transaction request node selection based on a DAG block chain, and further provides a collaborative node selection algorithm in multi-hop computation task unloading based on reinforcement learning.
The invention designs a credible cooperative node selection method in multi-hop computing task unloading by combining a block chain technology and a reinforcement learning algorithm. The method comprises the steps of firstly establishing an edge DAG block chain network according to a DAG (directedacyclinograph) graph unloaded in a multi-hop computing task cooperation mode. Then, a Markov decision process is formed by formalizing the problem of selecting the credible cooperative nodes in the multi-hop computing task unloading. On the basis of considering the dynamics of the unloading transaction connection of the computing task of the edge node and the selfishness of the node, in order to select the credible cooperative unloading node, the invention provides a multi-hop computing task unloading transaction node selection algorithm based on reinforcement learning by combining with the block chain technology. Thereby improving the credible service quality of the multi-hop computing task unloading.
The following are examples:
the system for selecting the credible unloading cooperative node in the sensing edge cloud block chain network comprises sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain Gb=(Vb,Eb) In which V isbOffloading for participating in a computing taskCarrying edge block chain nodes of transaction, and taking the edge block chain nodes as transaction request nodes and transaction response nodes when the calculation task is unloaded; ebThe transaction connection established for the h hop is tau, namely, both parties which can only carry out transaction according to a preset contract are observed; also included are a policy network, and a value network;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the transaction request node is used for initiating a transaction request phi to other sensing cloud edge nodesj=(Dj,Yjj) When receiving the confirmation of the transaction request, updating the trust level; the transaction request phij=(Dj,Yjj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjAnd the number of bitcoins of the unit resource value of the edge block chain node is consumed for training the model.
The transaction response node is used for reversely judging the credibility of the transaction according to the intelligent contract when the transaction request is received, and judging that the credibility is low and rejecting the transaction request when the transaction unloaded by the transaction request fails to realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending transaction request confirmation and intelligent and required number of bitcoins to the transaction request node.
The intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, wherein the larger the value l (t) is, the higher the degree of compliance of the intelligent contract for task unloading transaction is calculated; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
The blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value of
Figure BDA0002779200750000119
Is mapped toAnd establishing transaction connection by taking the transaction request node with the highest probability as a cooperative node, taking the transaction response node as a transaction request node of the next hop, recording a model, training time and model size of the completion of the training task of each node, and updating the trust of each node.
The optimization strategy is obtained by solving the optimization strategy by using an enhanced learning model and solving the optimization strategy by using a strategy network.
The strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network, is a model-free reinforcement learning structure and adopts a DNN network; specifically, the method comprises the following steps:
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe trade state of the node is defined, k is a trade response node subscript, and the maximum value of the node is equal to the maximum hop number; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein
Figure BDA0002779200750000111
Figure BDA0002779200750000112
A state of the intelligent contract is represented,
Figure BDA0002779200750000113
the representation is in compliance with a smart contract,
Figure BDA0002779200750000114
a violation of the intelligent contract is indicated,
Figure BDA0002779200750000115
indicating that the task offload delay time is calculated on the h-th hop of the transaction connection
Figure BDA0002779200750000116
Whether it is short or long, when
Figure BDA0002779200750000117
If so, the delay time is long, otherwise, the delay time is short;
Figure BDA0002779200750000118
in order to calculate the task offload transmission latency,
Figure BDA0002779200750000121
offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,ktTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,kt=θ);
The adopted reward function for training the strategy network is as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,hTraining and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure BDA0002779200750000122
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA0002779200750000123
wherein xi ispFor learning rate, G ═ γ r12r2+.. cost of return on discount, r1,r2,.. historical instant rewards read from a verified cache for storageThe historical instant reward of (1), gamma is a discount factor; function of estimated value
Figure BDA0002779200750000124
Preferably, the value of (A) is a parameter of
Figure BDA0002779200750000125
And (4) estimating a value network.
The value network is a DNN network, the input of the value network is the transaction state of the transaction response node, and the output of the value network is a value
Figure BDA0002779200750000126
With network parameters of
Figure BDA0002779200750000127
The update equation is as follows:
Figure BDA0002779200750000128
wherein ξvIs the learning rate.
The value network is iteratively updated using the square of the error using a loss function of
Figure BDA0002779200750000129
Figure BDA00027792007500001210
Due to the high-delay characteristic that the computing task is unloaded to the remote cloud node, in order to reduce the unloading cost of the computing task, the computing task of the sensing equipment is unloaded to the edge nodes, and the invention considers that the edge nodes process the computing task gamma in a distributed mode in a multi-hop cooperation moden={wn} of the position of the frame. Wherein wnRepresenting the model that needs to be trained. When computing task gammanIs offloaded to a plurality of edge nodes for each to-be-received model wnThe next-hop edge node firstly confirms the workload certification of model training and then unloads the next-hop edge node to the edge node after confirmationAnd continuing training. For a computation task ΓnN edge nodes form a multi-hop computation task unloading path and participate in task training successively. Thus, the multi-hop computing task offload process can be represented as a directed task graph Ga=(Va,Ea) In which V isaRepresenting edge nodes that are associated with a trusted context for task offloading, such as: execution time, queue time, model training results, and the like. EaRepresenting offload connections between edge nodes, which is related to compute task offload transfer rates and transfer times between edge nodes. In order to record the training result of each edge node and prevent malicious nodes from modifying the trained model data, the invention defines a DAG block chain on the basis of the task graph, the blocks are created in the edge nodes, the edge nodes and the block chain are integrated into an edge DAG block chain network, the nodes in the network are called edge block chain nodes, and the edge nodes and the block chain are modeled into a directed graph G of the integrated DAG block chainb=(Vb,Eb) As shown in FIG. 1, wherein VbThe border area blockchain node which participates in a computation task unloading transaction has two roles, a computation task unloading transaction request node and a transaction response node. EbThe method represents the transaction connection of the participants, and the two parties participating in the transaction can establish the transaction connection only by following a certain intelligent contract.
After a model is trained by a computation task unloading request node, the model is stored in a block to ensure that the model cannot be changed. And meanwhile, the training duration and the size of the model are automatically recorded in the block, then a calculation task unloading transaction is initiated to the transaction response node, when a transaction responder receives a transaction request unloaded by the transaction request node, the responder firstly reversely confirms whether the transaction is credible according to the intelligent contract, and if the transaction response node finds that the unloaded transaction cannot realize the condition in the intelligent contract, the transaction confirmation is failed, and other transaction request nodes are selected to continue to confirm. Otherwise, the transaction is confirmed to be passed, the transaction response node sends a certain bitcoin to the transaction request node to serve as model reward, and meanwhile the trust degree of the transaction request node is updated.
The method for selecting the trusted offload cooperative node in the sensing edge cloud block chain network provided by the embodiment comprises the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering into a DAG block chain;
for transaction phij=(Dj,Yjj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkAnd then, calculating task unloading transaction cost on the h-th hop of transaction connection, wherein the task unloading transaction cost comprises time delay and credibility tolerance, and calculating according to the following method:
Figure BDA0002779200750000131
wherein λ isdTolerance parameter, lambda, for delay times in off-loading transactions for multi-hop computing taskssA credibility tolerance parameter of the intelligent contract;
Figure BDA0002779200750000132
in order to calculate the task offload transmission latency,
Figure BDA0002779200750000133
xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,
Figure BDA0002779200750000134
Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;
Figure BDA0002779200750000135
the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;
Figure BDA0002779200750000136
task queue wait time for all nodes in the h-th hop transaction connection,
Figure BDA0002779200750000137
representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,
Figure BDA0002779200750000138
for the average arrival rate of task offloading, M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.
Figure BDA0002779200750000141
Φh1-l (t), where l (t) represents the probability that the expected model training time t of the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the computation of anybody over the transaction connectionThe higher the compliance of the service offload transaction smart contract.
Unloading intelligent contracts and delay cost estimation of multi-hop computing tasks in an edge DAG block chain network:
block chain network graph G at edge DAGbIn the method, a calculation task unloading transaction request node initiates a transaction request phi to an adjacent marginal area block chain nodej=(Dj,Yjj) Wherein D isjRepresenting the size of the model in bits, YjIndicating the resources that need to be spent to complete the training task. Gamma rayjAnd indicating that the training model consumes bitcoins of unit resources of the edge block chain nodes. After the buyer of the transaction response node as the model confirms the training result, certain bitcoins are sent to the transaction request node to compensate the resource consumption. When the maximum training hop number of the training task issuing node is set to be epsilon, at least epsilon +1 nodes are required in the edge DAG block chain network participating in the unloading of the multi-hop computing task. However, due to the fact that the destructive behaviors (such as delay of calculation time, modification of models and the like) of the intelligent attacker make transaction connection, calculation behaviors and training results in the marginal block chain network become untrustworthy, the number of times of failure of transaction between the calculation task unloading transaction request and the response node is increased, and the transaction trust is reduced. Finally, the trust lifetime of the edge DAG blockchain network becomes smaller as the node trust level decays. To increase the trusted lifetime of the edge DAG blockchain network, the intelligent contract is triggered when the transaction response node confirms that the computing task offloads the transaction request node. At the moment, only the transaction meeting the intelligent contract can be used as a credible transaction, and the transaction response node reversely selects the transaction request node as a cooperative node on the calculation task unloading path. The invention takes the training time of the model as the workload proof of the edge DAG block link points. Thus, an intelligent contract for computing task offload transactions is defined as SC ═ { l (t) | t ∈ [ t ])min,tmax]L (t) represents the probability that the expected model training time t of the transaction response node falls in the credible interval, and the larger the value of l (t), the higher the degree of compliance of the intelligent contract for calculating task unloading transaction. t is tminAnd tmaxAre parameters that can be set according to the goal of the training. When in useWhen the transaction response node receives a transaction request, whether the training time of the model meets the credible interval requirement of the training time in the intelligent contract is searched and recorded in the block, and if the training time does not meet the credible interval requirement of the training time in the intelligent contract, the calculation task unloading transaction cannot be carried out. Within a one-hop compute task offload transaction connection, the set of transaction request nodes that a transaction response node may choose to acknowledge is defined as v ═ { v ═ vkThe invention considers the computation task unloading transmission delay time, the execution time of the computation task on the edge DAG block chain network node and the waiting time of the task queue in the computation task unloading transaction process.
The calculation of the task offload transfer delay time is calculated as follows:
since in an edge DAG blockchain network, a plurality of transaction request nodes initiate acknowledgement requests to response nodes, the change of the acknowledgement request channel state causes delay of the transmission time of the computation task unloading, in order to calculate the transmission time of the transaction request nodes unloading the computation task to the response nodes, the available unloading transmission rate of the computation task on a transaction connection tau is defined as:
Figure BDA0002779200750000142
wherein B represents a bandwidth, pkRepresenting transmission power, σ2Representing the noise power. gkRepresenting the channel gain for indicating the transmission loss from the transaction requesting node to the responding node. Therefore, the calculation task unloading transmission delay time from the transaction request node to the response node is as follows:
Figure BDA0002779200750000143
wherein x isk+1,kRepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1And (6) processing. If x k+1,k1 represents that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1k0 thus makes it possible to calculate the time of transmission and reception as
Figure BDA0002779200750000144
Further calculating the transmission time on the whole computation task unloading path as
Figure BDA0002779200750000145
The edge DAG block chain node task execution time is calculated as follows:
in the multi-hop computation task unloading, each edge DAG block chain node needs to complete workload certification through a model training task. The present disclosure assumes that the edge DAG block segment has χ kernels. The execution time of the task is
Figure BDA0002779200750000151
Wherein L isjIs the total computational load. f. ofcIs the service rate of each CPU core and is a configurable variable.
The edge DAG block nexus task queue latency is calculated as follows:
since the nodes participating in the transaction receive tasks offloaded by multiple transaction requesting nodes, the computational task offload delay time of the edge DAG blockchain network is also related to the amount of tasks in the current node receive queue. Existing task amount z in current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.
Figure BDA0002779200750000152
From this, it can be calculated that the average arrival rate of the computation task offload is:
Figure BDA0002779200750000153
where M represents the number of offloads in the h-th hop transaction connection. x is the number ofjA value of 1 indicates successful unloading, otherwise it is 0. I is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0. Since the currently processed task requires a certain training time to complete, each training task arriving in the queue needs to wait for the completion of the task already being processed before being processed. Therefore, the training task entering the queue needs to wait for a period of time to be processed, and the waiting time of the task queue of all nodes in the h-th hop transaction connection
Figure BDA0002779200750000154
Wherein the content of the first and second substances,
Figure BDA0002779200750000155
the number of resources required by all nodes in the h-th hop transaction connection to process the tasks in the queue is represented, and therefore, the total delay time for calculating the task unloading transaction is as follows:
Figure BDA0002779200750000156
the edge DAG block chain Gb=(Vb,Eb) Is a directed task graph, where VbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe two parties are connected for the transaction of the participants, namely, the two parties conduct the transaction according to the preset intelligent contract.
(3) Trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning
Figure BDA0002779200750000157
According to an optimization strategy
Figure BDA0002779200750000158
Set of actions to formulate offload paths
Figure BDA0002779200750000159
Establishing a transaction connection tau conforming to the intelligent contract between the transaction request node and the transaction response node of each hop so as to form a taskA traffic offload path;
wherein the optimization strategy
Figure BDA00027792007500001510
Pr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),
Figure BDA00027792007500001511
responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken
Figure BDA00027792007500001512
Figure BDA00027792007500001513
Meaning that the transaction requesting node is not selected as a collaborator,
Figure BDA00027792007500001514
means for selecting a transaction request node as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k
The goal of reinforcement learning is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:minCo
Figure BDA0002779200750000161
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
where Co is the computational task offload transaction cost, av={av,1,av,2,...,avE isAction set, SC ═ l (t) | t ∈ [ t ]min,tmax]Is an intelligent contract.
The accumulated reward function adopted by the reinforcement learning is as follows:
Figure BDA0002779200750000162
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h
The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
Figure BDA0002779200750000163
wherein, PTFor the transmission probability, gamma is the discounting factor, V(s)v,k+1*) To obtain an optimum strategy pi*The state value function of time, defined as:
Figure BDA0002779200750000164
the invention designs a method for selecting a trusted cooperative node in multi-hop computing task unloading based on a DAG block chain, which constructs a trusted transaction path by selecting a trusted computing task unloading node. Since in DAG blockchain based edge networks, the decision by a transaction response node whether to accept a task offload from a transaction request node depends on the confirmation of the trustworthiness of the previous transaction request node. Thus, the backward selection of transaction request nodes in multi-hop computing task offloading can be modeled as a Markov decision process, which can be defined as a tuple ΘM=(S,Av,k,Pr,Cv,h) Wherein, in the step (A),
1)S:S={sv,k∈S|S=sv,1,sv,2,...,sv,ndenotes the state space of transactions between edge DAG blockchain nodes, sv,kRepresenting a transaction requesting node vkThe status of the transaction when initiated.
Figure BDA0002779200750000165
Wherein the content of the first and second substances,
Figure BDA0002779200750000166
a state of the intelligent contract is represented,
Figure BDA0002779200750000167
the representation is in compliance with a smart contract,
Figure BDA0002779200750000168
a violation of the intelligent contract is indicated,
Figure BDA0002779200750000169
indicating that the task offload delay time is calculated on the h-th hop of the transaction connection
Figure BDA00027792007500001610
Whether it is short or long, when
Figure BDA00027792007500001611
If so, the delay time is long, otherwise, the delay time is short;
Figure BDA00027792007500001612
in order to calculate the task offload transmission latency,
Figure BDA00027792007500001613
offloading a transmission delay threshold for a preset computing task. In a computing task offload transaction, a transaction response node vk+1For transaction request node vkConfirmation is carried out, v after the confirmation is passedk+1Becomes the requesting node for the next hop to compute any offload transactions.
2)Av,k
Figure BDA00027792007500001614
Representing a possible action space. Wherein, av,kRepresenting a transaction response node v in a computing task offload transactionk+1For transaction request node vkThe affirmative selection action to be taken is,
Figure BDA00027792007500001615
Figure BDA00027792007500001616
meaning that the transaction requesting node is not selected as a collaborator,
Figure BDA00027792007500001617
indicating that the transaction request node is selected as a collaborator. State-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kTransaction response node vk+1Confirmation selection action ofv,k
3) Pr: represents a state sv,kTo action av,kThe mapping probability of (2). In an untrusted computing task offload environment, the goal of a responding node in a computing task offload transaction is to obtain an optimization strategy pi*I.e. by sv,kTo av,kThe mapping probability of (2). According to an optimization strategy pi*Transaction response node vk+1Confirmation selection action that can find optimum
Figure BDA0002779200750000171
The optimization strategy for the transaction connection set on the computation task unloading path is
Figure BDA0002779200750000172
The optimal confirmation selection action set of the transaction response node is as follows:
Figure BDA0002779200750000173
4)Cv,hrepresenting a transaction requesting node vkAnd a transaction response node vk+1The computational task on the h-th hop's transaction connection offloads the transaction cost. Including time delay and confidence tolerance, computerThe transaction offload transaction cost may be calculated as:
Figure BDA0002779200750000174
wherein λ isdAnd the tolerance parameter represents the delay time in the process of unloading the transaction of the multi-hop computing task. Lambda [ alpha ]sRepresenting a confidence tolerance parameter of the smart contract. Phih1-l (t). For an offloaded computing task, after setting a maximum number of training hops, a transaction node in the offload path of the computing task that is to be trusted confirms the selection of the action set
Figure BDA0002779200750000175
The trading nodes confirm that the multi-hop path formed by the elements in the selection action set should meet the delay sensitivity requirement and the intelligent contract condition of the calculation task unloading, and the transaction cost of the calculation task unloading is minimized.
In reinforcement learning, the search process of the optimization strategy can be modeled as a Markov decision process, and the invention makes thetaMFurther expanded to thetaRL=(S,Av,k,PT,rkγ), wherein S and Av,kIs thetaMState space and actions in (1). PTIs the probability of delivery. r ishRepresenting the instant prize function. γ is a discounting factor. Selecting a collaborative transaction requesting node using reinforcement learning to obtain a set of transaction node validation selection actions in a trusted computing task offload path
Figure BDA0002779200750000177
In the process of confirming and selecting the transaction nodes of the multi-hop computation task unloading path, transaction response nodes vk+1Firstly, all transaction request nodes v are observedkCurrent transaction state svK, selecting a transaction connection τ to perform a confirmation selection action a on its transaction statusv,k. Then, the transaction response node vk+1Earning a reward rh. Transaction response node randomly selecting an action a using a greedy search strategyv,k~π(sv,k) To confirm the transaction connection tau of the first arriving transaction requesting node. Edge DAG blockchain network passing transitive probability PT(sv,k+1|sv,k,av,k) The state of the trading nodes on the edge DAG blockchain network is updated. At this time, the transaction response node vk+1Get a transient reward r of the trade connection tauhTo evaluate the efficiency of his confirmation of the selection action after it has been made. If the delay is short and the intelligent contract condition is met, the transaction response node firstly sends a certain bitcoin to the transaction request node vkThen vk+1Receiving vkThe transmitted model begins to be trained by using local data, and v is trained after a certain timek+1To vk+2Initiating a transaction request, vk+2Node observation vk+1Transaction status s of a nodev,k+1And performing a confirmation selection action av,k+1And the node selection process is ended until the maximum hop count is reached by selecting the coordinated transaction request nodes in the process. The goal of the reinforcement learning participants is to maximize the rewards per transaction. Therefore, in one distributed federal learning task transaction, a complete multi-hop calculation task trusted unloading path can be discovered by using reinforcement learning.
The transaction response node v can be obtained by the formula (5)k+1Obtaining an instantaneous reward r over a transaction connection at the h-th hophComprises the following steps:
rh(sv,k+1,av,k,sv,k)=-Cv,h (7)
accordingly, in a distributed federated learning task transaction, the reward of the cumulative policy π brought by the transaction response node confirmation selection can be expressed as follows:
Figure BDA0002779200750000176
where γ is the discount factor for each hop transaction, indicating the importance of the selection of the future transaction requesting node to the selection of the current transaction requesting node. Computing task offload transactions once computing task offload reaches a preset maximum number of hopsAnd (5) stopping. In the multi-hop calculation task unloading process, the reinforcement learning participant records the optimal calculation task unloading transaction path in the block
Figure BDA0002779200750000181
And the transaction response node validates the selected reward each time. From the accumulated rewards, a slave status slave s can be definedv,1And a strategy pi starting state value function:
Figure BDA0002779200750000182
the online multi-hop calculation task unloading method selects the optimal strategy pi to maximize the value function of each state, namely
Figure BDA0002779200750000183
In equation (10), the transfer probability and the reward function are used to solve for pi*(s), it is very difficult to model the transfer probabilities and reward functions accurately. In addition, changes in the transaction connection channel and the intelligent contract state are affected by the resource allocation and confidence tolerance of the edge DAG block nodes. If the transaction unloading path of the multi-hop computing task is long, the transaction state space of the edge DAG block chain node becomes complex and huge. Therefore, the online computing task unloading decision problem provided by the invention can be solved by using a model-free reinforcement learning algorithm. In the proposed method the policy vector θ is parameterized. At time t, when requesting a transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,kt=θ) (11)
in order to learn the multi-hop calculation task offloading policy parameter, a performance function defining the multi-hop calculation task offloading policy parameter θ is as follows:
Figure BDA0002779200750000184
in order to maximize the reward of the edge DAG block link points, the trading response node updates an L (theta) parameter theta by using a random gradient descent method, wherein the updating equation of the parameter theta is as follows:
Figure BDA0002779200750000185
wherein ξpThe learning rate. From the strategic gradient theory, one can obtain:
Figure BDA0002779200750000186
wherein q isπ(sv,k,av,k) Is a state-action value function of the strategy pi, and G ═ γ r12r2+.. is a discounted return cost. The parameter θ is updated using equation (14):
Figure BDA0002779200750000187
in order to further improve the learning performance, the invention uses an Actor-critic method to approximate the learning of the strategy and the value function, and updates the strategy by learning the value function and using the value function as critic. Make at state sv,The value function of k estimate is
Figure BDA0002779200750000188
Wherein
Figure BDA0002779200750000189
Is a learned parameter, therefore, the update equation for the policy parameter θ becomes:
Figure BDA0002779200750000191
wherein the estimated value function
Figure BDA0002779200750000192
Parameter (2) of
Figure BDA0002779200750000193
Is also updated as follows:
Figure BDA0002779200750000194
in which ξvIs the learning rate, iteratively updated using the square of the error, and a loss function of
Figure BDA0002779200750000195
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002779200750000196
since neural networks can approximate complex functions, the present invention uses DNN to learn the policy and value functions, thereby establishing a policy network and a value network. Therefore, under the environment of an edge DAG blockchain network, the method for selecting the trusted cooperative node in the unloading of the multi-hop computing task based on reinforcement learning is composed of two parts, as shown in FIG. 2, one part is an Actor policy network updating policy, and the other part is a Critic value network evaluation value function and updating policy.
Solving optimization strategies through model-free reinforcement learning algorithm
Figure BDA0002779200750000197
The method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kUsing the current policy network to calculate all current transaction request sectionsPoint vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
Figure BDA0002779200750000198
The current transaction request node v is calculated by adopting the current strategy networkkAnd optimization strategy of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,kt=θ)
wherein P is in state sv,kWhen, the action taken is av,kθ is a policy network parameter.
The policy network employs a DNN architecture.
The optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kAnd the transaction request node with the highest probability of 1 serves as a cooperative node and confirms the transaction request initiated by the cooperative node.
The instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
The updating experience cache specifically comprises: caching in experienceRecords the state of the transaction request node, the state of the transaction response node, the action value and the instant reward rh(sv,k+1,av,k,sv,k)=-Cv,h
Can be expressed as the following algorithm:
algorithm 1: inputting a multi-hop computing task unloading transaction node confirmation selection mechanism: edge DAG blockchain nodes, cost functions, calculation task offload transaction request nodes, maximum hop count ε, and learning rate { ξ ] of reinforcement learningpv}。
Figure BDA0002779200750000199
Figure BDA0002779200750000201
And (3) outputting: the multi-hop computing task offloads the set of transaction nodes.
(4) Learning parameters for caching update value functions based on experience
Figure BDA0002779200750000202
And a task offload parameter θ;
learning parameters of the update value function
Figure BDA0002779200750000203
The method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
Figure BDA0002779200750000204
wherein ξvIs the learning rate, the loss function is
Figure BDA0002779200750000205
Figure BDA0002779200750000206
Is the output result of the value function.
The update task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure BDA0002779200750000207
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure BDA0002779200750000208
wherein ξpFor learning rate, G ═ γ r12r2+., calculating the discount return cost according to the historical instant reward stored in the experience cache, wherein gamma is a discount factor; function of estimated value
Figure BDA0002779200750000209
The value of (A) preferably adopts a parameter
Figure BDA00027792007500002010
And (4) estimating a value network.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (11)

1. The system for selecting the trusted offload cooperative nodes in the sensing edge cloud block chain network is characterized by comprising sensing cloud edge nodes and blocks created in the edge nodes, wherein the edge nodes and the block chains form an edge DAG block chain network Gb=(Vb,Eb) In which V isbThe edge block chain nodes which participate in the unloading transaction of a computing task are used as transaction request nodes and transaction response nodes when the computing task is unloaded; ebThe transaction connection established for the h hop is tau, namely, the two parties conduct transaction according to the preset intelligent contract;
the block of the edge node stores a model which can not be changed and is used for completing a training task, training time and model size;
the blockchain network is used for executing actions according to an optimization strategy, so that the transaction response node requests nodes v ═ { v ═ from all transactionskChoose the action value of
Figure FDA0003573088640000011
The transaction request node with the highest mapping probability is used as a cooperative node to establish transaction connection, the transaction response node is used as a transaction request node of the next hop, and a model, training duration and model size of the completion of a training task of each node are recorded, and the trust of each node is updated;
the optimization strategy is obtained by solving through an enhanced learning model, and specifically is solved through a strategy network; the strategy network is used for solving an optimization strategy according to the current state of the DAG block chain network;
the inputs to the policy network are: current transaction response node vk+1All transaction request nodes observed, v ═ vkThe transaction status of } is; transaction request node vkState of(s)v,kRepresenting a transaction requesting node vkThe status of the transaction at the time of initiation of the transaction, wherein
Figure FDA0003573088640000012
Figure FDA0003573088640000013
A state of the intelligent contract is represented,
Figure FDA0003573088640000014
the representation is in compliance with a smart contract,
Figure FDA0003573088640000015
a violation of the intelligent contract is indicated,
Figure FDA0003573088640000016
indicating that in the h-hop transaction connection, the task unloading delay time is calculated
Figure FDA0003573088640000017
Whether it is short or long, when
Figure FDA0003573088640000018
If so, the delay time is long, otherwise, the delay time is short;
Figure FDA0003573088640000019
in order to calculate the task offload transmission latency,
Figure FDA00035730886400000110
offloading a transmission delay threshold for a preset computing task;
the output of the policy network is: each transaction request node vkState of(s)v,kTo action av,kIs mapped with probability P (a)t=av,k|st=sv,k,θtTheta) is an unloading strategy parameter theta of the strategy network, and an optimization strategy pi is established according to theta*(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)。
2. The system for selecting trusted offload cooperative nodes in sensor edge cloud blockchain network of claim 1, wherein the transaction request node is configured to initiate transaction requests φ to other sensor cloud edge nodesj=(Dj,Yj,Υj) When receiving the confirmation of the transaction request, updating the trust level; the above-mentionedRequest for transaction phij=(Dj,Yj,Υj) Wherein D isjIs the model size, in bits; y isjResources spent to complete the requested training task for the transaction; gamma rayjThe number of bitcoins of unit resource values of the edge block chain nodes is consumed for training the model;
the transaction response node is used for reversely judging the reliability of the transaction according to the intelligent contract when receiving the transaction request, and judging that the reliability is low and rejecting the transaction request when the transaction unloaded by the transaction request cannot realize the condition in the intelligent contract; otherwise, confirming the transaction request, and sending the transaction request confirmation and the intelligent and required number of bitcoins to the transaction request node;
the intelligent contract SC ═ { l (t) | t ∈ [ t [)min,tmax]L (t) is the probability that the model training time t expected by the transaction response node falls in a credible interval, and the higher the value l (t) is, the higher the degree of compliance of the intelligent contract for calculating task unloading transaction is; t is tminAnd tmaxThe lower limit and the upper limit of the confidence interval according to the training time.
3. The system for trusted offload collaboration node selection in a sensor edge cloud blockchain network of claim 1, wherein the policy network is a model-free reinforcement learning architecture.
4. The system for selecting trusted offload cooperative nodes in a sensor edge cloud blockchain network according to claim 1, wherein a reward function used for training the policy network is: r ish(sv,k+1,av,k,sv,k)=-Cv,h,Cv,hCost function C for the sensor cloud edge nodev,h
5. The system for selecting the trusted offload cooperative node in the sensor edge cloud block chain network according to claim 4, wherein a random gradient descent method is adopted to train and update a performance function of the offload policy parameter θ of the multi-hop calculation task, and the performance function is specifically:
Figure FDA0003573088640000031
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by adopting an equation:
Figure FDA0003573088640000032
wherein ξpFor learning rate, G ═ γ r12r2+.. cost of return on discount, r1,r2,.. historical instant rewards are read from the historical instant rewards stored in the experience cache, and gamma is a discount factor; function of estimated value
Figure FDA0003573088640000033
The value of (A) is given by the parameter
Figure FDA0003573088640000034
And (4) estimating a value network.
6. The system of claim 5, wherein the value network has an input of a transaction state of the transaction response node and an output of a value
Figure FDA0003573088640000035
With network parameters of
Figure FDA0003573088640000036
The update equation is as follows:
Figure FDA0003573088640000037
wherein ξvIs the learning rate;
the value network is iteratively updated using the square of the error using a loss function of
Figure FDA0003573088640000038
Figure FDA0003573088640000039
7. A method for selecting a trusted offload cooperative node in a sensing edge cloud block chain network is characterized by comprising the following steps:
(1) acquiring a training task gamma issued by a training task issuing noden={wnThe maximum training jump number epsilon set by the method, and a tolerance parameter lambda of delay time in the task unloading transaction processdThe credibility tolerance parameter lambda of the task intelligent contracts
(2) Taking at least epsilon +1 edge nodes as calculation task unloading transaction nodes to be selected to obtain cost function C of the calculation task unloading transaction nodesv,hAnd registering to a DAG block chain;
(3) trading state s according to each edge node in DAG block chain obtained in step (1)v,kOptimizing strategy for calculating task unloading path by adopting reinforcement learning planning
Figure FDA0003573088640000041
According to an optimization strategy
Figure FDA0003573088640000042
Set of actions to formulate offload paths
Figure FDA0003573088640000043
Figure FDA0003573088640000044
Transaction request node and transaction response node at each hopTransaction connection tau conforming to the intelligent contract is established between the points, so that a task unloading path is formed;
wherein the optimization strategy
Figure FDA0003573088640000045
Pr represents the state sv,kTo action av,kThe probability of the mapping of (a) to (b),
Figure FDA0003573088640000046
responding to a node v for trading in a computing task offload transactionk+1For transaction request node vkOptimal confirmation selection action set, action taken
Figure FDA0003573088640000047
Figure FDA0003573088640000048
Meaning that the transaction requesting node is not selected as a collaborator,
Figure FDA0003573088640000049
indicating that the transaction request node is selected as a collaborator; state-action pair { av,k|sv,kDenotes at transaction request node vkState of(s)v,kConditional transaction response node vk+1Confirmation selection action ofv,k
8. The method of claim 7, wherein the method for selecting trusted offload cooperative nodes in the sensor edge cloud blockchain network is for a transaction φj=(Dj,Yj,Υj) Wherein D isjRepresenting the size of the model in bits; y isjRepresents the resources that need to be spent in completing the training task; gamma rayjBit coins representing unit resources of the edge block chain nodes consumed by the training model; cost function C of sensing cloud edge nodev,hFor it to act as a transaction response node vk+1Selecting a transaction requesting node vkWhen, transaction connection at h hopThe calculation task of (1) offloads the transaction cost, which includes time delay and credibility tolerance, and is calculated according to the following method:
Figure FDA0003573088640000051
wherein λ isdTolerance parameter, lambda, for delay times in off-loading transactions for multi-hop computing taskssA credibility tolerance parameter of the intelligent contract;
Figure FDA0003573088640000052
in order to calculate the task offload transmission latency,
Figure FDA0003573088640000053
xk+1,krepresenting a transaction requesting node vkWhether the trained model is confirmed to be accepted and unloaded to a transaction response node vk+1Processing; if xk+1,k1 indicates that the calculation task of the transaction request node is unloaded to the transaction response node for processing, otherwise xk+1,k=0,
Figure FDA0003573088640000054
Calculating the available offload transfer rate for a task on a transaction connection tau, B denotes the bandwidth, pkRepresenting transmission power, σ2Representing the noise power; gkRepresenting a channel gain indicative of a transmission loss from the transaction requesting node to the responding node;
Figure FDA0003573088640000055
the execution time of the task in the unloading for the h-th hop, wherein LjIs the total computational load, fcThe service rate of each CPU core is a configurable variable;
Figure FDA0003573088640000056
task queue wait time for all nodes in the h-th hop transaction connection,
Figure FDA0003573088640000059
representing the number of resources required by all nodes in the h-th hop transaction connection to process tasks in the queue, fcIs the service rate of each CPU core, is a configurable variable,
Figure FDA0003573088640000057
the average arrival rate of task unloading, wherein M represents the unloading times in the h-th hop transaction connection; x is the number ofj1 indicates successful unloading, otherwise 0; i is{*}Is an indicator function, if the condition is true, then I{*}1, otherwise I{*}0, the amount of tasks z already present in the current edge DAG blockchain trading nodehService parameter is deltahPoisson distribution of, i.e.
Figure FDA0003573088640000058
Φh1-l (t), where l (t) represents the probability that the model training time t expected by the transaction response node falls within the confidence interval, and the greater the value of l (t), the greater the degree of compliance of the computational task offload transaction intelligence contract on the transaction connection.
9. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 7, wherein the objective of the reinforcement learning in the step (3) is: on the premise of meeting the delay sensitivity requirement of the calculation task unloading and complying with the intelligent contract, the transaction cost of the calculation task unloading is minimized; recording as follows:
MTOR:min Co
Figure FDA0003573088640000061
av={av,1,av,2,...,av,ε}
SC={l(t)|t∈[tmin,tmax]}
wherein C isoFor computing tasksOff-loading transaction costs, av={av,1,av,2,...,av,εIs the action set, SC ═ r (t) | t ∈ [ t }min,tmax]The intelligent contract is obtained;
the accumulated reward function adopted by the reinforcement learning is as follows:
Figure FDA0003573088640000062
wherein r ishRepresenting the instantaneous reward function of each hop, gamma being a discount factor; wherein r ish(sv,k+1,av,k,sv,k)=-Cv,h
The strategy optimization is regarded as a Markov process by adopting a greedy algorithm, and the action strategy pi of the maximum time of the instantaneous reward function of each hop is obtained*(sv,k) And recording the action strategy acquisition optimization strategy of the h hop as:
Figure FDA0003573088640000063
wherein, PTTo transmit the probability, gamma is the discount factor, V(s)v,k+1*) To obtain an optimum strategy pi*The state value function of time, defined as:
Figure FDA0003573088640000064
the updating experience cache specifically comprises the following steps: recording transaction request node status, transaction response node status, action value, instant reward r in experience cacheh(sv,k+1,av,k,sv,k)=-Cv,h
10. The method for selecting the trusted offload cooperative node in the sensor edge cloud blockchain network according to claim 8, wherein the step (3) solves the optimization strategy through a model-free reinforcement learning algorithm
Figure FDA0003573088640000065
The method comprises the following specific steps:
(3-1) initializing a task unloading parameter theta to obtain the current policy network, namely, taking the last updated task unloading parameter theta as the task unloading parameter theta of the current policy network;
(3-2) for each hop of the computation task learning, calculating the current transaction response node v of task offloadk+1Observation and collection of transaction node request points vkTransaction state s ofv,kAdopting the current strategy network to calculate all current transaction request nodes vkAnd a transaction response node vk+1Action strategy of*(sv,k) Estimating the instantaneous prize rhThereby determining action av,kTo select a transaction requesting node vkOne of the nodes is used as a cooperative node, the node is updated to a transaction response node, the experience cache is updated until the maximum hop count is reached, and a per-hop action strategy pi is obtained*(sv,k) Composition optimization strategy
Figure FDA0003573088640000071
The current transaction request node v is calculated by adopting the current policy networkkAnd optimization strategy pi of all transaction response nodes*(sv,k) The method specifically comprises the following steps:
when requesting transaction node vkIs sv,kAction a taken by the transaction response nodev,kThe probability of (c) is:
π(av,k|sv,k)=P(at=av,k|st=sv,k,θt=θ)
wherein P is in state sv,kWhen, the action taken is av,kTheta is a policy network parameter;
the optimization strategy is pi(s)v,k) The specific step of reversely selecting the cooperative node at the current hop response node is as follows: selection av,kTrade with highest probability of 1The node is solved to be used as a cooperative node, and the transaction request initiated by the node is confirmed;
the instant prizes are estimated as follows: r ish(sv,k+1,av,k,sv,k)=-Cv,h
The updating of the node as a transaction request point specifically includes: and the selected transaction request node updates the processing time of the calculation task in the node block, and takes the transaction response node as the transaction request node to carry out the next-hop transaction request.
11. The method for selecting a trusted offload cooperative node in a sensor edge cloud blockchain network as claimed in claim 8, comprising the steps of: (4) caching learning parameters of update value functions according to experience
Figure FDA0003573088640000081
And a task offload parameter θ;
learning parameters of the update value function
Figure FDA0003573088640000082
The method specifically comprises the following steps: iterative updating is performed by using the square of the error, and the updating equation is adopted as follows:
Figure FDA0003573088640000083
wherein ξvIs the learning rate, the loss function is
Figure FDA0003573088640000084
Figure FDA0003573088640000085
Is the output result of the value function;
the updating of the task unloading parameter θ specifically includes: according to data recorded by experience cache, training and updating a performance function of the multi-hop calculation task unloading strategy parameter theta by adopting a random gradient descent method, wherein the performance function is specifically as follows:
Figure FDA0003573088640000086
in order to accelerate the training speed of the strategy network, a value network is added to update a multi-hop calculation task unloading strategy parameter theta; the strategy parameter theta is updated by the equation:
Figure FDA0003573088640000087
wherein ξpFor learning rate, G ═ γ r12r2+., calculating the discount return cost according to the historical instant reward stored in the experience cache, wherein gamma is a discount factor; function of estimated value
Figure FDA0003573088640000088
The value of (A) preferably adopts a parameter
Figure FDA0003573088640000089
And (4) estimating a value network.
CN202011276468.5A 2020-11-16 2020-11-16 Credible unloading cooperative node selection system and method for sensing edge cloud block chain network Active CN112202928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011276468.5A CN112202928B (en) 2020-11-16 2020-11-16 Credible unloading cooperative node selection system and method for sensing edge cloud block chain network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011276468.5A CN112202928B (en) 2020-11-16 2020-11-16 Credible unloading cooperative node selection system and method for sensing edge cloud block chain network

Publications (2)

Publication Number Publication Date
CN112202928A CN112202928A (en) 2021-01-08
CN112202928B true CN112202928B (en) 2022-05-17

Family

ID=74033564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011276468.5A Active CN112202928B (en) 2020-11-16 2020-11-16 Credible unloading cooperative node selection system and method for sensing edge cloud block chain network

Country Status (1)

Country Link
CN (1) CN112202928B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887272B (en) * 2021-01-12 2022-06-28 绍兴文理学院 Device and method for controlling ore excavation attack surface in sensing edge cloud task unloading
CN112804107B (en) * 2021-01-28 2023-04-28 南京邮电大学 Hierarchical federal learning method for self-adaptive control of energy consumption of Internet of things equipment
CN112783662A (en) * 2021-02-18 2021-05-11 绍兴文理学院 CPU resource trusted sharing system in sensing edge cloud task unloading of integrated block chain
CN113052331A (en) * 2021-02-19 2021-06-29 北京航空航天大学 Block chain-based Internet of things personalized federal learning method
CN113222118B (en) * 2021-05-19 2022-09-09 北京百度网讯科技有限公司 Neural network training method, apparatus, electronic device, medium, and program product
CN113344255B (en) * 2021-05-21 2024-03-19 北京工业大学 Vehicle-mounted network application data transmission and charging optimization method based on mobile edge calculation and block chain
CN113419849A (en) * 2021-06-04 2021-09-21 国网河北省电力有限公司信息通信分公司 Edge computing node selection method and terminal equipment
CN113676954B (en) * 2021-07-12 2023-07-18 中山大学 Large-scale user task unloading method, device, computer equipment and storage medium
CN113537518B (en) * 2021-07-19 2022-09-30 哈尔滨工业大学 Model training method and device based on federal learning, equipment and storage medium
CN113570039B (en) * 2021-07-22 2024-02-06 同济大学 Block chain system based on reinforcement learning optimization consensus
CN113645702B (en) * 2021-07-30 2022-06-03 同济大学 Internet of things system supporting block chain and optimized by strategy gradient technology
CN113590328B (en) * 2021-08-02 2023-06-27 重庆大学 Edge computing service interaction method and system based on block chain
CN114172558B (en) * 2021-11-24 2024-01-19 上海大学 Task unloading method based on edge calculation and unmanned aerial vehicle cluster cooperation in vehicle network
CN113887748B (en) * 2021-12-07 2022-03-01 浙江师范大学 Online federal learning task allocation method and device, and federal learning method and system
CN114301911B (en) * 2021-12-17 2023-08-04 杭州谐云科技有限公司 Task management method and system based on edge-to-edge coordination
CN115022894B (en) * 2022-06-08 2023-12-19 西安交通大学 Task unloading and computing resource allocation method and system for low-orbit satellite network
CN115756873B (en) * 2022-12-15 2023-10-13 北京交通大学 Mobile edge computing and unloading method and platform based on federation reinforcement learning
CN116978509B (en) * 2023-09-22 2023-12-19 山东百康云网络科技有限公司 Electronic prescription circulation method
CN117610644B (en) * 2024-01-19 2024-04-16 南京邮电大学 Federal learning optimization method based on block chain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020044353A1 (en) * 2018-08-30 2020-03-05 Telefonaktiebolaget Lm Ericsson (Publ) System and method for collaborative task offloading automation in smart containers
CN111124531A (en) * 2019-11-25 2020-05-08 哈尔滨工业大学 Dynamic unloading method for calculation tasks based on energy consumption and delay balance in vehicle fog calculation
CN111274035A (en) * 2020-01-20 2020-06-12 长沙市源本信息科技有限公司 Resource scheduling method and device in edge computing environment and computer equipment
CN111447512A (en) * 2020-03-09 2020-07-24 重庆邮电大学 Energy-saving method for edge cloud unloading
CN111835827A (en) * 2020-06-11 2020-10-27 北京邮电大学 Internet of things edge computing task unloading method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020044353A1 (en) * 2018-08-30 2020-03-05 Telefonaktiebolaget Lm Ericsson (Publ) System and method for collaborative task offloading automation in smart containers
CN111124531A (en) * 2019-11-25 2020-05-08 哈尔滨工业大学 Dynamic unloading method for calculation tasks based on energy consumption and delay balance in vehicle fog calculation
CN111274035A (en) * 2020-01-20 2020-06-12 长沙市源本信息科技有限公司 Resource scheduling method and device in edge computing environment and computer equipment
CN111447512A (en) * 2020-03-09 2020-07-24 重庆邮电大学 Energy-saving method for edge cloud unloading
CN111835827A (en) * 2020-06-11 2020-10-27 北京邮电大学 Internet of things edge computing task unloading method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Markov Detection Tree-Based Centralized Scheme to Automatically Identify Malicious Webpages on Cloud Platforms;JIANHUA LIU,SHIGEN SHEN,MENGDA XU,XIN WANG,MINGLU LI;《IEEE Access》;20181227;第6卷;全文 *
基于车联网和移动边缘计算的时延可容忍数据传输;李萌等;《北京工业大学学报》;20180122(第04期);全文 *
社会属性感知的边缘计算任务调度策略;王汝言等;《电子与信息学报》;20200115(第01期);全文 *
边缘计算可信协同服务策略建模;乐光学等;《计算机研究与发展》;20200515(第05期);全文 *

Also Published As

Publication number Publication date
CN112202928A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN112202928B (en) Credible unloading cooperative node selection system and method for sensing edge cloud block chain network
Kang et al. Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory
CN112348204B (en) Safe sharing method for marine Internet of things data under edge computing framework based on federal learning and block chain technology
Asheralieva et al. Reputation-based coalition formation for secure self-organized and scalable sharding in iot blockchains with mobile-edge computing
Zhong et al. On designing incentive-compatible routing and forwarding protocols in wireless ad-hoc networks: an integrated approach using game theoretical and cryptographic techniques
Zou et al. Reputation-based regional federated learning for knowledge trading in blockchain-enhanced IoV
Wang et al. A novel reputation-aware client selection scheme for federated learning within mobile environments
CN113660668B (en) Seamless trusted cross-domain routing system of heterogeneous converged network and control method thereof
Kong et al. A reliable and efficient task offloading strategy based on multifeedback trust mechanism for IoT edge computing
Xu et al. Deep reinforcement learning assisted edge-terminal collaborative offloading algorithm of blockchain computing tasks for energy Internet
CN111262947A (en) Calculation-intensive data state updating implementation method based on mobile edge calculation
CN113626104B (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
Fu et al. An incentive mechanism of incorporating supervision game for federated learning in autonomous driving
CN116566838A (en) Internet of vehicles task unloading and content caching method with cooperative blockchain and edge calculation
Sethi et al. FedDOVe: A Federated Deep Q-learning-based Offloading for Vehicular fog computing
CN115034390A (en) Deep learning model reasoning acceleration method based on cloud edge-side cooperation
Lan et al. Deep reinforcement learning for computation offloading and caching in fog-based vehicular networks
CN116669111A (en) Mobile edge computing task unloading method based on blockchain
CN112783662A (en) CPU resource trusted sharing system in sensing edge cloud task unloading of integrated block chain
Raja et al. A Trusted distributed routing scheme for wireless sensor networks using block chain and jelly fish search optimizer based deep generative adversarial neural network (Deep-GANN) technique
Zhang et al. Multiaccess edge integrated networking for Internet of Vehicles: A blockchain-based deep compressed cooperative learning approach
Jain et al. Blockchain enabled trusted task offloading scheme for fog computing: A deep reinforcement learning approach
Wang et al. Eidls: An edge-intelligence-based distributed learning system over internet of things
Shaodong et al. Multi-step reinforcement learning-based offloading for vehicle edge computing
CN112910716B (en) Mobile fog calculation loss joint optimization system and method based on distributed DNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant