CN116032934A

CN116032934A - Rail transit network resource allocation method based on blockchain and edge calculation in ad hoc network scene

Info

Publication number: CN116032934A
Application number: CN202310010374.0A
Authority: CN
Inventors: 李萌; 田琳琳; 司鹏搏; 杨睿哲; 孙艳华; 孙恩昌; 张延华
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-04-28
Anticipated expiration: 2043-01-04
Also published as: CN116032934B

Abstract

The invention discloses a rail transit network resource allocation method based on block chain and edge calculation in an ad hoc network scene, which is characterized in that a multi-hop transmission model, a block chain model and an MEC server calculation model are constructed, the time delay, the economic cost and the time delay of a block chain system of multi-hop transmission of tasks between trains and the time delay and the economic cost generated by processing the tasks by the MEC server are calculated, so that the selection of an unloading route path, the unloading decision and the selection of the block size are guided and adjusted through training a deep neural network according to the system state, and the optimal resource allocation in the scene is completed. Simulation experiments show that the invention has certain advantages in the aspects of saving the system time delay and the total economic cost of the system.

Description

Rail transit network resource allocation method based on blockchain and edge calculation in ad hoc network scene

Technical Field

The invention relates to an unloading relay path planning and resource allocation decision method of an intelligent track traffic system based on block chain consensus and mobile edge calculation assistance in a multi-hop ad hoc network scene, which is a decision optimization method for optimizing the unloading path planning and resource allocation strategy of a calculation task of the intelligent track traffic system based on mobile edge calculation assistance through a deep reinforcement learning algorithm, effectively reducing system time delay and economic cost, ensuring safe storage and effective transmission of track traffic data, and belongs to the related fields of resource allocation and system decision in track traffic.

Background

Currently, with the continuous development of intelligent track systems, it is highly desirable to implement dynamic aggregation, deep mining, and efficient utilization of various application data by constructing high-performance computing capabilities. Cloud computing is widely used to solve this problem due to the limited computing power of trains. However, due to the rapid movement of trains, cloud computing architecture clearly fails to meet the real-time requirements of information processing in intelligent railway systems. In this regard, the mobile edge computing (mobile edge computing, MEC) technique is an emerging technique that addresses well the above-mentioned problems. The MEC technique performs computational tasks on edge servers close to the device, meeting sensitive latency requirements. At the same time, it brings higher quality of service to the user.

However, stations equipped with MEC servers typically have limited coverage and therefore it is impractical to consider only the case where the train is within range of the MEC server. A multihop Adhoc network is a distributed communication network without a fixed topology. The train nodes may autonomously create a wireless network for communication between trains for information and data interaction. Each train can be considered not only as a transceiver but also as a router. Therefore we consider a multi-hop Ad hoc network in combination with MEC technology. Tasks are offloaded to the MEC server via multi-hop transmission between trains, which makes the server coverage larger while meeting low latency requirements.

While the combination of multihop ad hoc networks and MEC technology in intelligent track systems may offer great advantages, it is worth considering how to effectively guarantee the security and reliability of data multihop transmissions due to the large amount of information related to track traffic involved therein. To address data security issues, emerging blockchain (blockchain) technology is introduced into the network architecture proposed by the present invention. The distributed, non-tamperable and secure features of blockchains make them naturally applicable to distributed MEC-enabled intelligent track systems with multi-hop connections.

However, joint application of multihop ad hoc networks and blockchains in MEC-enabled intelligent track systems still faces significant challenges. For example, in the case of high speed movement of trains and rapid changes in network topology, how to properly select routing paths and offloading decisions. In addition, there is a need to consider how to balance the delays and economic costs incurred by data transfer, offloading and consensus processes in MEC-based intelligent track systems. Therefore, these problems should be carefully considered in designing the system.

Meanwhile, aiming at the characteristics of high dynamic property, high dimensionality and the like of the environment state in the urban rail transit data transmission system, in recent years, deep reinforcement learning (deep reinforcement learning, DRL) gradually becomes a type of efficient optimization method for the heat gate. The agent applies actions to the environment according to a certain strategy. The environment returns to the agent instant rewards and transitions to the next state. The interaction process of the two is continued until the environment reaches a final state, and the agent continuously adjusts the strategy in the process to obtain the maximum long-term rewards. Since the DRL has been proposed, it has been widely studied for its versatility and effectiveness, and is continuously practically used in various fields. However, the related research on the intelligent rail transit system is quite deficient, and how to solve the performance optimization problem of time delay and economic cost of the system by using the DRL still needs to be carefully considered.

In summary, the invention provides an improved optimization framework for an intelligent track system based on MEC technology. In order to enable the MEC server to be used in a larger range while meeting low latency requirements, a multihop Ad hoc network is applied in the proposed network model. In addition, in order to effectively ensure the safety and reliability of multi-hop data transmission, a delegated Bayesian fault tolerance (delegated Byzantine fault tolerance, dBFT) consensus mechanism in the blockchain technology is also introduced. By jointly considering routing path selection, offloading decisions, and block size selection, the system delay and economic cost of offloading communications and computational processes are synergistically optimized.

Disclosure of Invention

The invention mainly aims to model a scene by taking the unloading transmission, consensus and time delay and economic cost in the calculation process of a calculation task in a system as optimization targets under the condition that stations with multiple trains and multiple configuration MEC servers exist in the scene in the aspect of optimal resource optimization allocation in the scene, and iteratively learn the model by applying a DRL algorithm to obtain a path planning and resource optimization allocation optimal strategy which is time-saving and low in economic cost. The method solves the problem of how to determine the optimal unloading path and the resource allocation strategy under the condition that a plurality of trains and stations with MEC servers are arranged in the scene, and effectively reduces the system time delay and the economic cost by executing the optimal unloading path and the resource allocation strategy

The station environment scene model of the multiple trains and the multiple configuration MEC servers, which is adapted by the invention, is shown in figure 1.

The flow chart of the system operation principle in the technical scheme of the invention is shown in fig. 2.

The relation diagram of the total delay and the task data volume of the system is shown in fig. 3.

The relation diagram of the total economic cost and the task data volume of the system is shown in fig. 4.

The relation between the system weighted overhead and the block interval is shown in fig. 5.

The total delay and block interval relationship diagram of the system of the present invention is shown in fig. 6.

The invention discloses a multi-train multi-configuration MEC server station environment scene model as shown in figure 1, which is a path planning and resource allocation decision method of an intelligent rail transit system based on the calculation assistance of a multi-hop ad hoc network and a block chain consensus moving edge, and is characterized in that: in a certain communication scene of intelligent rail transit, there are v trains running and stations on both sides of the track, and r MEC servers managed by different operators are deployed at the stations. When the train generates unloading requirements in the running process, the calculation tasks can be unloaded to the MEC server by utilizing the multi-hop Ad hoc network of the train workshop to the maximum extent even if the train does not enter the communication coverage range of the MEC server. In addition, k of v trains also participate in the blockchain consensus process as blockchain link points. And setting a multi-hop routing path model, a communication model, a calculation task model and a blockchain model according to the actual environment condition of the system, and determining weighting parameters of time delay and economic cost. And constructing a state space, an action space and a reward function in the DRL, setting parameters such as the size of a sample space, the sampling number and the like in a training network, carrying out iterative learning in combination with a scene model, and training parameters of a deep neural network to estimate a state action value. And finally, executing an unloading path and resource allocation optimal strategy under the guidance of a strategy network, thereby effectively reducing the time delay and economic cost for processing the real-time calculation task. The method is realized by the following steps:

when an offloading request is generated during traveling, the requesting train may offload the computing task to the MEC server using the multihop ad hoc network to the maximum extent even though it has not entered the communication coverage of the MEC server. When the last relay train receives the unloading task, a negotiation process is started, and when all the negotiation nodes successfully agree on a negotiation, the information security is ensured.

And (1) unloading the calculation task to the last hop relay train through the multi-hop self-organizing network by the train which requests to be unloaded. The method comprises the following specific steps:

computing task I _MV The set of relay trains in the offload routing path of (t) is represented as:

(except for the requesting train), task I _MV (t) the propagation delay between N-hop trains can be expressed as

wherein ,Fc_N (t) expected number of transmissions of data packets successfully delivered from source train to destination along N-hop routing path, B _MV (t) is task I _MV (t) the size of the input data required, r _s1 (t) is the transmission rate between the source train (i.e., the train requesting unloading) and the first relay train, r _(n-1)n (t) is a relay train v _n-1 and v_n Transmission rate between them.

Assume that there is a different cost of relaying per train. At task I _MV In the routing path of (t), corresponding relay vehicle sets

The relay cost (data amount per unit) sequence is { Pr ₁ (t),Pr ₂ (t),…,Pr _n (t),…,Pr _N (t) }. Therefore, the total train relay cost is calculated as:

and (2) after receiving the relay task, the last hop relay train sends the data to a blockchain system for transaction verification and consensus so as to ensure that the data is true and has no tampering. The method comprises the following specific steps:

the consensus node adopts an authorized Bayesian and busy fault tolerance (dBFT) consensus mechanism to carry out verification consensus on the block and the transaction, the number of calculation rounds required for verifying the signature is alpha, the number of calculation rounds required for generating and verifying the message verification code is beta, and the time delay of the consensus process is expressed as follows:

wherein ,S_bc (t) is the total transaction size, L is the average transaction size, f _sp (t) is the computing power of the master node, f _k (T) is the computing power of the consensus node k, T _i (T) represents a block generation interval, T _b (t) a broadcast delay between finger nodes.

And (3) unloading the calculation task of the last hop relay train to an MEC server managed by different operators through the LTE cellular network, and processing the task locally by the server. The method comprises the following specific steps:

there are R MEC servers in the communication scenario that are managed by different operators. The transmission delay generated by the last hop relay train for offloading the task to the MEC server through the LTE cellular network is expressed as:

wherein ,R_r (t) uplink transmission rate of data from the last hop relay train to the MEC server over the LTE cellular network.

The available computing resources and computing costs of each server are different in a real-time environment, the computing power and computing costs (per unit task complexity) of the MEC servers are respectively

and />

The computational delay and cost incurred at this stage are expressed as: />

And

wherein ,C_MV (t) is to implement computing task I _MV (t) number of CPU cycles required.

From this, the total system delay and total economic cost can be calculated as (including the unloading delivery process and the calculation process):

T _sum (t)＝T _tran (t)+T _cal (t)＝T _tran,V2V (t)+T _bc (t)+T _tran,V2I (t)+T _cal (t)

and

P _sum (t)＝Pr _sum (t)+Pc _sum (t)

step (4), setting a state space, an action space and a reward function in the DRL according to the steps (1) - (3) by combining the scene and the optimization target, wherein the specific steps are as follows:

step (4.1) of determining the link quality P between all trains _L (t)＝{P _ij (t) } i, j=1, 2, …, V, relay cost Pr (t) = { Pr for all trains ₁ (t),Pr ₂ (t),…,Pr _V (t) }, computing resource F (t) = { F of MEC server ₁ (t),F ₂ (t),…,F _R (t) } and the calculation cost of the server Pc (t) = { Pc ₁ (t),Pc ₂ (t),…,Pc _R (t) }, set state space:

S(t)＝{P _L (t),Pr(t),F(t),Pc(t)}

step (4.2), setting an action space according to the selection of the unloading route path, the unloading decision and the block size selection:

wherein ,

relay train in multi-hop routing path(arranged in sequential routing order), a ₀ (t) = {0,1, …, R-1} is an offloading decision-making action, S _bc (t)∈{1,2,…,S _max And the block size is adjusted.

Step (4.3), setting a reward function according to an optimization target:

constraint conditions: C1:T _sum (t)≤τ,

C2:T _bc (t)≤ε×T _i (t),

C3:S _bc (t)≤S _max ,

wherein ,w₁ and w₂ The weighting coefficients of delay and economic cost, respectively, θ is a penalty value, C1 is a time limit for completing the offloading task, where τ is the maximum allowable delay, C2 is the block completion time limit, and C3 is a limit on the total transaction size in a consensus process.

And (5) adopting a competition-based architecture Q network (dueling DQN) algorithm to solve the joint optimization problem. Setting the sample space size, the sampling number and the network layer number according to the state space, the action space and the reward function constructed in the step (4), training the deep neural network, and combining the dominance function A (s _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the Omega, ζ) and state value V(s) of the state _t The method comprises the steps of carrying out a first treatment on the surface of the ω, θ) to obtain the output state action value (Q value) of the competing framework Q network:

/>

where ω is a network convolution layer parameter, θ is a parameter of a specific full-connection layer of the state value function, and ζ is a parameter of a specific full-connection layer of the action dominance function.

Calculating a target Q value in a target network:

Q _target (s)＝r(t)+γmax _a∈A Q(s′,a′；ω',θ′,ξ)

wherein r (t) is instant prize, gamma is prize delay, max _a∈A Q (s ', a'; ω ', θ', ζ) represents the maximum Q value in the selectable actions in the next state. Training the parameter θ loss function in the deep neural network can be expressed as:

L(ω)＝E[(Q _target (s)-Q(s',a'；ω',θ',ξ')) ² ]

gradually adjusting parameters in the deep neural network to be sufficient to approximately describe the Q value by reducing the loss function;

and (6) generating Q values of all optional actions in each state according to the deep neural network trained in the step (5), and continuously executing the optimal actions in each state by taking the action with the maximum Q value as the optimal action in the state until the execution instruction is ended.

The invention has the advantages that under the communication scene with multiple trains and multiple MEC servers, the total delay and the economic expenditure of the system in the working period are effectively reduced by considering the link quality between trains, the relay cost of the trains and the calculation resource and calculation cost state of each server. The influence of a path planning and resource allocation decision method of an intelligent track traffic system based on the calculation assistance of a mobile edge commonly recognized by a multi-hop ad hoc network and a block chain on the weighting cost of system delay and economic cost in a scene is investigated through a simulation experiment.

Drawings

Fig. 1 is a schematic diagram of a communication scenario model including a plurality of trains traveling, and a station and blockchain consensus node where an MEC server for assisting calculation is disposed.

Fig. 2 is a flow chart of a path planning and resource allocation decision method design of an intelligent rail transit system based on the calculation assistance of a mobile edge commonly recognized by a multi-hop ad hoc network and a blockchain.

FIG. 3 is a graph of total delay versus amount of task data, where squares represent frames without block size selection, circles represent frames without offloading decisions, pentagons represent frames without path selection, and diamonds represent the method of the present invention.

FIG. 4 is a graph of total economic cost of a system versus amount of task data, where circles represent frames without offloading decisions, pentagons represent frames without path selection, diamonds represent the method of the present invention, and squares represent frames without block size selection.

Fig. 5 is a diagram of system weighted overhead versus block spacing, where circles represent frames without offloading decisions, squares represent frames without block size selection, pentagons represent frames without path selection, and diamonds represent the method of the present invention.

Fig. 6 is a graph of total delay versus block spacing, where squares represent frames without block size selection, circles represent frames without offloading decisions, pentagons represent frames without path selection, and diamonds represent the method of the present invention.

Detailed Description

The following is a further explanation of the technical scheme of the path planning and resource allocation decision method of the intelligent rail transit system based on the calculation assistance of the mobile edge of the multi-hop ad hoc network and the block chain consensus with reference to the accompanying drawings and examples.

The flow chart of the method of the invention is shown in fig. 2, and comprises the following steps:

step one, initializing a system, namely setting the number of trains, the number of common nodes of a block chain system, the number of MEC servers used for auxiliary calculation, the train relay cost, the auxiliary calculation resources of the MEC servers, the cost and the like;

step two, calculating the time delay T of multi-hop transmission of tasks between trains according to actual conditions _tran,V2V (t) and economic cost Pr _sum (T), time delay T of block chain system _bc (T) transmission delay T generated by offloading calculation tasks to MEC server _tran,V2I (T), time delay T generated by MEC server processing task _cal (t) economic cost Pc _sum (t)；

Step three, setting a state space s (t), an action space a (t) and a reward function r (t) of a DRL algorithm according to an optimization target;

setting the layer number of the deep neural network, the size of a sample space and the size of a sampling batch;

training a deep neural network, and iterating a Q value in a strategy network;

and step six, selecting an optimal action according to the corresponding action Q value under each state, and obtaining the maximum benefit.

FIG. 3 is a graph of total system delay versus amount of task data. As can be seen from fig. 3, the method of the present invention is capable of processing the total duration of the task in the scene under the condition of different task data amounts. When the task data size is 600KB, the total system delay corresponding to the method of the invention is only 0.38s, and the minimum system delay of the rest methods is 0.395s, which is up to 0.475s. It can be obtained that the total delay under all schemes increases with the increase of the task data size, but the total delay of the system optimized based on the method of the invention is always lower than that based on other methods.

FIG. 4 is a graph of total economic cost of a system versus amount of task data. As can be seen from fig. 4, when the data volume of the task increases, the total economic cost of all the schemes also increases, and the total economic cost of the system corresponding to the method of the present invention is always lower than that of the other methods. When the task data size is 600KB, the total economic cost of the system corresponding to the method is only 260, and the total economic cost of the system corresponding to the method without unloading decision is as high as 375.

Fig. 5 and 6 are graphs of system weighted overhead versus block interval, and system total delay versus block interval, respectively. As can be seen from fig. 5 and fig. 6, under the condition of the same block interval size, the total system delay and the system weighting overhead corresponding to the method of the present invention are always lower than those of the other methods. For example, when the block is generated as 0.2s, the method can reduce the system weighting cost to 95, and the system weighting cost optimized by the rest methods is higher than 100.

Claims

1. The track traffic network resource allocation method based on block chain and edge calculation in the ad hoc network scene is characterized by comprising the following steps of: under the intelligent rail traffic communication scene, there are v trains running and stations on two sides of the rail, and r MEC servers managed by different suppliers are deployed at the stations; when the train generates unloading requirements in the running process, even if the train does not enter the communication coverage area of the MEC server, the calculation task can be unloaded to the MEC server by utilizing the multi-hop Ad hoc network of the train workshop to the maximum extent; k of v trains also participate in the blockchain consensus process as blockchain link points; setting a multi-hop routing path model, a communication model, a calculation task model and a blockchain model according to actual environment conditions, and determining weighting parameters of time delay and economic cost; then constructing a state space, an action space and a reward function in the DRL, setting the size and the sampling number of a sample space in a training network, carrying out iterative learning in combination with a scene model, and training parameters of a deep neural network to estimate a state action value; and finally, executing an unloading path and resource allocation optimal strategy under the guidance of a strategy network, thereby effectively reducing the time delay and economic cost for processing the real-time calculation task.

2. The method for allocating the rail transit network resources based on blockchain and edge calculation in the ad hoc network scene according to claim 1, wherein the method is characterized by comprising the following steps:

when an unloading requirement is generated in the driving process, the train is requested to furthest utilize the multi-hop self-organizing network to unload the calculation task to the MEC server even if the calculation task does not enter the communication coverage of the MEC server yet; when the last relay train receives the unloading task, a negotiation process is started, and when all the negotiation nodes successfully agree on a negotiation, the information security is ensured; the method for distributing the rail transit network resources is realized by the following steps:

step (1), unloading a calculation task to a last hop relay train through a multi-hop self-organizing network by a train which is requested to be unloaded; the method comprises the following specific steps:

computing task I _MV Relay train set in offload routing path of (t)

Expressed as: />

Task I _MV (t) transfer delay between N-hop trains is expressed as

wherein ,Fc_N (t) expected number of transmissions of data packets successfully delivered from source train to destination along N-hop routing path, B _MV (t) is task I _MV (t) the size of the input data required, r _s1 (t) is the transmission rate between the source train, i.e., the train requesting unloading, and the first relay train, r _(n-1)n (t) is a relay train v _n-1 and v_n A transmission rate therebetween;

assume that there is a different cost of relaying for each train; at task I _MV In the routing path of (t), corresponding relay vehicle sets

The relay cost sequence is { Pr ₁ (t),Pr ₂ (t),…,Pr _n (t),…,Pr _N (t) }; therefore, the total train relay cost is calculated as:

step (2), after receiving a relay task, the last hop relay train sends data to a blockchain system for transaction verification and consensus so as to ensure that the data is true and has no tampering; the method comprises the following specific steps:

the consensus node adopts an authorized Bayesian and busy tolerance consensus mechanism to carry out verification consensus on the block and the transaction, the number of calculation rounds required by verifying the signature is alpha, the number of calculation rounds required by generating and verifying the message verification code is beta, and the time delay of the consensus process is expressed as follows:

wherein ,S_bc (t) is the total transaction size, L is the average transaction size, f _sp (t) is the computing power of the master node, f _k (t) is the computing power of consensus node k; t (T) _i (T) represents a block generation interval, T _b (t) a broadcast delay between finger nodes;

step (3), the last hop relay train offloads the calculation task to MEC servers managed by different suppliers through an LTE cellular network, and the servers locally process the task; the method comprises the following specific steps:

there are R MEC servers in the communication scenario managed by different vendors; the transmission delay generated by the last hop relay train for offloading the task to the MEC server through the LTE cellular network is expressed as:

wherein ,R_r (t) an uplink transmission rate of data from the last hop relay train to the MEC server over the LTE cellular network;

the available computing resources and computing costs of each server are different in a real-time environment, the computing power and computing costs of the MEC servers are respectively

and />

The resulting computational delays and costs are expressed as:

and

wherein ,C_MV (t) is to implement computing task I _MV (t) the number of CPU cycles required;

the total delay and the total economic cost of the system are calculated as follows:

and

P _sum (t)＝Pr _sum (t)+Pc _sum (t)

S(t)＝{P _L (t),Pr(t),F(t),Pc(t)}

wherein ,

a is a set of relay trains in a multi-hop routing path ₀ (t) = {0,1, …, R-1} is an offloading decision-making action, S _bc (t)∈{1,2,…,S _max Is an adjustable block rulerCun action;

step (4.3), setting a reward function according to an optimization target:

constraint conditions: C1:T _sum (t)≤τ,

C2:T _bc (t)≤ε×T _i (t),

C3:S _bc (t)≤S _max ,

wherein ,w₁ and w₂ Weighting coefficients for delay and economic cost, respectively, θ being a penalty value, C1 being a time limit for completing the offloading task, where τ is the maximum allowable delay; c2 represents a block completion time limit, and C3 represents a limit of the total transaction size in the consensus process;

step (5), adopting a competition-framework-based Q network algorithm to solve the joint optimization problem; setting the sample space size, the sampling number and the network layer number according to the state space, the action space and the reward function constructed in the step (4), training the deep neural network, and combining the dominance function A (s _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the Omega, ζ) and state value V(s) of the state _t The method comprises the steps of carrying out a first treatment on the surface of the ω, θ) to obtain the output state action value of the competing skeleton Q network, i.e., Q value:

wherein ω is a network convolution layer parameter, θ is a parameter of a specific full-connection layer of the state value function, and ζ is a parameter of a specific full-connection layer of the action dominance function;

calculating a target Q value in a target network:

Q _target (s)＝r(t)+γmax _a∈A Q(s′,a′；ω',θ′,ξ)

wherein r (t) is instant prize, gamma is prize delay, max _a∈A Q (s ', a'; ω ', θ', ζ) represents an optional in the next stateMaximum Q value in motion; the parameter θ loss function in the deep neural network is trained, expressed as:

L(ω)＝E[(Q _target (s)-Q(s',a'；ω',θ',ξ')) ² ]