CN116980424A

CN116980424A - Vehicle digital twin body edge deployment method used in Internet of vehicles scene

Info

Publication number: CN116980424A
Application number: CN202311113123.1A
Authority: CN
Inventors: 唐伦; 成章超; 戴军; 陈前斌
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2023-08-30
Filing date: 2023-08-30
Publication date: 2023-10-31

Abstract

The invention relates to a vehicle digital twin body edge deployment method used in a scene of the internet of vehicles, belonging to the technical field of mobile communication. The method comprises the following steps: constructing a digital twin-driven intelligent traffic vehicle-connected network; the base station is responsible for determining the deployment position of each vehicle digital twin body at the edge side in the service range; the vehicle digital twin body is initialized and deployed on an edge server nearest to the current access base station; the base station forwards the state data uploaded by the vehicle in real time to an edge server where the twin body is located for digital twin synchronization, and the data is provided for a cloud application layer after being processed; calculating cloud average information age and twin migration cost in the current deployment mode; and establishing an objective function for minimizing the cloud average information age and migration cost, and solving an optimal vehicle digital twin deployment scheme by adopting a multi-agent reinforcement learning algorithm. According to the invention, the cloud average information age and the migration cost of the twin are reduced, and the deployment efficiency of the digital twin of the vehicle is improved.

Description

Vehicle digital twin body edge deployment method used in Internet of vehicles scene

Technical Field

The invention belongs to the technical field of mobile communication, and relates to a vehicle digital twin body edge deployment method used in a scene of internet of vehicles.

Background

The current intelligent traffic system is relatively behind the perception capability and the big data analysis processing capability of the real traffic environment, and is not enough to support the development of emerging traffic application with higher requirements on the accuracy, the comprehensiveness and the real-time performance of traffic information in the B5G/6G era.

The digital twin technology is to reproduce the objects in the physical space to the twin space in a full space-time consistent way, and realize the research and control of the internal objects by observing, analyzing, deducting and operating the digital twin body. The digital twin technology is utilized to construct a digital model of the whole road traffic scene, so that more advanced technical support is provided for the intellectualization and the digitalization of traffic, and the efficiency and the safety of road traffic are improved.

Mobile edge computing techniques reduce system latency and network transmission pressure by performing computations at the network edge where deployment of digital twins can ensure low latency interactions with their corresponding object entities. However, the high speed mobility of the vehicle, the low latency requirements of digital twinning synchronization, and the limited heterogeneous resources of edge servers, etc., all present significant challenges to the dynamic deployment of the vehicle digital twins.

Disclosure of Invention

In view of the above, the invention aims to provide a vehicle digital twin body edge deployment method for a scene of the internet of vehicles, which realizes optimal deployment of the vehicle digital twin body edge and minimizes cloud average information age and twin body average migration cost.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a vehicle digital twin body edge deployment method for an Internet of vehicles scene specifically comprises the following steps:

s1: the intelligent traffic vehicle-connected network driven by the digital twinning is constructed, and the intelligent traffic vehicle-connected network comprises a physical terminal layer, an edge twinning layer and a cloud application layer;

s2: the base station is responsible for determining the deployment position of each vehicle digital twin body at the edge side in the service range; the vehicle digital twin body is initialized and deployed on an edge server nearest to the current access base station;

s3: the base station forwards the state data uploaded by the vehicle in real time to an edge server where the twin body is located to perform a digital twin synchronization process, and the data is provided to a cloud application layer after being processed;

s4: calculating cloud average information age and twin migration cost in the current deployment mode;

s5: establishing an objective function for minimizing the age and migration cost of cloud average information;

s6: and converting the objective function into a Markov decision process which can be observed by the multi-agent part, and solving an optimal vehicle digital twin deployment scheme by adopting an Actor-Critic-based multi-agent deep reinforcement learning method.

Further, in step S1, the physical terminal layer is composed of mobile vehicle nodes with limited computing and storage resources, the set of vehicle nodes is denoted v= {1,2, …, I, …, I }; the vehicle nodes sense surrounding environment information and vehicle state data in real time through the vehicle-mounted sensors, and the sensing data are uploaded to the edge twin layer in real time by utilizing a wireless communication technology, so that data support is provided for the construction of the digital twin body;

the edge twin layer consists of base stations for providing access service and edge servers for providing calculation and storage service, and each edge server is associated with any one base station; the base station set is denoted bs= {1,2, …, J, …, J }, the edge server set is denoted es= {1,2, …, K, …, K };

the cloud application layer comprises a digital twinning-based intelligent traffic management platform, acquires and stores full elements and full life cycles of vehicle state data, road environment data and the like in a real traffic scene, and builds a digital model of the whole road traffic scene so as to support the operation of the cloud application; also included are a number of digital twinning-based traffic application servers that perform efficient verification and low cost trial-and-error of innovative technologies on digital twinning platforms.

Further, in step S3, the digital twin synchronization process follows a timeout retransmission synchronization mechanism, specifically: the uploading time interval of the vehicle state data is dynamically determined according to the uploading time delay of the previous data packet and the processing time delay of the edge server; in a primary synchronization process, after the edge server processes the state data uploaded by the vehicle node, returning an ACK confirmation message, and uploading a next state data packet after the vehicle receives the confirmation message; and setting a digital twin synchronization delay threshold value, and if the vehicle does not receive the ACK confirmation information yet when the synchronization time interval exceeds the threshold value, retransmitting overtime to ensure the consistency and instantaneity of twin synchronization.

Further, in step S4, the cloud average information age refers to an average value of all vehicle data information ages in the current environment, and the specific calculation method is as follows: definition z= [ Z _i,k ] _I×K For the DT deployment matrix, DT represents the vehicle twins, and z is when the digital twins corresponding to vehicle i are deployed on the edge server k _i,k =1, otherwise z _i,k =0; let it be assumed that at t ₀ At moment, if the cloud just receives DT information about the vehicle i uploaded by the edge server k, from t ₀ The cloud end is related to DT before the next data update from the moment _i Information age delta of (2) _i (t) writable:

wherein ,representing the synchronisation delay of vehicle i with the vehicle twin, i.e. vehicle iTransmission delay of state data up-transmission to edge server k, T _k,Cloud The transmission delay of forwarding the state data of the vehicle i to the Cloud by the edge server k is represented;

defining average information age during vehicle i synchronizationThe method comprises the following steps:

cloud acquires average information ages of all vehiclesCan be expressed as:

further, in step S4, the twin migration cost includes a tunneling cost and a transmission cost, and the specific calculation method is as follows: digital twin DT corresponding to vehicle i _i Deployment location from edge server k ₁ Migration to k ₂ Migration cost at the timeExpressed as:

wherein ,c^ou Representing the cost of opening up c ^mig Representing the unit migration cost, i.e. the cost of transmitting a unit amount of data within a unit distance; DT (DT) _i ^Disk Representing the storage resources required to maintain the corresponding digital twin of vehicle i, dis (k) ₁ ,k ₂ ) Representing edge server k ₁ With edge server k ₂ A distance therebetween;

definition Z '= [ Z ]' _i ] _I×1 Is a DT migration matrix; for vehicle i, using the binary variable z' _i To indicate whether the digital twin of the vehicle i has migrated at a certain moment; when the deployment position of the time slot is the same as the last time slot, the migration process is not triggered, z' _i =0, otherwise z' _i =1; average twin migration Cost of the whole system ^mig Can be expressed as:

further, in step S5, the objective function established to minimize the age and migration cost of the cloud average information is expressed as:

s.t.C1:

C2:

C3:

C4:

C5:

C6:z _i,k ∈{0,1}

C7:β ₁ ,β ₂ ∈(0,1),β ₁ +β ₂ ＝1

wherein Z= [ Z ] _i,k ] _I×K Representing a vehicle twinned deployment matrix，Representing average information age, cost of all vehicles acquired by cloud ^mig Represents the average twin migration cost, DT _i ^CPU and DT_i ^Disk Respectively indicate maintenance DT _i CPU computing resources and disk storage resources required, < >> and />Respectively representing the total CPU computing resource and disk storage resource which are equipped by the edge server k; in the constraint condition, C1 ensures that each vehicle twin can be deployed on only one edge server at the same time; the C2-C3 ensures that the computing resources and the storage resources of any one edge server are not exhausted when the vehicle digital twin is deployed; C4-C5 ensures that the requirement of the information age of the edge end is met in the DT synchronization process>And information age demand of cloud-> wherein ,/>Representing the time of one synchronization of the twin, delta _m Representing the maximum age of cloud information in synchronization; c6 illustrates the deployment of variable z for a vehicle twin _i,k As a binary variable, z when the digital twin of vehicle i is deployed on edge server k _i,k =1, otherwise z _i,k =0; beta in C7 ₁ and β₂ And (5) representing weight factors, namely weight factors of cloud average information age and twin migration cost.

Further, in step S6, the objective function is converted into a markov decision process observable by the multi-agent portion, which specifically includes:

(1) Global state spaceReal-time state information of all vehicles, deployment positions of twin bodies of all vehicles, subchannel occupation conditions of all base stations, vehicle-related information, resource use conditions of all edge servers and information ages of cloud terminals about all vehicles;

(2) Local state space for agent jReal-time state information of a vehicle currently associated with the base station, deployment positions of twin bodies of the vehicle, information age indexes, positions of all edge servers and residual resource information;

(3) Action spaceA digital twin deployment strategy Z of all vehicles within the coverage area of each base station;

(4) RewardsAssociating the inverse of the weighted sum of cloud average information age and average twin migration cost for all vehicles with the base station;

rewarding agent j for t time slot _t ^j The definition is as follows:

wherein the reward consists of two parts, the first part isI.e., an average AoI indicator for all vehicles associated with base station j; the second part is->I.e. with the baseThe migration costs of all vehicles associated with station j; the total prize for agent j is by a factor beta ₁ and β₂ The two parts are respectively weighted and summed to obtain the final product.

Further, in step S6, an Actor-Critic-based multi-agent deep reinforcement learning method is adopted to solve an optimal vehicle digital twin deployment scheme, which specifically includes: the method adopts an Actor-Critic based centralized training-distributed execution deep reinforcement learning framework, and the strategy of each agent is represented by a deep neural network, namely an Actor network, which is marked as: wherein ,π^j Strategy for agent j, θ ^j Is a network parameter of a deep neural network, a ^j Is the action space s of the intelligent agent j ^j Is the observable state of agent j; setting a virtual central node to deploy DNN with a parameter phi, namely a Critic network;

and calculating a loss function, updating network parameters by combining with Adam, training until all the Actor networks converge or reach the maximum training round, and finally obtaining the optimal vehicle digital twin deployment strategy.

Further, in step S6, the algorithm training process specifically includes the following steps:

s61: initializing parameters of each Actor network and Critic network, and initializing an experience pool;

s62: circularly executing the data sampling process, wherein the intelligent agent j acquires the observation stateThen, executing the action according to the strategyObtain rewards r _t ⁱ Acquire the next time slot status +.>Placing the sample into an experience pool>

S63: calculating an estimate of the dominance functionCalculating the target value V _j (s _t )；

S64: for each agent j, a sample is taken from the experience poolAnd calculating an Actor network loss function and updating Actor network parameters, and calculating a Critic network loss function and updating Critic network parameters. The loop is executed until the bonus function converges or a maximum training round is reached.

Further, in step S6, a loss function is calculated and network parameters are updated in combination with Adam, having the steps of: the Actor network updates network parameters through a strategy loss function, and an Actor network loss function L corresponding to an intelligent agent j _Actor (θ ^j ) The calculated expression of (2) is:

wherein , and />Representing the old policy and the current new policy, a, of agent j, respectively _t and s_t Execution action and observation state at time t respectively, +.> and />The execution action and the observation state of the agent j at the time t are respectively; clip () is a truncated function, ensuring that moreThe gap between the new and old parameters is not too large; epsilon is a superparameter used to set the limit range of clip () functions; />Is a dominance function->Is obtained by generalized dominance estimation, and is specifically defined as:

wherein ,δ_t ＝r _t +γV _φ (s _t+1 )-V _φ (s _t )，δ _t Is the time sequence difference error of the time t, r _t For rewards at time t, gamma is a discount factor for measuring importance of future rewards, V _φ (s _t ) T is a sufficiently large time, lambda E [0,1]Is a hyper-parameter for balancing bias and variance;

actor network parameter theta corresponding to intelligent agent j ^j Iterative updating by random gradient ascent:

wherein ,α_Actor Is the learning rate of the Actor network,regarding network parameters θ for an Actor loss function ^j Is a partial derivative of (2);

the Critic network updates network parameters through a global state value loss function; critic network loss function L _Critic The calculation expression of (phi) is:

wherein ,V_φ (s _t ) Output of state cost function representing Critic network estimation, V _j (s _t ) Representing a target value;

critic network parameters phi are iteratively updated by random gradient descent:

wherein ,α_Critic Is the learning rate of the Critic network,is the partial derivative of the Critic loss function with respect to the network parameter phi.

Further, in step S6, a merit function is defined asFor evaluating how well an action is taken relative to an average level in a given state; defining a global action cost function as wherein U_t Reporting for accumulated discounts; global state cost function is

The invention has the beneficial effects that: according to the invention, the reasonable deployment of the vehicle digital twin is performed under the condition of being limited by the calculation and storage resources of the edge server, so that the time delay influence of the deployment position of the vehicle digital twin on the digital twin synchronization process is fully considered, the migration cost of the twin is reduced while the mean information age of the cloud is minimized, and the deployment efficiency of the vehicle digital twin on the edge side is improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a diagram of a digital twin-driven intelligent traffic vehicle network architecture constructed in accordance with the present invention;

FIG. 2 is a training flow diagram of the vehicle digital twin body edge deployment method for use in a vehicle networking scenario of the present invention;

FIG. 3 is a diagram of a vehicle digital twin synchronization process and cloud information age description;

FIG. 4 is a diagram of a multi-agent deep reinforcement learning framework based on Actor-Critic of the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Referring to fig. 1 to 4, the present invention constructs a digital twin-driven intelligent communication vehicle networking network, and the architecture diagram is shown in fig. 1, and the present invention provides a vehicle digital twin-body edge deployment method for use in a vehicle networking scene, and the flow is shown in fig. 2, and the implementation method specifically includes the following steps:

s1: a digital twin-driven intelligent traffic vehicle network architecture is constructed, which comprises three layers: a physical terminal layer, an edge twin layer and a cloud application layer.

The physical terminal layer contains numerous mobile vehicle nodes with limited computing and storage resources, denoted v= {1,2, …, I }. The edge twin layer is composed of a plurality of base stations and edge servers, wherein the base stations and the edge servers are respectively recorded as BS= {1,2, …, J } and ES= {1,2, …, K }, and the edge servers provide calculation and storage services for maintaining the digital twin bodies of the vehicles. Each edge server is associated with any one base station. The vehicle nodes upload the state data to the edge twin layer in real time by utilizing a wireless communication technology, and data support is provided for the construction of the digital twin. The cloud application layer comprises a core cloud digital twin platform, and a large number of intelligent traffic applications such as path planning, automatic driving, traffic flow prediction, high-precision positioning and the like are maintained, and the cloud digital twin platform provides real data support for the intelligent traffic applications and provides high-efficiency verification and low-cost trial-and-error of innovation technology on the twin platform.

S2: when a vehicle digital twin deployment strategy is formulated, the base stations in the edge twin layer are taken as targets. Each base station is responsible for determining the deployment position of each vehicle digital twin body on the edge side in the coverage range of the base station. Each vehicle digital twin is initially deployed on the edge server nearest the current access base station.

In step S2, each base station provides access service for vehicles, the coverage range is 500-600 meters, and each base station is responsible for determining the deployment position of each vehicle digital twin body on the edge side in the coverage range. After all vehicles are considered to finish service each time, the digital twin related data maintained in the edge server is automatically uploaded to the cloud for storage, and then the computing and storage resources of the edge server are released. When the vehicle is accessed to the network again, the vehicle digital twin is initialized and deployed on an edge server closest to the current access base station, and the edge server directly calls part of related data by the cloud, and initializes the digital twin.

S3: the vehicle digital twin in the edge server needs to be in state synchronization with a real vehicle in a physical terminal layer, the base station forwards state data uploaded by the vehicle in real time to the edge server where the twin is located, and the edge server processes the twin data and forwards the twin data to a cloud application layer so as to support a large number of digital twin-based traffic applications in the cloud.

In step S3, the digital twin synchronization specifically includes the following steps:

s31: the communication system between the vehicle and the base station is a wireless communication system. Let the emission power of the vehicle be p _i The channel bandwidth is B, the wireless channel gain is h, and the power spectral density of the Gaussian white noise of the channel is N ₀ The distance between the vehicle i and the base station j is dis (i, j), the path loss factor is gamma, and the uplink transmission rate achievable between the vehicle i and the base station j is:

at a certain moment, the size of the transmission data quantity of the vehicle i is D _i To base station j, the transmission delay is:

in this embodiment, the range of the emission power of the vehicle is [500,600] mW, the bandwidth of the wireless channel is 50MHz, the gain of the wireless channel is-30 dB/m, and the spectral density of the noise power is-127 dBm/Hz.

S32: if the digital twin body of the vehicle i is maintained in the edge server k, the wired transmission delay of the base station j for uploading the data to the edge server k is as follows:

wherein, ψ is a wire transmission unit delay, that is, a transmission delay required for transmitting a unit size data amount per unit distance, dis (j, k) is a distance between the base station j and the edge server k. In this embodiment, the delay value of the cable transmission unit is 1×10 ^-12 。

S33: after receiving the state data uploaded by the vehicle node i, the edge server k needs to perform calculation processing such as compression, aggregation, optimization, caching and the like on the original data, wherein the processing time is as follows:

wherein ,representing the processing time of the edge server k to process the data uploaded by the vehicle i, f (D _i ) Indicating that the processed data size is D _i Calculated amount, DT of time _i ^CPU Computing resources required for maintaining the digital twins of the vehicle i. In this embodiment, the computing resource DT required by a digital twin _i ^CPU And storage resource DT _i ^DISK The value ranges are respectively [1,2 ]]GHz and [2,3 ]]GB。

S34: after the data is processed, the edge server k returns an ACK acknowledgement to the vehicle node i, and the transmission delay is negligible due to the small data size of the ACK acknowledgement. Therefore, when the DT corresponding to vehicle i _i When deployed at edge server k, the single synchronization time is noted as:

s35: the edge server k forwards the state data of the vehicle i to the Cloud while processing the data, and the transmission delay is recorded as:

T _k,Cloud ＝ψ·D _i ·dis(k,Cloud)

s4: in each digital twin synchronization process, the base station records the synchronization time delay of the vehicle in the coverage area, the information age of the cloud with respect to the vehicle state data and the twin migration cost in the current deployment mode.

In step S4, the specific calculation step of the cloud end for the average information age of all vehicles is as follows:

s41: let it be assumed that at t ₀ At moment, if the cloud just receives DT information about the vehicle i uploaded by the edge server k, from t ₀ The cloud end is related to DT before the next data update from the moment _i The information age of (c) can be written as:

s42: the average information age during vehicle synchronization is:

s43: the average information age of all vehicles acquired by the cloud can be expressed as:

in step S4, the twin migration cost calculation step is as follows:

s44: digital twin DT corresponding to vehicle i _i The migration policy at a certain moment changes, assuming that it is from edge server k ₁ Migration to edge Server k ₂ . Then first target edge server k ₂ The cost of this process, which requires a new memory space to be opened up for maintaining the twin and for initializing, becomes the opening up cost, and is noted as: c ^ou . In this embodiment, the cost of development is 1×10 ^-5 。

S45: original edge server k ₁ All twin data to be migrated is transmitted to the new edge server k ₂ This process generates a transmission cost, denoted as: c ^mig ·DT _i ^Disk ·dis(k ₁ ,k ₂). wherein ,c^mig Representing the cost per unit migration, i.e., the cost per unit amount of data transferred per unit distance. DT (DT) _i ^Disk Representation of twin DT _i The total data size involved. In this embodiment, unitsMigration cost is 1×10 ^-10 。

S46: digital twin DT corresponding to vehicle i _i Deployment location from edge server k ₁ Migration to k ₂ When the total migration cost is expressed as:

s47: definition Z '= [ Z ]' _i ] _I×1 Is a DT migration matrix. For vehicle i, using the binary variable z' _i To indicate whether the digital twin of the vehicle i has migrated at a certain moment. When the deployment position of the time slot is the same as the last time slot, the migration process is not triggered, z' _i =0, otherwise z' _i ＝1。

S48: the migration cost of the whole system is as follows:

s5: a vehicle digital twin deployment strategy optimization model is established to minimize cloud average information age and migration cost.

When the problem of deployment of the digital twin of the vehicle is solved, the real-time targets of the cloud to the state information of the vehicle nodes in the real traffic environment are considered, and the freshness of the real vehicle state data received by the cloud is measured by the information age index. Meanwhile, considering that the vehicle nodes have high mobility, the deployment positions of the digital twin bodies of the vehicles can be continuously adjusted in the moving process of the vehicles, and the moving process of the digital twin bodies among the edge servers is triggered, so that corresponding moving cost is brought. Therefore, the cloud information age target is optimized, and meanwhile, the migration cost is balanced, and a large number of digital twin migration processes are prevented from being triggered. The optimization model of step S5 is specifically expressed as:

s.t.C1:

C2:

C3:

C4:

C5:

C6:z _i,k ∈{0,1}

C7:β ₁ ,β ₂ ∈(0,1),β ₁ +β ₂ ＝1

wherein Z= [ Z ] _i,k ] _I×K Representing a vehicle twinhull deployment matrix,represents the mean information age of the cloud, cost ^mig Representing the average twin migration cost. In the constraint condition, C1 ensures that each vehicle twin can only be deployed on one only edge server at the same time. The C2-C3 ensures that the computing resource and the storage resource of any one edge server are not exhausted when the vehicle digital twin body is deployed, and in the embodiment, the value range of the computing resource and the storage resource of each edge server is [80,90 ]]GHz and [2,3 ]]TB. And C4-C5 ensures that the information age requirements of the edge end and the cloud end are simultaneously met in the DT synchronization process. C6 illustrates the deployment of variable z for a vehicle twin _i,k As a binary variable, z when the digital twin of vehicle i is deployed on edge server k _i,k =1, otherwise z _i,k =0. Beta in C7 ₁ and β₂ Representing weight factors, i.e. cloudWeight coefficient of average information age and twin migration cost, β in this embodiment ₁ and β₂ The values of (2) are respectively 0.6 and 0.4.

S6: the optimization model is converted into a Markov decision process which can be observed by a multi-agent part, and a multi-agent deep reinforcement learning method based on an Actor-Critic is adopted for solving, so that an optimal deployment scheme of the digital twin of the vehicle is obtained.

In step S6, the optimization model is first converted into a multi-agent partially observable markov decision process, where the partially observable markov decision process mainly includes four parts, and is defined in detail as:

(1) Global state spaceReal-time state information of all vehicles, twins deployment positions of all vehicles, subchannel occupation conditions of all base stations, vehicle-related information, resource use conditions of all edge servers and information ages of cloud terminals on all vehicles.

(2) Local state space for agent jThe real-time state information of the vehicle related to the base station, the deployment position of the twin body, the information age index, the positions of all edge servers and the residual resource information.

(3) Action spaceA digital twin deployment strategy Z for all vehicles within the coverage area of each base station.

(4) RewardsThe inverse of the weighted sum of the cloud average information age and average twin migration cost for all vehicles associated with the base station. Rewarding agent j for t time slot _t ^j Is defined as

After conversion to a multi-agent partially observable Markov decision process, a centralized training-distributed execution deep reinforcement learning framework based on Actor-Critic is employed as shown in FIG. 4.

In step S6, the policy pi of each agent j is calculated ^j Represented by a deep neural network, the Actor network, is denoted as:a virtual central node is provided to deploy a DNN with a parameter phi, i.e. a Critic network.

The algorithm training process in step S6 specifically includes the following steps:

S63: obtaining a dominance function estimated value through generalized dominance estimationThe specific calculation method comprises the following steps:

in the formula ,δ_t ＝r _t +γV _φ (s _t+1 )-V _φ (s _t ) In this embodiment, the discount factor γ is 0.9, and the super parameter λ is 0.95. And calculates Critic network target value V _j (s _t )。

S64: for each agent j, a sample is taken from the experience poolAnd calculating an Actor network loss function and updating Actor network parameters, wherein the Actor network loss function corresponding to the intelligent agent j is calculated as follows:

in the formula , and />Representing the old policy and the current new policy of agent j, respectively. clip () is a truncated function that ensures that the gap between the updated parameter and the old parameter is not too large. Epsilon is a super parameter used to set the limit range of clip () functions. In this example, ε was 0.2. Actor network parameter theta corresponding to intelligent agent j ^j Iterative updating by random gradient ascent:

wherein ,α_Actor Is the learning rate of the Actor network,regarding network parameters θ for an Actor loss function ^j Is a partial derivative of (c).

S65: and calculating the Critic network loss function and updating Critic network parameters. The Critic network updates the network parameters through a global state cost loss function. The Critic network loss function is calculated as:

wherein ,V_φ (s _t ) Output of state cost function representing Critic network estimation, V _j (s _t ) Representing the target value.

wherein ,α_Critic Is the learning rate of the Critic network,is the partial derivative of the Critic loss function with respect to the network parameter phi. .

The training process is cyclically performed until the bonus function converges or a maximum training round is reached.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The vehicle digital twin body edge deployment method for the Internet of vehicles scene is characterized by comprising the following steps of:

2. The vehicle digital twin body edge deployment method according to claim 1, wherein in step S1, the physical terminal layer consists of mobile vehicle nodes with limited computing and storage resources, the vehicle node set being denoted v= {1,2, …, I, …, I }; the vehicle nodes sense surrounding environment information and vehicle state data in real time through the vehicle-mounted sensors, and the sensing data are uploaded to the edge twin layer in real time by utilizing a wireless communication technology, so that data support is provided for the construction of the digital twin body;

the cloud application layer comprises a digital twinning-based intelligent traffic management platform and a plurality of digital twinning-based traffic application servers.

3. The vehicle digital twin body edge deployment method according to claim 2, wherein in step S3, the digital twin synchronization process follows a timeout retransmission synchronization mechanism, in particular: the uploading time interval of the vehicle state data is dynamically determined according to the uploading time delay of the previous data packet and the processing time delay of the edge server; in a primary synchronization process, after the edge server processes the state data uploaded by the vehicle node, returning an ACK confirmation message, and uploading a next state data packet after the vehicle receives the confirmation message; and setting a digital twin synchronization delay threshold value, and if the vehicle does not receive the ACK confirmation information yet when the synchronization time interval exceeds the threshold value, retransmitting overtime to ensure the consistency and instantaneity of twin synchronization.

4. The vehicle digital twin body edge deployment method according to claim 2, wherein in step S4, the cloud average information age refers to an average value of all vehicle data information ages in the current environment, and the specific calculation method is as follows: definition z= [ Z _i,k ] _I×K For the DT deployment matrix, DT represents the vehicle twins, and z is when the digital twins corresponding to vehicle i are deployed on the edge server k _i,k =1, otherwise z _i,k =0; let it be assumed that at t ₀ At moment, if the cloud just receives DT information about the vehicle i uploaded by the edge server k, from t ₀ The cloud end is related to DT before the next data update from the moment _i Information age delta of (2) _i (t) writing:

wherein ,representing the synchronisation delay of the vehicle i with the vehicle twins, i.e. the transmission delay of the status data of the vehicle i to the edge server k, T _k,Cloud The transmission delay of forwarding the state data of the vehicle i to the Cloud by the edge server k is represented;

defining average messages during synchronization of vehicle iAge of restThe method comprises the following steps:

cloud acquires average information ages of all vehiclesExpressed as:

5. the vehicle digital twin body edge deployment method according to claim 4, wherein in step S4, the twin body migration cost includes a tunneling cost and a transmission cost, and the specific calculation method is as follows: digital twin DT corresponding to vehicle i _i Deployment location from edge server k ₁ Migration to k ₂ Migration cost at the timeExpressed as:

definition Z '= [ Z ]' _i ] _I×1 Migration for DTA matrix; for vehicle i, using the binary variable z' _i To indicate whether the digital twin of the vehicle i has migrated at a certain moment; when the deployment position of the time slot is the same as the last time slot, the migration process is not triggered, z' _i =0, otherwise z' _i =1; average twin migration Cost of the whole system ^mig Expressed as:

6. the vehicle digital twin body edge deployment method according to claim 5, wherein in step S5, the objective function established to minimize the age and migration cost of the cloud average information is expressed as:

C6:z _i,k ∈{0,1}

C7:β ₁ ,β ₂ ∈(0,1),β ₁ +β ₂ ＝1

wherein Z= [ Z ] _i,k ] _I×K Representing a vehicle twinhull deployment matrix,representing average information age, cost of all vehicles acquired by cloud ^mig Represents the average twin migration cost, DT _i ^CPU and DT_i ^Disk Respectively indicate maintenance DT _i CPU computing resources and disk storage resources required, < >> and />Respectively representing the total CPU computing resource and disk storage resource which are equipped by the edge server k; in the constraint condition, C1 ensures that each vehicle twin can be deployed on only one edge server at the same time; the C2-C3 ensures that the computing resources and the storage resources of any one edge server are not exhausted when the vehicle digital twin is deployed; C4-C5 ensures that the requirement of the information age of the edge end is met in the DT synchronization process>And information age demand of cloud wherein ,/>Representing the time of one synchronization of the twin, delta _m Representing the maximum age of cloud information in synchronization; c6 illustrates the deployment of variable z for a vehicle twin _i,k As binary variable, when the number of the vehicle i is twinZ when the organism is deployed on edge server k _i,k =1, otherwise z _i,k =0; beta in C7 ₁ and β₂ And (5) representing weight factors, namely weight factors of cloud average information age and twin migration cost.

7. The vehicle digital twin body edge deployment method of claim 6, wherein in step S6, the objective function is converted into a multi-agent partially observable markov decision process, comprising:

rewarding agent j for t time slot _t ^j The definition is as follows:

wherein the reward consists of two parts, the first part isI.e., an average AoI indicator for all vehicles associated with base station j; the second part is->I.e., the migration costs of all vehicles associated with base station j; the total prize for agent j is by a factor beta ₁ and β₂ The two parts are respectively weighted and summed to obtain the final product.

8. The vehicle digital twin edge deployment method according to claim 7, wherein in step S6, an Actor-Critic-based multi-agent deep reinforcement learning method is adopted to solve an optimal vehicle digital twin deployment scheme, and the method specifically comprises: the method adopts an Actor-Critic based centralized training-distributed execution deep reinforcement learning framework, and the strategy of each agent is represented by a deep neural network, namely an Actor network, which is marked as: wherein ,π^j Strategy for agent j, θ ^j Is a network parameter of a deep neural network, a ^j Is the action space s of the intelligent agent j ^j Is the observable state of agent j; setting a virtual central node to deploy DNN with a parameter phi, namely a Critic network;

9. The vehicle digital twin body edge deployment method of claim 8, wherein in step S6, the loss is calculatedThe function, in combination with Adam, updates network parameters, having the steps of: the Actor network updates network parameters through a strategy loss function, and an Actor network loss function L corresponding to an intelligent agent j _Actor (θ ^j ) The calculated expression of (2) is:

wherein , and />Representing the old policy and the current new policy, a, of agent j, respectively _t and s_t Execution action and observation state at time t respectively, +.> and />The execution action and the observation state of the agent j at the time t are respectively; clip () is a truncated function that ensures that the gap between the updated parameter and the old parameter is not too large; epsilon is a superparameter used to set the limit range of clip () functions; />Is a dominance function->Is obtained by generalized dominance estimation, and is specifically defined as:

10. The vehicle digital twin body edge deployment method of claim 9, wherein in step S6, the dominance function is defined asFor evaluating how well an action is taken relative to an average level in a given state; defining a global action cost function as +.> wherein U_t Reporting for accumulated discounts; the global state cost function is->