CN116208619A

CN116208619A - Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium

Info

Publication number: CN116208619A
Application number: CN202310276875.3A
Authority: CN
Inventors: 俱莹; 白皓文; 王浩宇; 裴庆祺
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-06-02

Abstract

An intelligent reflection surface assisted internet of vehicles safety calculation unloading method, system, equipment and medium, wherein the method comprises the following steps: constructing a RIS auxiliary MEC vehicle network communication scene; constructing a RIS-assisted secure communication scene; constructing an optimization objective function of the RIS auxiliary MEC vehicle network scene; constructing a deep reinforcement learning algorithm model; constructing a deep reinforcement learning training model, setting states, actions and rewards of the training model, and carrying out model training on an optimization target; the RIS assists the MEC vehicle network decision model to obtain a vehicle networking safety calculation unloading scheme; the system, the equipment and the medium are used for realizing an intelligent reflection surface-assisted internet-of-vehicles safe computing and unloading method; the invention minimizes the maximum MEC service time by jointly designing the RIS phase shift matrix and distributing the MEC computing resource in real time, solves the problems of task unloading delay and safety in a dynamic Internet of vehicles scene, satisfies the safety of a communication link, improves the integral service quality of the MEC, and ensures the service quality and the safety performance of the Internet of vehicles.

Description

Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium

Technical Field

The invention belongs to the technical field of wireless communication, and particularly relates to an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium.

Background

With the continuous innovation of 5G mobile communication technology, the emerging internet of vehicles (V2X) technology is becoming mature, where V represents a vehicle and X represents any object that interacts with the vehicle, which may be vehicles, people, transportation facilities, and networks. The widespread use of the internet of vehicles has driven a large number of data demands and delay sensitive services, which all require a large amount of computing resources to handle. However, the conventional cloud computing increases the time delay of the computing due to the longer distance between the target user and the server, which is not suitable for the emerging V2X technology. To address the inadequacies of cloud computing, mobile Edge Computing (MEC) has become significant as a new computing paradigm. MEC can be well combined with Internet of vehicles, and vehicle users with limited resources are liberated from heavy computing tasks by using abundant computing resources at the edges of the network. And an MEC server is deployed in the Internet of vehicles, and a plurality of vehicles can simultaneously offload tasks to the MEC server, obtain high-speed computing service, reduce the processing time delay of the tasks and improve the user experience. However, due to severe channel fading in a crowded urban environment, the task offloading rate may be low, thereby extending the offloading delay. In addition, the wireless link is vulnerable to security threats such as eavesdropping due to the broadcast nature of the wireless signal. Therefore, it is important to improve the service quality and data security of the MEC on-board network from the perspective of secure communication.

Smart reflective surfaces (RIS) are currently considered a promising technology to improve wireless transmission quality and coverage. By designing the elements of the intelligent reflecting surface, signal reflection is designed to enhance the power of the required signal while mitigating multi-user interference. Previous studies have shown that Physical Layer Security (PLS) can be an effective alternative or complementary solution to secure complex wireless networks by exploiting the randomness inherent in wireless channels. However, many PLS techniques will degrade severely when an eavesdropper is closer to the Base Station (BS) than a legitimate user, or when the legitimate user and eavesdropper have associated channels. In response to these serious challenges, RIS in combination with PLS holds promise for designing a robust secure transmission mechanism, because it can flexibly reconstruct the channel environment in real time, and thus a technology of combining RIS and MEC research to realize a secure service has been proposed. However, the scheme of the RIS and MEC combined research has high complexity, the optimal solution scheme with low complexity cannot be inferred by a mathematical method, and the deep reinforcement learning is used as a powerful state estimation and function approximation tool, so that the method can adapt to various dynamic networks and solve the complex optimization problem. Based on this, it is proposed to optimize RIS and MEC resource allocation with deep reinforcement learning algorithms to achieve optimal security services.

In the literature [ y.liu, w.wang, h. -H.Chen, F.Lyu, L.Wang, W.Meng, and x.shen, "Physical Layer Security Assisted Computation Offloading in Intelligently Connected Vehicle Networks," IEEE Transactions on Wireless Communications, vol.20, no.6, pp.3555-3570,2021 ], authors propose a secure computing offload scheme in a vehicle network, focusing on optimizing the secure MEC service delay of a target vehicle, wherein artificial noise is added to combat potential eavesdroppers, enabling secure communication of the vehicle network. However, the solution is to optimize the problem of delay of the safe moving edge computing service of the target vehicle in the static internet of vehicles scene, and cannot be applied to the dynamic internet of vehicles scene with heavy computing task.

In the literature [ Y.Ju, Y.Chen, Z.Cao, H.Wang, L.Liu, Q.Pei, and n.kumar, "Learning Based and Physical-layer Assisted Secure Computation Offloading in Vehicular Spectrum Sharing Networks," in IEEE info com 2022-IEEE Conference on Computer Communications Workshops (info com kshps), 2022 ], authors propose a scheme for implementing a secure MEC service based on deep reinforcement learning in a dynamic internet of vehicles scenario, but the scheme implements the secure service through a physical layer security technology, which has limitations and does not explore the potential benefits of an intelligent reflective surface.

In summary, the following drawbacks exist in the prior art:

(1) The prior art is used for optimizing the safe mobile edge computing service delay problem of a target vehicle in a static internet of vehicles scene, and is not suitable for a dynamic mobile edge computing vehicle network with heavy computing tasks.

(2) In the prior art, only when all target vehicles complete tasks to a base station equipped with a mobile edge computing server, the base station allocates MEC computing resources to the target vehicles, which greatly aggravates the service delay of the Internet of vehicles.

(3) In a dynamic internet of vehicles scenario, the potential benefits of intelligent reflective surfaces are not considered in the prior art when conducting research on the problem of mobile edge computing security service delay.

How to select a proper deep reinforcement learning algorithm to cope with a high-dimensional state space under a channel which changes in real time; how to optimize RIS and MEC by deep reinforcement learning is a key problem to be solved by RIS-assisted MEC security service technology.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium, which optimize MEC service based on a communication scheme of a depth deterministic strategy gradient algorithm (Deep deterministic policy gradient), and minimize maximum MEC service time by jointly designing a RIS phase shift matrix and distributing MEC calculation resources in real time so as to realize optimal MEC safety service, solve the problems of task unloading delay and safety in a dynamic internet of vehicles scene, and improve the integral service quality of MEC on the premise of meeting the safety of a communication link, so that the service quality and the safety performance of the internet of vehicles are ensured.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method comprises the following steps:

step 1: constructing a RIS auxiliary MEC vehicle network communication scene, and simultaneously adding an eavesdropper model;

step 2: constructing a RIS-assisted secure communication scene;

step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, and constructing an objective function when the model is solved;

step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3;

step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting states, actions and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, and carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene;

step 6: and (5) obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step (5), and obtaining an optimal solution of the optimization problem, namely obtaining the vehicle networking safety calculation unloading scheme.

The specific method of the step 1 is as follows:

the BS establishes multiple communication links with vehicle users in different orthogonal sub-bands simultaneously, and the resource-constrained target vehicle can offload its computing tasks to the BS equipped with the MEC server, so as to obtain MEC computing resources, where the target vehicle obtaining computing services is expressed as:

/>

Wherein, user _M Representing an mth target vehicle user;

an un-serviced vehicle is considered a potential eavesdropper and can be represented as:

E＝{Eve ₁ ,Eve ₂ ,…,Eve _E }

wherein ,Eve_E Representing the E-th potential eavesdropper.

The specific method of the step 2 is as follows:

step 2.1: let the reflection coefficient of the nth element of RIS be expressed as:

wherein ,φ_n E [0,2 pi), the RIS reflection coefficient matrix is defined as:

Θ＝diag([θ ₁ ,θ ₂ ,...,θ _N ])

by the absence of in-band interference, receive beamforming is designed by the maximum ratio combining technique, which can be expressed as:

wherein ,f_M A beamforming vector representing an mth V2I link;

step 2.2: modeling a communication channel;

in an MEC vehicle network, the channels include: mth V2I link

Link between mth target vehicle and RIS +.>

The link from the mth target vehicle to the e potential eavesdropper->

Link between RIS to the e potential eavesdropper +.>

RIS to BS link->

The RIS to BS channel obeys the Rician distribution, expressed as:

wherein ,κ_i,b Is a Rician factor, ρ is a reference distance d ₀ Path loss at =1m, d _i,b Is between RIS and BSDistance alpha of (a) _i,b For the path LOSs index of the RIS to BS link, non-LOS component

Follows a complex gaussian distribution with zero mean and unit variance, the same h _m,e ,h _m,b ,h _m,i ,h _i,e Following Rician distribution, κ due to congestion urban environments and blocking effects between vehicles _m,b and κ_m,e All are zero;

step 2.3: modeling a signal receiving process;

the mth V2I link received signal at the BS can be expressed as:

wherein ,P_m Is the transmission power of the mth target vehicle s _m Representing unit energy signal samples associated with a computational task, noise vector n _m Can be expressed as:

n _m ＝[n ₁ ,...n _K ] ^T

wherein ,

the uplink signal-to-interference-and-noise ratio SINR of the mth V2I link at BS is given by:

similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:

/>

wherein ,

the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:

thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:

C _m ＝log(1+η _m )

C _e,m ＝log(1+η _e,m )

in the MEC vehicle network, once the user completes the unloading process, the BS flexibly allocates the computing resources of the MEC server according to the task size, and each CPU cycle of the MEC server can process a certain number of data bits, assuming that the total computing power is ζbit/s.

The specific method of the step 3 is as follows:

step 3.1: modeling a safety process;

any non-serviced vehicle may tap any V2I link, and to protect the mission data from being tapped, the redundancy for protecting confidential information may be expressed as:

max{0,R _b -R _S }

wherein ,R_b For the code word rate, R _S Target security rate for confidential information;

if the capacity C of an eavesdropper _e Greater than R _b -R _S Will send a security interrupt, using capacity C _b Approximate R _b The secure transmission rate of the mth V2I link can thus be expressed as:

R _S,m ＝[0,(C _m -maxC _e,m )] ⁺ ,e∈ε

wherein ,[x]⁺ ＝max{0,x}；

The MEC service time (offload and computation time) of the mth V2I link can be expressed as:

wherein ,S_m The task size, ζ _m Is an allocated computing resource;

step 3.2: modeling an optimization target;

the optimization objective is to design RIS reflection coefficient matrix theta and MEC resource allocation for different calculation tasks

To minimize the service time, the former would affect the transmission time, the latter would determine the computation time, taking into account that the entire MEC service period is determined by the maximum service time of all V2I links, translating the above objective into the following min-max problem:

wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.

The specific method of the step 4 is as follows:

DDPG is an algorithm of a model-free and heterogeneous strategy off-policy's Actor-Critic architecture, wherein an Actor network is used for predicting actions, a Critic network is used for evaluating future benefits of taking the actions in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta ^a and θ^a′ Training and targeting of Critic networksNetwork parameters are respectively theta ^c and θ^c′ ；

At time slot t, the Actor trains the network to S _t As input, and output action a _t Critic training network will S _t and a_t State-action value Q as input and output state-action function value _π (S _t ,a _t ∣θ ^c ) It can be expressed as:

Q _π (S _t ,a _t ∣θ ^c )＝E _π [R _t ∣S _t ,a _t ,π]

wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network when enough quaternions are accumulated in the empirical playback pool D (S _t ,a _t ,r _t ,S _t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool _d To update the training network of Actor and Critic, the kth tuple y _k The target state-action function value Q' of (2) can be expressed as:

y _k ＝r _k +γQ′ _π′ (S _k+1 ,π′(S _k+1 ∣θ ^a′ )∣θ ^c′ )

wherein pi' represents the policy of the Actor target network;

the Critic training network updates the network using a mean square error MSE function, which can be expressed by:

the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:

the updating of the Actor and Critic target networks is as follows:

″

θ ^c′ ＝τ _c θ ^c +(1-τ _c )θ ^c′

θ ^a′ ＝τ _a θ ^a +(1-τ _a )θ ^a′

wherein ,τ_c and τ_a Is a soft update coefficient that satisfies τ _c ,τ _a ∈[0,1]。

The specific method in the step 5 is as follows:

step 5.1: setting a state space;

state of mth V2I link at time slot t

Comprising a privacy rate->

Residual off-load task volume- >

Residual calculation task quantity->

Occupied MEC resource amount->

Global channel state information->

It can be expressed as:

to sum up, the state of the mth V2I link is expressed as:

at time slot t, the total environment of the M V2I links can be expressed as:

step 5.2: setting an action space;

based on the current state S _t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:

a _t ＝{Θ _t ,ζ _t }

wherein ,

is a computing resource allocation;

step 5.3: setting a reward function;

at time slot t, corresponding to current action a _t Can be expressed as:

wherein ,

representing the secure MEC service time of the mth V2I link at time slot t, t _m,1 Is the current time spent, t _m,2 The estimated remaining time based on the current motion, which includes the remaining transmission time and the remaining calculation time, is three cases:

(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is based on the current action, and the residual calculation time of each target vehicle adopts a future meterThe policy for average allocation of computing resources to all target vehicles is calculated, i.e. ζ _min ；

(2) Some target vehicles are in the task unloading process, other target vehicles are in the task calculating process, for the target vehicles in the task unloading process, the residual transmission time in each user unloading process is calculated based on the current action, and the calculation resources are calculated as

Wherein ζ is the calculated time remaining for policy estimation _min The method is the minimum calculation resource of the target vehicle in the task calculation process, and for the target vehicle in the task calculation process, the residual calculation time is only estimated based on the current action;

(3) Estimating the residual calculation time of all target vehicles based on the current actions in the task calculation process of all target vehicles;

to increase the secure transmission rate, the penalty factor is expressed as:

if the current action can meet the security rate requirement of the mth link

Then v _m =0, otherwise ν _m ＝ν ^* ，ν ^* Is a parameter which can be set manually and is a negative number;

based on the setting of the reward function, the DDPG algorithm will continually learn action strategies that are directed towards reducing the maximum safe MEC service time within given constraints, and the total cumulative rewards can be expressed as:

where γ is the discount factor.

The specific method of the step 6 is as follows:

step 6.1: initializing;

randomly initializing parameters theta of an Actor and Critic training network ^a 、θ ^c Parameter theta of the Actor target network ^a′ Initialized to θ ^a Parameter theta of Critic target network ^c′ Initialized to θ ^c Clearing the experience playback pool D;

step 6.2: training;

randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;

At each time slot t, the BS interacts with the dynamic environment to obtain a state S _t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link _t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;

BS obtains the state S of the next time slot t+1 from the changing environment _t+1 And calculates the action a being made _t Rewards r obtained from the environment _t ；

The state, action, and prize in the above process are stored as tuples (S _t ,a _t ,r _t ,S _t+1 ) And stores the tuple in the experience playback pool D while acquiring the state-action function Qpi from the Critic network (S _t ,a _t ∣θ ^c )；

When there are enough tuples in the experience playback pool, N is taken from them _d Updating parameters of Critic and Actor networks by using samples with the size, and after the task amounts of all target vehicles are calculated, finishing one model training, and continuously repeating the above processes until the model training converges;

step 6.3: decision stage

And using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.

The invention also provides a system for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, which comprises the following steps:

RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of a RIS auxiliary MEC vehicle network communication scene;

RIS-assisted secure communication module: the system is used for realizing the construction of a RIS-assisted safety communication scene, and in the module, the RIS technology provides a guarantee for the safety of dynamic vehicle communication;

secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing a RIS auxiliary MEC vehicle network scene;

the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on an optimization target;

the deep reinforcement learning model training module: the method is used for constructing a deep reinforcement learning training model, and in the model, model training is carried out on an optimization target of a RIS auxiliary MEC vehicle network scene;

a deep reinforcement learning decision model module: the method is used for realizing an RIS auxiliary MEC vehicle network decision model, and the optimal RIS coefficient matrix and MEC resource allocation in a dynamic vehicle networking scene are obtained in the module.

The invention also provides an intelligent reflection surface-assisted internet-of-vehicles safety calculation unloading device, which comprises:

a memory for storing a computer program;

and the processor is used for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method when executing the computer program.

The invention also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.

Compared with the prior art, the invention has the beneficial effects that:

1. at present, the allocation of reflection coefficient matrixes and mobile edge computing resources of an intelligent reflecting surface is optimized by utilizing a deep reinforcement learning algorithm under a dynamic scene; the scheme provided by the invention can determine a plurality of continuous optimal actions under a high-dimension continuous state space, reduce the vehicle network service delay and simultaneously provide guarantee for the safety of communication.

2. The invention regards the base station as an intelligent agent, can make decisions according to the state of the continuous change of the surroundings, has high adaptability to the scene of the Internet of vehicles with high dynamic property, and can allocate computing resources for the target vehicle as long as the target vehicle finishes task unloading, so that the idle MEC resources are effectively utilized.

3. The safety problem in the car networking scene at present is solved based on physical layer safety technology, and the method has limitation. The intelligent reflecting surface technology provided by the invention combines the physical layer security technology to realize the scheme of security service, and solves the problem that the physical layer security technology cannot resist the eavesdropping user when the eavesdropping user is closer to the base station than the target user, and the eavesdropping user and the target user have relevant channels.

4. The RIS auxiliary MEC vehicle network safety communication scene provided by the step 1 and the step 2 can be associated with an actual dynamic vehicle networking safety communication scene, provides a solution for the safety service problem in the actual scene, and has the advantage of higher applicability.

5. The deep reinforcement learning algorithm provided by the step 4 can solve the problem of complex high-dimensional continuous state space, can output continuous action values according to the continuous state space, and has the advantages of adapting to dynamic scenes and solving the problem of non-convexity.

In summary, compared with the prior art, the method has the advantages of realizing safety service by utilizing the deep reinforcement learning algorithm to solve the problem of jointly optimizing the intelligent reflecting surface and the mobile edge calculation in the dynamic scene and reducing service delay.

Drawings

Fig. 1 is a flow chart of the present invention.

Fig. 2 is a schematic diagram of an intelligent reflection surface assisted moving edge computing scenario provided by an embodiment of the present invention.

Fig. 3 is a schematic diagram of a deep reinforcement learning training model according to an embodiment of the present invention.

Fig. 4 is a diagram of simulation results of comparing and analyzing average MEC service time, MEC successful service probability and average MEC service security interruption probability with other algorithms by DDPG algorithm under different eavesdropping levels provided by the embodiment of the present invention.

Fig. 5 is a diagram of simulation results of comparing and analyzing average MEC service time and MEC successful service probability by the DDPG algorithm with other algorithms under different task ranges of the target vehicle provided by the embodiment of the invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

The invention provides an intelligent reflection surface assisted Internet of vehicles safety calculation unloading method, system, equipment and medium, which firstly models RIS assisted MEC vehicle network scenes, wherein a base station establishes a plurality of communication links with vehicle users in different sub-bands simultaneously to realize high-speed data rate transmission service, in the MEC scenes, a target vehicle with limited resources unloads calculation tasks to a Base Station (BS) provided with an MEC server through a vehicle-base station (V2I) link, the BS requests flexible allocation of MEC resources for different tasks and then feeds back the results to the target user, models RIS assisted safety communication, the communication channels obey rice distribution (Rician), all vehicles are provided with single omnidirectional antennas, and the BS is provided with K antenna uniform linear arrays. The intelligent reflective surface is a diagonal matrix with N reflective elements. Since there is no in-band interference, the BS designs beamforming in a Maximum Ratio Combining (MRC) manner for each V2I link. Secondly, in order to realize the safety service in the MEC scene, the optimization problem of minimizing the maximum MEC service time by jointly designing the RIS reflection coefficient matrix and MEC resource allocation is proposed. The optimization problem is non-convex and is also a long-term decision process with high dynamic performance, so that a deep reinforcement learning algorithm is adopted to solve, optimal MEC service is realized, the state, action and rewards of the deep reinforcement learning algorithm are designed, parameters such as position information and task quantity of a dynamic vehicle are used as basis of decision of an agent, and finally, optimal RIS reflection coefficient matrix and MEC resource allocation are obtained through training, so that safe and low-delay MEC service is realized.

As shown in fig. 1, a flow chart of a deep reinforcement learning-based intelligent reflective surface assisted internet of vehicles security computing offload scheme.

step 1: constructing a RIS auxiliary MEC vehicle network communication scene, serving vehicles sending calculation service requests, and simultaneously adding an eavesdropper model for subsequent modeling and analysis; further, the specific method of the step 1 is as follows:

as shown in fig. 2, for the intelligent reflection-surface-assisted mobile edge computing scenario, the BS establishes multiple communication links with vehicle users in different orthogonal subbands simultaneously, a resource-constrained vehicle can offload its computing tasks to a BS equipped with an MEC server, the BS flexibly allocates MEC resources for different task requests, and then feeds back the results to the vehicle users. In the present invention, it is assumed that the time of the feedback delay is negligible with respect to the time required to satisfy the calculation task. Because of the limited resources at the BS, it is only possible to provide services to the vehicle that sent the computation service request, and the target vehicle that obtains the computation service is expressed as:

wherein, user _M Representing the mth target vehicle user.

ε＝{Eve ₁ ,Eve ₂ ,…,Eve _E }

wherein ,Eve_E Representing the E-th potential eavesdropper.

Step 2: constructing a RIS-assisted secure communication scene, and laying a foundation for a communication channel used subsequently in the invention;

further, the specific method in the step 2 is as follows:

wherein φ_n E [0,2 pi), the RIS reflection coefficient matrix is defined as:

Θ＝diag([θ ₁ ,θ ₂ ,...,θ _N ])

since there is no in-band interference, receive beamforming is designed by the max-ratio combining technique, which can be expressed as:

wherein ,f_M Representing the beamforming vector for the mth V2I link.

Step 2.2: modeling a communication channel;

in an MEC vehicle network, the channels include: mth V2I link

Link between mth target vehicle and RIS +.>

The link from the mth target vehicle to the e potential eavesdropper->

Link between RIS to the e potential eavesdropper +.>

RIS to BS link->

The RIS to BS channel obeys the Rician distribution, expressed as:

wherein ,κ_i,b Is a Rician factor, ρ is a reference distance d ₀ Path loss at =1m, d _i,b Is the distance between RIS and BS, α _i,b Is the path loss index of the RIS to BS link. non-LOS component

Follows a complex gaussian distribution with zero mean and unit variance for each element of (a). Same h _m,e ,h _m,b ,h _m,i ,h _i,e Following the Rician distribution. Kappa due to congestion effects between a crowded urban environment and a vehicle _m,b and κ_m,e All are zero.

Step 2.3: modeling a signal receiving process;

the mth V2I link received signal at the BS can be expressed as:

n _m ＝[n ₁ ,...n _K ] ^T

wherein ,

the uplink signal-to-interference-and-noise ratio (SINR) of the mth V2I link at the BS is given by:

wherein ,

C _m ＝log(1+η _m )

C _e,m ＝log(1+η _e,m )

in the MEC vehicle network, once the user completes the offloading process, the BS flexibly allocates the computational resources of the MEC server according to the size of the task. Each CPU cycle of the MEC server can process a certain number of data bits, assuming a total computing power of ζbit/s. In order to provide stable service, the BS aims to minimize the time of the entire MEC service while ensuring task offloading security for all users.

Step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, constructing an objective function when the model is solved, and laying a foundation for the model solution by using deep reinforcement learning subsequently;

Further, the specific method of the step 3 is as follows:

step 3.1: modeling a safety process;

the present invention contemplates a worst case security threat where any un-serviced vehicle may eavesdrop on any V2I link. In order to protect the task data from eavesdropping, the transmitting end encodes the data and then needs to determine two code rates, namely a code rate R, before transmission _b And target security rate R of confidential information _S . Redundancy for protecting confidential information can therefore be expressed as:

max{0,R _b -R _S }

wherein ,R_b For the code word rate, R _S Target privacy rate for confidential information.

If the capacity C of an eavesdropper _e Greater than R _b -R _S A privacy interrupt is sent. In the present invention, we use the capacity C _b Approximate R _b . The secure transmission rate of the mth V2I link can thus be expressed as:

R _S,m ＝[0,(C _m -maxC _e,m )] ⁺ ,e∈ε

wherein ,[x]⁺ ＝max{0,x}。

wherein ,S_m The task size, ζ _m Is an allocated computing resource.

Step 3.2, optimizing target modeling;

the optimization objective of the invention is to design RIS reflection coefficient matrix Θ and MEC resource allocation for different calculation tasks

To minimize service time. The former will affect the transmission time, while the latter will determine the calculation time. Considering that the whole MEC service period is determined by the maximum service time of all V2I links, we translate the above objective into the following min-max problem:

Step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3, laying a theoretical foundation for the actual problem to be solved, and reducing the solving difficulty of the optimization problem;

further, the specific method in the step 4 is as follows:

the joint design of the RIS reflection coefficient matrix and MEC resource allocation for the entire MEC service can be modeled as a Markov Decision Process (MDP). The process consists of a number of time periods and their specific actions, each of which affects future benefits. The optimization problem of the present invention is non-convex and a long-term decision problem with high dynamics, which is difficult to represent by the mathematical expression displayed, so the present invention employs a depth-reinforced learning (DRL) algorithm of depth deterministic strategy gradient (DDPG). The algorithm can train out proper parameters according to continuous state space, so that a desired RIS coefficient matrix and MEC resource allocation are designed and obtained, and the service time is minimized.

As shown in FIG. 3, DDPG is an algorithm of an Actor-Critic architecture without model-free, heterogeneous strategy. The Actor network is used to predict an action and the Critic network is used to evaluate future benefits of taking the action in the current state. Both the Actor network and the Critic network consist of two Deep Neural Network (DNN) networks: training a network and a target network. Training and target network parameters of the Actor network are respectively theta ^a and θ^a′ The training and target network parameters of the Critic network are respectively theta ^c and θ^c′ . DDPG deep reinforcement learning training model architecture.

At time slot t, the Actor trains the network to S _t As input, and output action a _t Critic training network willS _t and a_t As input and output state-action function value (state-action value) Q _π (S _t ,a _t ∣θ ^c ) It can be expressed as:

Q _π (S _t ,a _t ∣θ ^c )＝E _π [R _t ∣S _t ,a _t ,π]

wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network. When enough quaternions are accumulated in the experience playback pool D (S _t ,a _t ,r _t ,S _t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool _d To update the training network of the Actor and Critic. Kth tuple y _k The target state-action function value Q' of (2) can be expressed as:

y _k ＝r _k +γQ′ _π′ (S _k+1 ,π′(S _k+1 ∣θ ^a′ )∣θ ^c′ )

where pi' represents the policy of the Actor target network.

The Critic training network updates the network using a Mean Square Error (MSE) function, which can be represented by the following equation:

the updating of the Actor and Critic target networks is as follows:

″

θ ^c′ ＝τ _c θ ^c +(1-τ _c )θ ^c′

θ ^a′ ＝τ _a θ ^a +(1-τ _a )θ ^a′

wherein ,τ_c and τ_a Is a soft update coefficient that satisfies τ _c ,τ _a ∈[0,1]；

Step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting the state, action and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene, and laying a foundation for obtaining a decision model subsequently;

Further, the specific method in the step 5 is as follows:

step 5.1: setting a state space;

state of mth V2I link at time slot t

Comprising a privacy rate->

Residual off-load task volume->

Residual calculation task quantity->

Occupied MEC resource amount->

Global channel state information->

It can be expressed as:

to sum up, the state of the mth V2I link is expressed as:

at time slot t, the total environment of the M V2I links can be expressed as:

step 5.2: setting an action space;

a _t ＝{Θ _t ,ζ _t }

wherein ,

is a computing resource allocation; />

Step 5.3: bonus function settings

At time slot t, corresponding to current action a _t Can be expressed as:

wherein ,

representing the secure MEC service time of the mth V2I link at time slot t, t _m,1 Is the current time spent, t _m,2 Is the remaining time estimated based on the current action, which contains the remaining transmission time and the remaining calculation time. There are three cases of estimating the remaining time:

(1) All target vehicles are in the process of task offloading. The remaining transmission time of each target vehicle is calculated based on the current action, and the remaining calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta _min 。

(2) Some target vehicles are in the task unloading process, and other target vehicles are in the task calculating process. For a target vehicle in the task offloading process, calculating a remaining transmission time in each user offloading process based on the current actions, and calculating resources for

Wherein ζ is the calculated time remaining for policy estimation _min Is the minimum computing resource of the target vehicle in the task computing process. For the target vehicle in the task calculation process, only the remaining calculation time needs to be estimated based on the current motion.

(3) All target vehicles are in the process of task calculation. The remaining calculation time of all the target vehicles is estimated based on the current motion.

To increase the secure transmission rate, the penalty factor is expressed as:

if the current action is able to meet the security rate requirement of the mth link

Then v _m =0, otherwise ν _m ＝ν ^* ，ν ^* Is a parameter that can be set manually, which is a negative number.

Based on the setting of the reward function, the DDPG algorithm will continually learn action strategies within given constraints that are directed towards reducing the maximum safe MEC service time. The total jackpot may be expressed as:

wherein, gamma is a discount factor;

step 6: obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step 5, and obtaining an optimal solution of the optimization problem, namely obtaining a vehicle networking safety calculation unloading scheme;

Further, the specific method in the step 6 is as follows:

step 6.1: initializing;

randomly initializing parameters theta of an Actor and Critic training network ^a 、θ ^c Parameter theta of the Actor target network ^a′ Initialized to θ ^a Parameter theta of Critic target network ^c′ Initialized to θ ^c . Clearing the experience playback pool D;

step 6.2: training;

When there are enough tuples in the experience playback pool, N is taken from them _d Samples of the size update the parameters of Critic and Actor networks. After the task amounts of all the target vehicles have been calculated, one model training is completed. Repeating the above process until the model training converges ；

Step 6.3: a decision stage;

As shown in fig. 4, the DDPG algorithm compares and analyzes simulation results of average MEC service time, MEC successful service probability and average MEC service confidentiality interrupt probability with other algorithms at different eavesdropping levels. It can be seen that the DDPG-based method significantly reduces the average maximum MEC service time, improves the success probability of MEC, implements secure MEC service, and reduces service delay.

As shown in fig. 5, the DDPG algorithm compares and analyzes the average MEC service time and the probability of success in MEC service with other algorithms when the target vehicle is in different task ranges. As can be seen from comparison of simulation analysis graphs, the DDPG-based deep reinforcement learning algorithm can well solve the problem of high dimensionality non-convexity, can successfully learn an effective strategy in a complex and dynamic communication scene, and obtains the optimal RIS reflection coefficient and MEC resource allocation.

RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of the RIS auxiliary MEC vehicle network communication scene in the step 1;

RIS-assisted secure communication module: the method comprises the steps of (1) constructing a RIS-assisted safety communication scene in the step (2), wherein the RIS technology provides guarantee for the safety of dynamic vehicle communication in the module;

secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing the RIS auxiliary MEC vehicle network scene in the step 3;

the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on the optimization target in the step 4;

the deep reinforcement learning model training module: the method is used for realizing the construction of the deep reinforcement learning training model in the step 5, and in the model, model training is carried out on the optimization target of the RIS auxiliary MEC vehicle network scene;

a deep reinforcement learning decision model module: the method is used for realizing the RIS auxiliary MEC vehicle network decision model in the step 6, and the optimal RIS coefficient matrix and MEC resource allocation in the dynamic vehicle networking scene are obtained in the module.

a memory for storing a computer program;

Claims

1. An intelligent reflection surface-assisted internet of vehicles safety calculation unloading method is characterized in that: the method comprises the following steps:

step 2: constructing a RIS-assisted secure communication scene;

2. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 1 is as follows:

wherein, user _M Representing an mth target vehicle user;

ε＝{Eve ₁ ,Eve ₂ ,…,Eve _E }

wherein ,Eve_E Representing the E-th potential eavesdropper.

3. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 2 is as follows:

wherein ,φ_n E [0,2 pi), the RIS reflection coefficient matrix is defined as:

Θ＝diag([θ ₁ ,θ ₂ ,...,θ _N ])

wherein ,f_M A beamforming vector representing an mth V2I link;

step 2.2: modeling a communication channel;

in an MEC vehicle network, the channels include: mth V2I link

Link between mth target vehicle and RIS +.>

The link from the mth target vehicle to the e potential eavesdropper->

Link between RIS to the e potential eavesdropper +.>

RIS to BS link->

The RIS to BS channel obeys the Rician distribution, expressed as:

/>

wherein ,κ_i,b Is a Rician factor, ρ is a reference distance d ₀ Path loss at =1m, d _i,b Is the distance between RIS and BS, α _i,b For the path LOSs index of the RIS to BS link, non-LOS component

step 2.3: modeling a signal receiving process;

the mth V2I link received signal at the BS can be expressed as:

n _m ＝[n ₁ ,...n _K ] ^T

wherein ,

wherein ,

C _m ＝log(1+η _m )

C _e,m ＝log(1+η _e,m )

4. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 3 is as follows:

step 3.1: modeling a safety process;

max{0,R _b -R _S }

R _S,m ＝[0,(C _m -maxC _e,m )] ⁺ ,e∈ε

wherein ,[x]⁺ ＝max{0,x}；

wherein ,S_m The task size, ζ _m Is an allocated computing resource;

step 3.2: modeling an optimization target;

C ₁ :

C ₂ :

5. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 4 is as follows:

DDPG is an algorithm of an Actor-Critic architecture of model-free and heterogeneous policy off-policyThe network is used for predicting the action, the Critic network is used for evaluating the future benefits of taking the action in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta ^a and θ^a′ The training and target network parameters of the Critic network are respectively theta ^c and θ^c′ ；

Q _π (S _t ,a _t ∣θ ^c )＝E _π [R _t ∣S _t ,a _t ,π]

y _k ＝r _k +γQ′ _π′ (S _k+1 ,π′(S _k+1 ∣θ ^a′ )∣θ ^c′ )

wherein pi' represents the policy of the Actor target network;

the updating of the Actor and Critic target networks is as follows:

θ ^c′ ＝τ _c θ ^c +(1-τ _c )θ ^c′

θ ^a′ ＝τ _a θ ^a +(1-τ _a )θ ^a′

6. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method in the step 5 is as follows:

Step 5.1: setting a state space;

state of mth V2I link at time slot t

Comprising a privacy rate->

Residual off-load task volume->

Residual calculation task quantity->

Occupancy ofMEC resource amount->

Global channel state information->

It can be expressed as:

to sum up, the state of the mth V2I link is expressed as:

at time slot t, the total environment of the M V2I links can be expressed as:

step 5.2: setting an action space;

a _t ＝{Θ _t ,ζ _t }

wherein ,

is a computing resource allocation;

step 5.3: setting a reward function;

at time slot t, corresponding to current action a _t Can be expressed as:

wherein ,

(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is calculated based on the current action, and the residual calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta _min ；

to increase the secure transmission rate, the penalty factor is expressed as:

if the current action can meet the security rate requirement of the mth link

where γ is the discount factor.

7. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 6 is as follows:

Step 6.1: initializing;

step 6.2: training;

The state, action, and prize in the above process are stored as tuples (S _t ,a _t ,r _t ,S _t+1 ) And combine the tuplesStored in the experience playback pool D while acquiring the state-action function Q from the Critic network _π (S _t ,a _t ∣θ ^c )；

Step 6.3: a decision stage;

8. A system for implementing an intelligent reflective surface assisted internet of vehicles secure computing offload method as defined in any one of claims 1 to 7, characterized by: comprising the following steps:

9. An intelligent reflective surface assisted internet of vehicles secure computing offload device, characterized in that: comprising the following steps:

a memory for storing a computer program;

a processor for implementing an intelligent reflective surface assisted internet of vehicles security computing offload method as claimed in any one of claims 1-8 when executing said computer program.

10. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program that when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.