CN116208619A - Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium - Google Patents
Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium Download PDFInfo
- Publication number
- CN116208619A CN116208619A CN202310276875.3A CN202310276875A CN116208619A CN 116208619 A CN116208619 A CN 116208619A CN 202310276875 A CN202310276875 A CN 202310276875A CN 116208619 A CN116208619 A CN 116208619A
- Authority
- CN
- China
- Prior art keywords
- mec
- ris
- network
- target
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 113
- 238000004364 calculation method Methods 0.000 title claims abstract description 57
- 238000012549 training Methods 0.000 claims abstract description 66
- 238000004891 communication Methods 0.000 claims abstract description 57
- 230000009471 action Effects 0.000 claims abstract description 52
- 230000002787 reinforcement Effects 0.000 claims abstract description 41
- 238000005457 optimization Methods 0.000 claims abstract description 39
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims abstract description 25
- 230000006855 networking Effects 0.000 claims abstract description 9
- 230000010363 phase shift Effects 0.000 claims abstract description 5
- 230000008569 process Effects 0.000 claims description 48
- 230000005540 biological transmission Effects 0.000 claims description 25
- 238000013468 resource allocation Methods 0.000 claims description 24
- 238000005516 engineering process Methods 0.000 claims description 14
- 230000008901 benefit Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 8
- 108010046685 Rho Factor Proteins 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 claims description 2
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1074—Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
An intelligent reflection surface assisted internet of vehicles safety calculation unloading method, system, equipment and medium, wherein the method comprises the following steps: constructing a RIS auxiliary MEC vehicle network communication scene; constructing a RIS-assisted secure communication scene; constructing an optimization objective function of the RIS auxiliary MEC vehicle network scene; constructing a deep reinforcement learning algorithm model; constructing a deep reinforcement learning training model, setting states, actions and rewards of the training model, and carrying out model training on an optimization target; the RIS assists the MEC vehicle network decision model to obtain a vehicle networking safety calculation unloading scheme; the system, the equipment and the medium are used for realizing an intelligent reflection surface-assisted internet-of-vehicles safe computing and unloading method; the invention minimizes the maximum MEC service time by jointly designing the RIS phase shift matrix and distributing the MEC computing resource in real time, solves the problems of task unloading delay and safety in a dynamic Internet of vehicles scene, satisfies the safety of a communication link, improves the integral service quality of the MEC, and ensures the service quality and the safety performance of the Internet of vehicles.
Description
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium.
Background
With the continuous innovation of 5G mobile communication technology, the emerging internet of vehicles (V2X) technology is becoming mature, where V represents a vehicle and X represents any object that interacts with the vehicle, which may be vehicles, people, transportation facilities, and networks. The widespread use of the internet of vehicles has driven a large number of data demands and delay sensitive services, which all require a large amount of computing resources to handle. However, the conventional cloud computing increases the time delay of the computing due to the longer distance between the target user and the server, which is not suitable for the emerging V2X technology. To address the inadequacies of cloud computing, mobile Edge Computing (MEC) has become significant as a new computing paradigm. MEC can be well combined with Internet of vehicles, and vehicle users with limited resources are liberated from heavy computing tasks by using abundant computing resources at the edges of the network. And an MEC server is deployed in the Internet of vehicles, and a plurality of vehicles can simultaneously offload tasks to the MEC server, obtain high-speed computing service, reduce the processing time delay of the tasks and improve the user experience. However, due to severe channel fading in a crowded urban environment, the task offloading rate may be low, thereby extending the offloading delay. In addition, the wireless link is vulnerable to security threats such as eavesdropping due to the broadcast nature of the wireless signal. Therefore, it is important to improve the service quality and data security of the MEC on-board network from the perspective of secure communication.
Smart reflective surfaces (RIS) are currently considered a promising technology to improve wireless transmission quality and coverage. By designing the elements of the intelligent reflecting surface, signal reflection is designed to enhance the power of the required signal while mitigating multi-user interference. Previous studies have shown that Physical Layer Security (PLS) can be an effective alternative or complementary solution to secure complex wireless networks by exploiting the randomness inherent in wireless channels. However, many PLS techniques will degrade severely when an eavesdropper is closer to the Base Station (BS) than a legitimate user, or when the legitimate user and eavesdropper have associated channels. In response to these serious challenges, RIS in combination with PLS holds promise for designing a robust secure transmission mechanism, because it can flexibly reconstruct the channel environment in real time, and thus a technology of combining RIS and MEC research to realize a secure service has been proposed. However, the scheme of the RIS and MEC combined research has high complexity, the optimal solution scheme with low complexity cannot be inferred by a mathematical method, and the deep reinforcement learning is used as a powerful state estimation and function approximation tool, so that the method can adapt to various dynamic networks and solve the complex optimization problem. Based on this, it is proposed to optimize RIS and MEC resource allocation with deep reinforcement learning algorithms to achieve optimal security services.
In the literature [ y.liu, w.wang, h. -H.Chen, F.Lyu, L.Wang, W.Meng, and x.shen, "Physical Layer Security Assisted Computation Offloading in Intelligently Connected Vehicle Networks," IEEE Transactions on Wireless Communications, vol.20, no.6, pp.3555-3570,2021 ], authors propose a secure computing offload scheme in a vehicle network, focusing on optimizing the secure MEC service delay of a target vehicle, wherein artificial noise is added to combat potential eavesdroppers, enabling secure communication of the vehicle network. However, the solution is to optimize the problem of delay of the safe moving edge computing service of the target vehicle in the static internet of vehicles scene, and cannot be applied to the dynamic internet of vehicles scene with heavy computing task.
In the literature [ Y.Ju, Y.Chen, Z.Cao, H.Wang, L.Liu, Q.Pei, and n.kumar, "Learning Based and Physical-layer Assisted Secure Computation Offloading in Vehicular Spectrum Sharing Networks," in IEEE info com 2022-IEEE Conference on Computer Communications Workshops (info com kshps), 2022 ], authors propose a scheme for implementing a secure MEC service based on deep reinforcement learning in a dynamic internet of vehicles scenario, but the scheme implements the secure service through a physical layer security technology, which has limitations and does not explore the potential benefits of an intelligent reflective surface.
In summary, the following drawbacks exist in the prior art:
(1) The prior art is used for optimizing the safe mobile edge computing service delay problem of a target vehicle in a static internet of vehicles scene, and is not suitable for a dynamic mobile edge computing vehicle network with heavy computing tasks.
(2) In the prior art, only when all target vehicles complete tasks to a base station equipped with a mobile edge computing server, the base station allocates MEC computing resources to the target vehicles, which greatly aggravates the service delay of the Internet of vehicles.
(3) In a dynamic internet of vehicles scenario, the potential benefits of intelligent reflective surfaces are not considered in the prior art when conducting research on the problem of mobile edge computing security service delay.
How to select a proper deep reinforcement learning algorithm to cope with a high-dimensional state space under a channel which changes in real time; how to optimize RIS and MEC by deep reinforcement learning is a key problem to be solved by RIS-assisted MEC security service technology.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium, which optimize MEC service based on a communication scheme of a depth deterministic strategy gradient algorithm (Deep deterministic policy gradient), and minimize maximum MEC service time by jointly designing a RIS phase shift matrix and distributing MEC calculation resources in real time so as to realize optimal MEC safety service, solve the problems of task unloading delay and safety in a dynamic internet of vehicles scene, and improve the integral service quality of MEC on the premise of meeting the safety of a communication link, so that the service quality and the safety performance of the internet of vehicles are ensured.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, and simultaneously adding an eavesdropper model;
step 2: constructing a RIS-assisted secure communication scene;
step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, and constructing an objective function when the model is solved;
step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3;
step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting states, actions and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, and carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene;
step 6: and (5) obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step (5), and obtaining an optimal solution of the optimization problem, namely obtaining the vehicle networking safety calculation unloading scheme.
The specific method of the step 1 is as follows:
the BS establishes multiple communication links with vehicle users in different orthogonal sub-bands simultaneously, and the resource-constrained target vehicle can offload its computing tasks to the BS equipped with the MEC server, so as to obtain MEC computing resources, where the target vehicle obtaining computing services is expressed as:
Wherein, user M Representing an mth target vehicle user;
an un-serviced vehicle is considered a potential eavesdropper and can be represented as:
E={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
The specific method of the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as: wherein ,φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 1 ,θ 2 ,...,θ N ])
by the absence of in-band interference, receive beamforming is designed by the maximum ratio combining technique, which can be expressed as:
wherein ,fM A beamforming vector representing an mth V2I link;
step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I linkLink between mth target vehicle and RIS +.>The link from the mth target vehicle to the e potential eavesdropper->Link between RIS to the e potential eavesdropper +.>RIS to BS link->The RIS to BS channel obeys the Rician distribution, expressed as:
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is between RIS and BSDistance alpha of (a) i,b For the path LOSs index of the RIS to BS link, non-LOS componentFollows a complex gaussian distribution with zero mean and unit variance, the same h m,e ,h m,b ,h m,i ,h i,e Following Rician distribution, κ due to congestion urban environments and blocking effects between vehicles m,b and κm,e All are zero;
step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
the uplink signal-to-interference-and-noise ratio SINR of the mth V2I link at BS is given by:
similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the unloading process, the BS flexibly allocates the computing resources of the MEC server according to the task size, and each CPU cycle of the MEC server can process a certain number of data bits, assuming that the total computing power is ζbit/s.
The specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
any non-serviced vehicle may tap any V2I link, and to protect the mission data from being tapped, the redundancy for protecting confidential information may be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target security rate for confidential information;
if the capacity C of an eavesdropper e Greater than R b -R S Will send a security interrupt, using capacity C b Approximate R b The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x};
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
wherein ,Sm The task size, ζ m Is an allocated computing resource;
step 3.2: modeling an optimization target;
the optimization objective is to design RIS reflection coefficient matrix theta and MEC resource allocation for different calculation tasksTo minimize the service time, the former would affect the transmission time, the latter would determine the computation time, taking into account that the entire MEC service period is determined by the maximum service time of all V2I links, translating the above objective into the following min-max problem:
wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
The specific method of the step 4 is as follows:
DDPG is an algorithm of a model-free and heterogeneous strategy off-policy's Actor-Critic architecture, wherein an Actor network is used for predicting actions, a Critic network is used for evaluating future benefits of taking the actions in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta a and θa′ Training and targeting of Critic networksNetwork parameters are respectively theta c and θc′ ;
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network will S t and at State-action value Q as input and output state-action function value π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network when enough quaternions are accumulated in the empirical playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of Actor and Critic, the kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
wherein pi' represents the policy of the Actor target network;
the Critic training network updates the network using a mean square error MSE function, which can be expressed by:
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
the updating of the Actor and Critic target networks is as follows:
″
θ c′ =τ c θ c +(1-τ c )θ c′
θ a′ =τ a θ a +(1-τ a )θ a′
wherein ,τc and τa Is a soft update coefficient that satisfies τ c ,τ a ∈[0,1]。
The specific method in the step 5 is as follows:
step 5.1: setting a state space;
state of mth V2I link at time slot tComprising a privacy rate->Residual off-load task volume- >Residual calculation task quantity->Occupied MEC resource amount->Global channel state information->It can be expressed as:
to sum up, the state of the mth V2I link is expressed as:
at time slot t, the total environment of the M V2I links can be expressed as:
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ t ,ζ t }
step 5.3: setting a reward function;
at time slot t, corresponding to current action a t Can be expressed as:
wherein ,representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 The estimated remaining time based on the current motion, which includes the remaining transmission time and the remaining calculation time, is three cases:
(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is based on the current action, and the residual calculation time of each target vehicle adopts a future meterThe policy for average allocation of computing resources to all target vehicles is calculated, i.e. ζ min ;
(2) Some target vehicles are in the task unloading process, other target vehicles are in the task calculating process, for the target vehicles in the task unloading process, the residual transmission time in each user unloading process is calculated based on the current action, and the calculation resources are calculated as Wherein ζ is the calculated time remaining for policy estimation min The method is the minimum calculation resource of the target vehicle in the task calculation process, and for the target vehicle in the task calculation process, the residual calculation time is only estimated based on the current action;
(3) Estimating the residual calculation time of all target vehicles based on the current actions in the task calculation process of all target vehicles;
to increase the secure transmission rate, the penalty factor is expressed as:
if the current action can meet the security rate requirement of the mth linkThen v m =0, otherwise ν m =ν * ,ν * Is a parameter which can be set manually and is a negative number;
based on the setting of the reward function, the DDPG algorithm will continually learn action strategies that are directed towards reducing the maximum safe MEC service time within given constraints, and the total cumulative rewards can be expressed as:
where γ is the discount factor.
The specific method of the step 6 is as follows:
step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
At each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t ;
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And stores the tuple in the experience playback pool D while acquiring the state-action function Qpi from the Critic network (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Updating parameters of Critic and Actor networks by using samples with the size, and after the task amounts of all target vehicles are calculated, finishing one model training, and continuously repeating the above processes until the model training converges;
step 6.3: decision stage
And using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
The invention also provides a system for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, which comprises the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of a RIS auxiliary MEC vehicle network communication scene;
RIS-assisted secure communication module: the system is used for realizing the construction of a RIS-assisted safety communication scene, and in the module, the RIS technology provides a guarantee for the safety of dynamic vehicle communication;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing a RIS auxiliary MEC vehicle network scene;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on an optimization target;
the deep reinforcement learning model training module: the method is used for constructing a deep reinforcement learning training model, and in the model, model training is carried out on an optimization target of a RIS auxiliary MEC vehicle network scene;
a deep reinforcement learning decision model module: the method is used for realizing an RIS auxiliary MEC vehicle network decision model, and the optimal RIS coefficient matrix and MEC resource allocation in a dynamic vehicle networking scene are obtained in the module.
The invention also provides an intelligent reflection surface-assisted internet-of-vehicles safety calculation unloading device, which comprises:
a memory for storing a computer program;
and the processor is used for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method when executing the computer program.
The invention also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.
Compared with the prior art, the invention has the beneficial effects that:
1. at present, the allocation of reflection coefficient matrixes and mobile edge computing resources of an intelligent reflecting surface is optimized by utilizing a deep reinforcement learning algorithm under a dynamic scene; the scheme provided by the invention can determine a plurality of continuous optimal actions under a high-dimension continuous state space, reduce the vehicle network service delay and simultaneously provide guarantee for the safety of communication.
2. The invention regards the base station as an intelligent agent, can make decisions according to the state of the continuous change of the surroundings, has high adaptability to the scene of the Internet of vehicles with high dynamic property, and can allocate computing resources for the target vehicle as long as the target vehicle finishes task unloading, so that the idle MEC resources are effectively utilized.
3. The safety problem in the car networking scene at present is solved based on physical layer safety technology, and the method has limitation. The intelligent reflecting surface technology provided by the invention combines the physical layer security technology to realize the scheme of security service, and solves the problem that the physical layer security technology cannot resist the eavesdropping user when the eavesdropping user is closer to the base station than the target user, and the eavesdropping user and the target user have relevant channels.
4. The RIS auxiliary MEC vehicle network safety communication scene provided by the step 1 and the step 2 can be associated with an actual dynamic vehicle networking safety communication scene, provides a solution for the safety service problem in the actual scene, and has the advantage of higher applicability.
5. The deep reinforcement learning algorithm provided by the step 4 can solve the problem of complex high-dimensional continuous state space, can output continuous action values according to the continuous state space, and has the advantages of adapting to dynamic scenes and solving the problem of non-convexity.
In summary, compared with the prior art, the method has the advantages of realizing safety service by utilizing the deep reinforcement learning algorithm to solve the problem of jointly optimizing the intelligent reflecting surface and the mobile edge calculation in the dynamic scene and reducing service delay.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of an intelligent reflection surface assisted moving edge computing scenario provided by an embodiment of the present invention.
Fig. 3 is a schematic diagram of a deep reinforcement learning training model according to an embodiment of the present invention.
Fig. 4 is a diagram of simulation results of comparing and analyzing average MEC service time, MEC successful service probability and average MEC service security interruption probability with other algorithms by DDPG algorithm under different eavesdropping levels provided by the embodiment of the present invention.
Fig. 5 is a diagram of simulation results of comparing and analyzing average MEC service time and MEC successful service probability by the DDPG algorithm with other algorithms under different task ranges of the target vehicle provided by the embodiment of the invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
The invention provides an intelligent reflection surface assisted Internet of vehicles safety calculation unloading method, system, equipment and medium, which firstly models RIS assisted MEC vehicle network scenes, wherein a base station establishes a plurality of communication links with vehicle users in different sub-bands simultaneously to realize high-speed data rate transmission service, in the MEC scenes, a target vehicle with limited resources unloads calculation tasks to a Base Station (BS) provided with an MEC server through a vehicle-base station (V2I) link, the BS requests flexible allocation of MEC resources for different tasks and then feeds back the results to the target user, models RIS assisted safety communication, the communication channels obey rice distribution (Rician), all vehicles are provided with single omnidirectional antennas, and the BS is provided with K antenna uniform linear arrays. The intelligent reflective surface is a diagonal matrix with N reflective elements. Since there is no in-band interference, the BS designs beamforming in a Maximum Ratio Combining (MRC) manner for each V2I link. Secondly, in order to realize the safety service in the MEC scene, the optimization problem of minimizing the maximum MEC service time by jointly designing the RIS reflection coefficient matrix and MEC resource allocation is proposed. The optimization problem is non-convex and is also a long-term decision process with high dynamic performance, so that a deep reinforcement learning algorithm is adopted to solve, optimal MEC service is realized, the state, action and rewards of the deep reinforcement learning algorithm are designed, parameters such as position information and task quantity of a dynamic vehicle are used as basis of decision of an agent, and finally, optimal RIS reflection coefficient matrix and MEC resource allocation are obtained through training, so that safe and low-delay MEC service is realized.
As shown in fig. 1, a flow chart of a deep reinforcement learning-based intelligent reflective surface assisted internet of vehicles security computing offload scheme.
An intelligent reflection surface-assisted internet of vehicles safety calculation unloading method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, serving vehicles sending calculation service requests, and simultaneously adding an eavesdropper model for subsequent modeling and analysis; further, the specific method of the step 1 is as follows:
as shown in fig. 2, for the intelligent reflection-surface-assisted mobile edge computing scenario, the BS establishes multiple communication links with vehicle users in different orthogonal subbands simultaneously, a resource-constrained vehicle can offload its computing tasks to a BS equipped with an MEC server, the BS flexibly allocates MEC resources for different task requests, and then feeds back the results to the vehicle users. In the present invention, it is assumed that the time of the feedback delay is negligible with respect to the time required to satisfy the calculation task. Because of the limited resources at the BS, it is only possible to provide services to the vehicle that sent the computation service request, and the target vehicle that obtains the computation service is expressed as:
wherein, user M Representing the mth target vehicle user.
An un-serviced vehicle is considered a potential eavesdropper and can be represented as:
ε={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
Step 2: constructing a RIS-assisted secure communication scene, and laying a foundation for a communication channel used subsequently in the invention;
further, the specific method in the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as: wherein φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 1 ,θ 2 ,...,θ N ])
since there is no in-band interference, receive beamforming is designed by the max-ratio combining technique, which can be expressed as:
wherein ,fM Representing the beamforming vector for the mth V2I link.
Step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I linkLink between mth target vehicle and RIS +.>The link from the mth target vehicle to the e potential eavesdropper->Link between RIS to the e potential eavesdropper +.>RIS to BS link->The RIS to BS channel obeys the Rician distribution, expressed as:
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is the distance between RIS and BS, α i,b Is the path loss index of the RIS to BS link. non-LOS componentFollows a complex gaussian distribution with zero mean and unit variance for each element of (a). Same h m,e ,h m,b ,h m,i ,h i,e Following the Rician distribution. Kappa due to congestion effects between a crowded urban environment and a vehicle m,b and κm,e All are zero.
Step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
the uplink signal-to-interference-and-noise ratio (SINR) of the mth V2I link at the BS is given by:
similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the offloading process, the BS flexibly allocates the computational resources of the MEC server according to the size of the task. Each CPU cycle of the MEC server can process a certain number of data bits, assuming a total computing power of ζbit/s. In order to provide stable service, the BS aims to minimize the time of the entire MEC service while ensuring task offloading security for all users.
Step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, constructing an objective function when the model is solved, and laying a foundation for the model solution by using deep reinforcement learning subsequently;
Further, the specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
the present invention contemplates a worst case security threat where any un-serviced vehicle may eavesdrop on any V2I link. In order to protect the task data from eavesdropping, the transmitting end encodes the data and then needs to determine two code rates, namely a code rate R, before transmission b And target security rate R of confidential information S . Redundancy for protecting confidential information can therefore be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target privacy rate for confidential information.
If the capacity C of an eavesdropper e Greater than R b -R S A privacy interrupt is sent. In the present invention, we use the capacity C b Approximate R b . The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x}。
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
wherein ,Sm The task size, ζ m Is an allocated computing resource.
Step 3.2, optimizing target modeling;
the optimization objective of the invention is to design RIS reflection coefficient matrix Θ and MEC resource allocation for different calculation tasksTo minimize service time. The former will affect the transmission time, while the latter will determine the calculation time. Considering that the whole MEC service period is determined by the maximum service time of all V2I links, we translate the above objective into the following min-max problem:
Wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
Step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3, laying a theoretical foundation for the actual problem to be solved, and reducing the solving difficulty of the optimization problem;
further, the specific method in the step 4 is as follows:
the joint design of the RIS reflection coefficient matrix and MEC resource allocation for the entire MEC service can be modeled as a Markov Decision Process (MDP). The process consists of a number of time periods and their specific actions, each of which affects future benefits. The optimization problem of the present invention is non-convex and a long-term decision problem with high dynamics, which is difficult to represent by the mathematical expression displayed, so the present invention employs a depth-reinforced learning (DRL) algorithm of depth deterministic strategy gradient (DDPG). The algorithm can train out proper parameters according to continuous state space, so that a desired RIS coefficient matrix and MEC resource allocation are designed and obtained, and the service time is minimized.
As shown in FIG. 3, DDPG is an algorithm of an Actor-Critic architecture without model-free, heterogeneous strategy. The Actor network is used to predict an action and the Critic network is used to evaluate future benefits of taking the action in the current state. Both the Actor network and the Critic network consist of two Deep Neural Network (DNN) networks: training a network and a target network. Training and target network parameters of the Actor network are respectively theta a and θa′ The training and target network parameters of the Critic network are respectively theta c and θc′ . DDPG deep reinforcement learning training model architecture.
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network willS t and at As input and output state-action function value (state-action value) Q π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network. When enough quaternions are accumulated in the experience playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of the Actor and Critic. Kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
where pi' represents the policy of the Actor target network.
The Critic training network updates the network using a Mean Square Error (MSE) function, which can be represented by the following equation:
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
the updating of the Actor and Critic target networks is as follows:
″
θ c′ =τ c θ c +(1-τ c )θ c′
θ a′ =τ a θ a +(1-τ a )θ a′
wherein ,τc and τa Is a soft update coefficient that satisfies τ c ,τ a ∈[0,1];
Step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting the state, action and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene, and laying a foundation for obtaining a decision model subsequently;
Further, the specific method in the step 5 is as follows:
step 5.1: setting a state space;
state of mth V2I link at time slot tComprising a privacy rate->Residual off-load task volume->Residual calculation task quantity->Occupied MEC resource amount->Global channel state information->It can be expressed as:
to sum up, the state of the mth V2I link is expressed as:
at time slot t, the total environment of the M V2I links can be expressed as:
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ t ,ζ t }
Step 5.3: bonus function settings
At time slot t, corresponding to current action a t Can be expressed as:
wherein ,representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 Is the remaining time estimated based on the current action, which contains the remaining transmission time and the remaining calculation time. There are three cases of estimating the remaining time:
(1) All target vehicles are in the process of task offloading. The remaining transmission time of each target vehicle is calculated based on the current action, and the remaining calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta min 。
(2) Some target vehicles are in the task unloading process, and other target vehicles are in the task calculating process. For a target vehicle in the task offloading process, calculating a remaining transmission time in each user offloading process based on the current actions, and calculating resources forWherein ζ is the calculated time remaining for policy estimation min Is the minimum computing resource of the target vehicle in the task computing process. For the target vehicle in the task calculation process, only the remaining calculation time needs to be estimated based on the current motion.
(3) All target vehicles are in the process of task calculation. The remaining calculation time of all the target vehicles is estimated based on the current motion.
To increase the secure transmission rate, the penalty factor is expressed as:
if the current action is able to meet the security rate requirement of the mth linkThen v m =0, otherwise ν m =ν * ,ν * Is a parameter that can be set manually, which is a negative number.
Based on the setting of the reward function, the DDPG algorithm will continually learn action strategies within given constraints that are directed towards reducing the maximum safe MEC service time. The total jackpot may be expressed as:
wherein, gamma is a discount factor;
step 6: obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step 5, and obtaining an optimal solution of the optimization problem, namely obtaining a vehicle networking safety calculation unloading scheme;
Further, the specific method in the step 6 is as follows:
step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c . Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
at each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t ;
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And stores the tuple in the experience playback pool D while acquiring the state-action function Qpi from the Critic network (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Samples of the size update the parameters of Critic and Actor networks. After the task amounts of all the target vehicles have been calculated, one model training is completed. Repeating the above process until the model training converges ;
Step 6.3: a decision stage;
and using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
As shown in fig. 4, the DDPG algorithm compares and analyzes simulation results of average MEC service time, MEC successful service probability and average MEC service confidentiality interrupt probability with other algorithms at different eavesdropping levels. It can be seen that the DDPG-based method significantly reduces the average maximum MEC service time, improves the success probability of MEC, implements secure MEC service, and reduces service delay.
As shown in fig. 5, the DDPG algorithm compares and analyzes the average MEC service time and the probability of success in MEC service with other algorithms when the target vehicle is in different task ranges. As can be seen from comparison of simulation analysis graphs, the DDPG-based deep reinforcement learning algorithm can well solve the problem of high dimensionality non-convexity, can successfully learn an effective strategy in a complex and dynamic communication scene, and obtains the optimal RIS reflection coefficient and MEC resource allocation.
The invention also provides a system for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, which comprises the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of the RIS auxiliary MEC vehicle network communication scene in the step 1;
RIS-assisted secure communication module: the method comprises the steps of (1) constructing a RIS-assisted safety communication scene in the step (2), wherein the RIS technology provides guarantee for the safety of dynamic vehicle communication in the module;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing the RIS auxiliary MEC vehicle network scene in the step 3;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on the optimization target in the step 4;
the deep reinforcement learning model training module: the method is used for realizing the construction of the deep reinforcement learning training model in the step 5, and in the model, model training is carried out on the optimization target of the RIS auxiliary MEC vehicle network scene;
a deep reinforcement learning decision model module: the method is used for realizing the RIS auxiliary MEC vehicle network decision model in the step 6, and the optimal RIS coefficient matrix and MEC resource allocation in the dynamic vehicle networking scene are obtained in the module.
The invention also provides an intelligent reflection surface-assisted internet-of-vehicles safety calculation unloading device, which comprises:
a memory for storing a computer program;
and the processor is used for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method when executing the computer program.
The invention also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.
Claims (10)
1. An intelligent reflection surface-assisted internet of vehicles safety calculation unloading method is characterized in that: the method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, and simultaneously adding an eavesdropper model;
step 2: constructing a RIS-assisted secure communication scene;
step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, and constructing an objective function when the model is solved;
step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3;
step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting states, actions and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, and carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene;
Step 6: and (5) obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step (5), and obtaining an optimal solution of the optimization problem, namely obtaining the vehicle networking safety calculation unloading scheme.
2. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 1 is as follows:
the BS establishes multiple communication links with vehicle users in different orthogonal sub-bands simultaneously, and the resource-constrained target vehicle can offload its computing tasks to the BS equipped with the MEC server, so as to obtain MEC computing resources, where the target vehicle obtaining computing services is expressed as:
wherein, user M Representing an mth target vehicle user;
an un-serviced vehicle is considered a potential eavesdropper and can be represented as:
ε={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
3. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as: wherein ,φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 1 ,θ 2 ,...,θ N ])
by the absence of in-band interference, receive beamforming is designed by the maximum ratio combining technique, which can be expressed as:
wherein ,fM A beamforming vector representing an mth V2I link;
step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I linkLink between mth target vehicle and RIS +.>The link from the mth target vehicle to the e potential eavesdropper->Link between RIS to the e potential eavesdropper +.>RIS to BS link->The RIS to BS channel obeys the Rician distribution, expressed as:
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is the distance between RIS and BS, α i,b For the path LOSs index of the RIS to BS link, non-LOS componentFollows a complex gaussian distribution with zero mean and unit variance, the same h m,e ,h m,b ,h m,i ,h i,e Following Rician distribution, κ due to congestion urban environments and blocking effects between vehicles m,b and κm,e All are zero;
step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
the uplink signal-to-interference-and-noise ratio SINR of the mth V2I link at BS is given by:
Similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the unloading process, the BS flexibly allocates the computing resources of the MEC server according to the task size, and each CPU cycle of the MEC server can process a certain number of data bits, assuming that the total computing power is ζbit/s.
4. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
any non-serviced vehicle may tap any V2I link, and to protect the mission data from being tapped, the redundancy for protecting confidential information may be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target security rate for confidential information;
if the capacity C of an eavesdropper e Greater than R b -R S Will send a security interrupt, using capacity C b Approximate R b The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x};
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
wherein ,Sm The task size, ζ m Is an allocated computing resource;
step 3.2: modeling an optimization target;
the optimization objective is to design RIS reflection coefficient matrix theta and MEC resource allocation for different calculation tasksTo minimize the service time, the former would affect the transmission time, the latter would determine the computation time, taking into account that the entire MEC service period is determined by the maximum service time of all V2I links, translating the above objective into the following min-max problem:
wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
5. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 4 is as follows:
DDPG is an algorithm of an Actor-Critic architecture of model-free and heterogeneous policy off-policyThe network is used for predicting the action, the Critic network is used for evaluating the future benefits of taking the action in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta a and θa′ The training and target network parameters of the Critic network are respectively theta c and θc′ ;
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network will S t and at State-action value Q as input and output state-action function value π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network when enough quaternions are accumulated in the empirical playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of Actor and Critic, the kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
wherein pi' represents the policy of the Actor target network;
the Critic training network updates the network using a mean square error MSE function, which can be expressed by:
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
the updating of the Actor and Critic target networks is as follows:
θ c′ =τ c θ c +(1-τ c )θ c′
θ a′ =τ a θ a +(1-τ a )θ a′
wherein ,τc and τa Is a soft update coefficient that satisfies τ c ,τ a ∈[0,1]。
6. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method in the step 5 is as follows:
Step 5.1: setting a state space;
state of mth V2I link at time slot tComprising a privacy rate->Residual off-load task volume->Residual calculation task quantity->Occupancy ofMEC resource amount->Global channel state information->It can be expressed as:
to sum up, the state of the mth V2I link is expressed as:
at time slot t, the total environment of the M V2I links can be expressed as:
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ t ,ζ t }
step 5.3: setting a reward function;
at time slot t, corresponding to current action a t Can be expressed as:
wherein ,representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 The estimated remaining time based on the current motion, which includes the remaining transmission time and the remaining calculation time, is three cases:
(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is calculated based on the current action, and the residual calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta min ;
(2) Some target vehicles are in the task unloading process, other target vehicles are in the task calculating process, for the target vehicles in the task unloading process, the residual transmission time in each user unloading process is calculated based on the current action, and the calculation resources are calculated asWherein ζ is the calculated time remaining for policy estimation min The method is the minimum calculation resource of the target vehicle in the task calculation process, and for the target vehicle in the task calculation process, the residual calculation time is only estimated based on the current action;
(3) Estimating the residual calculation time of all target vehicles based on the current actions in the task calculation process of all target vehicles;
to increase the secure transmission rate, the penalty factor is expressed as:
if the current action can meet the security rate requirement of the mth linkThen v m =0, otherwise ν m =ν * ,ν * Is a parameter which can be set manually and is a negative number;
based on the setting of the reward function, the DDPG algorithm will continually learn action strategies that are directed towards reducing the maximum safe MEC service time within given constraints, and the total cumulative rewards can be expressed as:
where γ is the discount factor.
7. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 6 is as follows:
Step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
at each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t ;
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And combine the tuplesStored in the experience playback pool D while acquiring the state-action function Q from the Critic network π (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Updating parameters of Critic and Actor networks by using samples with the size, and after the task amounts of all target vehicles are calculated, finishing one model training, and continuously repeating the above processes until the model training converges;
Step 6.3: a decision stage;
and using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
8. A system for implementing an intelligent reflective surface assisted internet of vehicles secure computing offload method as defined in any one of claims 1 to 7, characterized by: comprising the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of a RIS auxiliary MEC vehicle network communication scene;
RIS-assisted secure communication module: the system is used for realizing the construction of a RIS-assisted safety communication scene, and in the module, the RIS technology provides a guarantee for the safety of dynamic vehicle communication;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing a RIS auxiliary MEC vehicle network scene;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on an optimization target;
the deep reinforcement learning model training module: the method is used for constructing a deep reinforcement learning training model, and in the model, model training is carried out on an optimization target of a RIS auxiliary MEC vehicle network scene;
A deep reinforcement learning decision model module: the method is used for realizing an RIS auxiliary MEC vehicle network decision model, and the optimal RIS coefficient matrix and MEC resource allocation in a dynamic vehicle networking scene are obtained in the module.
9. An intelligent reflective surface assisted internet of vehicles secure computing offload device, characterized in that: comprising the following steps:
a memory for storing a computer program;
a processor for implementing an intelligent reflective surface assisted internet of vehicles security computing offload method as claimed in any one of claims 1-8 when executing said computer program.
10. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program that when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310276875.3A CN116208619A (en) | 2023-03-21 | 2023-03-21 | Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310276875.3A CN116208619A (en) | 2023-03-21 | 2023-03-21 | Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116208619A true CN116208619A (en) | 2023-06-02 |
Family
ID=86519214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310276875.3A Pending CN116208619A (en) | 2023-03-21 | 2023-03-21 | Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116208619A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116963183A (en) * | 2023-07-31 | 2023-10-27 | 中国矿业大学 | Mine internet of things safe unloading method assisted by intelligent reflecting surface |
CN117156494A (en) * | 2023-10-31 | 2023-12-01 | 南京邮电大学 | Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication |
CN118042493A (en) * | 2024-04-11 | 2024-05-14 | 华东交通大学 | Internet of vehicles perception communication calculation joint optimization method based on reflecting element |
-
2023
- 2023-03-21 CN CN202310276875.3A patent/CN116208619A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116963183A (en) * | 2023-07-31 | 2023-10-27 | 中国矿业大学 | Mine internet of things safe unloading method assisted by intelligent reflecting surface |
CN116963183B (en) * | 2023-07-31 | 2024-03-08 | 中国矿业大学 | Mine internet of things safe unloading method assisted by intelligent reflecting surface |
CN117156494A (en) * | 2023-10-31 | 2023-12-01 | 南京邮电大学 | Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication |
CN117156494B (en) * | 2023-10-31 | 2024-01-19 | 南京邮电大学 | Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication |
CN118042493A (en) * | 2024-04-11 | 2024-05-14 | 华东交通大学 | Internet of vehicles perception communication calculation joint optimization method based on reflecting element |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fadlullah et al. | HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks | |
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
CN116208619A (en) | Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium | |
Chen et al. | Intelligent ubiquitous computing for future UAV-enabled MEC network systems | |
CN109617584B (en) | MIMO system beam forming matrix design method based on deep learning | |
CN109068391B (en) | Internet of vehicles communication optimization algorithm based on edge calculation and Actor-Critic algorithm | |
Shang et al. | Deep learning-assisted energy-efficient task offloading in vehicular edge computing systems | |
Hua et al. | Reconfigurable intelligent surface for green edge inference in machine learning | |
CN114143346B (en) | Joint optimization method and system for task unloading and service caching of Internet of vehicles | |
Shi et al. | A novel deep Q-learning-based air-assisted vehicular caching scheme for safe autonomous driving | |
Zhang et al. | Energy-efficient power control in wireless networks with spatial deep neural networks | |
CN110856259A (en) | Resource allocation and offloading method for adaptive data block size in mobile edge computing environment | |
Huang et al. | Dynamic compression ratio selection for edge inference systems with hard deadlines | |
Ji et al. | Reconfigurable intelligent surface enhanced device-to-device communications | |
Dai et al. | Deep reinforcement learning for edge computing and resource allocation in 5G beyond | |
CN113115451A (en) | Interference management and resource allocation scheme based on multi-agent deep reinforcement learning | |
CN112788764A (en) | Method and system for task unloading and resource allocation of NOMA ultra-dense network | |
Han et al. | Random caching optimization in large-scale cache-enabled Internet of Things networks | |
Lee et al. | Robust transmit power control with imperfect csi using a deep neural network | |
Lakew et al. | Adaptive partial offloading and resource harmonization in wireless edge computing-assisted ioe networks | |
Mahmoud et al. | Federated learning resource optimization and client selection for total energy minimization under outage, latency, and bandwidth constraints with partial or no CSI | |
Gupta et al. | LSTM-based energy-efficient wireless communication with reconfigurable intelligent surfaces | |
Su et al. | Semantic communication-based dynamic resource allocation in d2d vehicular networks | |
Hwang et al. | Deep reinforcement learning approach for uav-assisted mobile edge computing networks | |
Jiao et al. | Deep reinforcement learning-based optimization for RIS-based UAV-NOMA downlink networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |