CN116208619A - Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium - Google Patents

Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium Download PDF

Info

Publication number
CN116208619A
CN116208619A CN202310276875.3A CN202310276875A CN116208619A CN 116208619 A CN116208619 A CN 116208619A CN 202310276875 A CN202310276875 A CN 202310276875A CN 116208619 A CN116208619 A CN 116208619A
Authority
CN
China
Prior art keywords
mec
ris
network
target
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310276875.3A
Other languages
Chinese (zh)
Inventor
俱莹
白皓文
王浩宇
裴庆祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310276875.3A priority Critical patent/CN116208619A/en
Publication of CN116208619A publication Critical patent/CN116208619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An intelligent reflection surface assisted internet of vehicles safety calculation unloading method, system, equipment and medium, wherein the method comprises the following steps: constructing a RIS auxiliary MEC vehicle network communication scene; constructing a RIS-assisted secure communication scene; constructing an optimization objective function of the RIS auxiliary MEC vehicle network scene; constructing a deep reinforcement learning algorithm model; constructing a deep reinforcement learning training model, setting states, actions and rewards of the training model, and carrying out model training on an optimization target; the RIS assists the MEC vehicle network decision model to obtain a vehicle networking safety calculation unloading scheme; the system, the equipment and the medium are used for realizing an intelligent reflection surface-assisted internet-of-vehicles safe computing and unloading method; the invention minimizes the maximum MEC service time by jointly designing the RIS phase shift matrix and distributing the MEC computing resource in real time, solves the problems of task unloading delay and safety in a dynamic Internet of vehicles scene, satisfies the safety of a communication link, improves the integral service quality of the MEC, and ensures the service quality and the safety performance of the Internet of vehicles.

Description

Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium.
Background
With the continuous innovation of 5G mobile communication technology, the emerging internet of vehicles (V2X) technology is becoming mature, where V represents a vehicle and X represents any object that interacts with the vehicle, which may be vehicles, people, transportation facilities, and networks. The widespread use of the internet of vehicles has driven a large number of data demands and delay sensitive services, which all require a large amount of computing resources to handle. However, the conventional cloud computing increases the time delay of the computing due to the longer distance between the target user and the server, which is not suitable for the emerging V2X technology. To address the inadequacies of cloud computing, mobile Edge Computing (MEC) has become significant as a new computing paradigm. MEC can be well combined with Internet of vehicles, and vehicle users with limited resources are liberated from heavy computing tasks by using abundant computing resources at the edges of the network. And an MEC server is deployed in the Internet of vehicles, and a plurality of vehicles can simultaneously offload tasks to the MEC server, obtain high-speed computing service, reduce the processing time delay of the tasks and improve the user experience. However, due to severe channel fading in a crowded urban environment, the task offloading rate may be low, thereby extending the offloading delay. In addition, the wireless link is vulnerable to security threats such as eavesdropping due to the broadcast nature of the wireless signal. Therefore, it is important to improve the service quality and data security of the MEC on-board network from the perspective of secure communication.
Smart reflective surfaces (RIS) are currently considered a promising technology to improve wireless transmission quality and coverage. By designing the elements of the intelligent reflecting surface, signal reflection is designed to enhance the power of the required signal while mitigating multi-user interference. Previous studies have shown that Physical Layer Security (PLS) can be an effective alternative or complementary solution to secure complex wireless networks by exploiting the randomness inherent in wireless channels. However, many PLS techniques will degrade severely when an eavesdropper is closer to the Base Station (BS) than a legitimate user, or when the legitimate user and eavesdropper have associated channels. In response to these serious challenges, RIS in combination with PLS holds promise for designing a robust secure transmission mechanism, because it can flexibly reconstruct the channel environment in real time, and thus a technology of combining RIS and MEC research to realize a secure service has been proposed. However, the scheme of the RIS and MEC combined research has high complexity, the optimal solution scheme with low complexity cannot be inferred by a mathematical method, and the deep reinforcement learning is used as a powerful state estimation and function approximation tool, so that the method can adapt to various dynamic networks and solve the complex optimization problem. Based on this, it is proposed to optimize RIS and MEC resource allocation with deep reinforcement learning algorithms to achieve optimal security services.
In the literature [ y.liu, w.wang, h. -H.Chen, F.Lyu, L.Wang, W.Meng, and x.shen, "Physical Layer Security Assisted Computation Offloading in Intelligently Connected Vehicle Networks," IEEE Transactions on Wireless Communications, vol.20, no.6, pp.3555-3570,2021 ], authors propose a secure computing offload scheme in a vehicle network, focusing on optimizing the secure MEC service delay of a target vehicle, wherein artificial noise is added to combat potential eavesdroppers, enabling secure communication of the vehicle network. However, the solution is to optimize the problem of delay of the safe moving edge computing service of the target vehicle in the static internet of vehicles scene, and cannot be applied to the dynamic internet of vehicles scene with heavy computing task.
In the literature [ Y.Ju, Y.Chen, Z.Cao, H.Wang, L.Liu, Q.Pei, and n.kumar, "Learning Based and Physical-layer Assisted Secure Computation Offloading in Vehicular Spectrum Sharing Networks," in IEEE info com 2022-IEEE Conference on Computer Communications Workshops (info com kshps), 2022 ], authors propose a scheme for implementing a secure MEC service based on deep reinforcement learning in a dynamic internet of vehicles scenario, but the scheme implements the secure service through a physical layer security technology, which has limitations and does not explore the potential benefits of an intelligent reflective surface.
In summary, the following drawbacks exist in the prior art:
(1) The prior art is used for optimizing the safe mobile edge computing service delay problem of a target vehicle in a static internet of vehicles scene, and is not suitable for a dynamic mobile edge computing vehicle network with heavy computing tasks.
(2) In the prior art, only when all target vehicles complete tasks to a base station equipped with a mobile edge computing server, the base station allocates MEC computing resources to the target vehicles, which greatly aggravates the service delay of the Internet of vehicles.
(3) In a dynamic internet of vehicles scenario, the potential benefits of intelligent reflective surfaces are not considered in the prior art when conducting research on the problem of mobile edge computing security service delay.
How to select a proper deep reinforcement learning algorithm to cope with a high-dimensional state space under a channel which changes in real time; how to optimize RIS and MEC by deep reinforcement learning is a key problem to be solved by RIS-assisted MEC security service technology.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, system, equipment and medium, which optimize MEC service based on a communication scheme of a depth deterministic strategy gradient algorithm (Deep deterministic policy gradient), and minimize maximum MEC service time by jointly designing a RIS phase shift matrix and distributing MEC calculation resources in real time so as to realize optimal MEC safety service, solve the problems of task unloading delay and safety in a dynamic internet of vehicles scene, and improve the integral service quality of MEC on the premise of meeting the safety of a communication link, so that the service quality and the safety performance of the internet of vehicles are ensured.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
an intelligent reflection surface-assisted internet of vehicles safety calculation unloading method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, and simultaneously adding an eavesdropper model;
step 2: constructing a RIS-assisted secure communication scene;
step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, and constructing an objective function when the model is solved;
step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3;
step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting states, actions and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, and carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene;
step 6: and (5) obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step (5), and obtaining an optimal solution of the optimization problem, namely obtaining the vehicle networking safety calculation unloading scheme.
The specific method of the step 1 is as follows:
the BS establishes multiple communication links with vehicle users in different orthogonal sub-bands simultaneously, and the resource-constrained target vehicle can offload its computing tasks to the BS equipped with the MEC server, so as to obtain MEC computing resources, where the target vehicle obtaining computing services is expressed as:
Figure BDA0004136578610000041
/>
Wherein, user M Representing an mth target vehicle user;
an un-serviced vehicle is considered a potential eavesdropper and can be represented as:
E={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
The specific method of the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as:
Figure BDA0004136578610000051
wherein ,φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 12 ,...,θ N ])
by the absence of in-band interference, receive beamforming is designed by the maximum ratio combining technique, which can be expressed as:
Figure BDA0004136578610000052
wherein ,fM A beamforming vector representing an mth V2I link;
step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I link
Figure BDA0004136578610000053
Link between mth target vehicle and RIS +.>
Figure BDA0004136578610000054
The link from the mth target vehicle to the e potential eavesdropper->
Figure BDA0004136578610000055
Link between RIS to the e potential eavesdropper +.>
Figure BDA0004136578610000056
RIS to BS link->
Figure BDA0004136578610000057
The RIS to BS channel obeys the Rician distribution, expressed as:
Figure BDA0004136578610000058
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is between RIS and BSDistance alpha of (a) i,b For the path LOSs index of the RIS to BS link, non-LOS component
Figure BDA0004136578610000059
Follows a complex gaussian distribution with zero mean and unit variance, the same h m,e ,h m,b ,h m,i ,h i,e Following Rician distribution, κ due to congestion urban environments and blocking effects between vehicles m,b and κm,e All are zero;
step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
Figure BDA00041365786100000510
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
wherein ,
Figure BDA0004136578610000061
the uplink signal-to-interference-and-noise ratio SINR of the mth V2I link at BS is given by:
Figure BDA0004136578610000062
similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
Figure BDA0004136578610000063
/>
wherein ,
Figure BDA0004136578610000064
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
Figure BDA0004136578610000065
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the unloading process, the BS flexibly allocates the computing resources of the MEC server according to the task size, and each CPU cycle of the MEC server can process a certain number of data bits, assuming that the total computing power is ζbit/s.
The specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
any non-serviced vehicle may tap any V2I link, and to protect the mission data from being tapped, the redundancy for protecting confidential information may be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target security rate for confidential information;
if the capacity C of an eavesdropper e Greater than R b -R S Will send a security interrupt, using capacity C b Approximate R b The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x};
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
Figure BDA0004136578610000071
wherein ,Sm The task size, ζ m Is an allocated computing resource;
step 3.2: modeling an optimization target;
the optimization objective is to design RIS reflection coefficient matrix theta and MEC resource allocation for different calculation tasks
Figure BDA0004136578610000072
To minimize the service time, the former would affect the transmission time, the latter would determine the computation time, taking into account that the entire MEC service period is determined by the maximum service time of all V2I links, translating the above objective into the following min-max problem:
Figure BDA0004136578610000073
Figure BDA0004136578610000074
Figure BDA0004136578610000075
wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
The specific method of the step 4 is as follows:
DDPG is an algorithm of a model-free and heterogeneous strategy off-policy's Actor-Critic architecture, wherein an Actor network is used for predicting actions, a Critic network is used for evaluating future benefits of taking the actions in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta a and θa′ Training and targeting of Critic networksNetwork parameters are respectively theta c and θc′
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network will S t and at State-action value Q as input and output state-action function value π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network when enough quaternions are accumulated in the empirical playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of Actor and Critic, the kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
wherein pi' represents the policy of the Actor target network;
the Critic training network updates the network using a mean square error MSE function, which can be expressed by:
Figure BDA0004136578610000081
Figure BDA0004136578610000082
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
Figure BDA0004136578610000083
Figure BDA0004136578610000084
the updating of the Actor and Critic target networks is as follows:
θ c′ =τ c θ c +(1-τ cc′
θ a′ =τ a θ a +(1-τ aa′
wherein ,τc and τa Is a soft update coefficient that satisfies τ ca ∈[0,1]。
The specific method in the step 5 is as follows:
step 5.1: setting a state space;
state of mth V2I link at time slot t
Figure BDA0004136578610000091
Comprising a privacy rate->
Figure BDA0004136578610000092
Residual off-load task volume- >
Figure BDA0004136578610000093
Residual calculation task quantity->
Figure BDA0004136578610000094
Occupied MEC resource amount->
Figure BDA0004136578610000095
Global channel state information->
Figure BDA0004136578610000096
It can be expressed as:
Figure BDA0004136578610000097
to sum up, the state of the mth V2I link is expressed as:
Figure BDA0004136578610000098
at time slot t, the total environment of the M V2I links can be expressed as:
Figure BDA0004136578610000099
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ tt }
wherein ,
Figure BDA00041365786100000910
is a computing resource allocation;
step 5.3: setting a reward function;
at time slot t, corresponding to current action a t Can be expressed as:
Figure BDA00041365786100000911
wherein ,
Figure BDA00041365786100000912
representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 The estimated remaining time based on the current motion, which includes the remaining transmission time and the remaining calculation time, is three cases:
(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is based on the current action, and the residual calculation time of each target vehicle adopts a future meterThe policy for average allocation of computing resources to all target vehicles is calculated, i.e. ζ min
(2) Some target vehicles are in the task unloading process, other target vehicles are in the task calculating process, for the target vehicles in the task unloading process, the residual transmission time in each user unloading process is calculated based on the current action, and the calculation resources are calculated as
Figure BDA0004136578610000101
Wherein ζ is the calculated time remaining for policy estimation min The method is the minimum calculation resource of the target vehicle in the task calculation process, and for the target vehicle in the task calculation process, the residual calculation time is only estimated based on the current action;
(3) Estimating the residual calculation time of all target vehicles based on the current actions in the task calculation process of all target vehicles;
to increase the secure transmission rate, the penalty factor is expressed as:
Figure BDA0004136578610000102
if the current action can meet the security rate requirement of the mth link
Figure BDA0004136578610000103
Then v m =0, otherwise ν m =ν * ,ν * Is a parameter which can be set manually and is a negative number;
based on the setting of the reward function, the DDPG algorithm will continually learn action strategies that are directed towards reducing the maximum safe MEC service time within given constraints, and the total cumulative rewards can be expressed as:
Figure BDA0004136578610000104
where γ is the discount factor.
The specific method of the step 6 is as follows:
step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
At each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And stores the tuple in the experience playback pool D while acquiring the state-action function Qpi from the Critic network (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Updating parameters of Critic and Actor networks by using samples with the size, and after the task amounts of all target vehicles are calculated, finishing one model training, and continuously repeating the above processes until the model training converges;
step 6.3: decision stage
And using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
The invention also provides a system for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, which comprises the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of a RIS auxiliary MEC vehicle network communication scene;
RIS-assisted secure communication module: the system is used for realizing the construction of a RIS-assisted safety communication scene, and in the module, the RIS technology provides a guarantee for the safety of dynamic vehicle communication;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing a RIS auxiliary MEC vehicle network scene;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on an optimization target;
the deep reinforcement learning model training module: the method is used for constructing a deep reinforcement learning training model, and in the model, model training is carried out on an optimization target of a RIS auxiliary MEC vehicle network scene;
a deep reinforcement learning decision model module: the method is used for realizing an RIS auxiliary MEC vehicle network decision model, and the optimal RIS coefficient matrix and MEC resource allocation in a dynamic vehicle networking scene are obtained in the module.
The invention also provides an intelligent reflection surface-assisted internet-of-vehicles safety calculation unloading device, which comprises:
a memory for storing a computer program;
and the processor is used for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method when executing the computer program.
The invention also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.
Compared with the prior art, the invention has the beneficial effects that:
1. at present, the allocation of reflection coefficient matrixes and mobile edge computing resources of an intelligent reflecting surface is optimized by utilizing a deep reinforcement learning algorithm under a dynamic scene; the scheme provided by the invention can determine a plurality of continuous optimal actions under a high-dimension continuous state space, reduce the vehicle network service delay and simultaneously provide guarantee for the safety of communication.
2. The invention regards the base station as an intelligent agent, can make decisions according to the state of the continuous change of the surroundings, has high adaptability to the scene of the Internet of vehicles with high dynamic property, and can allocate computing resources for the target vehicle as long as the target vehicle finishes task unloading, so that the idle MEC resources are effectively utilized.
3. The safety problem in the car networking scene at present is solved based on physical layer safety technology, and the method has limitation. The intelligent reflecting surface technology provided by the invention combines the physical layer security technology to realize the scheme of security service, and solves the problem that the physical layer security technology cannot resist the eavesdropping user when the eavesdropping user is closer to the base station than the target user, and the eavesdropping user and the target user have relevant channels.
4. The RIS auxiliary MEC vehicle network safety communication scene provided by the step 1 and the step 2 can be associated with an actual dynamic vehicle networking safety communication scene, provides a solution for the safety service problem in the actual scene, and has the advantage of higher applicability.
5. The deep reinforcement learning algorithm provided by the step 4 can solve the problem of complex high-dimensional continuous state space, can output continuous action values according to the continuous state space, and has the advantages of adapting to dynamic scenes and solving the problem of non-convexity.
In summary, compared with the prior art, the method has the advantages of realizing safety service by utilizing the deep reinforcement learning algorithm to solve the problem of jointly optimizing the intelligent reflecting surface and the mobile edge calculation in the dynamic scene and reducing service delay.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of an intelligent reflection surface assisted moving edge computing scenario provided by an embodiment of the present invention.
Fig. 3 is a schematic diagram of a deep reinforcement learning training model according to an embodiment of the present invention.
Fig. 4 is a diagram of simulation results of comparing and analyzing average MEC service time, MEC successful service probability and average MEC service security interruption probability with other algorithms by DDPG algorithm under different eavesdropping levels provided by the embodiment of the present invention.
Fig. 5 is a diagram of simulation results of comparing and analyzing average MEC service time and MEC successful service probability by the DDPG algorithm with other algorithms under different task ranges of the target vehicle provided by the embodiment of the invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
The invention provides an intelligent reflection surface assisted Internet of vehicles safety calculation unloading method, system, equipment and medium, which firstly models RIS assisted MEC vehicle network scenes, wherein a base station establishes a plurality of communication links with vehicle users in different sub-bands simultaneously to realize high-speed data rate transmission service, in the MEC scenes, a target vehicle with limited resources unloads calculation tasks to a Base Station (BS) provided with an MEC server through a vehicle-base station (V2I) link, the BS requests flexible allocation of MEC resources for different tasks and then feeds back the results to the target user, models RIS assisted safety communication, the communication channels obey rice distribution (Rician), all vehicles are provided with single omnidirectional antennas, and the BS is provided with K antenna uniform linear arrays. The intelligent reflective surface is a diagonal matrix with N reflective elements. Since there is no in-band interference, the BS designs beamforming in a Maximum Ratio Combining (MRC) manner for each V2I link. Secondly, in order to realize the safety service in the MEC scene, the optimization problem of minimizing the maximum MEC service time by jointly designing the RIS reflection coefficient matrix and MEC resource allocation is proposed. The optimization problem is non-convex and is also a long-term decision process with high dynamic performance, so that a deep reinforcement learning algorithm is adopted to solve, optimal MEC service is realized, the state, action and rewards of the deep reinforcement learning algorithm are designed, parameters such as position information and task quantity of a dynamic vehicle are used as basis of decision of an agent, and finally, optimal RIS reflection coefficient matrix and MEC resource allocation are obtained through training, so that safe and low-delay MEC service is realized.
As shown in fig. 1, a flow chart of a deep reinforcement learning-based intelligent reflective surface assisted internet of vehicles security computing offload scheme.
An intelligent reflection surface-assisted internet of vehicles safety calculation unloading method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, serving vehicles sending calculation service requests, and simultaneously adding an eavesdropper model for subsequent modeling and analysis; further, the specific method of the step 1 is as follows:
as shown in fig. 2, for the intelligent reflection-surface-assisted mobile edge computing scenario, the BS establishes multiple communication links with vehicle users in different orthogonal subbands simultaneously, a resource-constrained vehicle can offload its computing tasks to a BS equipped with an MEC server, the BS flexibly allocates MEC resources for different task requests, and then feeds back the results to the vehicle users. In the present invention, it is assumed that the time of the feedback delay is negligible with respect to the time required to satisfy the calculation task. Because of the limited resources at the BS, it is only possible to provide services to the vehicle that sent the computation service request, and the target vehicle that obtains the computation service is expressed as:
Figure BDA0004136578610000151
wherein, user M Representing the mth target vehicle user.
An un-serviced vehicle is considered a potential eavesdropper and can be represented as:
ε={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
Step 2: constructing a RIS-assisted secure communication scene, and laying a foundation for a communication channel used subsequently in the invention;
further, the specific method in the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as:
Figure BDA0004136578610000152
wherein φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 12 ,...,θ N ])
since there is no in-band interference, receive beamforming is designed by the max-ratio combining technique, which can be expressed as:
Figure BDA0004136578610000153
wherein ,fM Representing the beamforming vector for the mth V2I link.
Step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I link
Figure BDA0004136578610000161
Link between mth target vehicle and RIS +.>
Figure BDA0004136578610000162
The link from the mth target vehicle to the e potential eavesdropper->
Figure BDA0004136578610000163
Link between RIS to the e potential eavesdropper +.>
Figure BDA0004136578610000164
RIS to BS link->
Figure BDA0004136578610000165
The RIS to BS channel obeys the Rician distribution, expressed as:
Figure BDA0004136578610000166
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is the distance between RIS and BS, α i,b Is the path loss index of the RIS to BS link. non-LOS component
Figure BDA0004136578610000167
Follows a complex gaussian distribution with zero mean and unit variance for each element of (a). Same h m,e ,h m,b ,h m,i ,h i,e Following the Rician distribution. Kappa due to congestion effects between a crowded urban environment and a vehicle m,b and κm,e All are zero.
Step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
Figure BDA0004136578610000168
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
wherein ,
Figure BDA0004136578610000169
the uplink signal-to-interference-and-noise ratio (SINR) of the mth V2I link at the BS is given by:
Figure BDA00041365786100001610
similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
Figure BDA0004136578610000171
wherein ,
Figure BDA0004136578610000172
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
Figure BDA0004136578610000173
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the offloading process, the BS flexibly allocates the computational resources of the MEC server according to the size of the task. Each CPU cycle of the MEC server can process a certain number of data bits, assuming a total computing power of ζbit/s. In order to provide stable service, the BS aims to minimize the time of the entire MEC service while ensuring task offloading security for all users.
Step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, constructing an objective function when the model is solved, and laying a foundation for the model solution by using deep reinforcement learning subsequently;
Further, the specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
the present invention contemplates a worst case security threat where any un-serviced vehicle may eavesdrop on any V2I link. In order to protect the task data from eavesdropping, the transmitting end encodes the data and then needs to determine two code rates, namely a code rate R, before transmission b And target security rate R of confidential information S . Redundancy for protecting confidential information can therefore be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target privacy rate for confidential information.
If the capacity C of an eavesdropper e Greater than R b -R S A privacy interrupt is sent. In the present invention, we use the capacity C b Approximate R b . The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x}。
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
Figure BDA0004136578610000181
wherein ,Sm The task size, ζ m Is an allocated computing resource.
Step 3.2, optimizing target modeling;
the optimization objective of the invention is to design RIS reflection coefficient matrix Θ and MEC resource allocation for different calculation tasks
Figure BDA0004136578610000182
To minimize service time. The former will affect the transmission time, while the latter will determine the calculation time. Considering that the whole MEC service period is determined by the maximum service time of all V2I links, we translate the above objective into the following min-max problem:
Figure BDA0004136578610000183
Figure BDA0004136578610000184
Figure BDA0004136578610000185
Wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
Step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3, laying a theoretical foundation for the actual problem to be solved, and reducing the solving difficulty of the optimization problem;
further, the specific method in the step 4 is as follows:
the joint design of the RIS reflection coefficient matrix and MEC resource allocation for the entire MEC service can be modeled as a Markov Decision Process (MDP). The process consists of a number of time periods and their specific actions, each of which affects future benefits. The optimization problem of the present invention is non-convex and a long-term decision problem with high dynamics, which is difficult to represent by the mathematical expression displayed, so the present invention employs a depth-reinforced learning (DRL) algorithm of depth deterministic strategy gradient (DDPG). The algorithm can train out proper parameters according to continuous state space, so that a desired RIS coefficient matrix and MEC resource allocation are designed and obtained, and the service time is minimized.
As shown in FIG. 3, DDPG is an algorithm of an Actor-Critic architecture without model-free, heterogeneous strategy. The Actor network is used to predict an action and the Critic network is used to evaluate future benefits of taking the action in the current state. Both the Actor network and the Critic network consist of two Deep Neural Network (DNN) networks: training a network and a target network. Training and target network parameters of the Actor network are respectively theta a and θa′ The training and target network parameters of the Critic network are respectively theta c and θc′ . DDPG deep reinforcement learning training model architecture.
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network willS t and at As input and output state-action function value (state-action value) Q π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network. When enough quaternions are accumulated in the experience playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of the Actor and Critic. Kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
where pi' represents the policy of the Actor target network.
The Critic training network updates the network using a Mean Square Error (MSE) function, which can be represented by the following equation:
Figure BDA0004136578610000201
Figure BDA0004136578610000202
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
Figure BDA0004136578610000203
Figure BDA0004136578610000204
the updating of the Actor and Critic target networks is as follows:
θ c′ =τ c θ c +(1-τ cc′
θ a′ =τ a θ a +(1-τ aa′
wherein ,τc and τa Is a soft update coefficient that satisfies τ ca ∈[0,1];
Step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting the state, action and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene, and laying a foundation for obtaining a decision model subsequently;
Further, the specific method in the step 5 is as follows:
step 5.1: setting a state space;
state of mth V2I link at time slot t
Figure BDA0004136578610000205
Comprising a privacy rate->
Figure BDA0004136578610000206
Residual off-load task volume->
Figure BDA0004136578610000207
Residual calculation task quantity->
Figure BDA0004136578610000208
Occupied MEC resource amount->
Figure BDA0004136578610000209
Global channel state information->
Figure BDA00041365786100002010
It can be expressed as:
Figure BDA00041365786100002011
to sum up, the state of the mth V2I link is expressed as:
Figure BDA0004136578610000211
at time slot t, the total environment of the M V2I links can be expressed as:
Figure BDA0004136578610000212
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ tt }
wherein ,
Figure BDA0004136578610000213
is a computing resource allocation; />
Step 5.3: bonus function settings
At time slot t, corresponding to current action a t Can be expressed as:
Figure BDA0004136578610000214
wherein ,
Figure BDA0004136578610000215
representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 Is the remaining time estimated based on the current action, which contains the remaining transmission time and the remaining calculation time. There are three cases of estimating the remaining time:
(1) All target vehicles are in the process of task offloading. The remaining transmission time of each target vehicle is calculated based on the current action, and the remaining calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta min
(2) Some target vehicles are in the task unloading process, and other target vehicles are in the task calculating process. For a target vehicle in the task offloading process, calculating a remaining transmission time in each user offloading process based on the current actions, and calculating resources for
Figure BDA0004136578610000216
Wherein ζ is the calculated time remaining for policy estimation min Is the minimum computing resource of the target vehicle in the task computing process. For the target vehicle in the task calculation process, only the remaining calculation time needs to be estimated based on the current motion.
(3) All target vehicles are in the process of task calculation. The remaining calculation time of all the target vehicles is estimated based on the current motion.
To increase the secure transmission rate, the penalty factor is expressed as:
Figure BDA0004136578610000221
if the current action is able to meet the security rate requirement of the mth link
Figure BDA0004136578610000223
Then v m =0, otherwise ν m =ν * ,ν * Is a parameter that can be set manually, which is a negative number.
Based on the setting of the reward function, the DDPG algorithm will continually learn action strategies within given constraints that are directed towards reducing the maximum safe MEC service time. The total jackpot may be expressed as:
Figure BDA0004136578610000222
wherein, gamma is a discount factor;
step 6: obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step 5, and obtaining an optimal solution of the optimization problem, namely obtaining a vehicle networking safety calculation unloading scheme;
Further, the specific method in the step 6 is as follows:
step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c . Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
at each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And stores the tuple in the experience playback pool D while acquiring the state-action function Qpi from the Critic network (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Samples of the size update the parameters of Critic and Actor networks. After the task amounts of all the target vehicles have been calculated, one model training is completed. Repeating the above process until the model training converges ;
Step 6.3: a decision stage;
and using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
As shown in fig. 4, the DDPG algorithm compares and analyzes simulation results of average MEC service time, MEC successful service probability and average MEC service confidentiality interrupt probability with other algorithms at different eavesdropping levels. It can be seen that the DDPG-based method significantly reduces the average maximum MEC service time, improves the success probability of MEC, implements secure MEC service, and reduces service delay.
As shown in fig. 5, the DDPG algorithm compares and analyzes the average MEC service time and the probability of success in MEC service with other algorithms when the target vehicle is in different task ranges. As can be seen from comparison of simulation analysis graphs, the DDPG-based deep reinforcement learning algorithm can well solve the problem of high dimensionality non-convexity, can successfully learn an effective strategy in a complex and dynamic communication scene, and obtains the optimal RIS reflection coefficient and MEC resource allocation.
The invention also provides a system for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method, which comprises the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of the RIS auxiliary MEC vehicle network communication scene in the step 1;
RIS-assisted secure communication module: the method comprises the steps of (1) constructing a RIS-assisted safety communication scene in the step (2), wherein the RIS technology provides guarantee for the safety of dynamic vehicle communication in the module;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing the RIS auxiliary MEC vehicle network scene in the step 3;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on the optimization target in the step 4;
the deep reinforcement learning model training module: the method is used for realizing the construction of the deep reinforcement learning training model in the step 5, and in the model, model training is carried out on the optimization target of the RIS auxiliary MEC vehicle network scene;
a deep reinforcement learning decision model module: the method is used for realizing the RIS auxiliary MEC vehicle network decision model in the step 6, and the optimal RIS coefficient matrix and MEC resource allocation in the dynamic vehicle networking scene are obtained in the module.
The invention also provides an intelligent reflection surface-assisted internet-of-vehicles safety calculation unloading device, which comprises:
a memory for storing a computer program;
and the processor is used for realizing the intelligent reflection surface-assisted internet of vehicles safety calculation unloading method when executing the computer program.
The invention also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.

Claims (10)

1. An intelligent reflection surface-assisted internet of vehicles safety calculation unloading method is characterized in that: the method comprises the following steps:
step 1: constructing a RIS auxiliary MEC vehicle network communication scene, and simultaneously adding an eavesdropper model;
step 2: constructing a RIS-assisted secure communication scene;
step 3: modeling an optimization target of the RIS auxiliary MEC vehicle network scene constructed in the step 1, and constructing an objective function when the model is solved;
step 4: constructing a deep reinforcement learning algorithm model according to the optimization target provided in the step 3;
step 5: constructing a deep reinforcement learning training model according to the deep reinforcement learning algorithm model provided in the step 4, setting states, actions and rewards of the training model by combining the communication scenes and the objective functions in the step 1, the step 2 and the step 3, and carrying out model training on an optimization target of the RIS auxiliary MEC vehicle network communication scene;
Step 6: and (5) obtaining a RIS auxiliary MEC vehicle network decision model according to the training model in the step (5), and obtaining an optimal solution of the optimization problem, namely obtaining the vehicle networking safety calculation unloading scheme.
2. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 1 is as follows:
the BS establishes multiple communication links with vehicle users in different orthogonal sub-bands simultaneously, and the resource-constrained target vehicle can offload its computing tasks to the BS equipped with the MEC server, so as to obtain MEC computing resources, where the target vehicle obtaining computing services is expressed as:
Figure FDA0004136578600000011
wherein, user M Representing an mth target vehicle user;
an un-serviced vehicle is considered a potential eavesdropper and can be represented as:
ε={Eve 1 ,Eve 2 ,…,Eve E }
wherein ,EveE Representing the E-th potential eavesdropper.
3. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 2 is as follows:
step 2.1: let the reflection coefficient of the nth element of RIS be expressed as:
Figure FDA0004136578600000021
wherein ,φn E [0,2 pi), the RIS reflection coefficient matrix is defined as:
Θ=diag([θ 12 ,...,θ N ])
by the absence of in-band interference, receive beamforming is designed by the maximum ratio combining technique, which can be expressed as:
Figure FDA0004136578600000022
wherein ,fM A beamforming vector representing an mth V2I link;
step 2.2: modeling a communication channel;
in an MEC vehicle network, the channels include: mth V2I link
Figure FDA0004136578600000023
Link between mth target vehicle and RIS +.>
Figure FDA0004136578600000024
The link from the mth target vehicle to the e potential eavesdropper->
Figure FDA0004136578600000025
Link between RIS to the e potential eavesdropper +.>
Figure FDA0004136578600000026
RIS to BS link->
Figure FDA0004136578600000027
The RIS to BS channel obeys the Rician distribution, expressed as:
Figure FDA0004136578600000028
/>
wherein ,κi,b Is a Rician factor, ρ is a reference distance d 0 Path loss at =1m, d i,b Is the distance between RIS and BS, α i,b For the path LOSs index of the RIS to BS link, non-LOS component
Figure FDA0004136578600000029
Follows a complex gaussian distribution with zero mean and unit variance, the same h m,e ,h m,b ,h m,i ,h i,e Following Rician distribution, κ due to congestion urban environments and blocking effects between vehicles m,b and κm,e All are zero;
step 2.3: modeling a signal receiving process;
the mth V2I link received signal at the BS can be expressed as:
Figure FDA00041365786000000210
wherein ,Pm Is the transmission power of the mth target vehicle s m Representing unit energy signal samples associated with a computational task, noise vector n m Can be expressed as:
n m =[n 1 ,...n K ] T
wherein ,
Figure FDA0004136578600000031
the uplink signal-to-interference-and-noise ratio SINR of the mth V2I link at BS is given by:
Figure FDA0004136578600000032
Similarly, the eavesdropping signal of the mth V2I link at the ith eavesdropping vehicle is expressed as:
Figure FDA0004136578600000033
wherein ,
Figure FDA0004136578600000034
the SINR of the mth V2I link at the e-th eavesdropping vehicle can be expressed as:
Figure FDA0004136578600000035
thus, the capacity of the mth V2I link and the eavesdropping capacity of the e-th eavesdropping vehicle to the mth V2I link can be expressed as:
C m =log(1+η m )
C e,m =log(1+η e,m )
in the MEC vehicle network, once the user completes the unloading process, the BS flexibly allocates the computing resources of the MEC server according to the task size, and each CPU cycle of the MEC server can process a certain number of data bits, assuming that the total computing power is ζbit/s.
4. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 3 is as follows:
step 3.1: modeling a safety process;
any non-serviced vehicle may tap any V2I link, and to protect the mission data from being tapped, the redundancy for protecting confidential information may be expressed as:
max{0,R b -R S }
wherein ,Rb For the code word rate, R S Target security rate for confidential information;
if the capacity C of an eavesdropper e Greater than R b -R S Will send a security interrupt, using capacity C b Approximate R b The secure transmission rate of the mth V2I link can thus be expressed as:
R S,m =[0,(C m -maxC e,m )] + ,e∈ε
wherein ,[x]+ =max{0,x};
The MEC service time (offload and computation time) of the mth V2I link can be expressed as:
Figure FDA0004136578600000041
wherein ,Sm The task size, ζ m Is an allocated computing resource;
step 3.2: modeling an optimization target;
the optimization objective is to design RIS reflection coefficient matrix theta and MEC resource allocation for different calculation tasks
Figure FDA0004136578600000042
To minimize the service time, the former would affect the transmission time, the latter would determine the computation time, taking into account that the entire MEC service period is determined by the maximum service time of all V2I links, translating the above objective into the following min-max problem:
Figure FDA0004136578600000043
C 1 :
Figure FDA0004136578600000044
C 2 :
Figure FDA0004136578600000045
wherein constraint C1 represents the sum of the computing resources allocated to different target vehicles as a fixed value, and constraint C2 represents the modulus constraint of the RIS reflection coefficient as a unit modulus.
5. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 4 is as follows:
DDPG is an algorithm of an Actor-Critic architecture of model-free and heterogeneous policy off-policyThe network is used for predicting the action, the Critic network is used for evaluating the future benefits of taking the action in the current state, and the Actor network and the Critic network are composed of two deep neural network DNN networks: training network and target network, training of the Actor network and target network parameters are respectively theta a and θa′ The training and target network parameters of the Critic network are respectively theta c and θc′
At time slot t, the Actor trains the network to S t As input, and output action a t Critic training network will S t and at State-action value Q as input and output state-action function value π (S t ,a t ∣θ c ) It can be expressed as:
Q π (S t ,a t ∣θ c )=E π [R t ∣S t ,a t ,π]
wherein E [. Cndot.]Representing the desired function, pi represents the strategy of the Actor training network when enough quaternions are accumulated in the empirical playback pool D (S t ,a t ,r t ,S t+1 ) When the model optimizer is running, the model optimizer randomly extracts the size N from the experience playback pool d To update the training network of Actor and Critic, the kth tuple y k The target state-action function value Q' of (2) can be expressed as:
y k =r k +γQ′ π′ (S k+1 ,π′(S k+1 ∣θ a′ )∣θ c′ )
wherein pi' represents the policy of the Actor target network;
the Critic training network updates the network using a mean square error MSE function, which can be expressed by:
Figure FDA0004136578600000051
Figure FDA0004136578600000052
the Actor training network uses deterministic policy gradient functions to update the network, which can be expressed as:
Figure FDA0004136578600000053
Figure FDA0004136578600000054
the updating of the Actor and Critic target networks is as follows:
θ c′ =τ c θ c +(1-τ cc′
θ a′ =τ a θ a +(1-τ aa′
wherein ,τc and τa Is a soft update coefficient that satisfies τ ca ∈[0,1]。
6. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method in the step 5 is as follows:
Step 5.1: setting a state space;
state of mth V2I link at time slot t
Figure FDA0004136578600000061
Comprising a privacy rate->
Figure FDA0004136578600000062
Residual off-load task volume->
Figure FDA0004136578600000063
Residual calculation task quantity->
Figure FDA0004136578600000064
Occupancy ofMEC resource amount->
Figure FDA0004136578600000065
Global channel state information->
Figure FDA0004136578600000066
It can be expressed as:
Figure FDA0004136578600000067
to sum up, the state of the mth V2I link is expressed as:
Figure FDA0004136578600000068
at time slot t, the total environment of the M V2I links can be expressed as:
Figure FDA0004136578600000069
step 5.2: setting an action space;
based on the current state S t The BS will design the RIS phase shift matrix and MEC resource allocation, and at each time slot t, the action space can be expressed as:
a t ={Θ tt }
wherein ,
Figure FDA00041365786000000610
is a computing resource allocation;
step 5.3: setting a reward function;
at time slot t, corresponding to current action a t Can be expressed as:
Figure FDA00041365786000000611
wherein ,
Figure FDA00041365786000000612
representing the secure MEC service time of the mth V2I link at time slot t, t m,1 Is the current time spent, t m,2 The estimated remaining time based on the current motion, which includes the remaining transmission time and the remaining calculation time, is three cases:
(1) All target vehicles are in the task unloading process, the residual transmission time of each target vehicle is calculated based on the current action, and the residual calculation time of each target vehicle is calculated by adopting a strategy of evenly distributing calculation resources to all target vehicles, namely zeta min
(2) Some target vehicles are in the task unloading process, other target vehicles are in the task calculating process, for the target vehicles in the task unloading process, the residual transmission time in each user unloading process is calculated based on the current action, and the calculation resources are calculated as
Figure FDA0004136578600000071
Wherein ζ is the calculated time remaining for policy estimation min The method is the minimum calculation resource of the target vehicle in the task calculation process, and for the target vehicle in the task calculation process, the residual calculation time is only estimated based on the current action;
(3) Estimating the residual calculation time of all target vehicles based on the current actions in the task calculation process of all target vehicles;
to increase the secure transmission rate, the penalty factor is expressed as:
Figure FDA0004136578600000072
if the current action can meet the security rate requirement of the mth link
Figure FDA0004136578600000073
Then v m =0, otherwise ν m =ν * ,ν * Is a parameter which can be set manually and is a negative number;
based on the setting of the reward function, the DDPG algorithm will continually learn action strategies that are directed towards reducing the maximum safe MEC service time within given constraints, and the total cumulative rewards can be expressed as:
Figure FDA0004136578600000074
where γ is the discount factor.
7. The intelligent reflective surface assisted internet of vehicles security computing offload method of claim 1, wherein: the specific method of the step 6 is as follows:
Step 6.1: initializing;
randomly initializing parameters theta of an Actor and Critic training network a 、θ c Parameter theta of the Actor target network a′ Initialized to θ a Parameter theta of Critic target network c′ Initialized to θ c Clearing the experience playback pool D;
step 6.2: training;
randomly initializing the positions of a target vehicle and a eavesdropping vehicle, and initializing the task quantity of the target vehicle for requesting service;
at each time slot t, the BS interacts with the dynamic environment to obtain a state S t Based on the current state, the BS obtains action a from the Actor network of the Mth V2I link t Setting a reflection coefficient matrix and MEC resource allocation for the target vehicle;
BS obtains the state S of the next time slot t+1 from the changing environment t+1 And calculates the action a being made t Rewards r obtained from the environment t
The state, action, and prize in the above process are stored as tuples (S t ,a t ,r t ,S t+1 ) And combine the tuplesStored in the experience playback pool D while acquiring the state-action function Q from the Critic network π (S t ,a t ∣θ c );
When there are enough tuples in the experience playback pool, N is taken from them d Updating parameters of Critic and Actor networks by using samples with the size, and after the task amounts of all target vehicles are calculated, finishing one model training, and continuously repeating the above processes until the model training converges;
Step 6.3: a decision stage;
and using the training convergence decision model in a random dynamic vehicle network scene, deciding an optimal RIS reflection coefficient matrix and MEC resource allocation in each time slot, minimizing the maximum MEC service time in the whole process, and finally obtaining the optimal solution of the optimization target.
8. A system for implementing an intelligent reflective surface assisted internet of vehicles secure computing offload method as defined in any one of claims 1 to 7, characterized by: comprising the following steps:
RIS auxiliary MEC vehicle network communication module: the system comprises a base station, a dynamic vehicle, a communication module and a communication module, wherein the base station is used for realizing the construction of a RIS auxiliary MEC vehicle network communication scene;
RIS-assisted secure communication module: the system is used for realizing the construction of a RIS-assisted safety communication scene, and in the module, the RIS technology provides a guarantee for the safety of dynamic vehicle communication;
secure computing service optimization objective module: the method comprises the steps of constructing an optimization target for realizing a RIS auxiliary MEC vehicle network scene;
the deep reinforcement learning algorithm selection module: the method is used for realizing the construction of a deep reinforcement learning algorithm model based on an optimization target;
the deep reinforcement learning model training module: the method is used for constructing a deep reinforcement learning training model, and in the model, model training is carried out on an optimization target of a RIS auxiliary MEC vehicle network scene;
A deep reinforcement learning decision model module: the method is used for realizing an RIS auxiliary MEC vehicle network decision model, and the optimal RIS coefficient matrix and MEC resource allocation in a dynamic vehicle networking scene are obtained in the module.
9. An intelligent reflective surface assisted internet of vehicles secure computing offload device, characterized in that: comprising the following steps:
a memory for storing a computer program;
a processor for implementing an intelligent reflective surface assisted internet of vehicles security computing offload method as claimed in any one of claims 1-8 when executing said computer program.
10. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program that when executed by a processor is capable of computing and offloading an intelligent reflective surface-assisted internet of vehicles security.
CN202310276875.3A 2023-03-21 2023-03-21 Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium Pending CN116208619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310276875.3A CN116208619A (en) 2023-03-21 2023-03-21 Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310276875.3A CN116208619A (en) 2023-03-21 2023-03-21 Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN116208619A true CN116208619A (en) 2023-06-02

Family

ID=86519214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310276875.3A Pending CN116208619A (en) 2023-03-21 2023-03-21 Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116208619A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116963183A (en) * 2023-07-31 2023-10-27 中国矿业大学 Mine internet of things safe unloading method assisted by intelligent reflecting surface
CN117156494A (en) * 2023-10-31 2023-12-01 南京邮电大学 Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication
CN118042493A (en) * 2024-04-11 2024-05-14 华东交通大学 Internet of vehicles perception communication calculation joint optimization method based on reflecting element

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116963183A (en) * 2023-07-31 2023-10-27 中国矿业大学 Mine internet of things safe unloading method assisted by intelligent reflecting surface
CN116963183B (en) * 2023-07-31 2024-03-08 中国矿业大学 Mine internet of things safe unloading method assisted by intelligent reflecting surface
CN117156494A (en) * 2023-10-31 2023-12-01 南京邮电大学 Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication
CN117156494B (en) * 2023-10-31 2024-01-19 南京邮电大学 Three-terminal fusion task scheduling model and method for RIS auxiliary wireless communication
CN118042493A (en) * 2024-04-11 2024-05-14 华东交通大学 Internet of vehicles perception communication calculation joint optimization method based on reflecting element

Similar Documents

Publication Publication Date Title
Fadlullah et al. HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN116208619A (en) Intelligent reflection surface-assisted Internet of vehicles safety calculation unloading method, system, equipment and medium
Chen et al. Intelligent ubiquitous computing for future UAV-enabled MEC network systems
CN109617584B (en) MIMO system beam forming matrix design method based on deep learning
CN109068391B (en) Internet of vehicles communication optimization algorithm based on edge calculation and Actor-Critic algorithm
Shang et al. Deep learning-assisted energy-efficient task offloading in vehicular edge computing systems
Hua et al. Reconfigurable intelligent surface for green edge inference in machine learning
CN114143346B (en) Joint optimization method and system for task unloading and service caching of Internet of vehicles
Shi et al. A novel deep Q-learning-based air-assisted vehicular caching scheme for safe autonomous driving
Zhang et al. Energy-efficient power control in wireless networks with spatial deep neural networks
CN110856259A (en) Resource allocation and offloading method for adaptive data block size in mobile edge computing environment
Huang et al. Dynamic compression ratio selection for edge inference systems with hard deadlines
Ji et al. Reconfigurable intelligent surface enhanced device-to-device communications
Dai et al. Deep reinforcement learning for edge computing and resource allocation in 5G beyond
CN113115451A (en) Interference management and resource allocation scheme based on multi-agent deep reinforcement learning
CN112788764A (en) Method and system for task unloading and resource allocation of NOMA ultra-dense network
Han et al. Random caching optimization in large-scale cache-enabled Internet of Things networks
Lee et al. Robust transmit power control with imperfect csi using a deep neural network
Lakew et al. Adaptive partial offloading and resource harmonization in wireless edge computing-assisted ioe networks
Mahmoud et al. Federated learning resource optimization and client selection for total energy minimization under outage, latency, and bandwidth constraints with partial or no CSI
Gupta et al. LSTM-based energy-efficient wireless communication with reconfigurable intelligent surfaces
Su et al. Semantic communication-based dynamic resource allocation in d2d vehicular networks
Hwang et al. Deep reinforcement learning approach for uav-assisted mobile edge computing networks
Jiao et al. Deep reinforcement learning-based optimization for RIS-based UAV-NOMA downlink networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination