CN114827947A

CN114827947A - Internet of vehicles safety calculation unloading and resource allocation method, computer equipment and terminal

Info

Publication number: CN114827947A
Application number: CN202210253563.6A
Authority: CN
Inventors: 俱莹; 曹植伟; 陈宇超; 王浩宇; 刘雷; 裴庆祺; 王励成
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-07-29

Abstract

The invention belongs to the technical field of Internet of vehicles edge computing, and discloses an Internet of vehicles safety computing unloading and resource allocation method, computer equipment and a terminal. Firstly, modeling an optimization problem into a multi-agent sequential decision problem, and solving by using a reinforcement learning method. Since the dqn (deep Q learning) method has over-estimation problem, the Q value is overestimated and the performance is reduced. Therefore, DDQN (Dual deep Q learning) method is adopted to train the multi-agent model. The dynamic process of the vehicle is modeled by using the queuing theory, so that the scene is closer to the actual scene. This approach enables the user to select a reasonable strategy that minimizes the maximum delay among all vehicles.

Description

Internet of vehicles safety calculation unloading and resource allocation method, computer equipment and terminal

Technical Field

The invention belongs to the technical field of vehicle networking edge computing, and particularly relates to a vehicle networking safety computing unloading and resource allocation method, computer equipment and a terminal.

Background

With the continuous progress of technology and the increasing demand, the application of big data to the internet of vehicles prompts vehicles to generate more and more delay-sensitive tasks to support new services including traffic flow prediction, and there are two methods for dealing with the problem: one is to enhance the computing power of the onboard chip so that it can handle these tasks. Another approach is to use moving edge computing techniques to handle the task. The mobile edge computing technology utilizes the wireless access network to provide the required service and cloud computing function for the user nearby, so as to create a communication service environment with high performance, low delay and high bandwidth. The mobile edge computing technology can effectively solve the problem of insufficient computing capability of the vehicle, however, due to the open characteristic of the wireless channel, the computing unloading process has a risk of information leakage, and the physical layer security technology utilizes the characteristic of the wireless channel to protect the privacy of the user, such as: signal processing, channel coding, multi-antenna modulation, etc. With the application of big data in the scene of internet of vehicles, the contradiction between the transmission of mass data and limited spectrum resources is increasingly prominent, and the emerging spectrum sharing technology can remarkably improve the utilization rate of the spectrum and save the spectrum resources while ensuring the normal communication requirements of users.

In the existing research, the combination of physical layer security technology and spectrum sharing technology in a vehicle edge computing network is not researched yet. On one hand, the network topology structure is changed rapidly due to the high-speed movement of the vehicle, so that the conventional scheme cannot be used for rapid decision making, and on the other hand, the safety scheme of the computing network on the edge of the vehicle is considered to be difficult to meet the requirement of ultra-low time delay. Due to the fact that the vehicle networking scene is complex due to high-speed movement of the vehicle and more eavesdroppers, the traditional mathematical optimization method is difficult to adapt to the dynamic complex vehicle networking scene, and therefore the complex dynamic optimization problem needs to be solved by means of decision making and learning capacity of Deep Reinforcement Learning (DRL). Based on this, a transmission scheme (SoRA) for security offloading and resource allocation based on Deep Reinforcement Learning (DRL) is designed for a multi-user communication scene of the internet of vehicles, so that the method is fast suitable for complex and dynamic communication environments, and can reduce service delay to the maximum extent while ensuring the communication security of individual users.

In a practical internet of vehicles multi-user service scenario, multiple users may compete for the same segment of premium spectrum resources or the same edge server computing resources, which may lead to the problem of competing gaming. Therefore, how to organically combine the frequency band selection and the edge server selection reduces the overall service delay while considering the vehicle power; how to adapt to the rapid change of a dynamic scene in the internet of vehicles and solve the problem of multi-eavesdropper security, and meeting the service requirement in the dynamic scene of the internet of vehicles is the problem to be solved in the development of the communication calculation unloading technology of the internet of vehicles. The disadvantage of using mathematical derivation to ensure physical layer safety research is that it can be performed in static scenes and cannot adapt to high-speed dynamic scenes in the internet of vehicles. The existing research on the security of the physical layer of the internet of vehicles still only considers one static eavesdropper, but in an actual scene, a plurality of eavesdroppers often exist. Once the above problem can be solved, the communication can be secured in the vehicle edge calculation.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the traditional optimization method is difficult to adapt to complex dynamic vehicle networking computing unloading scenes, and cannot meet the requirements of high-reliability and high-speed data transmission service.

(2) Due to selfish preference of vehicle nodes, part of vehicles tend to minimize service delay of the vehicles, but service delay of the rest of vehicles exceeds a tolerable range, so that overall performance of the system is reduced.

(3) In the prior art, only a single static eavesdropper is considered, and a plurality of dynamic eavesdroppers are necessarily arranged in an actual scene, so that the safety of the calculation unloading process is difficult to ensure.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method for unloading and allocating resources in the safety calculation of the Internet of vehicles, computer equipment and a terminal.

The invention is realized in such a way, and provides a method for calculating, unloading and resource allocation of the safety of the Internet of vehicles, which can effectively break through the limitation of a static scene and realize the real-time decision of the safety calculation and unloading of the Internet of vehicles. The dynamic course of the vehicle is first modeled using queuing theory and there are multiple dynamic eavesdroppers in the scene. Secondly, modeling the optimization problem into a multi-agent sequential decision problem, and performing multi-agent training solution by using a DDQN reinforcement learning method, so that a user can select a reasonable strategy to minimize the maximum service delay in all vehicles while performing safe unloading. The invention actively promotes the cooperation among the nodes of the Internet of vehicles, meets the communication requirements of ultra-low time delay, high safety and high reliability, and can adapt to dynamic Internet of vehicles scenes. Further, the internet of vehicles safety calculation unloading and resource allocation method comprises the following steps:

the method comprises the steps that firstly, a vehicle networking communication scene of a single base station is constructed, and the base station is connected to an edge server to provide calculation unloading service; a dynamic car networking scene is set up for the method, so that subsequent modeling and analysis are facilitated.

Secondly, modeling a communication process for transmission processes of different links; lays a foundation for the communication channel used subsequently by the invention.

Thirdly, modeling an optimized target by utilizing a Wyner eavesdropping coding scheme; the method and the device lay a foundation for calculating the eavesdropping rate of an eavesdropper and training the model.

Fourthly, the base station obtains the state information of the current moment through the action of the surrounding environment information, wherein the state information comprises the information of the target vehicle, including the vehicle speed, the position coordinate, the current state, the frequency band allocation information and the resource allocation information on the edge server, and the information is used as the state input of the deep reinforcement learning, and the DDQN algorithm is used for the deep reinforcement learning; a state space is determined for the training of the following agents.

Fifthly, selecting corresponding actions by the vehicle based on the current state information; the current state action is power selection, frequency band selection and edge server calculation resource block selection; an action space of the agent is determined.

Sixthly, designing a reward mechanism and a structure of the neural network according to the model and the strategy constructed in the second step; the reward mechanism is designed so that the user vehicles in the system can cooperate better to minimize the maximum latency.

Seventhly, extracting input characteristics of the current state by using the DDQN neural network in the fifth step, fitting a Q function to obtain Q values of different actions in various input states, selecting the action in the current state according to an element-greedy strategy, training and updating neural network parameters by combining a reward mechanism in the fifth step, and mainly updating the neural network parameters; and updating the neural network parameters.

Eighthly, using the trained DDQN network, taking the state information of the current environment as state input, outputting a Q value sequence adopting corresponding actions in the current state, and taking the action with the maximum Q value as a strategy for selecting power, frequency band and edge server computing resources of the target vehicle in the current state; and guarantees are provided for the convergence property and the convergence time of the model.

Further, the first step process is as follows: the arrival process of the vehicle is modeled by using a queuing theory, the arrival time interval t of the vehicle obeys a negative exponential score, and a probability density function is as follows:

where λ is the vehicle arrival rate and t is the time interval between vehicle arrivals.

Further, the process of the second step is as follows:

2.1 during communicationIn, channel gain g between transmitting end and receiving end _k By large scale fading a _k And small scale fading component h _k ：

g _k ＝α _k h _k ；

2.2 Large Scale fading h _k Consisting of path LOSs and shadow fading, the path LOSs of V2V is divided into LOS and NLOS cases, where:

wherein f is _c Is the carrier frequency, d is the distance, d _BP Is an effective distance, h ₀ And h ₁ The path loss in the case of NLOS is:

PL _Nlos (d ₁ ，d ₂ )＝PL _los (d ₁ )+20-12.5n _j +10n _j log ₁₀ d ₂ +3log ₁₀ (f _c /5)

wherein n is _j ＝max(2.8-0.0024d ₁ ，1.84)，d ₁ And d ₂ Representing the length and width of each road grid in a Manhattan grid layout;

shadow fading of V2V:

where D is the updated distance matrix, D _corr 10, N on the general city road _S (n) is an M x M matrix, which is a normally distributed matrix expected to be 0 with variance of 1;

path loss PL of V2I _V2I ＝a+blog ₁₀ R, wherein R represents the distance between the vehicle and the base station, and a and b are path loss parameters related to the scene; shadow fading of V2I:

wherein D _i Matrix representing the updated distance of the ith vehicle user, D _corr Is 50, R is an M x M matrix with the diagonal lines being k and the remaining elements being k/2, N _i (n) a Mx1 matrix for the ith vehicle user, which is a normal distribution matrix with 0 and 1 variance desired;

2.3 the rate of the offload link from the kth vehicle user to the base station via the mth sub-channel is

Where W is the bandwidth of the channel and,

expressed as the signal-to-noise ratio:

wherein

Represents the power from the kth vehicle user to the base station, g _k，B [m]Representing the channel gain, σ, of the k-th vehicle to the base station on the m-th frequency band ² The representation of the noise is represented by,

is the interference experienced in the unloading of the kth user vehicle,

transmission power g representing that m-th vehicle performs V2V communication _m，B [m]Represents an interference channel gain, ρ, caused to V2I communication by the m-th vehicle for V2V communication _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used;

2.4 nth eavesdropper on mth sub-bandRate of eavesdropping of k vehicle users

Expressed as:

wherein

Represents the power of the k-th vehicle user, g _k，n [m]Indicates the channel gain, σ, of the k-th vehicle to the eavesdropper on the m-th band ² The representation of the noise is represented by,

is the disturbance suffered during the eavesdropping,

transmission power g representing that m-th vehicle performs V2V communication _m，n [m]Denotes the channel gain, ρ, of the m-th vehicle in V2V communication with the eavesdropper _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used.

Further, the process of the third step is as follows:

3.1 secure offload Rate expressed as

v _e Representing all eavesdroppers.

3.2 time of transmission of kth vehicle user to base station

Wherein B is _k Which represents the size of the computing task,

representing the secure offload rate, the time the task computed on the edge computing server:

wherein B is _k Representing the size of the computational task, z _k [j]1 means that the jth resource block is allocated to the kth vehicle user for use, z _k [j]0 means that the jth resource block is not allocated to the kth vehicle user for use, N _c，j Indicates the total number of edge server processing cores, u _E Representing the processing rate of each core; the total time delay

3.3 minimize the maximum service delay among all vehicles, the objective function is:

Subject to：

C ₁ ：

C ₂ ：

C ₃ ：

C ₄ ：

C ₅ ：

wherein N is _u Representing the total number of service vehicles, N _b Representing edge server resource blocks, N _c Representing the total number of processing cores and processing power of the MEC server, N _p Represents a selectable amount of vehicle power,

meaning that the kth vehicle user selects the ith power as the transmit power, otherwise

C ₁ Ensuring that the total number of processing cores does not exceed the core number of edge servers, C ₂ ，C ₃ ，C ₄ Three constraints ensure that each vehicle user can only select one frequency band, one transmitting power and one calculation resource block, C ₅ And the decision variables of the optimization target are designated as binary variables.

Further, the process of the fifth step is as follows:

5.1 the motion space can be represented by a three-dimensional coordinate, wherein the x axis represents the selection of a frequency band, the y axis represents the selection of the vehicle emission power, and the z axis represents the selection of a computing resource block on the edge server; let the frequency band select have N _a The power of the vehicle is selected to be N _p The edge server resource block is selected to have N _b In this way, the action for any vehicle needing service may be N _a ×N _b ×N _p ；

And 5.2, balancing the training process and the exploration process by adopting an element-greedy strategy, and at the time t, selecting the action with the maximum Q value by the base station according to the probability of 1-element, and selecting one action from the state space A according to the probability of element.

Further, the process of the sixth step is as follows:

6.1 dividing rewards into N according to time of service delay _w Grading;

6.2 when the calculated unload rate is too low, there is a large delay with a reward of 0.

Further, the neural network training process of the seventh step is as follows:

7.1 initializing environmental information and Q network parameters to generate vehicle operation data;

7.2 in each training round, updating and acquiring the current vehicle position and the environmental state, resetting the frequency band power selection and the edge server resource allocation strategy;

7.3 selecting an action for the target vehicle according to the current state information and a greedy algorithm, namely a combination scheme of frequency band selection, vehicle power and edge server resource allocation, and updating the environment information;

7.4 obtaining the action combination scheme of all target vehicles, obtaining the reward value r related to the capacity _c,i And the returned prize value r _t }；

7.5 storing the state, action, reward and next state at time t as a sample in the experience pool,

7.6 when the number of samples in the experience pool is large enough, training the model is started, and small batches of samples(s) are randomly drawn from the experience pool _t ,a _t ,r _t ,s _t+1 ) And training network parameters and updating the target network weight.

Another object of the present invention is to provide a computer device, comprising a memory and a processor, the memory storing a computer program, the computer program, when executed by the processor, causing the processor to perform the steps of the car networking security calculation offloading and resource allocation method.

Another objective of the present invention is to provide an information data processing terminal, where the information data processing terminal is configured to execute the steps of the method for calculating, uninstalling and allocating resources in the internet of vehicles.

In combination with the above technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:

first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:

the invention can overcome the uncertainty caused by the movement of multiple eavesdroppers and vehicles, and reduces the service delay while considering the safety of the physical layer to ensure the communication safety. Firstly, modeling an optimization problem into a multi-agent sequential decision problem, and solving by using a reinforcement learning method. Since the dqn (deep Q learning) method has over-estimation problem, the Q value is overestimated and the performance is reduced. Therefore, DDQN (Dual deep Q learning) method is adopted to train the multi-agent model. The dynamic process of the vehicle is modeled by using the queuing theory, so that the scene is closer to the actual scene. This approach enables the user to select a reasonable strategy that minimizes the maximum delay among all vehicles.

Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows: the invention researches the service problem of multi-user multi-eavesdroppers under the condition that the vehicle needs to calculate and unload, provides a DRL-based SoRA strategy through design, and can help the vehicle to quickly make an optimal strategy according to the current environment so as to minimize service delay. In the model, the problems of high-speed moving characteristics of a vehicle, competition in the processes of frequency band selection and edge server resource block selection, interference under a multi-user scene and the like are considered. The simulation result of the model shows that the method provided by the invention can reduce the overall delay of calculation unloading service of the vehicle and improve the communication safety and the like.

Third, as an inventive supplementary proof of the claims of the present invention, there are also presented several important aspects:

the technical scheme of the invention fills the technical blank in the industry at home and abroad:

the invention provides a safety calculation unloading and resource allocation method, which can effectively break through the limitation of a static scene and realize dynamic real-time decision of the Internet of vehicles. Meanwhile, the invention can solve the problem of the increase of the overall time delay of the network caused by the selfish preference among the nodes in the prior art, effectively stimulates the mutual cooperation among vehicles, thereby minimizing the maximum service time delay in a network system, simultaneously ensuring the safety of calculation and unloading in consideration of a plurality of dynamic eavesdroppers, meeting the communication requirements of ultra-low time delay, high reliability and high safety of the communication of the Internet of vehicles, enabling the communication to adapt to dynamic and complex communication and edge calculation scenes of the Internet of vehicles, filling the blank of the Internet of vehicles industry at home and abroad and promoting the landing of edge calculation service.

Drawings

Fig. 1 is a flowchart of a method for offloading and allocating resources for security calculation in an internet of vehicles according to an embodiment of the present invention.

Fig. 2 is a flowchart of an implementation of the method for offloading security computation and resource allocation in the internet of vehicles according to the embodiment of the present invention.

Fig. 3 is a schematic diagram of a millimeter wave multi-user communication scene of the internet of vehicles according to the embodiment of the invention.

Fig. 4 is a schematic diagram of a DDQN network according to an embodiment of the present invention.

Fig. 5 is a comparison diagram of system performance and vehicle performance under different traffic patterns according to different schemes provided by the embodiment of the invention.

Fig. 6 is a schematic diagram of average connection probabilities under different capacity threshold limits according to different solutions provided in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

First, an embodiment is explained. This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.

As shown in fig. 1, the method for secure computation offloading and resource allocation in the internet of vehicles provided by the present invention includes the following steps:

s101: constructing a vehicle networking communication scene of a single base station, wherein the base station is connected to an edge server to provide calculation unloading service;

s102: aiming at the transmission processes of different links, modeling is carried out on the communication process and the like;

s103: improving the security of the edge computing network of the Internet of vehicles by utilizing the Wyner interception coding scheme, and modeling an optimization target;

s104: the base station acquires the state information of the current moment through the action with the surrounding environment information, wherein the state information comprises the information (including vehicle speed, position coordinates and current state) of a target vehicle, frequency band allocation information and resource allocation information on an edge server are used as the state input of deep reinforcement learning, and the deep reinforcement learning uses a DDQN algorithm;

s105: based on the current state information, the vehicle selects a corresponding action; the current state action is power selection, frequency band selection and edge server calculation resource block selection;

s106: designing a reward mechanism and a structure of a neural network according to the model and the strategy constructed in the S102;

s107: extracting input characteristics of the current state by using the DDQN neural network in the S105, fitting a Q function to obtain Q values of different actions in various input states, selecting the action in the current state according to an Ee-greedy strategy, and training and updating neural network parameters by combining with a reward mechanism in the S105;

s108: and by utilizing the trained DDQN network, taking the state information of the current environment as state input, outputting a Q value sequence adopting corresponding actions in the current state, and taking the action with the maximum Q value as a strategy for power selection, frequency band selection and edge server calculation resource selection of the target vehicle in the current state.

The procedure at step S101 is as follows: the arrival process of the vehicle is modeled by using a queuing theory, the arrival time interval t of the vehicle obeys a negative exponential score, and a probability density function is as follows:

The process at step S102 is as follows:

s2.1 in the course of communication, the channel gain g between the transmitting end and the receiving end _k By large scale fading a _k And small scale fading component h _k 。

g _k ＝α _k h _k ；

S2.2 Large Scale fading h _k Consisting of path loss and shadow fading. The path LOSs of V2V is divided into LOS and NLOS cases. In the LOS case:

wherein f is _c Is the carrier frequency, d is the distance, d _BP Is an effective distance, h ₀ And h ₁ The path loss in the case of NLOS is the height of the vehicle:

wherein n is _j ＝max(2.8-0.0024d ₁ ，1.84)，d ₁ And d ₂ Indicating the length and width of each road grid in the manhattan grid layout.

Shadow fading of V2V:

where D is the updated distance matrix, D _corr 10, N on the general city road _S (n) is an M x M matrix, which is a normally distributed matrix with variance of 1, expected to be 0.

Path loss PL of V2I _V2I ＝a+blog ₁₀ R, wherein R represents the distance between the vehicle and the base station, and a and b are path loss parameters related to the scene. Of V2IShadow fading:

wherein D _i Matrix representing the updated distance of the ith vehicle user, D _corr Is 50, R is an M x M matrix with the diagonal lines being k and the remaining elements being k/2, N _i (n) is the Mx1 matrix generated by the ith vehicle user, which is a normal distribution matrix expected to be 0 with a variance of 1.

S2.3 the rate of the unloading link from the kth vehicle user to the base station via the mth sub-channel is

Where W is the bandwidth of the channel and,

expressed as the signal-to-noise ratio:

it can be used for the treatment of cattle

is the interference experienced in the unloading of the kth user vehicle,

transmission power g representing that m-th vehicle performs V2V communication _m，B [m]Interference to V2I communication indicating that the m-th vehicle performs V2V communicationChannel gain, p _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used;

s2.4 Rate of an nth eavesdropper eavesdropping on a kth vehicle user on an mth sub-band

Is shown as

Wherein

is the disturbance experienced during the eavesdropping process,

The process at step S103 is as follows:

s3.1 secure offload Rate expressed as

v _e Representing all eavesdroppers.

S3.2 time of transmission of kth vehicle user to base station

Wherein B is _k Which represents the size of the computational task or tasks,

representing the secure offload rate. Time calculated by task on edge calculation server:

wherein B is _k Representing the size of the computational task, z _k [j]1 means that the jth resource block is allocated to the kth vehicle user for use, z _k [j]0 means that the jth resource block is not allocated to the kth vehicle user for use, N _c，j Indicates the total number of edge server processing cores, u _E Representing the processing rate of each core. The total delay

S3.3 minimizes the maximum service delay among all vehicles, with an objective function of:

Subject to：

C ₁ ：

C ₂ ：

C ₃ ：

C ₄ ：

C ₅ ：

C ₁ Ensuring that the total number of processing cores does not exceed the core number of edge servers, C ₂ ，C ₃ ，C ₄ Three constraints ensure that each vehicle user can only select one frequency band, one transmission power and one calculation resource block. C ₅ And the decision variables of the optimization target are designated as binary variables.

The process at step S105 is as follows:

s5.1 the motion space can be represented using a three-dimensional coordinate, with the x-axis representing band selection, the y-axis representing vehicle transmit power selection, and the z-axis representing selection of computing resource blocks on the edge server. Let the frequency band select have N _a The power of the vehicle is selected to be N _b In the method, the edge server resource block is selected to have N _p In this way, the action for any vehicle needing service may be N _a ×N _b ×N _p 。

S5.2, adopting an epsilon-greedy strategy to balance the training process and the exploration process. At time t, the base station selects the action with the largest Q value with a probability of 1-e, and selects one action from the state space A with a probability of e.

The procedure at step S106 is as follows:

s6.1 dividing the rewards into N according to the time of service delay _w And (5) grading.

S6.2 when the calculated offload rate is too low, there is a large delay with a reward of 0.

The neural network training process in the step of step S107 is as follows:

and S7.1, initializing the environmental information and the Q network parameters to generate vehicle operation data.

And S7.2, updating and acquiring the current vehicle position and the environment state in each training round, resetting the frequency band power selection and the edge server resource allocation strategy.

S7.3, selecting an action for the target vehicle according to the current state information and the greedy algorithm, namely a combination scheme of frequency band selection, vehicle power and edge server resource allocation, and updating the environment information.

S7.4 obtaining the action combination schemes of all target vehicles, and further obtaining the reward value r related to the capacity _c，i And the returned prize value r _t }。

S7.5 stores the state, action, reward and next state at time t as a sample in the experience pool.

S7.6 when the number of empirical pool samples is sufficient, training the model is started. Randomly taking small batches of samples(s) from a pool of experiences _t ，a _t ，r _t ，s _t+1 ) And training network parameters and updating the target network weight.

And II, application embodiment. In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies.

The method is applied and verified in a dynamic Internet of vehicles calculation unloading scene. The application example considers a communication system of a bidirectional crossroad, the arrival time interval of vehicles on each road is distributed according to negative indexes, the arrival rate of the vehicles is 0.5, and the vehicle speed is 72 km/h. The scenario therefore requires the base station to quickly make edge server resource block, vehicle transmit power and band block selections based on limited state information. Meanwhile, the model training and verification analysis provided by the invention are carried out on application implementation cases. Fig. 5 to 6 are performance analysis diagrams of the present embodiment, and multi-dimensional new energy analysis is performed, so that the effectiveness and robustness of the proposed security computation offloading and resource allocation method are verified, and the overall performance of the system can be significantly improved. The method has profound significance for promoting the development of the car networking and edge computing technology. It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware part can be realized by a special logic chip; the software may be stored in memory for execution with appropriate instructions. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

And thirdly, evidence of relevant effects of the embodiment. The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.

Fig. 5 is a diagram illustrating the total delay at different locations according to the present invention. By randomly generating 10 points, the maximum delay of all vehicles is calculated as the total processing time delay under different schemes. From the figure, it can be seen that the delay performance of the SoRA scheme is far better than that of the local computation scheme and the scheme without frequency band sharing. Since the frequency band sharing will not only cause interference to the target but also cause interference to the eavesdropper, the eavesdropping rate of the eavesdropper is reduced, and finally the performance of the SoRA scheme is shorter than that of the scheme without sharing. For all random location points, the SoRA scheme is very close to the optimal scheme, compared with the optimal scheme, a lot of time is consumed to traverse all possibilities, and the DRL-based SoRA strategy quickly adapts to the characteristics of the car networking environment, thereby illustrating the high efficiency of the scheme.

Fig. 6 shows the average connection probability under different capacity threshold limits according to different aspects of the present invention. The performance of the different schemes can be seen by setting different capacity thresholds. It can be seen from the figure that, with the continuous increase of the capacity threshold, the connection probability of the random strategy is firstly sharply decreased and then slowly decreased, the connection probability of the optimal scheme remains unchanged, and the SoRA strategy has better effect than the random strategy without sharing and has little difference from the optimal strategy.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. The method for the safe calculation unloading and resource allocation of the Internet of vehicles is characterized in that the method for the safe calculation unloading and resource allocation of the Internet of vehicles firstly utilizes a queuing theory to model a dynamic process of a vehicle, and a plurality of dynamic eavesdroppers are arranged in a scene; secondly, modeling the optimization problem into a multi-agent sequential decision problem, and performing multi-agent training solution by using a DDQN reinforcement learning method, so that a user can select a reasonable strategy to minimize the maximum service delay in all vehicles while performing safe unloading.

2. The vehicle networking security computing offloading and resource allocation method of claim 1, wherein the vehicle networking security computing offloading and resource allocation method comprises the steps of:

the method comprises the steps that firstly, a vehicle networking communication scene of a single base station is constructed, and the base station is connected to an edge server to provide calculation unloading service;

secondly, modeling a communication process for transmission processes of different links;

thirdly, modeling an optimized target by utilizing a Wyner eavesdropping coding scheme;

fourthly, the base station obtains the state information of the current moment through the action of the base station and the surrounding environment information, wherein the state information comprises the information of the target vehicle, including the vehicle speed, the position coordinate and the current state), the frequency band allocation information and the resource allocation information on the edge server are used as the state input of the deep reinforcement learning, and the DDQN algorithm is used for the deep reinforcement learning;

fifthly, selecting corresponding actions by the vehicle based on the current state information; the current state action is power selection, frequency band selection and edge server calculation resource block selection;

sixthly, designing a reward mechanism and a structure of the neural network according to the model and the strategy constructed in the second step;

seventhly, extracting input characteristics of the current state by using the DDQN neural network in the fifth step, fitting a Q function to obtain Q values of different actions in various input states, selecting the action in the current state according to an e-greedy strategy, and training and updating parameters of the neural network by combining with a reward mechanism in the fifth step;

and eighthly, using the trained DDQN network, taking the state information of the current environment as state input, outputting a Q value sequence adopting corresponding actions in the current state, and taking the action with the maximum Q value as a strategy for selecting power, frequency band and edge server computing resources of the target vehicle in the current state.

3. The vehicle networking security computation offload and resource allocation method according to claim 2, wherein the process of the first step is as follows: the arrival process of the vehicle is modeled by using a queuing theory, the arrival time interval t of the vehicle obeys a negative exponential score, and a probability density function is as follows:

4. The vehicle networking security computation offload and resource allocation method of claim 2, wherein the second step is performed as follows:

2.1 channel gain g between transmitting and receiving ends during communication _k By large scale fading a _k And small scale fading component h _k ：

g _k ＝α _k h _k ；

shadow fading of V2V:

where D is the updated distance matrix, D _corr 10, N on the general city road _S (n) is a matrix of M x M, which is a normally distributed matrix with variance 1 expected to be 0;

wherein

Represents the power from the kth vehicle user to the base station, g _k,B [m]Representing the channel gain, σ, of the k-th vehicle to the base station on the m-th frequency band ² The representation of the noise is represented by,

is the interference experienced in the unloading of the kth user vehicle,

transmission power g representing that m-th vehicle performs V2V communication _m,n [m]Denotes a channel gain, ρ, of the m-th vehicle for V2V communication _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used.

2.3 user pass of kth vehicleThe rate of the offload link from the mth subchannel to the base station is

Where W is the bandwidth of the channel and,

expressed as the signal-to-noise ratio:

wherein

Represents the power from the kth vehicle user to the base station, g _k,B [m]Indicates the channel gain, σ, of the kth vehicle to the base station on the mth frequency band ² The representation of the noise is represented by,

is the interference experienced in the unloading of the kth user vehicle,

transmission power g representing that m-th vehicle performs V2V communication _m,B [m]Represents an interference channel gain, ρ, caused to V2I communication by the m-th vehicle for V2V communication _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used;

2.4 Rate of an nth eavesdropper eavesdropping on a kth vehicular user on an mth subband

Expressed as:

wherein

Represents the power of the k-th vehicle user, g _k,n [m]Indicates the channel gain, σ, of the k-th vehicle to the eavesdropper on the m-th band ² The representation of the noise is represented by,

is the disturbance suffered during the eavesdropping,

transmission power g representing that m-th vehicle performs V2V communication _m,n [m]Denotes the channel gain, ρ, of the m-th vehicle in V2V communication with the eavesdropper _k′ [m]Using this band is denoted by 1, ρ _k′ [m]0 means that this band is not used.

5. The vehicle networking security computation offload and resource allocation method according to claim 2, wherein the third step is as follows:

3.1 secure offload Rate expressed as

v _e Representing all eavesdroppers.

3.2 time of transmission of kth vehicle user to base station

Wherein B is _k Which represents the size of the computing task,

wherein B is _k Representing the size of the computational task, z _k [j]1 means that the jth resource block is allocated to the kth vehicle user for use, z _k [j]0 means that the jth resource block is not allocated to the kth vehicle user for use, N _c,j Indicates the total number of edge server processing cores, u _E Representing the processing rate of each core; the total delay

3.3 minimizing the maximum service delay among all vehicles, the objective function is:

Subject to:

C ₁ :

C ₂ :

C ₃ :

C ₄ :

C ₅ :

N _c Representing the total number of processing cores and processing power of the MEC server, N _u Representing the total number of serviced vehicles. C ₁ Ensuring that the total number of processing cores does not exceed the core number of edge servers, C ₂ ，C ₃ ，C ₄ Three constraints ensure that each vehicle user can only select one frequency band, one transmitting power and one calculation resource block, C ₅ And the decision variables of the optimization target are designated as binary variables.

6. The vehicle networking security computation offload and resource allocation method according to claim 2, wherein the process of the fifth step is as follows:

5.1 the motion space can be represented by a three-dimensional coordinate, wherein the x axis represents the selection of a frequency band, the y axis represents the selection of the vehicle emission power, and the z axis represents the selection of a computing resource block on the edge server; let the frequency band select have N _a The power of the vehicle is selected to be N _b In the method, the edge server resource block is selected to have N _p In this way, the action for any vehicle needing service may be N _a ×N _b ×N _p ；

And 5.2, balancing the training process and the exploration process by adopting an element E-greedy strategy, and selecting the action with the maximum Q value by the base station with the probability of 1-element at the time t, and selecting one action from the state space A with the probability of element E.

7. The vehicle networking security computation offload and resource allocation method according to claim 2, wherein the process of the sixth step is as follows:

6.1 partitioning rewards into N according to time of service delay _w Grading;

8. The vehicle networking security computation offload and resource allocation method according to claim 2, wherein the neural network training process of the seventh step is as follows:

9. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the car networking security calculation offloading and resource allocation method according to any of claims 1-7.

10. An information data processing terminal, characterized in that the information data processing terminal is used for executing the steps of the vehicle networking security computing unloading and resource allocation method according to any one of claims 1 to 7.