CN112995951B - 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm - Google Patents

5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm Download PDF

Info

Publication number
CN112995951B
CN112995951B CN202110273529.0A CN202110273529A CN112995951B CN 112995951 B CN112995951 B CN 112995951B CN 202110273529 A CN202110273529 A CN 202110273529A CN 112995951 B CN112995951 B CN 112995951B
Authority
CN
China
Prior art keywords
link
channel
resource allocation
user
vehicles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110273529.0A
Other languages
Chinese (zh)
Other versions
CN112995951A (en
Inventor
王书墨
宋晓勤
柴新越
缪娟娟
王奎宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110273529.0A priority Critical patent/CN112995951B/en
Publication of CN112995951A publication Critical patent/CN112995951A/en
Application granted granted Critical
Publication of CN112995951B publication Critical patent/CN112995951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/46Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0215Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices
    • H04W28/0221Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices power availability or consumption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0231Traffic management, e.g. flow control or congestion control based on communication conditions
    • H04W28/0236Traffic management, e.g. flow control or congestion control based on communication conditions radio quality, e.g. interference, losses or delay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a vehicle-to-vehicle (V2V) communication resource allocation method based on a depth deterministic strategy gradient (DDPG) algorithm, wherein V2V communication accesses a 5G network by using a network slicing technology, an optimal V2V user channel allocation and transmission power joint optimization strategy is obtained by utilizing a depth reinforcement learning optimization strategy, a V2V user reduces mutual interference among V2V links by selecting proper transmission power and channels, and the total system throughput of the V2V links is maximized under the condition of meeting the link delay constraint. The invention uses DDPG algorithm to effectively solve the joint optimization problem of V2V user channel allocation and power selection, and can stably represent in the optimization of a series of continuous action spaces.

Description

5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm
Technical Field
The invention relates to a Vehicle networking technology, in particular to a resource allocation method of a Vehicle networking, and more particularly to a Vehicle-to-Vehicle (V2V) communication resource allocation method of a 5G Vehicle networking by adopting a Deep Deterministic Policy Gradient (DDPG) algorithm.
Background
The Vehicle-to-information (V2X) is a typical application of the Internet of Things (IoT) in the field of Intelligent Transportation Systems (ITS), and refers to a ubiquitous Intelligent Vehicle network formed based on an Intranet, the Internet and a mobile Vehicle-mounted network. The vehicle networking system shares and exchanges data according to an agreed communication protocol and a data interaction standard. By sensing and cooperating pedestrians, roadside facilities, vehicles, networks and clouds in real time, the intelligent traffic management and service is realized, for example, the road safety is improved, the road condition sensing is enhanced, and the traffic jam is reduced.
Reasonable vehicle networking resource allocation is crucial to mitigating interference, improving network efficiency, and ultimately optimizing wireless communication performance. Conventional resource allocation schemes mostly use slowly varying large-scale fading channel information for allocation. There is a heuristic location dependent uplink resource allocation scheme proposed in the literature that features spatial resource reuse without requiring complete channel state information, thus reducing signaling overhead. Additional research has developed a framework that includes vehicle grouping, multiplexed channel selection, and power control that can reduce the overall interference of V2V users to the cellular network while maximizing the sum or minimum achievable rate of V2V users. However, with the increasing of communication traffic and the great improvement of the communication speed requirement, the fast change of a wireless channel due to high mobility brings great uncertainty to resource allocation, and the traditional resource allocation method cannot meet the requirements of high reliability and low delay of people on the internet of vehicles.
Deep learning provides a multi-tiered computational model that can learn efficient data representations with multiple levels of abstraction from unstructured sources, providing a powerful data-driven approach to solving many problems traditionally thought to be difficult. Compared with the traditional resource allocation algorithm, the resource allocation scheme based on the deep reinforcement learning algorithm can better meet the requirements of high reliability and low time delay of the Internet of vehicles. There is a document that proposes a novel distributed vehicle-to-vehicle communication resource allocation mechanism based on deep reinforcement learning that can be applied to unicast and broadcast scenes. According to the distributed resource allocation mechanism, the agent, i.e., the V2V link or vehicle, can make a decision to find the best sub-band and transmission power level without waiting for global state information. However, the existing V2V resource allocation algorithm based on deep reinforcement learning cannot meet the differentiated service requirements of scenes such as high bandwidth, large capacity, ultra-reliability and low time delay in a 5G network.
Therefore, the resource allocation method provided by the invention adopts the 5G network slicing technology, can provide differentiated services for different application scenes in a 5G network, and simultaneously adopts the DDPG algorithm which can stably express in the optimization of a series of continuous action spaces to allocate the V2V resources, takes the maximization of the system throughput as the optimization target of the V2V resource allocation, and obtains good balance between complexity and performance.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, a deep reinforcement learning-based DDPG algorithm V2V user resource allocation method is provided, and V2V communication accesses a 5G network by a network slicing technology. The method can realize the V2V user resource allocation with the maximized system throughput at lower V2V link delay under the condition that the V2V link does not interfere with the V2I link.
The technical scheme is as follows: under the condition of considering the delay of the V2V link, the aim of maximizing the throughput of the system communication system is achieved with reasonable resource allocation. We adopt 5G network slicing technique, the V2V link and the V2I link use different slices, and the V2V link does not interfere with the V2I link. With the distributed resource allocation method, the base station is not required to centrally schedule the channel state information, each V2V link is treated as an agent, and the channel and transmit power are selected based on the instantaneous state information and the information shared from the neighbors by each time slot. And optimizing the deep reinforcement learning model by using a DDPG algorithm through establishing the deep reinforcement learning model. And obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model. The invention is realized by the following technical scheme: a method for distributing V2V resources based on 5G network slices by adopting a DDPG algorithm comprises the following steps:
(1) communication services in the internet of vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) about driving safety;
(2) dividing V2I and V2V communication traffic into different slices respectively by using a 5G network slicing technology;
(3) the constructed user resource allocation system model is that K shares a channel with the authorized bandwidth of B for V2V users;
(4) by adopting a distributed resource allocation method, under the condition of considering V2V link delay, a deep reinforcement learning model is constructed with the aim of maximizing the throughput of a communication system;
(5) considering a joint optimization problem in a continuous action space, and optimizing a deep reinforcement learning model by using a deep certainty strategy gradient (DDPG) algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback;
(6) and obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
Further, the step (4) comprises the following specific steps:
(4a) the state space S is specifically defined as channel information related to resource allocation, including subchannel m corresponding V2V link instantaneous channel information Gt[m]Interference strength I received in the previous time slot of sub-channel mt-1[m]Number of times N that subchannel m was selected by an adjacent V2V link in the previous time slott-1[m]Residual load L transmitted by V2V usertResidual time delay UtI.e. by
st={Gt,It-1,Nt-1,Lt,Ut}
Considering the V2V link as an agent, each time the V2V link is based on the current state stSelecting a channel and transmitting power by the E.S;
(4b) defining the motion space A as the transmit power and the selected channel, denoted as
Figure BSA0000234973230000021
Wherein the content of the first and second substances,
Figure BSA0000234973230000022
the transmission power of the k-th V2V link user,
Figure BSA0000234973230000023
the situation that the mth channel is used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is to select the spectral sub-band and transmit power for the V2V link, maximizing the system throughput for the V2V link while meeting the delay constraint and causing less interference to other V2V links. The reward function can thus be expressed as:
Figure BSA0000234973230000031
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty increases as the transmission time increases.
(4d) Establishing a deep reinforcement learning model on the basis of Q learning according to the established S, A and R, and evaluating the function Q (S)t,at) Represents the slave state stPerforming action atThe resulting discount reward, the Q-value update function is:
Figure BSA0000234973230000032
wherein r istIs an instant reward function, gamma is a discount factor, stState information for V2V link at time t, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
Has the advantages that: the invention provides a V2V resource allocation method based on a 5G network slice by adopting a deep deterministic strategy gradient algorithm, wherein V2V communication uses a network slice technology to access a 5G network, an optimal V2V user channel allocation and transmission power joint optimization strategy is obtained by utilizing a deep reinforcement learning optimization strategy, a V2V user reduces mutual interference between V2V links by selecting proper transmission power and allocation channels, and the system throughput of the V2V link is maximized under the constraint of meeting link delay. The invention uses DDPG algorithm to effectively solve the joint optimization problem of V2V user channel allocation and power selection, and can stably represent in the optimization of a series of continuous action spaces.
In conclusion, under the conditions of ensuring reasonable resource allocation, low interference between V2V links and low computational complexity, the V2V resource allocation method based on the 5G network slice, which adopts the deep deterministic strategy gradient algorithm, is superior in maximizing the throughput of the V2V system.
Drawings
FIG. 1 is a flowchart of a 5G Internet of vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a V2V user resource allocation model based on 5G network slicing technology according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning framework based on an Actor-Critic model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a V2V communication deep reinforcement learning model according to an embodiment of the present invention;
Detailed Description
The core idea of the invention is that: the V2V communication is accessed to the 5G network by a network slicing technology, a distributed resource allocation method is adopted, each V2V link is taken as an intelligent agent, and a deep reinforcement learning model is optimized by establishing the deep reinforcement learning model and utilizing a DDPG algorithm. And obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
The present invention is described in further detail below.
Step (1), the communication service in the internet of vehicles is divided into two types, namely, broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) related to driving safety.
And (2) respectively dividing V2I and V2V into different slices by using a 5G network slicing technology.
Step (3), the constructed user resource allocation system model is that the K pair of V2V users share the channel with the authorized bandwidth of B,
the method comprises the following specific steps:
(3a) establishing a V2V user resource allocation system model, wherein the system comprises K pairs of V2V Users (VUEs) and is represented by a set K ═ 1, 2,.. K, and a total authorized bandwidth B is equally divided into M bandwidth equal to B0Sub-channels, sets for sub-channels
Figure BSA00002349732300000413
Represents;
(3b) the SINR of the kth V2V link may be expressed as:
Figure BSA0000234973230000041
wherein the content of the first and second substances,
Figure BSA0000234973230000042
Gdis the total interference power, g, of all V2V links sharing the same RBkIs the channel gain for the kth V2V link internet-of-vehicles user,
Figure BSA0000234973230000043
is the interference gain of the k' th V2V link to the k 2V 2V link. The channel capacity of the kth V2V link may be expressed as:
Cv[k]=W·log(1+γv[k]) (ii) a Expression 3
(3c) For the kth V2V link, it selects the channel information at time t as:
Figure BSA0000234973230000044
if it is
Figure BSA0000234973230000045
The mth channel is used by the kth V2V link while there is
Figure BSA0000234973230000046
And i ≠ m, i.e
Figure BSA0000234973230000047
K is the total number of the V2V links, and M is the total number of the available channels of the V2V link access slice.
Step (4), a distributed resource allocation method is adopted, and a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system under the condition of considering the delay of the V2V link, and the method comprises the following specific steps:
(4a) in particularDefining a state space S as observation information related to resource allocation, including subchannel corresponding V2V link instantaneous channel information
Figure BSA0000234973230000048
Interference strength I received by previous time slot of sub-channelt-1[m],
Figure BSA0000234973230000049
Number of times N that channel m was selected by an adjacent V2V link in a previous time sloty-1[m],
Figure BSA00002349732300000410
The remaining V2V load LtResidual time delay UtI.e. by
Figure BSA00002349732300000411
(4b) Defining the motion space A as the transmit power and the selected channel, denoted as
Figure BSA00002349732300000412
Wherein the content of the first and second substances,
Figure BSA0000234973230000051
the transmission power of the k-th V2V link user,
Figure BSA0000234973230000052
for the use of the mth channel by the kth V2V link user,
Figure BSA0000234973230000053
indicating that the mth channel is used by the kth V2V link user,
Figure BSA0000234973230000054
indicating that the mth channel is not used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is to select the spectral sub-band and transmit power for the V2V link, maximizing the system throughput for the V2V link while meeting the delay constraint and causing less interference to other V2V links. The reward function can thus be expressed as:
Figure BSA0000234973230000055
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty increases as the transmission time increases. In order to obtain a good return for a long period of time, both the pre-ocular return and the future return should be considered. Thus, the main goal of reinforcement learning is to find a strategy to maximize the expected cumulative discount return,
Figure BSA0000234973230000056
wherein β ∈ [0, 1] is a discount factor;
(4d) and according to the established S, A and R, establishing a deep reinforcement learning model on the basis of Q learning: evaluating the function Q(s)t,at) Represents the slave state stPerforming action atThe later generated discount reward, the Q value update function is
Figure BSA0000234973230000057
Wherein r istIs an instant reward function, gamma is a discount factor, stState information for V2V link at time t, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
And (5) in order to solve the problem of V2V resource allocation based on 5G network slices, an action space in a deep reinforcement learning model established by taking a V2V link as an agent comprises two variables of transmitting power and channel selection, continuous change of the transmitting power within a certain range is considered, and in order to solve the problem of joint optimization in the high-dimensional action space, especially the continuous action space, the deep reinforcement learning model is optimized by using a DDPG algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback.
The deep learning fitting means that the DDPG algorithm is based on an Actor-Critic framework, and a deterministic strategy a, mu (s | theta) and an action value function Q (s, a | delta) are fitted by using a deep neural network with parameters of theta and delta respectively, and the method is shown in the attached figure 3 of the specification.
The soft update means that the parameters of the action value network are frequently updated in a gradient manner and are used for calculating the gradient of the policy network, so that the learning process of the action value network is likely to be unstable, and therefore, the network is updated by adopting a soft update manner.
Respectively creating an online network and a target network for the strategy network and the action value network:
Figure BSA0000234973230000061
the network is continuously updated by gradient descent in the training process, and the updating mode of the target network is as follows
θ' ═ τ θ + (1- τ) θ expression 9
δ' ═ τ δ + (1- τ) δ expression 10
The empirical playback mechanism means that the sample data of state transition generated when the sample data interacts with the environment has time sequence relevance, and the deviation of action value function fitting is easily caused. Therefore, by taking the experience playback mechanism of the DQN algorithm as a reference, the collected samples are firstly put into a sample pool, and then some mini-batch samples are randomly selected from the sample pool to be used for training the network. The processing removes the correlation and the dependency among samples, solves the problems of the correlation among data and the non-static distribution of the correlation among data, and enables the algorithm to be easier to converge.
The method for optimizing the deep reinforcement learning model by using the DDPG algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback comprises the following steps:
(5a) initializing the training round number P;
(5b) initializing a time step t in the P round;
(5c) the online Actor policy network inputs the state s according totOutput action atAnd obtain an instant prize rtWhile going to the next state st+1Thereby obtaining training data(s)t,at,rt,st+1);
(5d) Training data(s)t,at,rt,st+1) Storing the experience in an experience playback pool;
(5e) randomly sampling m training data(s) from an empirical replay poolt,at,rt,st+1) Forming a data set, and sending the data set to an online Actor policy network, an online Critic evaluation network, a target Actor policy network and a target Critic evaluation network;
(5f) setting Q to be estimated
yi=ri+γQ′(si+1,μ′(si+1| θ ') | δ') expression 11
Defining the loss function of an online Critic evaluation network as
Figure BSA0000234973230000062
Updating all parameters theta of the Critic current network through gradient back propagation of the neural network;
(5g) defining a given sampling strategy gradient of an on-line Actor strategy network as
Figure BSA0000234973230000063
Updating all parameters delta of the current network of the Actor through the gradient back propagation of the neural network;
(5h) if the online training frequency reaches the target network updating frequency, respectively updating target network parameters delta 'and theta' according to the online network parameters delta and theta;
(5i) judging whether t is less than K, wherein K is the total time step in the p round, if so, t is t +1, and entering the step 5c, otherwise, entering the step 5 j;
(5j) and judging whether p is less than I or not, setting a threshold value for the training round number by I, if so, setting p to be p +1, and entering the step 5b, otherwise, finishing the optimization and obtaining the optimized deep reinforcement learning model.
Step (6), obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model, comprising the following steps:
(6a) inputting the state information s of the system at a certain moment by using the deep reinforcement learning model trained by the DDPG algorithmk(t);
(6b) Outputting the optimal action strategy
Figure BSA0000234973230000071
Obtaining the optimal V2V user transmitting power
Figure BSA0000234973230000072
And allocating channels
Figure BSA0000234973230000073
Finally, the drawings in the specification are explained in detail.
In fig. 1, a flow of a 5G internet of vehicles V2V resource allocation method using a deep deterministic strategy gradient algorithm is described, in which V2V communication uses a network slicing technique to access a 5G network, and a DDPG optimized deep reinforcement learning model is used to obtain an optimal V2V user channel allocation and transmit power joint optimization strategy.
In fig. 2, a V2V user resource allocation model based on the 5G network slicing technique is depicted, with V2V communications and V2I communications using different slices.
In fig. 3, it is described that the deep learning fitting refers to the DDPG algorithm fitting a deterministic strategy a ═ μ (s | θ) and an action value function Q (s, a | δ) using deep neural networks with parameters θ and δ, respectively, based on an Actor-criticc framework.
In FIG. 4The V2V communication deep reinforcement learning model is described. It can be seen that the V2V link acts as an agent based on the current state stE S selects a channel and transmit power according to a reward function.
Based on the description of the present invention, it should be apparent to those skilled in the art that the V2V resource allocation method based on the deep reinforcement learning DDPG algorithm using the 5G network slicing technique of the present invention can improve the system throughput and ensure that the communication delay meets the safety requirement.
Details not described in the present application are well within the skill of those in the art.

Claims (1)

1. A5G Internet of vehicles V2V resource allocation method adopting a depth deterministic strategy gradient algorithm is characterized by comprising the following steps:
(1) communication services in the internet of vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) about driving safety;
(2) dividing V2I and V2V communication traffic into different slices respectively by using a 5G network slicing technology;
(3) the constructed user resource allocation system model is that K shares a channel with the authorized bandwidth of B for V2V users;
(4) by adopting a distributed resource allocation method, under the condition of considering V2V link delay, a state space S is constructed by taking the maximization of the throughput of a communication system as a target and is observation information related to resource allocation, an action space A is a transmission power and a selected channel, and an incentive R is a deep reinforcement learning model of the weighted sum of the throughput of the system and the delay of the system, and specifically comprises the following steps:
(4a) the state space S is specifically defined as observation information related to resource allocation, including subchannel m corresponding V2V link instantaneous channel state information Gt[m]Interference strength I received in the previous time slot of sub-channel mt-1[m]Number of times N that subchannel m was selected by an adjacent V2V link in the previous time slott-1[m]Residual load L transmitted by V2V usertResidual time delay UtI.e. by
st={Gt,It-1,Nt-1,Lt,Ut}
Considering the V2V link as an agent, each time the V2V link is based on the current state stSelecting a channel and transmitting power by the E.S;
(4b) defining the action space A as the transmit power and the selected channel, denoted as
Figure FSB0000198150420000011
Wherein the content of the first and second substances,
Figure FSB0000198150420000012
the transmission power of the k-th V2V link user,
Figure FSB0000198150420000013
for the use of the mth channel by the kth V2V link user,
Figure FSB0000198150420000014
indicating that the mth channel is used by the kth V2V link user,
Figure FSB0000198150420000015
indicating that the mth channel is not used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is that the V2V link selects a spectral sub-band and transmit power that maximizes the system throughput of the V2V link while satisfying the delay constraint, so the reward function can be expressed as:
Figure FSB0000198150420000016
wherein, Cv[k]For the channel capacity, T, of the Kth V2V link0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, as a function of the transmission timeIncreasing, the penalty will also increase;
(4d) establishing a deep reinforcement learning model on the basis of Q learning according to the established S, A and R, and evaluating the function Q (S)t,at) Represents the slave state stPerforming action atThe resulting discount reward, the Q-value update function is:
Figure FSB0000198150420000017
wherein r istIs an instant reward function, gamma is a discount factor, stState information for V2V link at time t, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space;
(5) considering a joint optimization problem in a continuous action space, and optimizing a deep reinforcement learning model by using a deep certainty strategy gradient (DDPG) algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback;
(6) and obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
CN202110273529.0A 2021-03-12 2021-03-12 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm Active CN112995951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110273529.0A CN112995951B (en) 2021-03-12 2021-03-12 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110273529.0A CN112995951B (en) 2021-03-12 2021-03-12 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm

Publications (2)

Publication Number Publication Date
CN112995951A CN112995951A (en) 2021-06-18
CN112995951B true CN112995951B (en) 2022-04-08

Family

ID=76335240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110273529.0A Active CN112995951B (en) 2021-03-12 2021-03-12 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm

Country Status (1)

Country Link
CN (1) CN112995951B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676958B (en) * 2021-07-28 2023-06-02 北京信息科技大学 Vehicle-to-vehicle network slice bandwidth resource allocation method and device
CN113727306B (en) * 2021-08-16 2023-04-07 南京大学 Decoupling C-V2X network slicing method based on deep reinforcement learning
CN113709882B (en) * 2021-08-24 2023-10-17 吉林大学 Internet of vehicles communication resource allocation method based on graph theory and reinforcement learning
CN113766661B (en) * 2021-08-30 2023-12-26 北京邮电大学 Interference control method and system for wireless network environment
CN113965944A (en) * 2021-09-14 2022-01-21 中国船舶重工集团公司第七一六研究所 Method and system for maximizing delay certainty by ensuring system control performance
CN114245345B (en) * 2021-11-25 2024-04-19 西安电子科技大学 Imperfect channel state information-oriented Internet of vehicles power control method and system
CN114245344A (en) * 2021-11-25 2022-03-25 西安电子科技大学 Internet of vehicles uncertain channel state information robust power control method and system
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114885426B (en) * 2022-05-05 2024-04-16 南京航空航天大学 5G Internet of vehicles resource allocation method based on federal learning and deep Q network
CN114827956A (en) * 2022-05-12 2022-07-29 南京航空航天大学 High-energy-efficiency V2X resource allocation method for user privacy protection
CN114641041B (en) * 2022-05-18 2022-09-13 之江实验室 Internet of vehicles slicing method and device oriented to edge intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN111083942A (en) * 2018-08-22 2020-04-28 Lg 电子株式会社 Method and apparatus for performing uplink transmission in wireless communication system
CN111137292A (en) * 2018-11-01 2020-05-12 通用汽车环球科技运作有限责任公司 Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9775045B2 (en) * 2015-09-11 2017-09-26 Intel IP Corporation Slicing architecture for wireless communication
US10986602B2 (en) * 2018-02-09 2021-04-20 Intel Corporation Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function
CN110320883A (en) * 2018-03-28 2019-10-11 上海汽车集团股份有限公司 A kind of Vehicular automatic driving control method and device based on nitrification enhancement
CN110972107B (en) * 2018-09-29 2021-12-31 华为技术有限公司 Load balancing method and device
CN112469000A (en) * 2019-09-06 2021-03-09 杨海琴 System and method for vehicle network service on 5G network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083942A (en) * 2018-08-22 2020-04-28 Lg 电子株式会社 Method and apparatus for performing uplink transmission in wireless communication system
CN111137292A (en) * 2018-11-01 2020-05-12 通用汽车环球科技运作有限责任公司 Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method

Also Published As

Publication number Publication date
CN112995951A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112995951B (en) 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm
Ning et al. Deep reinforcement learning for vehicular edge computing: An intelligent offloading system
Chen et al. Cooperative edge caching with location-based and popular contents for vehicular networks
CN112954651B (en) Low-delay high-reliability V2V resource allocation method based on deep reinforcement learning
CN113543074B (en) Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
Mlika et al. Network slicing with MEC and deep reinforcement learning for the Internet of Vehicles
Qian et al. Leveraging dynamic stackelberg pricing game for multi-mode spectrum sharing in 5G-VANET
CN111970733A (en) Deep reinforcement learning-based cooperative edge caching algorithm in ultra-dense network
CN111132074B (en) Multi-access edge computing unloading and frame time slot resource allocation method in Internet of vehicles environment
CN114885426B (en) 5G Internet of vehicles resource allocation method based on federal learning and deep Q network
Wang et al. Energy-delay minimization of task migration based on game theory in MEC-assisted vehicular networks
Zhang et al. Fuzzy logic-based resource allocation algorithm for V2X communications in 5G cellular networks
CN110267274B (en) Spectrum sharing method for selecting sensing users according to social credibility among users
Lin et al. Popularity-aware online task offloading for heterogeneous vehicular edge computing using contextual clustering of bandits
Vu et al. Multi-agent reinforcement learning for channel assignment and power allocation in platoon-based c-v2x systems
CN115134779A (en) Internet of vehicles resource allocation method based on information age perception
CN116582860A (en) Link resource allocation method based on information age constraint
Ouyang Task offloading algorithm of vehicle edge computing environment based on Dueling-DQN
Zhou et al. Multi-agent few-shot meta reinforcement learning for trajectory design and channel selection in UAV-assisted networks
Gao et al. Reinforcement learning based resource allocation in cache-enabled small cell networks with mobile users
CN117412391A (en) Enhanced dual-depth Q network-based Internet of vehicles wireless resource allocation method
Bhadauria et al. QoS based deep reinforcement learning for V2X resource allocation
Mei et al. Semi-decentralized network slicing for reliable V2V service provisioning: A model-free deep reinforcement learning approach
Chaowei et al. Collaborative caching in vehicular edge network assisted by cell-free massive MIMO
CN115052262A (en) Potential game-based vehicle networking computing unloading and power optimization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant