CN112995951A - 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm - Google Patents
5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm Download PDFInfo
- Publication number
- CN112995951A CN112995951A CN202110273529.0A CN202110273529A CN112995951A CN 112995951 A CN112995951 A CN 112995951A CN 202110273529 A CN202110273529 A CN 202110273529A CN 112995951 A CN112995951 A CN 112995951A
- Authority
- CN
- China
- Prior art keywords
- link
- resource allocation
- channel
- user
- vehicles
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013468 resource allocation Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000002787 reinforcement Effects 0.000 claims abstract description 34
- 238000004891 communication Methods 0.000 claims abstract description 26
- 230000009471 action Effects 0.000 claims abstract description 23
- 230000005540 biological transmission Effects 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 20
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000013135 deep learning Methods 0.000 claims description 7
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 8
- 230000006855 networking Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/46—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0215—Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices
- H04W28/0221—Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices power availability or consumption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0231—Traffic management, e.g. flow control or congestion control based on communication conditions
- H04W28/0236—Traffic management, e.g. flow control or congestion control based on communication conditions radio quality, e.g. interference, losses or delay
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/44—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a vehicle-to-vehicle (V2V) communication resource allocation method based on a depth deterministic strategy gradient (DDPG) algorithm, wherein V2V communication accesses a 5G network by using a network slicing technology, an optimal V2V user channel allocation and transmission power joint optimization strategy is obtained by utilizing a depth reinforcement learning optimization strategy, a V2V user reduces mutual interference among V2V links by selecting proper transmission power and channels, and the total system throughput of the V2V links is maximized under the condition of meeting the link delay constraint. The invention uses DDPG algorithm to effectively solve the joint optimization problem of V2V user channel allocation and power selection, and can stably represent in the optimization of a series of continuous action spaces.
Description
Technical Field
The invention relates to a Vehicle networking technology, in particular to a resource allocation method of a Vehicle networking, and more particularly to a Vehicle-to-Vehicle (V2V) communication resource allocation method of a 5G Vehicle networking by adopting a Deep Deterministic Policy Gradient (DDPG) algorithm.
Background
The Vehicle-to-information (V2X) is a typical application of the Internet of Things (IoT) in the field of Intelligent Transportation Systems (ITS), and refers to a ubiquitous Intelligent Vehicle network formed based on an Intranet, the Internet and a mobile Vehicle-mounted network. The vehicle networking system shares and exchanges data according to an agreed communication protocol and a data interaction standard. By sensing and cooperating pedestrians, roadside facilities, vehicles, networks and clouds in real time, the intelligent traffic management and service is realized, for example, the road safety is improved, the road condition sensing is enhanced, and the traffic jam is reduced.
Reasonable vehicle networking resource allocation is crucial to mitigating interference, improving network efficiency, and ultimately optimizing wireless communication performance. Conventional resource allocation schemes mostly use slowly varying large-scale fading channel information for allocation. There is a heuristic location dependent uplink resource allocation scheme proposed in the literature that features spatial resource reuse without requiring complete channel state information, thus reducing signaling overhead. Additional research has developed a framework that includes vehicle grouping, multiplexed channel selection, and power control that can reduce the overall interference of V2V users to the cellular network while maximizing the sum or minimum achievable rate of V2V users. However, with the increasing of communication traffic and the great improvement of the communication speed requirement, the fast change of a wireless channel due to high mobility brings great uncertainty to resource allocation, and the traditional resource allocation method cannot meet the requirements of high reliability and low delay of people on the internet of vehicles.
Deep learning provides a multi-tiered computational model that can learn efficient data representations with multiple levels of abstraction from unstructured sources, providing a powerful data-driven approach to solving many problems traditionally thought to be difficult. Compared with the traditional resource allocation algorithm, the resource allocation scheme based on the deep reinforcement learning algorithm can better meet the requirements of high reliability and low time delay of the Internet of vehicles. There is a document that proposes a novel distributed vehicle-to-vehicle communication resource allocation mechanism based on deep reinforcement learning that can be applied to unicast and broadcast scenes. According to the distributed resource allocation mechanism, the agent, i.e., the V2V link or vehicle, can make a decision to find the best sub-band and transmission power level without waiting for global state information. However, the existing V2V resource allocation algorithm based on deep reinforcement learning cannot meet the differentiated service requirements of scenes such as high bandwidth, large capacity, ultra-reliability and low time delay in a 5G network.
Therefore, the resource allocation method provided by the invention adopts the 5G network slicing technology, can provide differentiated services for different application scenes in a 5G network, and simultaneously adopts the DDPG algorithm which can stably express in the optimization of a series of continuous action spaces to allocate the V2V resources, takes the maximization of the system throughput as the optimization target of the V2V resource allocation, and obtains good balance between complexity and performance.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, a deep reinforcement learning-based DDPG algorithm V2V user resource allocation method is provided, and V2V communication accesses a 5G network by a network slicing technology. The method can realize the V2V user resource allocation with the maximized system throughput at lower V2V link delay under the condition that the V2V link does not interfere with the V2I link.
The technical scheme is as follows: under the condition of considering the delay of the V2V link, the aim of maximizing the throughput of the system communication system is achieved with reasonable resource allocation. We adopt 5G network slicing technique, the V2V link and the V2I link use different slices, and the V2V link does not interfere with the V2I link. With the distributed resource allocation method, the base station is not required to centrally schedule the channel state information, each V2V link is treated as an agent, and the channel and transmit power are selected based on the instantaneous state information and the information shared from the neighbors by each time slot. And optimizing the deep reinforcement learning model by using a DDPG algorithm through establishing the deep reinforcement learning model. And obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model. The invention is realized by the following technical scheme: a method for distributing V2V resources based on 5G network slices by adopting a DDPG algorithm comprises the following steps:
(1) communication services in the internet of vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) about driving safety;
(2) dividing V2I and V2V communication traffic into different slices respectively by using a 5G network slicing technology;
(3) the constructed user resource allocation system model is that K shares a channel with the authorized bandwidth of B for V2V users;
(4) by adopting a distributed resource allocation method, under the condition of considering V2V link delay, a deep reinforcement learning model is constructed with the aim of maximizing the throughput of a communication system;
(5) considering a joint optimization problem in a continuous action space, and optimizing a deep reinforcement learning model by using a deep certainty strategy gradient (DDPG) algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback;
(6) and obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
Further, the step (4) comprises the following specific steps:
(4a) the state space S is specifically defined as channel information related to resource allocation, including subchannel m corresponding V2V link instantaneous channel information Gt[m]Interference strength I received in the previous time slot of sub-channel mt-1[m]Number of times N that subchannel m was selected by an adjacent V2V link in the previous time slott-1[m]Residual load L transmitted by V2V usertResidual time delay UtI.e. by
st={Gt,It-1,Nt-1,Lt,Ut}
Considering the V2V link as an agent, each time the V2V link is based on the current state stSelecting a channel and transmitting power by the E.S;
Wherein,the transmission power of the k-th V2V link user,the situation that the mth channel is used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is to select the spectral sub-band and transmit power for the V2V link, maximizing the system throughput for the V2V link while meeting the delay constraint and causing less interference to other V2V links. The reward function can thus be expressed as:
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty increases as the transmission time increases.
(4d) Establishing a deep reinforcement learning model on the basis of Q learning according to the established S, A and R, and evaluating the function Q (S)t,at) Represents the slave state stPerforming action atThe resulting discount reward, the Q-value update function is:
wherein r istIs an instant reward function, gamma is a discount factor, stState information for V2V link at time t, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
Has the advantages that: the invention provides a V2V resource allocation method based on a 5G network slice by adopting a deep deterministic strategy gradient algorithm, wherein V2V communication uses a network slice technology to access a 5G network, an optimal V2V user channel allocation and transmission power joint optimization strategy is obtained by utilizing a deep reinforcement learning optimization strategy, a V2V user reduces mutual interference between V2V links by selecting proper transmission power and allocation channels, and the system throughput of the V2V link is maximized under the constraint of meeting link delay. The invention uses DDPG algorithm to effectively solve the joint optimization problem of V2V user channel allocation and power selection, and can stably represent in the optimization of a series of continuous action spaces.
In conclusion, under the conditions of ensuring reasonable resource allocation, low interference between V2V links and low computational complexity, the V2V resource allocation method based on the 5G network slice, which adopts the deep deterministic strategy gradient algorithm, is superior in maximizing the throughput of the V2V system.
Drawings
FIG. 1 is a flowchart of a 5G Internet of vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a V2V user resource allocation model based on 5G network slicing technology according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a deep reinforcement learning framework based on an Actor-Critic model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a V2V communication deep reinforcement learning model according to an embodiment of the present invention;
Detailed Description
The core idea of the invention is that: the V2V communication is accessed to the 5G network by a network slicing technology, a distributed resource allocation method is adopted, each V2V link is taken as an intelligent agent, and a deep reinforcement learning model is optimized by establishing the deep reinforcement learning model and utilizing a DDPG algorithm. And obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
The present invention is described in further detail below.
Step (1), the communication service in the internet of vehicles is divided into two types, namely, broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) related to driving safety.
And (2) respectively dividing V2I and V2V into different slices by using a 5G network slicing technology.
Step (3), the constructed user resource allocation system model is that the K pair of V2V users share the channel with the authorized bandwidth of B,
the method comprises the following specific steps:
(3a) establishing a V2V user resource allocation system model, wherein the system comprises K pairs of V2V Users (VUEs) and is represented by a set K ═ 1, 2,.. K, and a total authorized bandwidth B is equally divided into M bandwidth equal to B0Sub-channels, sets for sub-channelsRepresents;
(3b) the SINR of the kth V2V link may be expressed as:
wherein,
Gdis the total interference power, g, of all V2V links sharing the same RBkIs the channel gain for the kth V2V link internet-of-vehicles user,is the interference gain of the k' th V2V link to the k 2V 2V link. The channel capacity of the kth V2V link may be expressed as:
Cv[k]=W·log(1+γv[k]) (ii) a Expression 3
(3c) For the kth V2V link, it selects the channel information at time t as:
if it isThe mth channel is used by the kth V2V link while there isAnd i ≠ m, i.eK is the total number of the V2V links, and M is the total number of the available channels of the V2V link access slice.
Step (4), a distributed resource allocation method is adopted, and a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system under the condition of considering the delay of the V2V link, and the method comprises the following specific steps:
(4a) the state space S is specifically defined as observation information related to resource allocation, including subchannel corresponding V2V link instantaneous channel informationInterference strength I received by previous time slot of sub-channelt-1[m],Number of times N that channel m was selected by an adjacent V2V link in a previous time sloty-1[m],The remaining V2V load LtResidual time delay UtI.e. by
Wherein,the transmission power of the k-th V2V link user,for the use of the mth channel by the kth V2V link user,indicating that the mth channel is used by the kth V2V link user,indicating that the mth channel is not used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is to select the spectral sub-band and transmit power for the V2V link, maximizing the system throughput for the V2V link while meeting the delay constraint and causing less interference to other V2V links. The reward function can thus be expressed as:
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty increases as the transmission time increases. In order to obtain a good return for a long period of time, both the pre-ocular return and the future return should be considered. Thus, the main goal of reinforcement learning is to find a strategy to maximize the expected cumulative discount return,
wherein β ∈ [0, 1] is a discount factor;
(4d) and according to the established S, A and R, establishing a deep reinforcement learning model on the basis of Q learning: evaluating the function Q(s)t,at) Represents the slave state stPerforming action atThe later generated discount reward, the Q value update function is
Wherein r istIs an instant reward function, gamma is a discount factor, stState information for V2V link at time t, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
And (5) in order to solve the problem of V2V resource allocation based on 5G network slices, an action space in a deep reinforcement learning model established by taking a V2V link as an agent comprises two variables of transmitting power and channel selection, continuous change of the transmitting power within a certain range is considered, and in order to solve the problem of joint optimization in the high-dimensional action space, especially the continuous action space, the deep reinforcement learning model is optimized by using a DDPG algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback.
The deep learning fitting means that the DDPG algorithm is based on an Actor-Critic framework, and a deterministic strategy a, mu (s | theta) and an action value function Q (s, a | delta) are fitted by using a deep neural network with parameters of theta and delta respectively, and the method is shown in the attached figure 3 of the specification.
The soft update means that the parameters of the action value network are frequently updated in a gradient manner and are used for calculating the gradient of the policy network, so that the learning process of the action value network is likely to be unstable, and therefore, the network is updated by adopting a soft update manner.
Respectively creating an online network and a target network for the strategy network and the action value network:
the network is continuously updated by gradient descent in the training process, and the updating mode of the target network is as follows
θ' ═ τ θ + (1- τ) θ expression 9
δ' ═ τ δ + (1- τ) δ expression 10
The empirical playback mechanism means that the sample data of state transition generated when the sample data interacts with the environment has time sequence relevance, and the deviation of action value function fitting is easily caused. Therefore, by taking the experience playback mechanism of the DQN algorithm as a reference, the collected samples are firstly put into a sample pool, and then some mini-batch samples are randomly selected from the sample pool to be used for training the network. The processing removes the correlation and the dependency among samples, solves the problems of the correlation among data and the non-static distribution of the correlation among data, and enables the algorithm to be easier to converge.
The method for optimizing the deep reinforcement learning model by using the DDPG algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback comprises the following steps:
(5a) initializing the training round number P;
(5b) initializing a time step t in the P round;
(5c) the online Actor policy network inputs the state s according totOutput action atAnd obtain an instant prize rtWhile going to the next state st+1Thereby obtaining training data(s)t,at,rt,st+1);
(5d) Training data(s)t,at,rt,st+1) Storing the experience in an experience playback pool;
(5e) randomly sampling m training data(s) from an empirical replay poolt,at,rt,st+1) Forming a data set, and sending the data set to an online Actor policy network, an online Critic evaluation network, a target Actor policy network and a target Critic evaluation network;
(5f) setting Q to be estimated
yi=ri+γQ′(si+1,μ′(si+1| θ ') | δ') expression 11
Defining the loss function of an online Critic evaluation network as
Updating all parameters theta of the Critic current network through gradient back propagation of the neural network;
(5g) defining a given sampling strategy gradient of an on-line Actor strategy network as
Updating all parameters delta of the current network of the Actor through the gradient back propagation of the neural network;
(5h) if the online training frequency reaches the target network updating frequency, respectively updating target network parameters delta 'and theta' according to the online network parameters delta and theta;
(5i) judging whether t is less than K, wherein K is the total time step in the p round, if so, t is t +1, and entering the step 5c, otherwise, entering the step 5 j;
(5j) and judging whether p is less than I or not, setting a threshold value for the training round number by I, if so, setting p to be p +1, and entering the step 5b, otherwise, finishing the optimization and obtaining the optimized deep reinforcement learning model.
Step (6), obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model, comprising the following steps:
(6a) inputting the state information s of the system at a certain moment by using the deep reinforcement learning model trained by the DDPG algorithmk(t);
(6b) Outputting the optimal action strategyObtaining the optimal V2V user transmitting powerAnd is divided intoChannel allocation
Finally, the drawings in the specification are explained in detail.
In fig. 1, a flow of a 5G internet of vehicles V2V resource allocation method using a deep deterministic strategy gradient algorithm is described, in which V2V communication uses a network slicing technique to access a 5G network, and a DDPG optimized deep reinforcement learning model is used to obtain an optimal V2V user channel allocation and transmit power joint optimization strategy.
In fig. 2, a V2V user resource allocation model based on the 5G network slicing technique is depicted, with V2V communications and V2I communications using different slices.
In fig. 3, it is described that the deep learning fitting refers to the DDPG algorithm fitting a deterministic strategy a ═ μ (s | θ) and an action value function Q (s, a | δ) using deep neural networks with parameters θ and δ, respectively, based on an Actor-criticc framework.
In fig. 4, a V2V communication deep reinforcement learning model is depicted. It can be seen that the V2V link acts as an agent based on the current state stE S selects a channel and transmit power according to a reward function.
Based on the description of the present invention, it should be apparent to those skilled in the art that the V2V resource allocation method based on the deep reinforcement learning DDPG algorithm using the 5G network slicing technique of the present invention can improve the system throughput and ensure that the communication delay meets the safety requirement.
Details not described in the present application are well within the skill of those in the art.
Claims (2)
1. A5G Internet of vehicles V2V resource allocation method adopting a depth deterministic strategy gradient algorithm is characterized by comprising the following steps:
(1) communication services in the internet of vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) about driving safety;
(2) dividing V2I and V2V communication traffic into different slices respectively by using a 5G network slicing technology;
(3) the constructed user resource allocation system model is that K shares a channel with the authorized bandwidth of B for V2V users;
(4) by adopting a distributed resource allocation method, under the condition of considering V2V link delay, a deep reinforcement learning model is constructed with the aim of maximizing the throughput of a communication system;
(5) considering a joint optimization problem in a continuous action space, and optimizing a deep reinforcement learning model by using a deep certainty strategy gradient (DDPG) algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback;
(6) and obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
2. The 5G Internet of vehicles V2V resource allocation method adopting the deep deterministic strategy gradient algorithm according to claim 1, wherein the step (4) comprises the following specific steps:
(4a) the state space S is specifically defined as observation information related to resource allocation, including subchannel m corresponding V2V link instantaneous channel state information Gt[m]Interference strength I received in the previous time slot of sub-channel mt-1[m]Number of times N that subchannel m was selected by an adjacent V2V link in the previous time slott-1[m]Residual load L transmitted by V2V usertResidual time delay UtI.e. by
st={Gt,It-1,Nt-1,Lt,Ut}
Considering the V2V link as an agent, each time the V2V link is based on the current state stSelecting a channel and transmitting power by the E.S;
Wherein,the transmission power of the k-th V2V link user,for the use of the mth channel by the kth V2V link user,indicating that the mth channel is used by the kth V2V link user,indicating that the mth channel is not used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is that the V2V link selects the spectral sub-band and transmit power to maximize the system throughput of the V2V link while satisfying the delay constraint. The reward function can thus be expressed as:
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty will increase as the transmission time increases;
(4d) establishing a deep reinforcement learning model on the basis of Q learning according to the established S, A and R, and evaluating the function Q (S)t,at) Represents the slave state stPerforming action atThe resulting discount reward, the Q-value update function is:
wherein r istIs an instant reward function, gamma is a discount factor, stFor the state of V2V link at time tInformation, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110273529.0A CN112995951B (en) | 2021-03-12 | 2021-03-12 | 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110273529.0A CN112995951B (en) | 2021-03-12 | 2021-03-12 | 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112995951A true CN112995951A (en) | 2021-06-18 |
CN112995951B CN112995951B (en) | 2022-04-08 |
Family
ID=76335240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110273529.0A Active CN112995951B (en) | 2021-03-12 | 2021-03-12 | 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112995951B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113676958A (en) * | 2021-07-28 | 2021-11-19 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113709882A (en) * | 2021-08-24 | 2021-11-26 | 吉林大学 | Vehicle networking communication resource allocation method based on graph theory and reinforcement learning |
CN113727306A (en) * | 2021-08-16 | 2021-11-30 | 南京大学 | Decoupling C-V2X network slicing method based on deep reinforcement learning |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113965944A (en) * | 2021-09-14 | 2022-01-21 | 中国船舶重工集团公司第七一六研究所 | Method and system for maximizing delay certainty by ensuring system control performance |
CN114245345A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles power control method and system for imperfect channel state information |
CN114245344A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles uncertain channel state information robust power control method and system |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114641041A (en) * | 2022-05-18 | 2022-06-17 | 之江实验室 | Edge-intelligent-oriented Internet of vehicles slicing method and device |
CN114786201A (en) * | 2022-04-28 | 2022-07-22 | 合肥工业大学 | Dynamic cooperative optimization method for communication delay and channel efficiency of wireless network |
CN114827956A (en) * | 2022-05-12 | 2022-07-29 | 南京航空航天大学 | High-energy-efficiency V2X resource allocation method for user privacy protection |
CN114885426A (en) * | 2022-05-05 | 2022-08-09 | 南京航空航天大学 | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network |
CN115515101A (en) * | 2022-09-23 | 2022-12-23 | 西北工业大学 | Decoupling Q learning intelligent codebook selection method for SCMA-V2X system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170079059A1 (en) * | 2015-09-11 | 2017-03-16 | Intel IP Corporation | Slicing architecture for wireless communication |
US20190174449A1 (en) * | 2018-02-09 | 2019-06-06 | Intel Corporation | Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function |
CN110320883A (en) * | 2018-03-28 | 2019-10-11 | 上海汽车集团股份有限公司 | A kind of Vehicular automatic driving control method and device based on nitrification enhancement |
CN110753319A (en) * | 2019-10-12 | 2020-02-04 | 山东师范大学 | Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles |
CN110972107A (en) * | 2018-09-29 | 2020-04-07 | 华为技术有限公司 | Load balancing method and device |
CN111083942A (en) * | 2018-08-22 | 2020-04-28 | Lg 电子株式会社 | Method and apparatus for performing uplink transmission in wireless communication system |
CN111137292A (en) * | 2018-11-01 | 2020-05-12 | 通用汽车环球科技运作有限责任公司 | Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
CN112469000A (en) * | 2019-09-06 | 2021-03-09 | 杨海琴 | System and method for vehicle network service on 5G network |
-
2021
- 2021-03-12 CN CN202110273529.0A patent/CN112995951B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170079059A1 (en) * | 2015-09-11 | 2017-03-16 | Intel IP Corporation | Slicing architecture for wireless communication |
US20190174449A1 (en) * | 2018-02-09 | 2019-06-06 | Intel Corporation | Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function |
CN110320883A (en) * | 2018-03-28 | 2019-10-11 | 上海汽车集团股份有限公司 | A kind of Vehicular automatic driving control method and device based on nitrification enhancement |
CN111083942A (en) * | 2018-08-22 | 2020-04-28 | Lg 电子株式会社 | Method and apparatus for performing uplink transmission in wireless communication system |
CN110972107A (en) * | 2018-09-29 | 2020-04-07 | 华为技术有限公司 | Load balancing method and device |
CN111137292A (en) * | 2018-11-01 | 2020-05-12 | 通用汽车环球科技运作有限责任公司 | Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles |
CN112469000A (en) * | 2019-09-06 | 2021-03-09 | 杨海琴 | System and method for vehicle network service on 5G network |
CN110753319A (en) * | 2019-10-12 | 2020-02-04 | 山东师范大学 | Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
Non-Patent Citations (2)
Title |
---|
KAI YU: "A Reinforcement Learning Aided Decoupled RAN Slicing Framework for Cellular V2X", 《GLOBECOM 2020 - 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 * |
郭彩丽: "动态时空数据驱动的认知车联网频谱感知与共享技术研究", 《物联网学报》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113676958B (en) * | 2021-07-28 | 2023-06-02 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113676958A (en) * | 2021-07-28 | 2021-11-19 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113727306A (en) * | 2021-08-16 | 2021-11-30 | 南京大学 | Decoupling C-V2X network slicing method based on deep reinforcement learning |
CN113709882A (en) * | 2021-08-24 | 2021-11-26 | 吉林大学 | Vehicle networking communication resource allocation method based on graph theory and reinforcement learning |
CN113709882B (en) * | 2021-08-24 | 2023-10-17 | 吉林大学 | Internet of vehicles communication resource allocation method based on graph theory and reinforcement learning |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113766661B (en) * | 2021-08-30 | 2023-12-26 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113965944A (en) * | 2021-09-14 | 2022-01-21 | 中国船舶重工集团公司第七一六研究所 | Method and system for maximizing delay certainty by ensuring system control performance |
CN114245345A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles power control method and system for imperfect channel state information |
CN114245344A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles uncertain channel state information robust power control method and system |
CN114245345B (en) * | 2021-11-25 | 2024-04-19 | 西安电子科技大学 | Imperfect channel state information-oriented Internet of vehicles power control method and system |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114449482B (en) * | 2022-03-11 | 2024-05-14 | 南京理工大学 | Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning |
CN114786201A (en) * | 2022-04-28 | 2022-07-22 | 合肥工业大学 | Dynamic cooperative optimization method for communication delay and channel efficiency of wireless network |
CN114786201B (en) * | 2022-04-28 | 2024-09-03 | 合肥工业大学 | Dynamic cooperative optimization method for communication delay and channel efficiency of wireless network |
CN114885426A (en) * | 2022-05-05 | 2022-08-09 | 南京航空航天大学 | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network |
CN114885426B (en) * | 2022-05-05 | 2024-04-16 | 南京航空航天大学 | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network |
CN114827956A (en) * | 2022-05-12 | 2022-07-29 | 南京航空航天大学 | High-energy-efficiency V2X resource allocation method for user privacy protection |
CN114827956B (en) * | 2022-05-12 | 2024-05-10 | 南京航空航天大学 | High-energy-efficiency V2X resource allocation method for user privacy protection |
CN114641041B (en) * | 2022-05-18 | 2022-09-13 | 之江实验室 | Internet of vehicles slicing method and device oriented to edge intelligence |
CN114641041A (en) * | 2022-05-18 | 2022-06-17 | 之江实验室 | Edge-intelligent-oriented Internet of vehicles slicing method and device |
CN115515101A (en) * | 2022-09-23 | 2022-12-23 | 西北工业大学 | Decoupling Q learning intelligent codebook selection method for SCMA-V2X system |
Also Published As
Publication number | Publication date |
---|---|
CN112995951B (en) | 2022-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112995951B (en) | 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm | |
Chen et al. | Cooperative edge caching with location-based and popular contents for vehicular networks | |
CN113543074B (en) | Joint computing migration and resource allocation method based on vehicle-road cloud cooperation | |
CN112954651B (en) | Low-delay high-reliability V2V resource allocation method based on deep reinforcement learning | |
CN109862610A (en) | A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm | |
CN114885426B (en) | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network | |
Qian et al. | Leveraging dynamic stackelberg pricing game for multi-mode spectrum sharing in 5G-VANET | |
CN111970733A (en) | Deep reinforcement learning-based cooperative edge caching algorithm in ultra-dense network | |
Zhang et al. | Fuzzy logic-based resource allocation algorithm for V2X communications in 5G cellular networks | |
CN111132074B (en) | Multi-access edge computing unloading and frame time slot resource allocation method in Internet of vehicles environment | |
Lin et al. | Popularity-aware online task offloading for heterogeneous vehicular edge computing using contextual clustering of bandits | |
CN115052262A (en) | Potential game-based vehicle networking computing unloading and power optimization method | |
CN110267274A (en) | A kind of frequency spectrum sharing method according to credit worthiness selection sensing user social between user | |
CN117412391A (en) | Enhanced dual-depth Q network-based Internet of vehicles wireless resource allocation method | |
CN115134779A (en) | Internet of vehicles resource allocation method based on information age perception | |
Ouyang | Task offloading algorithm of vehicle edge computing environment based on Dueling-DQN | |
Zhou et al. | Multi-agent few-shot meta reinforcement learning for trajectory design and channel selection in UAV-assisted networks | |
Bhadauria et al. | QoS based deep reinforcement learning for V2X resource allocation | |
Wu et al. | AoI minimization for UAV-to-device underlay communication by multi-agent deep reinforcement learning | |
Mei et al. | Semi-decentralized network slicing for reliable V2V service provisioning: A model-free deep reinforcement learning approach | |
Khan et al. | Sum throughput maximization scheme for NOMA-Enabled D2D groups using deep reinforcement learning in 5G and beyond networks | |
Ren et al. | Joint spectrum allocation and power control in vehicular communications based on dueling double DQN | |
Yang et al. | Task-driven semantic-aware green cooperative transmission strategy for vehicular networks | |
Yadav et al. | Joint mode selection and resource allocation for cellular V2X communication using distributed deep reinforcement learning under 5G and beyond networks | |
Masaracchia et al. | User mobility into NOMA assisted communication: analysis and a reinforcement learning with neural network based approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |