CN112995951A - 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm - Google Patents
5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm Download PDFInfo
- Publication number
- CN112995951A CN112995951A CN202110273529.0A CN202110273529A CN112995951A CN 112995951 A CN112995951 A CN 112995951A CN 202110273529 A CN202110273529 A CN 202110273529A CN 112995951 A CN112995951 A CN 112995951A
- Authority
- CN
- China
- Prior art keywords
- link
- resource allocation
- channel
- user
- vehicles
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013468 resource allocation Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000002787 reinforcement Effects 0.000 claims abstract description 34
- 230000009471 action Effects 0.000 claims abstract description 28
- 238000004891 communication Methods 0.000 claims abstract description 27
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 20
- 230000005540 biological transmission Effects 0.000 claims description 15
- 238000013135 deep learning Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/46—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0215—Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices
- H04W28/0221—Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices power availability or consumption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/0231—Traffic management, e.g. flow control or congestion control based on communication conditions
- H04W28/0236—Traffic management, e.g. flow control or congestion control based on communication conditions radio quality, e.g. interference, losses or delay
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/44—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
本发明提出一种基于深度确定性策略梯度(DDPG)算法的车对车(V2V)通信资源分配方法,V2V通信使用网络切片技术接入5G网络,利用深度强化学习优化策略获得最优的V2V用户信道分配和发射功率联合优化策略,V2V用户通过选择合适的发射功率和信道,来降低V2V链路之间的相互干扰,在满足链路延迟约束下,最大化V2V链路的系统总吞吐量。本发明使用DDPG算法可以有效解决V2V用户信道分配和功率选择的联合优化问题,可以在一系列连续动作空间的优化中表现稳定。
The present invention proposes a vehicle-to-vehicle (V2V) communication resource allocation method based on the Deep Deterministic Policy Gradient (DDPG) algorithm. The V2V communication uses the network slicing technology to access the 5G network, and uses the deep reinforcement learning optimization strategy to obtain the optimal V2V user. Channel allocation and transmit power joint optimization strategy, V2V users can reduce the mutual interference between V2V links by selecting appropriate transmit power and channels, and maximize the total system throughput of V2V links under the constraint of link delay. The invention can effectively solve the joint optimization problem of V2V user channel allocation and power selection by using the DDPG algorithm, and can perform stably in the optimization of a series of continuous action spaces.
Description
技术领域technical field
本发明涉及一种车联网技术,尤其涉及一种车联网的资源分配方法,更具体地说,涉及一种采用深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法的5G车联网的车对车(Vehicle-to-Vehicle,V2V)通信资源分配方法。The present invention relates to an Internet of Vehicles technology, in particular to a resource allocation method for the Internet of Vehicles, and more particularly, to a vehicle-to-vehicle pairing system for 5G Internet of Vehicles using a Deep Deterministic Policy Gradient (DDPG) algorithm. Vehicle-to-Vehicle (V2V) communication resource allocation method.
背景技术Background technique
车联网(Vehicle-to-everything,V2X)是物联网(Internet of Things,IoT)在智能交通系统(Intelligent Transportation System,ITS)领域中的典型应用,它是指基于Intranet、Internet和移动车载网络而形成的无处不在的智能车网络。车联网根据约定的通信协议和数据交互标准共享和交换数据。它通过对行人、路边设施、车辆、网络和云之间的实时感知和协作,实现了智能交通管理和服务,例如改善了道路安全,增强了路况感知并减少了交通拥堵。Vehicle-to-everything (V2X) is a typical application of Internet of Things (IoT) in the field of Intelligent Transportation System (ITS). The ubiquitous smart car network formed. The Internet of Vehicles shares and exchanges data according to agreed communication protocols and data exchange standards. It enables intelligent traffic management and services such as improved road safety, enhanced road condition awareness and reduced traffic congestion through real-time perception and collaboration between pedestrians, roadside facilities, vehicles, networks and the cloud.
合理的车联网资源分配对于减轻干扰、提高网络效率和最终优化无线通信性能至关重要。传统的资源分配方案大多利用缓慢变化的大规模衰落信道信息进行分配。有文献提出了一种启发式的位置相关上行链路资源分配方案,其特征在于空间资源重用,而不需要完整的信道状态信息,因此减少了信令开销。另有研究开发了包括车辆分组、复用信道选择和功率控制的框架,可以降低V2V用户对蜂窝网络的总干扰,同时最大化V2V用户的和速率或最小可达速率。但随着通信量的与日俱增和通信速率需求的大幅提升,高移动性导致无线信道快速变化给资源分配带来很大的不确定性,传统的资源分配方法无法满足人们对车联网的高可靠性和低延时需求。Reasonable allocation of IoV resources is crucial for mitigating interference, improving network efficiency, and ultimately optimizing wireless communication performance. Most of the traditional resource allocation schemes use the slowly changing large-scale fading channel information for allocation. A heuristic location-dependent uplink resource allocation scheme has been proposed, which is characterized by spatial resource reuse without the need for complete channel state information, thus reducing signaling overhead. Another research has developed a framework including vehicle grouping, multiplexing channel selection and power control, which can reduce the total interference of V2V users to the cellular network while maximizing the sum rate or minimum achievable rate of V2V users. However, with the increasing communication volume and the substantial increase in communication rate requirements, high mobility leads to rapid changes in wireless channels, which brings great uncertainty to resource allocation. Traditional resource allocation methods cannot meet people's high reliability of the Internet of Vehicles. and low latency requirements.
深度学习提供了多层计算模型,可以从非结构化源中学习具有多级抽象的高效数据表示,为解决许多传统上被认为是困难的问题提供了一种强大的数据驱动方法。基于深度强化学习算法的资源分配方案比传统资源分配算法更能满足车联网的高可靠性和低延时性的要求。有文献提出了一种可以应用于单播和广播场景的基于深度强化学习的新型分布式车对车通信资源分配机制。根据分布的资源分配机制,智能体,即V2V链路或车辆不需要等待全局状态信息就可以做出决定以找到最佳子带和传输功率水平。但现有的基于深度强化学习的V2V资源分配算法无法满足5G网络下高带宽、大容量、超可靠低时延等场景的差异化服务需求。Deep learning provides multi-layered computational models that can learn efficient data representations with multiple levels of abstraction from unstructured sources, providing a powerful data-driven approach to many traditionally considered difficult problems. The resource allocation scheme based on the deep reinforcement learning algorithm can better meet the requirements of high reliability and low latency of the Internet of Vehicles than the traditional resource allocation algorithm. Some literatures propose a novel distributed vehicle-to-vehicle communication resource allocation mechanism based on deep reinforcement learning that can be applied to unicast and broadcast scenarios. According to the distributed resource allocation mechanism, the agent, i.e. the V2V link or the vehicle does not need to wait for global state information to make a decision to find the optimal subband and transmit power level. However, the existing V2V resource allocation algorithms based on deep reinforcement learning cannot meet the needs of differentiated services in scenarios such as high bandwidth, large capacity, ultra-reliable and low-latency under 5G networks.
因此本发明提出的资源分配方法采用5G网络切片技术,能在5G网络下为不同应用场景提供差异化服务,同时采用可在一系列连续动作空间的优化中表现稳定的DDPG算法进行V2V资源分配,以系统吞吐量最大化作为V2V资源分配的优化目标,在复杂度和性能之间取得了很好的平衡。Therefore, the resource allocation method proposed in the present invention adopts the 5G network slicing technology, which can provide differentiated services for different application scenarios under the 5G network, and at the same time adopts the DDPG algorithm that can perform stable in the optimization of a series of continuous action spaces for V2V resource allocation, Taking the maximization of system throughput as the optimization goal of V2V resource allocation, it achieves a good balance between complexity and performance.
发明内容SUMMARY OF THE INVENTION
发明目的:针对现有技术存在的上述问题,提出一种基于深度强化学习DDPG算法V2V用户资源分配方法,V2V通信以网络切片技术接入5G网络。该方法能在V2V链路对V2I链路没有干扰的情况下,以较低的V2V链路延迟实现系统吞吐量最大化的V2V用户资源分配。Purpose of the invention: In view of the above problems existing in the prior art, a method for V2V user resource allocation based on the deep reinforcement learning DDPG algorithm is proposed. V2V communication uses network slicing technology to access 5G network. The method can achieve V2V user resource allocation that maximizes system throughput with lower V2V link delay under the condition that the V2V link does not interfere with the V2I link.
技术方案:在考虑V2V链路延迟的情况下,以合理的资源分配达到系统通信系统吞吐量最大化的目的。我们采用5G网络切片技术,V2V链路和V2I链路使用不同的切片,V2V链路对V2I链路不产生干扰。采用分布式的资源分配方法,不需要基站集中调度信道状态信息,将每条V2V链路视为智能体,并且基于瞬时状态信息和每个时隙从邻居共享的信息来选择信道和发射功率。通过建立深度强化学习模型,利用DDPG算法优化深度强化学习模型。根据优化后的深度强化学习模型,得到最优的V2V用户发射功率和信道分配策略。完成上述发明通过以下技术方案实现:一种采用DDPG算法的基于5G网络切片的V2V资源分配方法,包括步骤如下:Technical solution: In the case of considering the V2V link delay, the purpose of maximizing the throughput of the system communication system is achieved by reasonable resource allocation. We adopt 5G network slicing technology, V2V link and V2I link use different slices, V2V link does not interfere with V2I link. The distributed resource allocation method does not require the base station to centrally schedule channel state information, treats each V2V link as an agent, and selects the channel and transmit power based on the instantaneous state information and information shared from neighbors for each time slot. By establishing a deep reinforcement learning model, the DDPG algorithm is used to optimize the deep reinforcement learning model. According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained. Completion of the above invention is achieved through the following technical solutions: a V2V resource allocation method based on 5G network slicing using the DDPG algorithm, including the following steps:
(1)将车联网中的通信业务分为两种类型,即车辆与路边设施之间(V2I)的宽带多媒体数据传输以及车与车之间(V2V)关于行车安全的数据传输;(1) The communication services in the Internet of Vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles and vehicles (V2V) about driving safety;
(2)利用5G网络切片技术,将V2I与V2V通信业务分别划分到不同切片;(2) Use 5G network slicing technology to divide V2I and V2V communication services into different slices respectively;
(3)构建的用户资源分配系统模型为K对V2V用户共用授权带宽为B的信道;(3) The constructed user resource allocation system model is that K shares a channel with an authorized bandwidth of B for V2V users;
(4)采用分布式的资源分配方法,在考虑V2V链路延迟的情况下,以通信系统吞吐量最大化为目标构建深度强化学习模型;(4) Using a distributed resource allocation method, in the case of considering the V2V link delay, a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system;
(5)考虑连续动作空间中的联合优化问题,利用包含深度学习拟合,软更新,记忆回放三个机制的深度确定性策略梯度(DDPG)算法优化深度强化学习模型;(5) Consider the joint optimization problem in the continuous action space, and optimize the deep reinforcement learning model by using the deep deterministic policy gradient (DDPG) algorithm including three mechanisms of deep learning fitting, soft update, and memory playback;
(6)根据优化后的深度强化学习模型,得到最优V2V用户发射功率和信道分配策略。(6) According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained.
进一步的,所述步骤(4)包括如下具体步骤:Further, the step (4) includes the following specific steps:
(4a),具体地定义状态空间S为与资源分配有关的信道信息,包括子信道m相应V2V链路瞬时信道信息Gt[m],子信道m前一时隙接收到的干扰强度It-1[m],子信道m在前一时隙被相邻的V2V链路选择的次数Nt-1[m],V2V用户传输的剩余负载Lt,剩余时延Ut,即(4a), specifically define the state space S as the channel information related to resource allocation, including the instantaneous channel information G t [m] of the corresponding V2V link of the sub-channel m, the interference intensity I t- 1 [m], the number of times the subchannel m is selected by the adjacent V2V link in the previous time slot N t-1 [m], the remaining load L t transmitted by the V2V user, and the remaining delay U t , namely
st={Gt,It-1,Nt-1,Lt,Ut}s t ={G t , I t-1 , N t-1 , L t , U t }
将V2V链路视为智能体,每次V2V链路基于当前状态st∈S选择信道和发射功率;Considering the V2V link as an agent, each V2V link selects the channel and transmit power based on the current state s t ∈ S;
(4b),定义动作空间A为发射功率和选择的信道,表示为 (4b), define the action space A as the transmit power and the selected channel, expressed as
其中,为第k个V2V链路用户的发射功率,为第m个信道被第k个V2V链路用户使用情况;in, is the transmit power of the kth V2V link user, is the usage of the mth channel by the kth V2V link user;
(4c),定义奖励函数R,V2V资源分配的目标是V2V链路选择频谱子带和发射功率,在满足延迟约束,对其他V2V链路产生较小的干扰的要求下最大化V2V链路的系统吞吐量。因此奖励函数可以表示为:(4c), define the reward function R, the goal of V2V resource allocation is to select the spectral subband and transmit power of the V2V link, and maximize the V2V link under the requirement of satisfying the delay constraint and causing less interference to other V2V links. system throughput. So the reward function can be expressed as:
其中,T0为最大可容忍延迟,λd、λp为两个部分的权值,T0-Ut是传输所用的时间,随着传输时间的增加,惩罚也会增加。Among them, T 0 is the maximum tolerable delay, λ d and λ p are the weights of the two parts, and T 0 -U t is the time used for transmission. As the transmission time increases, the penalty will also increase.
(4d),依据建立好的S,A和R,在Q学习的基础上建立深度强化学习模型,评估函数Q(st,at)表示从状态st执行动作at后产生的折扣奖励,Q值更新函数为:(4d), according to the established S, A and R, a deep reinforcement learning model is established on the basis of Q learning, and the evaluation function Q(s t , a t ) represents the discounted reward generated after the action a t is executed from the state s t , the Q value update function is:
其中,rt为即时奖励函数,γ为折扣因子,st为V2V链路在t时刻的状态信息,st+1表示V2V链路在执行at后的状态,A为动作at构成的动作空间。Among them, r t is the immediate reward function, γ is the discount factor, s t is the state information of the V2V link at time t, s t+1 represents the state of the V2V link after executing at t , and A is composed of action at t action space.
有益效果:本发明提出的一种采用深度确定性策略梯度算法的基于5G网络切片的V2V资源分配方法,V2V通信使用网络切片技术接入5G网络,利用深度强化学习优化策略获得最优的V2V用户信道分配和发射功率联合优化策略,V2V用户通过选择合适的发射功率和分配信道,来降低V2V链路之间的相互干扰,在满足链路延迟的约束下,最大化V2V链路的系统吞吐量。本发明使用DDPG算法可以有效解决V2V用户信道分配和功率选择的联合优化问题,可以在一系列连续动作空间的优化中表现稳定。Beneficial effects: a V2V resource allocation method based on 5G network slicing using a deep deterministic strategy gradient algorithm proposed by the present invention, V2V communication uses network slicing technology to access 5G network, and uses deep reinforcement learning optimization strategy to obtain optimal V2V users Channel allocation and transmit power joint optimization strategy, V2V users can reduce the mutual interference between V2V links by selecting appropriate transmit power and allocation channels, and maximize the system throughput of V2V links under the constraint of link delay . The invention can effectively solve the joint optimization problem of V2V user channel allocation and power selection by using the DDPG algorithm, and can perform stably in the optimization of a series of continuous action spaces.
综上所述,在保证资源分配合理,V2V链路间低干扰以及计算复杂度低的情况下,本发明提出的一种采用深度确定性策略梯度算法的基于5G网络切片的V2V资源分配方法在最大化V2V系统吞吐量方面是优越的。To sum up, in the case of ensuring reasonable resource allocation, low interference between V2V links and low computational complexity, a V2V resource allocation method based on 5G network slicing using a deep deterministic policy gradient algorithm proposed by the present invention is as follows: It is superior in terms of maximizing the throughput of the V2V system.
附图说明Description of drawings
图1为本发明实施例提供的一种采用深度确定性策略梯度算法的5G车联网V2V资源分配方法的流程图;1 is a flowchart of a 5G Internet of Vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm provided by an embodiment of the present invention;
图2为本发明实施例提供的基于5G网络切片技术的V2V用户资源分配模型示意图;2 is a schematic diagram of a V2V user resource allocation model based on 5G network slicing technology provided by an embodiment of the present invention;
图3为本发明实施例提供的基于Actor-Critic模型的深度强化学习框架示意图;3 is a schematic diagram of a deep reinforcement learning framework based on an Actor-Critic model provided by an embodiment of the present invention;
图4为本发明实施例提供的V2V通信深度强化学习模型示意图;4 is a schematic diagram of a deep reinforcement learning model for V2V communication provided by an embodiment of the present invention;
具体实施方式Detailed ways
本发明的核心思想在于:V2V通信以网络切片技术接入5G网络,采用分布式的资源分配方法,将每条V2V链路视为智能体,通过建立深度强化学习模型,利用DDPG算法优化深度强化学习模型。根据优化后的深度强化学习模型,得到最优的V2V用户发射功率和信道分配策略。The core idea of the present invention is: V2V communication uses network slicing technology to access 5G network, adopts distributed resource allocation method, treats each V2V link as an intelligent body, establishes a deep reinforcement learning model, and uses DDPG algorithm to optimize deep reinforcement Learning models. According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained.
下面对本发明做进一步详细描述。The present invention will be described in further detail below.
步骤(1),车联网中的通信业务车联网中的通信业务分为两种类型即,车辆与路边设施之间(V2I)的宽带多媒体数据传输以及车与车之间(V2V)与行车安全相关的数据传输。Step (1), the communication service in the Internet of Vehicles The communication service in the Internet of Vehicles is divided into two types, namely, broadband multimedia data transmission between vehicles and roadside facilities (V2I) and vehicle-to-vehicle (V2V) and driving. Security-Related Data Transmission.
步骤(2),利用5G网络切片技术,将V2I与V2V分别划分到不同切片。In step (2), the 5G network slicing technology is used to divide V2I and V2V into different slices respectively.
步骤(3),构建的用户资源分配系统模型为K对V2V用户共用授权带宽为B的信道,In step (3), the constructed user resource allocation system model is that K shares a channel with an authorized bandwidth of B to V2V users,
包括如下具体步骤:It includes the following specific steps:
(3a),建立V2V用户资源分配系统模型,系统包括K对V2V用户(VUEs),用集合κ={1,2,...,K}表示,总的授权带宽B被等分成M个带宽为B0的子信道,子信道用集合表示;(3a), establish a V2V user resource allocation system model, the system includes K pairs of V2V users (VUEs), represented by a set κ = {1, 2, ..., K}, the total authorized bandwidth B is equally divided into M bandwidths is the sub-channel of B 0 , and the sub-channel uses a set express;
(3b),第K条V2V链路的SINR可以表示为:(3b), the SINR of the K-th V2V link can be expressed as:
其中,in,
Gd是共享相同RB的所有V2V链路的总干扰功率,gk是第k条V2V链路车联网用户的信道增益,是第k′条V2V链路对第k条V2V链路的干扰增益。第K条V2V链路的信道容量可以表示为:G d is the total interference power of all V2V links sharing the same RB, g k is the channel gain of the k-th V2V link IoV user, is the interference gain of the k'th V2V link to the kth V2V link. The channel capacity of the Kth V2V link can be expressed as:
Cv[k]=W·log(1+γv[k]); 表达式3C v [k]=W·log(1+ γv [k]); Expression 3
(3c),对于第k个V2V链路,其在t时刻选择信道信息为:(3c), for the kth V2V link, the channel information selected at time t is:
若则第m个信道被第k条V2V链路使用,同时有且i≠m,即K为V2V链路总个数,M为V2V链路接入切片的可用信道总数。like Then the mth channel is used by the kth V2V link, and there are And i≠m, that is K is the total number of V2V links, and M is the total number of available channels of the V2V link access slice.
步骤(4),采用分布式的资源分配方法,在考虑V2V链路延迟的情况下,以通信系统吞吐量最大化为目标构建深度强化学习模型,包括如下具体步骤:In step (4), a distributed resource allocation method is adopted, and a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system under the consideration of the V2V link delay, including the following specific steps:
(4a),具体地定义状态空间S为与资源分配有关的观测信息,包括子信道相应V2V链路瞬时信道信息子信道前一时隙接收到的干扰强度It-1[m],信道m在前一时隙被相邻的V2V链路选择的次数Ny-1[m],剩余的V2V负载Lt,剩余时延Ut,即(4a), specifically define the state space S as the observation information related to resource allocation, including the instantaneous channel information of the corresponding V2V link of the sub-channel The received interference strength It -1 [m] of the subchannel in the previous slot, Number N y-1 [m] that channel m was selected by adjacent V2V links in the previous time slot, The remaining V2V load L t , the remaining time delay U t , namely
(4b),定义动作空间A为发射功率和选择的信道,表示为 (4b), define the action space A as the transmit power and the selected channel, expressed as
其中,为第k个V2V链路用户的发射功率,为第m个信道被第k个V2V链路用户使用情况,表示第m个信道被第k个V2V链路用户使用,表示第m个信道没有被第k个V2V链路用户使用;in, is the transmit power of the kth V2V link user, is the usage of the mth channel by the kth V2V link user, indicates that the mth channel is used by the kth V2V link user, Indicates that the mth channel is not used by the kth V2V link user;
(4c),定义奖励函数R,V2V资源分配的目标是V2V链路选择频谱子带和发射功率,在满足延迟约束,对其他V2V链路产生较小的干扰的要求下最大化V2V链路的系统吞吐量。因此奖励函数可以表示为:(4c), define the reward function R, the goal of V2V resource allocation is to select the spectral subband and transmit power of the V2V link, and maximize the V2V link under the requirement of satisfying the delay constraint and causing less interference to other V2V links. system throughput. So the reward function can be expressed as:
其中,T0为最大可容忍延迟,λd、λp为两个部分的权值,T0-Ut是传输所用的时间,随着传输时间的增加,惩罚也会增加。为了获得长期的良好回报,应同时考虑眼前的回报和未来的回报。因此,强化学习的主要目标是找到一种策略来最大化预期的累积折扣回报,Among them, T 0 is the maximum tolerable delay, λ d and λ p are the weights of the two parts, and T 0 -U t is the time used for transmission. As the transmission time increases, the penalty will also increase. For good long-term returns, both immediate and future returns should be considered. Therefore, the main goal of reinforcement learning is to find a strategy that maximizes the expected cumulative discounted return,
其中,β∈[0,1]是折扣因子;where β∈[0,1] is the discount factor;
(4d),依据建立好的S,A和R,在Q学习的基础上建立深度强化学习模型:评估函数Q(st,at)表示从状态st执行动作at后产生的折扣奖励,Q值更新函数为(4d), according to the established S, A and R, establish a deep reinforcement learning model on the basis of Q learning: the evaluation function Q(s t , a t ) represents the discounted reward generated after performing the action a t from the state s t , the Q value update function is
其中,rt为即时奖励函数,γ为折扣因子,st为V2V链路在t时刻的状态信息,st+1表示V2V链路在执行at后的状态,A为动作at构成的动作空间。Among them, r t is the immediate reward function, γ is the discount factor, s t is the state information of the V2V link at time t, s t+1 represents the state of the V2V link after executing at t , and A is composed of action at t action space.
步骤(5),为了解决基于5G网络切片的V2V资源分配问题,以V2V链路为智能体所建立的深度强化学习模型中的动作空间包括发射功率和信道选择两个变量,考虑发射功率一定范围内连续变化,为了解决这种高维动作空间,尤其是连续动作空间中的联合优化问题,利用包含深度学习拟合,软更新,记忆回放三个机制的DDPG算法优化深度强化学习模型。Step (5), in order to solve the problem of V2V resource allocation based on 5G network slicing, the action space in the deep reinforcement learning model established with the V2V link as the agent includes two variables: transmission power and channel selection, considering a certain range of transmission power. In order to solve this high-dimensional action space, especially the joint optimization problem in the continuous action space, the deep reinforcement learning model is optimized by using the DDPG algorithm including the three mechanisms of deep learning fitting, soft update, and memory playback.
深度学习拟合指DDPG算法基于Actor-Critic框架,分别使用参数为θ和δ的深度神经网络来拟合确定性策略a=μ(s|θ)和动作值函数Q(s,a|δ)如说明书附图图3所示。Deep learning fitting means that the DDPG algorithm is based on the Actor-Critic framework, and uses deep neural networks with parameters θ and δ to fit the deterministic strategy a=μ(s|θ) and the action value function Q(s, a|δ) As shown in Figure 3 of the accompanying drawings.
软更新指动作值网络的参数在频繁梯度更新的同时,又用于计算策略网络的梯度,使得动作值网络的学习过程很可能出现不稳定的情况,所以提出采用软更新方式来更新网络。Soft update means that the parameters of the action value network are used to calculate the gradient of the policy network while the parameters of the action value network are updated frequently, which makes the learning process of the action value network likely to be unstable. Therefore, a soft update method is proposed to update the network.
分别为策略网络和动作值网络创建在线网络和目标网络两个神经网络:Create two neural networks, the online network and the target network, for the policy network and the action-value network, respectively:
训练过程中利用梯度下降不断更新网络,目标网络的更新方式如下During the training process, gradient descent is used to continuously update the network. The update method of the target network is as follows
θ′=τθ+(1-τ)θ 表达式9θ′=τθ+(1-τ)θ Expression 9
δ′=τδ+(1-τ)δ 表达式10δ′=τδ+(1-τ)
经验回放机制是指与环境交互时产生的状态转换样本数据具有时序关联性,易造成动作值函数拟合的偏差。因此,借鉴DQN算法的经验回放机制,将采集到的样本先放入样本池,然后从样本池中随机选出一些mini-batch样本用于对网络的训练。这种处理去除了样本间的相关性和依赖性,解决了数据间相关性及其非静态分布的问题,使得算法更容易收敛。The experience replay mechanism means that the state transition sample data generated when interacting with the environment has a time-series correlation, which is easy to cause the deviation of the action value function fitting. Therefore, drawing on the experience playback mechanism of the DQN algorithm, the collected samples are first put into the sample pool, and then some mini-batch samples are randomly selected from the sample pool for training the network. This process removes the correlation and dependency between samples, solves the problem of correlation between data and its non-static distribution, and makes the algorithm easier to converge.
利用包含深度学习拟合,软更新,记忆回放三个机制的DDPG算法优化深度强化学习模型,包括如下步骤:Using the DDPG algorithm including deep learning fitting, soft update, and memory playback to optimize the deep reinforcement learning model includes the following steps:
(5a),初始化训练回合数P;(5a), initialize the number of training rounds P;
(5b),初始化P回合中的时间步t;(5b), initialize the time step t in the P round;
(5c),在线Actor策略网络根据输入状态st,输出动作at,并获取即时的奖励rt,同时转到下一状态st+1,从而获得训练数据(st,at,rt,st+1);(5c), the online Actor policy network outputs the action at according to the input state s t , and obtains the immediate reward r t , and at the same time goes to the next state s t+1 , thereby obtaining the training data (s t , at t , r t , s t+1 );
(5d),将训练数据(st,at,rt,st+1)存入经验回放池中;(5d), store the training data (s t , at t , r t , s t+1 ) into the experience playback pool;
(5e),从经验回放池中随机采样m个训练数据(st,at,rt,st+1)构成数据集,发送给在线Actor策略网络、在线Critic评价网络、目标Actor策略网络和目标Critic评价网络;(5e), randomly sample m training data (s t , at , r t , s t +1 ) from the experience playback pool to form a dataset, and send it to the online Actor policy network, online Critic evaluation network, and target Actor policy network and target Critic evaluation network;
(5f),设置Q估计为(5f), set the Q estimate as
yi=ri+γQ′(si+1,μ′(si+1|θ′)|δ′) 表达式11y i =r i +γQ′(s i+1 , μ′(s i+1 |θ′)|δ′) Expression 11
定义在线Critic评价网络的损失函数为The loss function of the online critical evaluation network is defined as
通过神经网络的梯度反向传播来更新Critic当前网络的所有参数θ;Update all parameters θ of Critic's current network through gradient back-propagation of the neural network;
(5g),定义在线Actor策略网络的给抽样策略梯度为(5g), define the gradient of the sampling strategy for the online Actor strategy network as
通过神经网络的梯度反向传播来更新Actor当前网络的所有参数δ;Update all parameters δ of Actor's current network through gradient back-propagation of the neural network;
(5h),若在线训练次数达到目标网络更新频数,根据在线网络参数δ和θ分别更新目标网络参数δ′和θ′;(5h), if the number of online training reaches the target network update frequency, update the target network parameters δ′ and θ′ respectively according to the online network parameters δ and θ;
(5i),判断是否满足t<K,K为p回合中的总时间步,若是,t=t+1,进入步骤5c,否则,进入步骤5j;(5i), judge whether t<K is satisfied, K is the total time step in p round, if so, t=t+1, go to step 5c, otherwise, go to step 5j;
(5j),判断是否满足p<I,I为训练回合数设定阈值,若是,p=p+1,进入步骤5b,否则,优化结束,得到优化后的深度强化学习模型。(5j), judge whether p<I is satisfied, I is the set threshold for the number of training rounds, if so, p=p+1, go to step 5b, otherwise, the optimization ends, and the optimized deep reinforcement learning model is obtained.
步骤(6),根据优化后的深度强化学习模型,得到最优V2V用户发射功率和信道分配策略,包括如下步骤:Step (6), according to the optimized deep reinforcement learning model, obtain the optimal V2V user transmit power and channel allocation strategy, including the following steps:
(6a),利用DDPG算法训练好的深度强化学习模型,输入系统某时刻的状态信息sk(t);(6a), using the deep reinforcement learning model trained by the DDPG algorithm, input the state information sk (t) of the system at a certain time;
(6b),输出最优动作策略得到最优的V2V用户发射功率和分配信道 (6b), output the optimal action strategy Get the optimal V2V user transmit power and assign channels
最后,对说明书中的附图进行详细说明。Finally, the drawings in the specification are described in detail.
在图1中,描述了一种采用深度确定性策略梯度算法的5G车联网V2V资源分配方法的流程,V2V通信使用网络切片技术接入5G网络,利用DDPG优化深度强化学习模型获得最优的V2V用户信道分配和发射功率联合优化策略。In Figure 1, the flow of a 5G Internet of Vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm is described. V2V communication uses network slicing technology to access the 5G network, and uses DDPG to optimize the deep reinforcement learning model to obtain the optimal V2V User channel allocation and transmit power joint optimization strategy.
在图2中,描述了基于5G网络切片技术的V2V用户资源分配模型,V2V通信和V2I通信使用不同的切片。In Figure 2, the V2V user resource allocation model based on 5G network slicing technology is described, and V2V communication and V2I communication use different slices.
在图3中,描述了深度学习拟合指DDPG算法基于Actor-Critic框架,分别使用参数为θ和δ的深度神经网络来拟合确定性策略a=μ(s|θ)和动作值函数Q(s,a|δ)。In Figure 3, the deep learning fitting is described. The DDPG algorithm is based on the Actor-Critic framework and uses deep neural networks with parameters θ and δ to fit the deterministic policy a=μ(s|θ) and the action value function Q, respectively. (s, a|δ).
在图4中,描述了V2V通信深度强化学习模型。可以看出V2V链路作为智能体基于当前状态st∈S根据奖励函数选择信道和发射功率。In Figure 4, a deep reinforcement learning model for V2V communication is depicted. It can be seen that the V2V link as an agent selects the channel and transmit power according to the reward function based on the current state s t ∈ S.
根据对本发明的说明,本领域的技术人员应该不难看出,本发明的采用5G网络切片技术基于深度强化学习DDPG算法的V2V资源分配方法可以提高系统吞吐量并且能保证通信时延达到安全要求。According to the description of the present invention, those skilled in the art should not be difficult to see that the V2V resource allocation method based on the deep reinforcement learning DDPG algorithm using the 5G network slicing technology of the present invention can improve the system throughput and ensure that the communication delay meets the security requirements.
本发明申请书中未作详细描述的内容属于本领域专业技术人员公知的现有技术。The contents not described in detail in the application of the present invention belong to the prior art known to those skilled in the art.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110273529.0A CN112995951B (en) | 2021-03-12 | 2021-03-12 | A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110273529.0A CN112995951B (en) | 2021-03-12 | 2021-03-12 | A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112995951A true CN112995951A (en) | 2021-06-18 |
CN112995951B CN112995951B (en) | 2022-04-08 |
Family
ID=76335240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110273529.0A Active CN112995951B (en) | 2021-03-12 | 2021-03-12 | A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112995951B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113676958A (en) * | 2021-07-28 | 2021-11-19 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113709882A (en) * | 2021-08-24 | 2021-11-26 | 吉林大学 | Vehicle networking communication resource allocation method based on graph theory and reinforcement learning |
CN113727306A (en) * | 2021-08-16 | 2021-11-30 | 南京大学 | Decoupling C-V2X network slicing method based on deep reinforcement learning |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113965944A (en) * | 2021-09-14 | 2022-01-21 | 中国船舶重工集团公司第七一六研究所 | Method and system for maximizing delay certainty by ensuring system control performance |
CN114245344A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles uncertain channel state information robust power control method and system |
CN114245345A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles power control method and system for imperfect channel state information |
CN114401552A (en) * | 2022-01-17 | 2022-04-26 | 重庆邮电大学 | Airspace resource allocation method based on determinant point process learning |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114641041A (en) * | 2022-05-18 | 2022-06-17 | 之江实验室 | A method and device for edge intelligence-oriented Internet of Vehicles slicing |
CN114786201A (en) * | 2022-04-28 | 2022-07-22 | 合肥工业大学 | A Dynamic Cooperative Optimization Method for Communication Delay and Channel Efficiency in Wireless Networks |
CN114827956A (en) * | 2022-05-12 | 2022-07-29 | 南京航空航天大学 | High-energy-efficiency V2X resource allocation method for user privacy protection |
CN114885426A (en) * | 2022-05-05 | 2022-08-09 | 南京航空航天大学 | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network |
CN115086992A (en) * | 2022-05-07 | 2022-09-20 | 北京科技大学 | Distributed semantic communication system and bandwidth resource allocation method and device |
CN115515101A (en) * | 2022-09-23 | 2022-12-23 | 西北工业大学 | Decoupling Q learning intelligent codebook selection method for SCMA-V2X system |
CN115696258A (en) * | 2022-09-01 | 2023-02-03 | 华南师范大学 | Resource allocation method, storage medium and equipment for Internet of Vehicles based on reinforcement learning |
CN118890658A (en) * | 2024-06-14 | 2024-11-01 | 金陵科技学院 | A resource optimization method for Internet of Vehicles based on composite priority experience replay sampling |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170079059A1 (en) * | 2015-09-11 | 2017-03-16 | Intel IP Corporation | Slicing architecture for wireless communication |
US20190174449A1 (en) * | 2018-02-09 | 2019-06-06 | Intel Corporation | Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function |
CN110320883A (en) * | 2018-03-28 | 2019-10-11 | 上海汽车集团股份有限公司 | A kind of Vehicular automatic driving control method and device based on nitrification enhancement |
CN110753319A (en) * | 2019-10-12 | 2020-02-04 | 山东师范大学 | Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles |
CN110972107A (en) * | 2018-09-29 | 2020-04-07 | 华为技术有限公司 | Load balancing method and device |
CN111083942A (en) * | 2018-08-22 | 2020-04-28 | Lg 电子株式会社 | Method and apparatus for performing uplink transmission in wireless communication system |
CN111137292A (en) * | 2018-11-01 | 2020-05-12 | 通用汽车环球科技运作有限责任公司 | Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | An intelligent variable time domain model prediction energy management method for hybrid electric vehicles |
CN112469000A (en) * | 2019-09-06 | 2021-03-09 | 杨海琴 | System and method for vehicle network service on 5G network |
-
2021
- 2021-03-12 CN CN202110273529.0A patent/CN112995951B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170079059A1 (en) * | 2015-09-11 | 2017-03-16 | Intel IP Corporation | Slicing architecture for wireless communication |
US20190174449A1 (en) * | 2018-02-09 | 2019-06-06 | Intel Corporation | Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function |
CN110320883A (en) * | 2018-03-28 | 2019-10-11 | 上海汽车集团股份有限公司 | A kind of Vehicular automatic driving control method and device based on nitrification enhancement |
CN111083942A (en) * | 2018-08-22 | 2020-04-28 | Lg 电子株式会社 | Method and apparatus for performing uplink transmission in wireless communication system |
CN110972107A (en) * | 2018-09-29 | 2020-04-07 | 华为技术有限公司 | Load balancing method and device |
CN111137292A (en) * | 2018-11-01 | 2020-05-12 | 通用汽车环球科技运作有限责任公司 | Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles |
CN112469000A (en) * | 2019-09-06 | 2021-03-09 | 杨海琴 | System and method for vehicle network service on 5G network |
CN110753319A (en) * | 2019-10-12 | 2020-02-04 | 山东师范大学 | Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | An intelligent variable time domain model prediction energy management method for hybrid electric vehicles |
Non-Patent Citations (2)
Title |
---|
KAI YU: "A Reinforcement Learning Aided Decoupled RAN Slicing Framework for Cellular V2X", 《GLOBECOM 2020 - 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 * |
郭彩丽: "动态时空数据驱动的认知车联网频谱感知与共享技术研究", 《物联网学报》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113676958B (en) * | 2021-07-28 | 2023-06-02 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113676958A (en) * | 2021-07-28 | 2021-11-19 | 北京信息科技大学 | Vehicle-to-vehicle network slice bandwidth resource allocation method and device |
CN113727306A (en) * | 2021-08-16 | 2021-11-30 | 南京大学 | Decoupling C-V2X network slicing method based on deep reinforcement learning |
CN113709882A (en) * | 2021-08-24 | 2021-11-26 | 吉林大学 | Vehicle networking communication resource allocation method based on graph theory and reinforcement learning |
CN113709882B (en) * | 2021-08-24 | 2023-10-17 | 吉林大学 | An Internet of Vehicles communication resource allocation method based on graph theory and reinforcement learning |
CN113766661A (en) * | 2021-08-30 | 2021-12-07 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113766661B (en) * | 2021-08-30 | 2023-12-26 | 北京邮电大学 | Interference control method and system for wireless network environment |
CN113965944A (en) * | 2021-09-14 | 2022-01-21 | 中国船舶重工集团公司第七一六研究所 | Method and system for maximizing delay certainty by ensuring system control performance |
CN113965944B (en) * | 2021-09-14 | 2024-11-01 | 中国船舶集团有限公司第七一六研究所 | Method and system for maximizing delay certainty for ensuring control performance of system |
CN114245344A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles uncertain channel state information robust power control method and system |
CN114245345A (en) * | 2021-11-25 | 2022-03-25 | 西安电子科技大学 | Internet of vehicles power control method and system for imperfect channel state information |
CN114245345B (en) * | 2021-11-25 | 2024-04-19 | 西安电子科技大学 | Imperfect channel state information-oriented Internet of vehicles power control method and system |
CN114401552A (en) * | 2022-01-17 | 2022-04-26 | 重庆邮电大学 | Airspace resource allocation method based on determinant point process learning |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114449482B (en) * | 2022-03-11 | 2024-05-14 | 南京理工大学 | Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning |
CN114786201A (en) * | 2022-04-28 | 2022-07-22 | 合肥工业大学 | A Dynamic Cooperative Optimization Method for Communication Delay and Channel Efficiency in Wireless Networks |
CN114786201B (en) * | 2022-04-28 | 2024-09-03 | 合肥工业大学 | A dynamic collaborative optimization method for communication delay and channel efficiency in wireless networks |
CN114885426A (en) * | 2022-05-05 | 2022-08-09 | 南京航空航天大学 | 5G Internet of vehicles resource allocation method based on federal learning and deep Q network |
CN114885426B (en) * | 2022-05-05 | 2024-04-16 | 南京航空航天大学 | A 5G vehicle network resource allocation method based on federated learning and deep Q network |
CN115086992A (en) * | 2022-05-07 | 2022-09-20 | 北京科技大学 | Distributed semantic communication system and bandwidth resource allocation method and device |
CN114827956A (en) * | 2022-05-12 | 2022-07-29 | 南京航空航天大学 | High-energy-efficiency V2X resource allocation method for user privacy protection |
CN114827956B (en) * | 2022-05-12 | 2024-05-10 | 南京航空航天大学 | An energy-efficient V2X resource allocation method for user privacy protection |
CN114641041B (en) * | 2022-05-18 | 2022-09-13 | 之江实验室 | Internet of vehicles slicing method and device oriented to edge intelligence |
CN114641041A (en) * | 2022-05-18 | 2022-06-17 | 之江实验室 | A method and device for edge intelligence-oriented Internet of Vehicles slicing |
CN115696258A (en) * | 2022-09-01 | 2023-02-03 | 华南师范大学 | Resource allocation method, storage medium and equipment for Internet of Vehicles based on reinforcement learning |
CN115696258B (en) * | 2022-09-01 | 2025-02-11 | 华南师范大学 | Vehicle network resource allocation method, storage medium and device based on reinforcement learning |
CN115515101A (en) * | 2022-09-23 | 2022-12-23 | 西北工业大学 | Decoupling Q learning intelligent codebook selection method for SCMA-V2X system |
CN115515101B (en) * | 2022-09-23 | 2024-11-26 | 西北工业大学 | A decoupled Q-learning intelligent codebook selection method for SCMA-V2X system |
CN118890658A (en) * | 2024-06-14 | 2024-11-01 | 金陵科技学院 | A resource optimization method for Internet of Vehicles based on composite priority experience replay sampling |
Also Published As
Publication number | Publication date |
---|---|
CN112995951B (en) | 2022-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112995951B (en) | A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm | |
Zhang et al. | Beyond D2D: Full dimension UAV-to-everything communications in 6G | |
CN112954651B (en) | A low-latency and high-reliability V2V resource allocation method based on deep reinforcement learning | |
Guo et al. | Federated reinforcement learning-based resource allocation for D2D-aided digital twin edge networks in 6G industrial IoT | |
Wang et al. | Joint resource allocation and power control for D2D communication with deep reinforcement learning in MCC | |
CN113543074B (en) | Joint computing migration and resource allocation method based on vehicle-road cloud cooperation | |
Wang et al. | Energy-delay minimization of task migration based on game theory in MEC-assisted vehicular networks | |
CN114885426B (en) | A 5G vehicle network resource allocation method based on federated learning and deep Q network | |
Zhang et al. | Fuzzy logic-based resource allocation algorithm for V2X communications in 5G cellular networks | |
Zhang et al. | Delay-optimized resource allocation in fog-based vehicular networks | |
Vu et al. | Multi-agent reinforcement learning for channel assignment and power allocation in platoon-based C-V2X systems | |
Bi et al. | Deep reinforcement learning based power allocation for D2D network | |
Qiu et al. | Maintaining links in the highly dynamic FANET using deep reinforcement learning | |
Mekki et al. | Vehicular cloud networking: evolutionary game with reinforcement learning-based access approach | |
Rasheed | Dynamic mode selection and resource allocation approach for 5G-vehicle-to-everything (V2X) communication using asynchronous federated deep reinforcement learning method | |
Ouyang | Task offloading algorithm of vehicle edge computing environment based on Dueling-DQN | |
CN117412391A (en) | A wireless resource allocation method for Internet of Vehicles based on enhanced dual-depth Q network | |
Wang et al. | Machine learning enables radio resource allocation in the downlink of ultra-low latency vehicular networks | |
Gui et al. | Spectrum-energy-efficient mode selection and resource allocation for heterogeneous V2X networks: A federated multi-agent deep reinforcement learning approach | |
Khan et al. | Sum throughput maximization scheme for NOMA-enabled D2D groups using deep reinforcement learning in 5G and beyond networks | |
Zhao et al. | Multi-agent deep reinforcement learning based resource management in heterogeneous V2X networks | |
CN115866787A (en) | Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation | |
Ji et al. | Optimization of resource allocation for V2X security communication based on multi-agent reinforcement learning | |
Waqas et al. | A novel duplex deep reinforcement learning based RRM framework for next-generation V2X communication networks | |
Ren et al. | Joint spectrum allocation and power control in vehicular communications based on dueling double DQN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |