CN112995951A - 5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm - Google Patents

5G Internet of vehicles V2V resource allocation method adopting depth certainty strategy gradient algorithm Download PDF

Info

Publication number
CN112995951A
CN112995951A CN202110273529.0A CN202110273529A CN112995951A CN 112995951 A CN112995951 A CN 112995951A CN 202110273529 A CN202110273529 A CN 202110273529A CN 112995951 A CN112995951 A CN 112995951A
Authority
CN
China
Prior art keywords
link
resource allocation
channel
user
vehicles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110273529.0A
Other languages
Chinese (zh)
Other versions
CN112995951B (en
Inventor
王书墨
宋晓勤
柴新越
缪娟娟
王奎宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110273529.0A priority Critical patent/CN112995951B/en
Publication of CN112995951A publication Critical patent/CN112995951A/en
Application granted granted Critical
Publication of CN112995951B publication Critical patent/CN112995951B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/46Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0215Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices
    • H04W28/0221Traffic management, e.g. flow control or congestion control based on user or device properties, e.g. MTC-capable devices power availability or consumption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0231Traffic management, e.g. flow control or congestion control based on communication conditions
    • H04W28/0236Traffic management, e.g. flow control or congestion control based on communication conditions radio quality, e.g. interference, losses or delay
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明提出一种基于深度确定性策略梯度(DDPG)算法的车对车(V2V)通信资源分配方法,V2V通信使用网络切片技术接入5G网络,利用深度强化学习优化策略获得最优的V2V用户信道分配和发射功率联合优化策略,V2V用户通过选择合适的发射功率和信道,来降低V2V链路之间的相互干扰,在满足链路延迟约束下,最大化V2V链路的系统总吞吐量。本发明使用DDPG算法可以有效解决V2V用户信道分配和功率选择的联合优化问题,可以在一系列连续动作空间的优化中表现稳定。

Figure 202110273529

The present invention proposes a vehicle-to-vehicle (V2V) communication resource allocation method based on the Deep Deterministic Policy Gradient (DDPG) algorithm. The V2V communication uses the network slicing technology to access the 5G network, and uses the deep reinforcement learning optimization strategy to obtain the optimal V2V user. Channel allocation and transmit power joint optimization strategy, V2V users can reduce the mutual interference between V2V links by selecting appropriate transmit power and channels, and maximize the total system throughput of V2V links under the constraint of link delay. The invention can effectively solve the joint optimization problem of V2V user channel allocation and power selection by using the DDPG algorithm, and can perform stably in the optimization of a series of continuous action spaces.

Figure 202110273529

Description

一种采用深度确定性策略梯度算法的5G车联网V2V资源分配 方法A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm

技术领域technical field

本发明涉及一种车联网技术,尤其涉及一种车联网的资源分配方法,更具体地说,涉及一种采用深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法的5G车联网的车对车(Vehicle-to-Vehicle,V2V)通信资源分配方法。The present invention relates to an Internet of Vehicles technology, in particular to a resource allocation method for the Internet of Vehicles, and more particularly, to a vehicle-to-vehicle pairing system for 5G Internet of Vehicles using a Deep Deterministic Policy Gradient (DDPG) algorithm. Vehicle-to-Vehicle (V2V) communication resource allocation method.

背景技术Background technique

车联网(Vehicle-to-everything,V2X)是物联网(Internet of Things,IoT)在智能交通系统(Intelligent Transportation System,ITS)领域中的典型应用,它是指基于Intranet、Internet和移动车载网络而形成的无处不在的智能车网络。车联网根据约定的通信协议和数据交互标准共享和交换数据。它通过对行人、路边设施、车辆、网络和云之间的实时感知和协作,实现了智能交通管理和服务,例如改善了道路安全,增强了路况感知并减少了交通拥堵。Vehicle-to-everything (V2X) is a typical application of Internet of Things (IoT) in the field of Intelligent Transportation System (ITS). The ubiquitous smart car network formed. The Internet of Vehicles shares and exchanges data according to agreed communication protocols and data exchange standards. It enables intelligent traffic management and services such as improved road safety, enhanced road condition awareness and reduced traffic congestion through real-time perception and collaboration between pedestrians, roadside facilities, vehicles, networks and the cloud.

合理的车联网资源分配对于减轻干扰、提高网络效率和最终优化无线通信性能至关重要。传统的资源分配方案大多利用缓慢变化的大规模衰落信道信息进行分配。有文献提出了一种启发式的位置相关上行链路资源分配方案,其特征在于空间资源重用,而不需要完整的信道状态信息,因此减少了信令开销。另有研究开发了包括车辆分组、复用信道选择和功率控制的框架,可以降低V2V用户对蜂窝网络的总干扰,同时最大化V2V用户的和速率或最小可达速率。但随着通信量的与日俱增和通信速率需求的大幅提升,高移动性导致无线信道快速变化给资源分配带来很大的不确定性,传统的资源分配方法无法满足人们对车联网的高可靠性和低延时需求。Reasonable allocation of IoV resources is crucial for mitigating interference, improving network efficiency, and ultimately optimizing wireless communication performance. Most of the traditional resource allocation schemes use the slowly changing large-scale fading channel information for allocation. A heuristic location-dependent uplink resource allocation scheme has been proposed, which is characterized by spatial resource reuse without the need for complete channel state information, thus reducing signaling overhead. Another research has developed a framework including vehicle grouping, multiplexing channel selection and power control, which can reduce the total interference of V2V users to the cellular network while maximizing the sum rate or minimum achievable rate of V2V users. However, with the increasing communication volume and the substantial increase in communication rate requirements, high mobility leads to rapid changes in wireless channels, which brings great uncertainty to resource allocation. Traditional resource allocation methods cannot meet people's high reliability of the Internet of Vehicles. and low latency requirements.

深度学习提供了多层计算模型,可以从非结构化源中学习具有多级抽象的高效数据表示,为解决许多传统上被认为是困难的问题提供了一种强大的数据驱动方法。基于深度强化学习算法的资源分配方案比传统资源分配算法更能满足车联网的高可靠性和低延时性的要求。有文献提出了一种可以应用于单播和广播场景的基于深度强化学习的新型分布式车对车通信资源分配机制。根据分布的资源分配机制,智能体,即V2V链路或车辆不需要等待全局状态信息就可以做出决定以找到最佳子带和传输功率水平。但现有的基于深度强化学习的V2V资源分配算法无法满足5G网络下高带宽、大容量、超可靠低时延等场景的差异化服务需求。Deep learning provides multi-layered computational models that can learn efficient data representations with multiple levels of abstraction from unstructured sources, providing a powerful data-driven approach to many traditionally considered difficult problems. The resource allocation scheme based on the deep reinforcement learning algorithm can better meet the requirements of high reliability and low latency of the Internet of Vehicles than the traditional resource allocation algorithm. Some literatures propose a novel distributed vehicle-to-vehicle communication resource allocation mechanism based on deep reinforcement learning that can be applied to unicast and broadcast scenarios. According to the distributed resource allocation mechanism, the agent, i.e. the V2V link or the vehicle does not need to wait for global state information to make a decision to find the optimal subband and transmit power level. However, the existing V2V resource allocation algorithms based on deep reinforcement learning cannot meet the needs of differentiated services in scenarios such as high bandwidth, large capacity, ultra-reliable and low-latency under 5G networks.

因此本发明提出的资源分配方法采用5G网络切片技术,能在5G网络下为不同应用场景提供差异化服务,同时采用可在一系列连续动作空间的优化中表现稳定的DDPG算法进行V2V资源分配,以系统吞吐量最大化作为V2V资源分配的优化目标,在复杂度和性能之间取得了很好的平衡。Therefore, the resource allocation method proposed in the present invention adopts the 5G network slicing technology, which can provide differentiated services for different application scenarios under the 5G network, and at the same time adopts the DDPG algorithm that can perform stable in the optimization of a series of continuous action spaces for V2V resource allocation, Taking the maximization of system throughput as the optimization goal of V2V resource allocation, it achieves a good balance between complexity and performance.

发明内容SUMMARY OF THE INVENTION

发明目的:针对现有技术存在的上述问题,提出一种基于深度强化学习DDPG算法V2V用户资源分配方法,V2V通信以网络切片技术接入5G网络。该方法能在V2V链路对V2I链路没有干扰的情况下,以较低的V2V链路延迟实现系统吞吐量最大化的V2V用户资源分配。Purpose of the invention: In view of the above problems existing in the prior art, a method for V2V user resource allocation based on the deep reinforcement learning DDPG algorithm is proposed. V2V communication uses network slicing technology to access 5G network. The method can achieve V2V user resource allocation that maximizes system throughput with lower V2V link delay under the condition that the V2V link does not interfere with the V2I link.

技术方案:在考虑V2V链路延迟的情况下,以合理的资源分配达到系统通信系统吞吐量最大化的目的。我们采用5G网络切片技术,V2V链路和V2I链路使用不同的切片,V2V链路对V2I链路不产生干扰。采用分布式的资源分配方法,不需要基站集中调度信道状态信息,将每条V2V链路视为智能体,并且基于瞬时状态信息和每个时隙从邻居共享的信息来选择信道和发射功率。通过建立深度强化学习模型,利用DDPG算法优化深度强化学习模型。根据优化后的深度强化学习模型,得到最优的V2V用户发射功率和信道分配策略。完成上述发明通过以下技术方案实现:一种采用DDPG算法的基于5G网络切片的V2V资源分配方法,包括步骤如下:Technical solution: In the case of considering the V2V link delay, the purpose of maximizing the throughput of the system communication system is achieved by reasonable resource allocation. We adopt 5G network slicing technology, V2V link and V2I link use different slices, V2V link does not interfere with V2I link. The distributed resource allocation method does not require the base station to centrally schedule channel state information, treats each V2V link as an agent, and selects the channel and transmit power based on the instantaneous state information and information shared from neighbors for each time slot. By establishing a deep reinforcement learning model, the DDPG algorithm is used to optimize the deep reinforcement learning model. According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained. Completion of the above invention is achieved through the following technical solutions: a V2V resource allocation method based on 5G network slicing using the DDPG algorithm, including the following steps:

(1)将车联网中的通信业务分为两种类型,即车辆与路边设施之间(V2I)的宽带多媒体数据传输以及车与车之间(V2V)关于行车安全的数据传输;(1) The communication services in the Internet of Vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles and vehicles (V2V) about driving safety;

(2)利用5G网络切片技术,将V2I与V2V通信业务分别划分到不同切片;(2) Use 5G network slicing technology to divide V2I and V2V communication services into different slices respectively;

(3)构建的用户资源分配系统模型为K对V2V用户共用授权带宽为B的信道;(3) The constructed user resource allocation system model is that K shares a channel with an authorized bandwidth of B for V2V users;

(4)采用分布式的资源分配方法,在考虑V2V链路延迟的情况下,以通信系统吞吐量最大化为目标构建深度强化学习模型;(4) Using a distributed resource allocation method, in the case of considering the V2V link delay, a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system;

(5)考虑连续动作空间中的联合优化问题,利用包含深度学习拟合,软更新,记忆回放三个机制的深度确定性策略梯度(DDPG)算法优化深度强化学习模型;(5) Consider the joint optimization problem in the continuous action space, and optimize the deep reinforcement learning model by using the deep deterministic policy gradient (DDPG) algorithm including three mechanisms of deep learning fitting, soft update, and memory playback;

(6)根据优化后的深度强化学习模型,得到最优V2V用户发射功率和信道分配策略。(6) According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained.

进一步的,所述步骤(4)包括如下具体步骤:Further, the step (4) includes the following specific steps:

(4a),具体地定义状态空间S为与资源分配有关的信道信息,包括子信道m相应V2V链路瞬时信道信息Gt[m],子信道m前一时隙接收到的干扰强度It-1[m],子信道m在前一时隙被相邻的V2V链路选择的次数Nt-1[m],V2V用户传输的剩余负载Lt,剩余时延Ut,即(4a), specifically define the state space S as the channel information related to resource allocation, including the instantaneous channel information G t [m] of the corresponding V2V link of the sub-channel m, the interference intensity I t- 1 [m], the number of times the subchannel m is selected by the adjacent V2V link in the previous time slot N t-1 [m], the remaining load L t transmitted by the V2V user, and the remaining delay U t , namely

st={Gt,It-1,Nt-1,Lt,Ut}s t ={G t , I t-1 , N t-1 , L t , U t }

将V2V链路视为智能体,每次V2V链路基于当前状态st∈S选择信道和发射功率;Considering the V2V link as an agent, each V2V link selects the channel and transmit power based on the current state s t ∈ S;

(4b),定义动作空间A为发射功率和选择的信道,表示为

Figure BSA0000234973230000021
(4b), define the action space A as the transmit power and the selected channel, expressed as
Figure BSA0000234973230000021

其中,

Figure BSA0000234973230000022
为第k个V2V链路用户的发射功率,
Figure BSA0000234973230000023
为第m个信道被第k个V2V链路用户使用情况;in,
Figure BSA0000234973230000022
is the transmit power of the kth V2V link user,
Figure BSA0000234973230000023
is the usage of the mth channel by the kth V2V link user;

(4c),定义奖励函数R,V2V资源分配的目标是V2V链路选择频谱子带和发射功率,在满足延迟约束,对其他V2V链路产生较小的干扰的要求下最大化V2V链路的系统吞吐量。因此奖励函数可以表示为:(4c), define the reward function R, the goal of V2V resource allocation is to select the spectral subband and transmit power of the V2V link, and maximize the V2V link under the requirement of satisfying the delay constraint and causing less interference to other V2V links. system throughput. So the reward function can be expressed as:

Figure BSA0000234973230000031
Figure BSA0000234973230000031

其中,T0为最大可容忍延迟,λd、λp为两个部分的权值,T0-Ut是传输所用的时间,随着传输时间的增加,惩罚也会增加。Among them, T 0 is the maximum tolerable delay, λ d and λ p are the weights of the two parts, and T 0 -U t is the time used for transmission. As the transmission time increases, the penalty will also increase.

(4d),依据建立好的S,A和R,在Q学习的基础上建立深度强化学习模型,评估函数Q(st,at)表示从状态st执行动作at后产生的折扣奖励,Q值更新函数为:(4d), according to the established S, A and R, a deep reinforcement learning model is established on the basis of Q learning, and the evaluation function Q(s t , a t ) represents the discounted reward generated after the action a t is executed from the state s t , the Q value update function is:

Figure BSA0000234973230000032
Figure BSA0000234973230000032

其中,rt为即时奖励函数,γ为折扣因子,st为V2V链路在t时刻的状态信息,st+1表示V2V链路在执行at后的状态,A为动作at构成的动作空间。Among them, r t is the immediate reward function, γ is the discount factor, s t is the state information of the V2V link at time t, s t+1 represents the state of the V2V link after executing at t , and A is composed of action at t action space.

有益效果:本发明提出的一种采用深度确定性策略梯度算法的基于5G网络切片的V2V资源分配方法,V2V通信使用网络切片技术接入5G网络,利用深度强化学习优化策略获得最优的V2V用户信道分配和发射功率联合优化策略,V2V用户通过选择合适的发射功率和分配信道,来降低V2V链路之间的相互干扰,在满足链路延迟的约束下,最大化V2V链路的系统吞吐量。本发明使用DDPG算法可以有效解决V2V用户信道分配和功率选择的联合优化问题,可以在一系列连续动作空间的优化中表现稳定。Beneficial effects: a V2V resource allocation method based on 5G network slicing using a deep deterministic strategy gradient algorithm proposed by the present invention, V2V communication uses network slicing technology to access 5G network, and uses deep reinforcement learning optimization strategy to obtain optimal V2V users Channel allocation and transmit power joint optimization strategy, V2V users can reduce the mutual interference between V2V links by selecting appropriate transmit power and allocation channels, and maximize the system throughput of V2V links under the constraint of link delay . The invention can effectively solve the joint optimization problem of V2V user channel allocation and power selection by using the DDPG algorithm, and can perform stably in the optimization of a series of continuous action spaces.

综上所述,在保证资源分配合理,V2V链路间低干扰以及计算复杂度低的情况下,本发明提出的一种采用深度确定性策略梯度算法的基于5G网络切片的V2V资源分配方法在最大化V2V系统吞吐量方面是优越的。To sum up, in the case of ensuring reasonable resource allocation, low interference between V2V links and low computational complexity, a V2V resource allocation method based on 5G network slicing using a deep deterministic policy gradient algorithm proposed by the present invention is as follows: It is superior in terms of maximizing the throughput of the V2V system.

附图说明Description of drawings

图1为本发明实施例提供的一种采用深度确定性策略梯度算法的5G车联网V2V资源分配方法的流程图;1 is a flowchart of a 5G Internet of Vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm provided by an embodiment of the present invention;

图2为本发明实施例提供的基于5G网络切片技术的V2V用户资源分配模型示意图;2 is a schematic diagram of a V2V user resource allocation model based on 5G network slicing technology provided by an embodiment of the present invention;

图3为本发明实施例提供的基于Actor-Critic模型的深度强化学习框架示意图;3 is a schematic diagram of a deep reinforcement learning framework based on an Actor-Critic model provided by an embodiment of the present invention;

图4为本发明实施例提供的V2V通信深度强化学习模型示意图;4 is a schematic diagram of a deep reinforcement learning model for V2V communication provided by an embodiment of the present invention;

具体实施方式Detailed ways

本发明的核心思想在于:V2V通信以网络切片技术接入5G网络,采用分布式的资源分配方法,将每条V2V链路视为智能体,通过建立深度强化学习模型,利用DDPG算法优化深度强化学习模型。根据优化后的深度强化学习模型,得到最优的V2V用户发射功率和信道分配策略。The core idea of the present invention is: V2V communication uses network slicing technology to access 5G network, adopts distributed resource allocation method, treats each V2V link as an intelligent body, establishes a deep reinforcement learning model, and uses DDPG algorithm to optimize deep reinforcement Learning models. According to the optimized deep reinforcement learning model, the optimal V2V user transmit power and channel allocation strategy are obtained.

下面对本发明做进一步详细描述。The present invention will be described in further detail below.

步骤(1),车联网中的通信业务车联网中的通信业务分为两种类型即,车辆与路边设施之间(V2I)的宽带多媒体数据传输以及车与车之间(V2V)与行车安全相关的数据传输。Step (1), the communication service in the Internet of Vehicles The communication service in the Internet of Vehicles is divided into two types, namely, broadband multimedia data transmission between vehicles and roadside facilities (V2I) and vehicle-to-vehicle (V2V) and driving. Security-Related Data Transmission.

步骤(2),利用5G网络切片技术,将V2I与V2V分别划分到不同切片。In step (2), the 5G network slicing technology is used to divide V2I and V2V into different slices respectively.

步骤(3),构建的用户资源分配系统模型为K对V2V用户共用授权带宽为B的信道,In step (3), the constructed user resource allocation system model is that K shares a channel with an authorized bandwidth of B to V2V users,

包括如下具体步骤:It includes the following specific steps:

(3a),建立V2V用户资源分配系统模型,系统包括K对V2V用户(VUEs),用集合κ={1,2,...,K}表示,总的授权带宽B被等分成M个带宽为B0的子信道,子信道用集合

Figure BSA00002349732300000413
表示;(3a), establish a V2V user resource allocation system model, the system includes K pairs of V2V users (VUEs), represented by a set κ = {1, 2, ..., K}, the total authorized bandwidth B is equally divided into M bandwidths is the sub-channel of B 0 , and the sub-channel uses a set
Figure BSA00002349732300000413
express;

(3b),第K条V2V链路的SINR可以表示为:(3b), the SINR of the K-th V2V link can be expressed as:

Figure BSA0000234973230000041
Figure BSA0000234973230000041

其中,in,

Figure BSA0000234973230000042
Figure BSA0000234973230000042

Gd是共享相同RB的所有V2V链路的总干扰功率,gk是第k条V2V链路车联网用户的信道增益,

Figure BSA0000234973230000043
是第k′条V2V链路对第k条V2V链路的干扰增益。第K条V2V链路的信道容量可以表示为:G d is the total interference power of all V2V links sharing the same RB, g k is the channel gain of the k-th V2V link IoV user,
Figure BSA0000234973230000043
is the interference gain of the k'th V2V link to the kth V2V link. The channel capacity of the Kth V2V link can be expressed as:

Cv[k]=W·log(1+γv[k]); 表达式3C v [k]=W·log(1+ γv [k]); Expression 3

(3c),对于第k个V2V链路,其在t时刻选择信道信息为:(3c), for the kth V2V link, the channel information selected at time t is:

Figure BSA0000234973230000044
Figure BSA0000234973230000044

Figure BSA0000234973230000045
则第m个信道被第k条V2V链路使用,同时有
Figure BSA0000234973230000046
且i≠m,即
Figure BSA0000234973230000047
K为V2V链路总个数,M为V2V链路接入切片的可用信道总数。like
Figure BSA0000234973230000045
Then the mth channel is used by the kth V2V link, and there are
Figure BSA0000234973230000046
And i≠m, that is
Figure BSA0000234973230000047
K is the total number of V2V links, and M is the total number of available channels of the V2V link access slice.

步骤(4),采用分布式的资源分配方法,在考虑V2V链路延迟的情况下,以通信系统吞吐量最大化为目标构建深度强化学习模型,包括如下具体步骤:In step (4), a distributed resource allocation method is adopted, and a deep reinforcement learning model is constructed with the goal of maximizing the throughput of the communication system under the consideration of the V2V link delay, including the following specific steps:

(4a),具体地定义状态空间S为与资源分配有关的观测信息,包括子信道相应V2V链路瞬时信道信息

Figure BSA0000234973230000048
子信道前一时隙接收到的干扰强度It-1[m],
Figure BSA0000234973230000049
信道m在前一时隙被相邻的V2V链路选择的次数Ny-1[m],
Figure BSA00002349732300000410
剩余的V2V负载Lt,剩余时延Ut,即(4a), specifically define the state space S as the observation information related to resource allocation, including the instantaneous channel information of the corresponding V2V link of the sub-channel
Figure BSA0000234973230000048
The received interference strength It -1 [m] of the subchannel in the previous slot,
Figure BSA0000234973230000049
Number N y-1 [m] that channel m was selected by adjacent V2V links in the previous time slot,
Figure BSA00002349732300000410
The remaining V2V load L t , the remaining time delay U t , namely

Figure BSA00002349732300000411
Figure BSA00002349732300000411

(4b),定义动作空间A为发射功率和选择的信道,表示为

Figure BSA00002349732300000412
(4b), define the action space A as the transmit power and the selected channel, expressed as
Figure BSA00002349732300000412

其中,

Figure BSA0000234973230000051
为第k个V2V链路用户的发射功率,
Figure BSA0000234973230000052
为第m个信道被第k个V2V链路用户使用情况,
Figure BSA0000234973230000053
表示第m个信道被第k个V2V链路用户使用,
Figure BSA0000234973230000054
表示第m个信道没有被第k个V2V链路用户使用;in,
Figure BSA0000234973230000051
is the transmit power of the kth V2V link user,
Figure BSA0000234973230000052
is the usage of the mth channel by the kth V2V link user,
Figure BSA0000234973230000053
indicates that the mth channel is used by the kth V2V link user,
Figure BSA0000234973230000054
Indicates that the mth channel is not used by the kth V2V link user;

(4c),定义奖励函数R,V2V资源分配的目标是V2V链路选择频谱子带和发射功率,在满足延迟约束,对其他V2V链路产生较小的干扰的要求下最大化V2V链路的系统吞吐量。因此奖励函数可以表示为:(4c), define the reward function R, the goal of V2V resource allocation is to select the spectral subband and transmit power of the V2V link, and maximize the V2V link under the requirement of satisfying the delay constraint and causing less interference to other V2V links. system throughput. So the reward function can be expressed as:

Figure BSA0000234973230000055
Figure BSA0000234973230000055

其中,T0为最大可容忍延迟,λd、λp为两个部分的权值,T0-Ut是传输所用的时间,随着传输时间的增加,惩罚也会增加。为了获得长期的良好回报,应同时考虑眼前的回报和未来的回报。因此,强化学习的主要目标是找到一种策略来最大化预期的累积折扣回报,Among them, T 0 is the maximum tolerable delay, λ d and λ p are the weights of the two parts, and T 0 -U t is the time used for transmission. As the transmission time increases, the penalty will also increase. For good long-term returns, both immediate and future returns should be considered. Therefore, the main goal of reinforcement learning is to find a strategy that maximizes the expected cumulative discounted return,

Figure BSA0000234973230000056
Figure BSA0000234973230000056

其中,β∈[0,1]是折扣因子;where β∈[0,1] is the discount factor;

(4d),依据建立好的S,A和R,在Q学习的基础上建立深度强化学习模型:评估函数Q(st,at)表示从状态st执行动作at后产生的折扣奖励,Q值更新函数为(4d), according to the established S, A and R, establish a deep reinforcement learning model on the basis of Q learning: the evaluation function Q(s t , a t ) represents the discounted reward generated after performing the action a t from the state s t , the Q value update function is

Figure BSA0000234973230000057
Figure BSA0000234973230000057

其中,rt为即时奖励函数,γ为折扣因子,st为V2V链路在t时刻的状态信息,st+1表示V2V链路在执行at后的状态,A为动作at构成的动作空间。Among them, r t is the immediate reward function, γ is the discount factor, s t is the state information of the V2V link at time t, s t+1 represents the state of the V2V link after executing at t , and A is composed of action at t action space.

步骤(5),为了解决基于5G网络切片的V2V资源分配问题,以V2V链路为智能体所建立的深度强化学习模型中的动作空间包括发射功率和信道选择两个变量,考虑发射功率一定范围内连续变化,为了解决这种高维动作空间,尤其是连续动作空间中的联合优化问题,利用包含深度学习拟合,软更新,记忆回放三个机制的DDPG算法优化深度强化学习模型。Step (5), in order to solve the problem of V2V resource allocation based on 5G network slicing, the action space in the deep reinforcement learning model established with the V2V link as the agent includes two variables: transmission power and channel selection, considering a certain range of transmission power. In order to solve this high-dimensional action space, especially the joint optimization problem in the continuous action space, the deep reinforcement learning model is optimized by using the DDPG algorithm including the three mechanisms of deep learning fitting, soft update, and memory playback.

深度学习拟合指DDPG算法基于Actor-Critic框架,分别使用参数为θ和δ的深度神经网络来拟合确定性策略a=μ(s|θ)和动作值函数Q(s,a|δ)如说明书附图图3所示。Deep learning fitting means that the DDPG algorithm is based on the Actor-Critic framework, and uses deep neural networks with parameters θ and δ to fit the deterministic strategy a=μ(s|θ) and the action value function Q(s, a|δ) As shown in Figure 3 of the accompanying drawings.

软更新指动作值网络的参数在频繁梯度更新的同时,又用于计算策略网络的梯度,使得动作值网络的学习过程很可能出现不稳定的情况,所以提出采用软更新方式来更新网络。Soft update means that the parameters of the action value network are used to calculate the gradient of the policy network while the parameters of the action value network are updated frequently, which makes the learning process of the action value network likely to be unstable. Therefore, a soft update method is proposed to update the network.

分别为策略网络和动作值网络创建在线网络和目标网络两个神经网络:Create two neural networks, the online network and the target network, for the policy network and the action-value network, respectively:

Figure BSA0000234973230000061
Figure BSA0000234973230000061

训练过程中利用梯度下降不断更新网络,目标网络的更新方式如下During the training process, gradient descent is used to continuously update the network. The update method of the target network is as follows

θ′=τθ+(1-τ)θ 表达式9θ′=τθ+(1-τ)θ Expression 9

δ′=τδ+(1-τ)δ 表达式10δ′=τδ+(1-τ)δ Expression 10

经验回放机制是指与环境交互时产生的状态转换样本数据具有时序关联性,易造成动作值函数拟合的偏差。因此,借鉴DQN算法的经验回放机制,将采集到的样本先放入样本池,然后从样本池中随机选出一些mini-batch样本用于对网络的训练。这种处理去除了样本间的相关性和依赖性,解决了数据间相关性及其非静态分布的问题,使得算法更容易收敛。The experience replay mechanism means that the state transition sample data generated when interacting with the environment has a time-series correlation, which is easy to cause the deviation of the action value function fitting. Therefore, drawing on the experience playback mechanism of the DQN algorithm, the collected samples are first put into the sample pool, and then some mini-batch samples are randomly selected from the sample pool for training the network. This process removes the correlation and dependency between samples, solves the problem of correlation between data and its non-static distribution, and makes the algorithm easier to converge.

利用包含深度学习拟合,软更新,记忆回放三个机制的DDPG算法优化深度强化学习模型,包括如下步骤:Using the DDPG algorithm including deep learning fitting, soft update, and memory playback to optimize the deep reinforcement learning model includes the following steps:

(5a),初始化训练回合数P;(5a), initialize the number of training rounds P;

(5b),初始化P回合中的时间步t;(5b), initialize the time step t in the P round;

(5c),在线Actor策略网络根据输入状态st,输出动作at,并获取即时的奖励rt,同时转到下一状态st+1,从而获得训练数据(st,at,rt,st+1);(5c), the online Actor policy network outputs the action at according to the input state s t , and obtains the immediate reward r t , and at the same time goes to the next state s t+1 , thereby obtaining the training data (s t , at t , r t , s t+1 );

(5d),将训练数据(st,at,rt,st+1)存入经验回放池中;(5d), store the training data (s t , at t , r t , s t+1 ) into the experience playback pool;

(5e),从经验回放池中随机采样m个训练数据(st,at,rt,st+1)构成数据集,发送给在线Actor策略网络、在线Critic评价网络、目标Actor策略网络和目标Critic评价网络;(5e), randomly sample m training data (s t , at , r t , s t +1 ) from the experience playback pool to form a dataset, and send it to the online Actor policy network, online Critic evaluation network, and target Actor policy network and target Critic evaluation network;

(5f),设置Q估计为(5f), set the Q estimate as

yi=ri+γQ′(si+1,μ′(si+1|θ′)|δ′) 表达式11y i =r i +γQ′(s i+1 , μ′(s i+1 |θ′)|δ′) Expression 11

定义在线Critic评价网络的损失函数为The loss function of the online critical evaluation network is defined as

Figure BSA0000234973230000062
Figure BSA0000234973230000062

通过神经网络的梯度反向传播来更新Critic当前网络的所有参数θ;Update all parameters θ of Critic's current network through gradient back-propagation of the neural network;

(5g),定义在线Actor策略网络的给抽样策略梯度为(5g), define the gradient of the sampling strategy for the online Actor strategy network as

Figure BSA0000234973230000063
Figure BSA0000234973230000063

通过神经网络的梯度反向传播来更新Actor当前网络的所有参数δ;Update all parameters δ of Actor's current network through gradient back-propagation of the neural network;

(5h),若在线训练次数达到目标网络更新频数,根据在线网络参数δ和θ分别更新目标网络参数δ′和θ′;(5h), if the number of online training reaches the target network update frequency, update the target network parameters δ′ and θ′ respectively according to the online network parameters δ and θ;

(5i),判断是否满足t<K,K为p回合中的总时间步,若是,t=t+1,进入步骤5c,否则,进入步骤5j;(5i), judge whether t<K is satisfied, K is the total time step in p round, if so, t=t+1, go to step 5c, otherwise, go to step 5j;

(5j),判断是否满足p<I,I为训练回合数设定阈值,若是,p=p+1,进入步骤5b,否则,优化结束,得到优化后的深度强化学习模型。(5j), judge whether p<I is satisfied, I is the set threshold for the number of training rounds, if so, p=p+1, go to step 5b, otherwise, the optimization ends, and the optimized deep reinforcement learning model is obtained.

步骤(6),根据优化后的深度强化学习模型,得到最优V2V用户发射功率和信道分配策略,包括如下步骤:Step (6), according to the optimized deep reinforcement learning model, obtain the optimal V2V user transmit power and channel allocation strategy, including the following steps:

(6a),利用DDPG算法训练好的深度强化学习模型,输入系统某时刻的状态信息sk(t);(6a), using the deep reinforcement learning model trained by the DDPG algorithm, input the state information sk (t) of the system at a certain time;

(6b),输出最优动作策略

Figure BSA0000234973230000071
得到最优的V2V用户发射功率
Figure BSA0000234973230000072
和分配信道
Figure BSA0000234973230000073
(6b), output the optimal action strategy
Figure BSA0000234973230000071
Get the optimal V2V user transmit power
Figure BSA0000234973230000072
and assign channels
Figure BSA0000234973230000073

最后,对说明书中的附图进行详细说明。Finally, the drawings in the specification are described in detail.

在图1中,描述了一种采用深度确定性策略梯度算法的5G车联网V2V资源分配方法的流程,V2V通信使用网络切片技术接入5G网络,利用DDPG优化深度强化学习模型获得最优的V2V用户信道分配和发射功率联合优化策略。In Figure 1, the flow of a 5G Internet of Vehicles V2V resource allocation method using a deep deterministic policy gradient algorithm is described. V2V communication uses network slicing technology to access the 5G network, and uses DDPG to optimize the deep reinforcement learning model to obtain the optimal V2V User channel allocation and transmit power joint optimization strategy.

在图2中,描述了基于5G网络切片技术的V2V用户资源分配模型,V2V通信和V2I通信使用不同的切片。In Figure 2, the V2V user resource allocation model based on 5G network slicing technology is described, and V2V communication and V2I communication use different slices.

在图3中,描述了深度学习拟合指DDPG算法基于Actor-Critic框架,分别使用参数为θ和δ的深度神经网络来拟合确定性策略a=μ(s|θ)和动作值函数Q(s,a|δ)。In Figure 3, the deep learning fitting is described. The DDPG algorithm is based on the Actor-Critic framework and uses deep neural networks with parameters θ and δ to fit the deterministic policy a=μ(s|θ) and the action value function Q, respectively. (s, a|δ).

在图4中,描述了V2V通信深度强化学习模型。可以看出V2V链路作为智能体基于当前状态st∈S根据奖励函数选择信道和发射功率。In Figure 4, a deep reinforcement learning model for V2V communication is depicted. It can be seen that the V2V link as an agent selects the channel and transmit power according to the reward function based on the current state s t ∈ S.

根据对本发明的说明,本领域的技术人员应该不难看出,本发明的采用5G网络切片技术基于深度强化学习DDPG算法的V2V资源分配方法可以提高系统吞吐量并且能保证通信时延达到安全要求。According to the description of the present invention, those skilled in the art should not be difficult to see that the V2V resource allocation method based on the deep reinforcement learning DDPG algorithm using the 5G network slicing technology of the present invention can improve the system throughput and ensure that the communication delay meets the security requirements.

本发明申请书中未作详细描述的内容属于本领域专业技术人员公知的现有技术。The contents not described in detail in the application of the present invention belong to the prior art known to those skilled in the art.

Claims (2)

1. A5G Internet of vehicles V2V resource allocation method adopting a depth deterministic strategy gradient algorithm is characterized by comprising the following steps:
(1) communication services in the internet of vehicles are divided into two types, namely broadband multimedia data transmission between vehicles and roadside facilities (V2I) and data transmission between vehicles (V2V) about driving safety;
(2) dividing V2I and V2V communication traffic into different slices respectively by using a 5G network slicing technology;
(3) the constructed user resource allocation system model is that K shares a channel with the authorized bandwidth of B for V2V users;
(4) by adopting a distributed resource allocation method, under the condition of considering V2V link delay, a deep reinforcement learning model is constructed with the aim of maximizing the throughput of a communication system;
(5) considering a joint optimization problem in a continuous action space, and optimizing a deep reinforcement learning model by using a deep certainty strategy gradient (DDPG) algorithm comprising three mechanisms of deep learning fitting, soft updating and memory playback;
(6) and obtaining the optimal V2V user transmitting power and channel allocation strategy according to the optimized deep reinforcement learning model.
2. The 5G Internet of vehicles V2V resource allocation method adopting the deep deterministic strategy gradient algorithm according to claim 1, wherein the step (4) comprises the following specific steps:
(4a) the state space S is specifically defined as observation information related to resource allocation, including subchannel m corresponding V2V link instantaneous channel state information Gt[m]Interference strength I received in the previous time slot of sub-channel mt-1[m]Number of times N that subchannel m was selected by an adjacent V2V link in the previous time slott-1[m]Residual load L transmitted by V2V usertResidual time delay UtI.e. by
st={Gt,It-1,Nt-1,Lt,Ut}
Considering the V2V link as an agent, each time the V2V link is based on the current state stSelecting a channel and transmitting power by the E.S;
(4b) defining the action space A as the transmit power and the selected channel, denoted as
Figure FSA0000234973220000011
Wherein,
Figure FSA0000234973220000012
the transmission power of the k-th V2V link user,
Figure FSA0000234973220000013
for the use of the mth channel by the kth V2V link user,
Figure FSA0000234973220000014
indicating that the mth channel is used by the kth V2V link user,
Figure FSA0000234973220000015
indicating that the mth channel is not used by the kth V2V link user;
(4c) the goal of defining the reward function R, V2V resource allocation is that the V2V link selects the spectral sub-band and transmit power to maximize the system throughput of the V2V link while satisfying the delay constraint. The reward function can thus be expressed as:
Figure FSA0000234973220000016
wherein, T0To the maximum tolerable delay, λd、λpIs a weight of two parts, T0-UtIs the time taken for transmission, the penalty will increase as the transmission time increases;
(4d) establishing a deep reinforcement learning model on the basis of Q learning according to the established S, A and R, and evaluating the function Q (S)t,at) Represents the slave state stPerforming action atThe resulting discount reward, the Q-value update function is:
Figure FSA0000234973220000021
wherein r istIs an instant reward function, gamma is a discount factor, stFor the state of V2V link at time tInformation, st+1Indicating that the V2V link is performing atIn the latter state, A is action atThe formed motion space.
CN202110273529.0A 2021-03-12 2021-03-12 A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm Active CN112995951B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110273529.0A CN112995951B (en) 2021-03-12 2021-03-12 A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110273529.0A CN112995951B (en) 2021-03-12 2021-03-12 A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm

Publications (2)

Publication Number Publication Date
CN112995951A true CN112995951A (en) 2021-06-18
CN112995951B CN112995951B (en) 2022-04-08

Family

ID=76335240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110273529.0A Active CN112995951B (en) 2021-03-12 2021-03-12 A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm

Country Status (1)

Country Link
CN (1) CN112995951B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676958A (en) * 2021-07-28 2021-11-19 北京信息科技大学 Vehicle-to-vehicle network slice bandwidth resource allocation method and device
CN113709882A (en) * 2021-08-24 2021-11-26 吉林大学 Vehicle networking communication resource allocation method based on graph theory and reinforcement learning
CN113727306A (en) * 2021-08-16 2021-11-30 南京大学 Decoupling C-V2X network slicing method based on deep reinforcement learning
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113965944A (en) * 2021-09-14 2022-01-21 中国船舶重工集团公司第七一六研究所 Method and system for maximizing delay certainty by ensuring system control performance
CN114245344A (en) * 2021-11-25 2022-03-25 西安电子科技大学 Internet of vehicles uncertain channel state information robust power control method and system
CN114245345A (en) * 2021-11-25 2022-03-25 西安电子科技大学 Internet of vehicles power control method and system for imperfect channel state information
CN114401552A (en) * 2022-01-17 2022-04-26 重庆邮电大学 Airspace resource allocation method based on determinant point process learning
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114641041A (en) * 2022-05-18 2022-06-17 之江实验室 A method and device for edge intelligence-oriented Internet of Vehicles slicing
CN114786201A (en) * 2022-04-28 2022-07-22 合肥工业大学 A Dynamic Cooperative Optimization Method for Communication Delay and Channel Efficiency in Wireless Networks
CN114827956A (en) * 2022-05-12 2022-07-29 南京航空航天大学 High-energy-efficiency V2X resource allocation method for user privacy protection
CN114885426A (en) * 2022-05-05 2022-08-09 南京航空航天大学 5G Internet of vehicles resource allocation method based on federal learning and deep Q network
CN115086992A (en) * 2022-05-07 2022-09-20 北京科技大学 Distributed semantic communication system and bandwidth resource allocation method and device
CN115515101A (en) * 2022-09-23 2022-12-23 西北工业大学 Decoupling Q learning intelligent codebook selection method for SCMA-V2X system
CN115696258A (en) * 2022-09-01 2023-02-03 华南师范大学 Resource allocation method, storage medium and equipment for Internet of Vehicles based on reinforcement learning
CN118890658A (en) * 2024-06-14 2024-11-01 金陵科技学院 A resource optimization method for Internet of Vehicles based on composite priority experience replay sampling

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170079059A1 (en) * 2015-09-11 2017-03-16 Intel IP Corporation Slicing architecture for wireless communication
US20190174449A1 (en) * 2018-02-09 2019-06-06 Intel Corporation Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function
CN110320883A (en) * 2018-03-28 2019-10-11 上海汽车集团股份有限公司 A kind of Vehicular automatic driving control method and device based on nitrification enhancement
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN110972107A (en) * 2018-09-29 2020-04-07 华为技术有限公司 Load balancing method and device
CN111083942A (en) * 2018-08-22 2020-04-28 Lg 电子株式会社 Method and apparatus for performing uplink transmission in wireless communication system
CN111137292A (en) * 2018-11-01 2020-05-12 通用汽车环球科技运作有限责任公司 Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 An intelligent variable time domain model prediction energy management method for hybrid electric vehicles
CN112469000A (en) * 2019-09-06 2021-03-09 杨海琴 System and method for vehicle network service on 5G network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170079059A1 (en) * 2015-09-11 2017-03-16 Intel IP Corporation Slicing architecture for wireless communication
US20190174449A1 (en) * 2018-02-09 2019-06-06 Intel Corporation Technologies to authorize user equipment use of local area data network features and control the size of local area data network information in access and mobility management function
CN110320883A (en) * 2018-03-28 2019-10-11 上海汽车集团股份有限公司 A kind of Vehicular automatic driving control method and device based on nitrification enhancement
CN111083942A (en) * 2018-08-22 2020-04-28 Lg 电子株式会社 Method and apparatus for performing uplink transmission in wireless communication system
CN110972107A (en) * 2018-09-29 2020-04-07 华为技术有限公司 Load balancing method and device
CN111137292A (en) * 2018-11-01 2020-05-12 通用汽车环球科技运作有限责任公司 Spatial and temporal attention based deep reinforcement learning for hierarchical lane change strategies for controlling autonomous vehicles
CN112469000A (en) * 2019-09-06 2021-03-09 杨海琴 System and method for vehicle network service on 5G network
CN110753319A (en) * 2019-10-12 2020-02-04 山东师范大学 Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 An intelligent variable time domain model prediction energy management method for hybrid electric vehicles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI YU: "A Reinforcement Learning Aided Decoupled RAN Slicing Framework for Cellular V2X", 《GLOBECOM 2020 - 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE》 *
郭彩丽: "动态时空数据驱动的认知车联网频谱感知与共享技术研究", 《物联网学报》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676958B (en) * 2021-07-28 2023-06-02 北京信息科技大学 Vehicle-to-vehicle network slice bandwidth resource allocation method and device
CN113676958A (en) * 2021-07-28 2021-11-19 北京信息科技大学 Vehicle-to-vehicle network slice bandwidth resource allocation method and device
CN113727306A (en) * 2021-08-16 2021-11-30 南京大学 Decoupling C-V2X network slicing method based on deep reinforcement learning
CN113709882A (en) * 2021-08-24 2021-11-26 吉林大学 Vehicle networking communication resource allocation method based on graph theory and reinforcement learning
CN113709882B (en) * 2021-08-24 2023-10-17 吉林大学 An Internet of Vehicles communication resource allocation method based on graph theory and reinforcement learning
CN113766661A (en) * 2021-08-30 2021-12-07 北京邮电大学 Interference control method and system for wireless network environment
CN113766661B (en) * 2021-08-30 2023-12-26 北京邮电大学 Interference control method and system for wireless network environment
CN113965944A (en) * 2021-09-14 2022-01-21 中国船舶重工集团公司第七一六研究所 Method and system for maximizing delay certainty by ensuring system control performance
CN113965944B (en) * 2021-09-14 2024-11-01 中国船舶集团有限公司第七一六研究所 Method and system for maximizing delay certainty for ensuring control performance of system
CN114245344A (en) * 2021-11-25 2022-03-25 西安电子科技大学 Internet of vehicles uncertain channel state information robust power control method and system
CN114245345A (en) * 2021-11-25 2022-03-25 西安电子科技大学 Internet of vehicles power control method and system for imperfect channel state information
CN114245345B (en) * 2021-11-25 2024-04-19 西安电子科技大学 Imperfect channel state information-oriented Internet of vehicles power control method and system
CN114401552A (en) * 2022-01-17 2022-04-26 重庆邮电大学 Airspace resource allocation method based on determinant point process learning
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114449482B (en) * 2022-03-11 2024-05-14 南京理工大学 Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning
CN114786201A (en) * 2022-04-28 2022-07-22 合肥工业大学 A Dynamic Cooperative Optimization Method for Communication Delay and Channel Efficiency in Wireless Networks
CN114786201B (en) * 2022-04-28 2024-09-03 合肥工业大学 A dynamic collaborative optimization method for communication delay and channel efficiency in wireless networks
CN114885426A (en) * 2022-05-05 2022-08-09 南京航空航天大学 5G Internet of vehicles resource allocation method based on federal learning and deep Q network
CN114885426B (en) * 2022-05-05 2024-04-16 南京航空航天大学 A 5G vehicle network resource allocation method based on federated learning and deep Q network
CN115086992A (en) * 2022-05-07 2022-09-20 北京科技大学 Distributed semantic communication system and bandwidth resource allocation method and device
CN114827956A (en) * 2022-05-12 2022-07-29 南京航空航天大学 High-energy-efficiency V2X resource allocation method for user privacy protection
CN114827956B (en) * 2022-05-12 2024-05-10 南京航空航天大学 An energy-efficient V2X resource allocation method for user privacy protection
CN114641041B (en) * 2022-05-18 2022-09-13 之江实验室 Internet of vehicles slicing method and device oriented to edge intelligence
CN114641041A (en) * 2022-05-18 2022-06-17 之江实验室 A method and device for edge intelligence-oriented Internet of Vehicles slicing
CN115696258A (en) * 2022-09-01 2023-02-03 华南师范大学 Resource allocation method, storage medium and equipment for Internet of Vehicles based on reinforcement learning
CN115696258B (en) * 2022-09-01 2025-02-11 华南师范大学 Vehicle network resource allocation method, storage medium and device based on reinforcement learning
CN115515101A (en) * 2022-09-23 2022-12-23 西北工业大学 Decoupling Q learning intelligent codebook selection method for SCMA-V2X system
CN115515101B (en) * 2022-09-23 2024-11-26 西北工业大学 A decoupled Q-learning intelligent codebook selection method for SCMA-V2X system
CN118890658A (en) * 2024-06-14 2024-11-01 金陵科技学院 A resource optimization method for Internet of Vehicles based on composite priority experience replay sampling

Also Published As

Publication number Publication date
CN112995951B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN112995951B (en) A 5G Internet of Vehicles V2V Resource Allocation Method Using Deep Deterministic Policy Gradient Algorithm
Zhang et al. Beyond D2D: Full dimension UAV-to-everything communications in 6G
CN112954651B (en) A low-latency and high-reliability V2V resource allocation method based on deep reinforcement learning
Guo et al. Federated reinforcement learning-based resource allocation for D2D-aided digital twin edge networks in 6G industrial IoT
Wang et al. Joint resource allocation and power control for D2D communication with deep reinforcement learning in MCC
CN113543074B (en) Joint computing migration and resource allocation method based on vehicle-road cloud cooperation
Wang et al. Energy-delay minimization of task migration based on game theory in MEC-assisted vehicular networks
CN114885426B (en) A 5G vehicle network resource allocation method based on federated learning and deep Q network
Zhang et al. Fuzzy logic-based resource allocation algorithm for V2X communications in 5G cellular networks
Zhang et al. Delay-optimized resource allocation in fog-based vehicular networks
Vu et al. Multi-agent reinforcement learning for channel assignment and power allocation in platoon-based C-V2X systems
Bi et al. Deep reinforcement learning based power allocation for D2D network
Qiu et al. Maintaining links in the highly dynamic FANET using deep reinforcement learning
Mekki et al. Vehicular cloud networking: evolutionary game with reinforcement learning-based access approach
Rasheed Dynamic mode selection and resource allocation approach for 5G-vehicle-to-everything (V2X) communication using asynchronous federated deep reinforcement learning method
Ouyang Task offloading algorithm of vehicle edge computing environment based on Dueling-DQN
CN117412391A (en) A wireless resource allocation method for Internet of Vehicles based on enhanced dual-depth Q network
Wang et al. Machine learning enables radio resource allocation in the downlink of ultra-low latency vehicular networks
Gui et al. Spectrum-energy-efficient mode selection and resource allocation for heterogeneous V2X networks: A federated multi-agent deep reinforcement learning approach
Khan et al. Sum throughput maximization scheme for NOMA-enabled D2D groups using deep reinforcement learning in 5G and beyond networks
Zhao et al. Multi-agent deep reinforcement learning based resource management in heterogeneous V2X networks
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
Ji et al. Optimization of resource allocation for V2X security communication based on multi-agent reinforcement learning
Waqas et al. A novel duplex deep reinforcement learning based RRM framework for next-generation V2X communication networks
Ren et al. Joint spectrum allocation and power control in vehicular communications based on dueling double DQN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant