WO2021227508A1 - 基于深度强化学习的工业5g动态多优先级多接入方法 - Google Patents

基于深度强化学习的工业5g动态多优先级多接入方法 Download PDF

Info

Publication number
WO2021227508A1
WO2021227508A1 PCT/CN2020/139322 CN2020139322W WO2021227508A1 WO 2021227508 A1 WO2021227508 A1 WO 2021227508A1 CN 2020139322 W CN2020139322 W CN 2020139322W WO 2021227508 A1 WO2021227508 A1 WO 2021227508A1
Authority
WO
WIPO (PCT)
Prior art keywords
industrial
neural network
terminal
priority
network model
Prior art date
Application number
PCT/CN2020/139322
Other languages
English (en)
French (fr)
Inventor
于海斌
刘晓宇
许驰
曾鹏
金曦
夏长清
Original Assignee
中国科学院沈阳自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院沈阳自动化研究所 filed Critical 中国科学院沈阳自动化研究所
Priority to US17/296,509 priority Critical patent/US20220217792A1/en
Publication of WO2021227508A1 publication Critical patent/WO2021227508A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0866Non-scheduled access, e.g. ALOHA using a dedicated channel for access
    • H04W74/0875Non-scheduled access, e.g. ALOHA using a dedicated channel for access with assigned priorities based access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/002Transmission of channel access control information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0037Inter-user or inter-terminal allocation

Definitions

  • the present invention provides a dynamic multi-priority multi-channel access method for industrial 5G networks based on deep reinforcement learning, and is aimed at large-scale distributed industrial 5G terminal concurrent communication and ultra-reliable and low latency communication in industrial 5G networks.
  • URLLC URLLC
  • Multi-channel access allows large-scale concurrent access of industrial 5G terminals, which can effectively improve the efficiency of spectrum utilization.
  • traditional multi-channel access algorithms are generally based on known system models.
  • the number and data of industrial 5G terminals are time-varying, and it is difficult to obtain an accurate system model.
  • High reliability and low latency of data transmission are the most important service quality requirements in industrial communication.
  • the data generated by industrial 5G terminals have time-varying requirements for the real-time and reliability of transmission.
  • the priority of the terminal in the traditional industrial production process It is generally constant, and it is difficult to guarantee the real-time and reliable transmission requirements of massive time-varying data.
  • Deep reinforcement learning can use deep learning to estimate the system model, combined with reinforcement learning to solve dynamic multi-priority multi-channel access, effectively solving the problem of system model modeling difficulties and state space explosion.
  • the purpose of the present invention is to address the large-scale distributed industrial 5G terminal concurrent communication and high-reliability and low-latency communication requirements in the industrial 5G network, considering the different requirements for real-time and reliability caused by massive industrial data transmission.
  • the traditional method is difficult to model and the problem of the explosion of the algorithm state space, and provides a dynamic multi-priority and multi-access method for industrial 5G networks based on deep reinforcement learning to achieve specific packet loss rates and end-to-end delay constraints for industrial 5G terminals Dynamic multi-priority multi-access.
  • Deep reinforcement learning can use deep learning to estimate the system model, combined with reinforcement learning to solve dynamic multi-priority multi-channel access, and effectively solve the problem of system model modeling difficulty and state space explosion.
  • the present invention adopts the following technical scheme: an industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning.
  • channel allocation is realized by training a neural network model, including the following steps:
  • the industrial 5G network includes: 1 industrial 5G base station, 1 edge computing server, and N industrial 5G terminals;
  • the edge computing server is connected to an industrial 5G base station and is used to train a deep reinforcement learning neural network model
  • the industrial 5G base station downloads the trained neural network model from the edge computing server, and is used to schedule the dynamic multi-priority multi-channel access of the industrial 5G terminal;
  • the industrial 5G terminal is connected to an industrial 5G base station through an industrial 5G network, and is used to generate industrial data with different transmission requirements.
  • the establishment of an industrial 5G network model includes: determining the coverage of the industrial 5G network and the number of industrial 5G terminals in the range N, the number of industrial 5G terminal priorities P, and the number of channels C.
  • q-eval depth neural network for obtaining industrial 5G terminal n of the current state vector s n a n a motion vector evaluation function Q (s n, a n) ; q-next neural network model is used to select the terminal industrial 5G The evaluation function of the maximum action vector a′ n of the next state vector s n
  • the initialization parameters of the q-next deep neural network are the same as those of the q-eval deep neural network.
  • the parameters w and b of the q-eval deep neural network are updated after each iteration of the neural network model.
  • the q-next deep neural network The parameters w and b are updated once after each iteration of training the neural network model I times.
  • the training data includes:
  • the state vector s n (t) of the industrial 5G terminal n in the time slot t (t ⁇ T) [c n (t), ack n (t), p n (t), cf(t)], where c n (t) represents the channel c (c ⁇ C) selected by the industrial 5G terminal n at the beginning of the time slot t, ack n (t) represents whether the industrial 5G terminal n has successfully sent data at the end of the time slot t, p n (t) Indicates the priority p(p ⁇ P) of the industrial 5G terminal n in the time slot t, and cf(t) represents the occupancy rate of all channels c in the time slot t;
  • the training neural network model includes the following steps:
  • the neural network model Collect historical time slot status information of all industrial 5G terminals in the industrial 5G network, and obtain the multi-priority channel allocation result through the neural network model; when the network performance of the allocation result meets the requirements, that is, the packet loss rate, the system global packet loss If both the rate and end-to-end delay are less than the corresponding network performance indicators, the neural network model is used as the final trained neural network model for final multi-priority channel allocation;
  • the network performance indicators include:
  • Packet loss rate in Indicates whether channel c is allocated to industrial 5G terminal n in time slot t;
  • Related to industrial 5G terminal n priority p; Represents the number of data packets that industrial 5G terminal n is ready to transmit on channel c at the beginning of time slot t, Indicates the number of data packets successfully transmitted by the industrial 5G terminal n on the channel c at the end of the time slot t;
  • the end-to-end delay is defined as in Defined as the propagation delay of an industrial 5G terminal n, that is, the delay experienced by electromagnetic waves from the sending end of one industrial 5G terminal to the receiving end of another industrial 5G terminal; Defined as the transmission delay of the industrial 5G terminal n, that is, the delay experienced from the first bit of the data packet to the last bit being sent; Defined as the queuing delay of the industrial 5G terminal n, that is, the delay experienced by a data packet from reaching the industrial 5G terminal to leaving the industrial 5G terminal; d hw is defined as the hardware delay, that is, the delay caused by the hardware performance of the industrial 5G terminal.
  • the collection of the state information of all industrial 5G terminals in the current industrial 5G network as the input of the neural network model, and the multi-priority channel allocation through the neural network model includes the following steps:
  • the industrial base station schedules the industrial 5G terminal to access the channel.
  • Industrial 5G dynamic multi-priority multi-access system based on deep reinforcement learning including:
  • Edge computing server used to build and train a dynamic multi-priority multi-channel access neural network model based on deep reinforcement learning
  • Industrial 5G terminals used to generate industrial data with different transmission requirements, and collect terminal status information, action information, and reward information;
  • the industrial 5G base station is used to download the trained neural network model, and use the state information of the industrial 5G terminal as the input of the neural network model, and perform multi-priority channel allocation through the neural network model.
  • the present invention maps the real-time and reliable time-varying characteristics required by industrial 5G terminal data transmission to the dynamic priority of industrial 5G terminals, and uses dynamic multi-function based on deep reinforcement learning.
  • the priority multi-channel access algorithm solves the problems of traditional method modeling difficulty and algorithm state space explosion caused by the large number of distributed industrial 5G terminal communication in the industrial 5G network and the massive real-time and reliability requirements of different data. It effectively guarantees the reliable transmission of high real-time data and the channel access allocation between industrial 5G terminals of different priorities.
  • the present invention has strong versatility and practicability, can adaptively handle industrial 5G terminals and channel changes, can effectively ensure the dynamic multi-priority and multiple access of industrial 5G terminals, and achieve a specific packet loss rate and end-to-end Stable transmission under the constraints of end-to-end delay improves system security and stability.
  • Figure 1 is a flow chart of the method of the present invention
  • Figure 2 is a system model diagram
  • Figure 3 is a diagram of the deep reinforcement learning architecture.
  • the present invention relates to industrial 5G network technology, including the following steps: establishing an industrial 5G network model, determining the number, priority and channel number of industrial 5G terminals; establishing a dynamic multi-priority multi-channel access neural network model based on deep reinforcement learning, and initializing Model parameters; collect the status, action, and reward information of all industrial 5G terminals in multiple time slots in the industrial 5G network as training data; use the collected data to train the neural network model until the packet loss rate and end-to-end delay meet the requirements of industrial communication ; Collect the status information of all the industrial 5G terminals in the current time slot industrial 5G network, and use it as the neural network model input for multi-priority channel allocation.
  • the industrial 5G terminals perform multiple access according to the channel allocation results.
  • the present invention invents a dynamic multi-priority multi-channel access algorithm based on deep reinforcement learning. This method fully considers the difficulty in modeling of traditional methods and the explosion of algorithm state space caused by the different real-time and reliability requirements of massive industrial data transmission, and can efficiently and real-time perform multi-channel allocation to industrial 5G terminals of different priorities , To ensure large-scale concurrent access.
  • the present invention mainly includes the following implementation process, as shown in Figure 1, including the following steps:
  • Step 1 Establish an industrial 5G network model, determine the number of industrial 5G terminals, priority and channel number;
  • Step 2 Establish a dynamic multi-priority multi-channel access neural network model based on deep reinforcement learning, and initialize the model parameters;
  • Step 3 Collect the status, action, and reward information of all industrial 5G terminals in T time slots in the industrial 5G network as training data;
  • Step 4 Use the collected training data to train the neural network model until the packet loss rate and end-to-end delay meet the requirements of industrial communication;
  • Step 5 Collect the status information of all industrial 5G terminals in the current time slot industrial 5G network, and use it as the neural network model input to perform multi-priority channel allocation.
  • Industrial 5G terminals perform multiple access based on the channel allocation results.
  • the industrial 5G network includes: 1 industrial 5G base station, 1 edge computing server, and N industrial 5G terminals; among them, the edge computing server is connected to the industrial 5G base station for training deep reinforcement learning neural network models; industrial 5G The base station downloads the updated neural network model from the edge computing server, which is used to schedule multi-channel access with dynamic multi-user priority; the industrial 5G terminal is connected with the industrial 5G base station through the industrial 5G network to generate industrial data with different transmission requirements ;
  • the industrial 5G network model mainly includes two types of situations: the number of industrial equipment N is less than the number of channels C, and the number of industrial equipment N is greater than or equal to the number of channels C.
  • q-eval deep neural network is used to obtain the action vector a n of the current state vector s n of the industrial 5G terminal n (n ⁇ N) the evaluation function Q (s n, a n) ; q-next motion vector maximum depth of a neural network by the following state vector s for industrial 5G terminal of a n n n of the evaluation function
  • Use reinforcement learning Update q-eval deep neural network parameters ⁇ represents a learning rate
  • denotes a percentage off
  • r n represents the terminal 5G industry rewards in the current state n s n a n perform motion vector obtained.
  • the initialization parameters of the q-next deep neural network are the same as those of the q-eval deep neural network.
  • the parameters w and b of the q-eval deep neural network are updated after each iteration of the neural network model.
  • the parameters of the q-next deep neural network are updated after each iteration.
  • the neural network model is updated once after every iteration of training the neural network model once.
  • Collecting the status, action, and reward information of all industrial 5G terminals in T time slots in the industrial 5G network as training data includes:
  • an industrial 5G terminal n is allocated channel c (c ⁇ C) to transmit data in time slot t
  • the evaluation function of the c+1 th is the largest.
  • the 0th evaluation function of is the largest;
  • the reward vector r n (t) [r n (t)] of the industrial 5G terminal n at the time slot t, where r n (t) represents the reward obtained by the industrial 5G terminal n at the end of the time slot t, the reward value It is related to the success of data transmission and the priority of industrial 5G terminals. If the industrial 5G terminal n fails to send data in time slot t, regardless of the priority of the industrial 5G terminal, the reward obtained by the industrial 5G terminal n is negative; if the industrial 5G terminal n sends data successfully in the time slot t, the priority of the industrial 5G terminal becomes higher. High, the higher the reward obtained by the industrial 5G terminal n, which is a positive value.
  • the neural network training process includes the following steps:
  • the packet loss rate and end-to-end delay performance indicators include:
  • (1) Indicates whether channel c is allocated to industrial 5G terminal n in time slot t, if Indicates that channel c is not allocated to industrial 5G terminal n in time slot t, if It means that the channel c is allocated to the industrial 5G terminal n in the time slot t; the high-priority industrial 5G terminal can have a higher probability of accessing the channel to transmit data, and the low-priority industrial 5G terminal has a lower probability of accessing the channel to transmit data, that is, industrial The higher the priority of 5G terminal n, The higher the probability;
  • the end-to-end delay is defined as in Defined as the propagation delay of the industrial 5G terminal n, that is, the delay experienced by the electromagnetic wave from the sending end to the receiving end; Defined as the transmission delay of the industrial 5G terminal n, that is, the delay experienced from the first bit of the data packet to the last bit being sent; Defined as the queuing delay of industrial 5G terminal n, that is, the delay experienced by a data packet from reaching the industrial 5G terminal to leaving the industrial 5G terminal.
  • the higher the priority p of the industrial 5G terminal n, the smaller the queuing delay; d hw is defined as hardware Latency, that is, the time delay caused by the hardware performance of the industrial 5G terminal.
  • the industrial base station centrally dispatches the industrial 5G terminal to access the channel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明涉及工业5G网络技术,具体地说,是基于深度强化学习的工业5G动态多优先级多接入方法,包括以下步骤:建立工业5G网络模型;建立基于深度强化学习的动态多优先级多信道接入神经网络模型;采集工业5G网络内全部工业5G终端多个时隙的状态、动作、奖励信息作为训练数据;使用采集的数据训练神经网络模型,直至丢包率和端到端时延满足工业通信要求;收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,进行多优先级信道分配,工业5G终端根据信道分配结果进行多接入。本发明能够高效、实时地对工业5G网络内不同优先级的工业5G终端进行多信道分配,确保大规模并发接入。

Description

基于深度强化学习的工业5G动态多优先级多接入方法 技术领域
本发明提供基于深度强化学习的工业5G网络动态多优先级多信道接入方法,针对工业5G网络中大规模分布式工业5G终端并发通信以及高可靠低时延通信(Ultra-Reliable and Low Latency Communication,URLLC)要求,考虑海量工业数据传输对实时性、可靠性要求各异所造成的传统方法建模难和算法状态空间爆炸的问题,特别涉及工业5G终端的丢包率和端到端时延约束,属于工业5G网络技术领域。
背景技术
随着工业4.0的发展,大量的分布式工业5G终端互联互通,产生了海量的具有不同实时性、可靠性传输要求的数据。为实现灵活可定制的智能制造过程,分布式工业5G终端间使用工业无线网络实现数据通信。实时性、可靠性是数据通信最重要的服务质量要求,工业5G网络以其高可靠低时延和大规模机器间通信的性能保证,成为工业无线网络的通信使能技术。
多信道接入允许工业5G终端的大规模并发接入,能够有效地提高频谱利用效率,然而传统的多信道接入算法一般是基于已知的系统模型的,对于大规模机器间通信的工业场景,工业5G终端数量和数据是时变的,难以获得准确的系统模型。数据传输的高可靠低时延是工业通信中最重要的服务质量要求,工业5G终端产生的数据对于传输的实时性、可靠性的要求是时变的,然而传统工业生产过程中终端的优先级一般是恒定的,难以保证海量的时变的数据的实时性、可靠性传输要求。
对于大规模的工业5G终端动态多优先级多信道接入,不仅难以获得准确的系统模型,而且会造成算法的状态空间爆炸。深度强化学习,能够利用深度学习估计系统模型,结合强化学习求解动态多优先级多信道接入,有效地解决了系统模型建模难和状态空间爆炸问题。
发明内容
为实现上述发明目的,本发明的目的在于针对工业5G网络中大规模分布式工业5G终端并发通信以及高可靠低时延通信要求,考虑海量工业数据传输对实时性、可靠性要求各异所造成的传统方法建模难和算法状态空间爆炸的问题,提供基于深度强化学习的工业5G网络动态多优先级多接入方法,实现特定丢包率和端到端时延约束下的工业5G终端的动态多优先级多接入。
对于大规模的工业5G终端动态多优先级多信道接入,不仅难以获得准确的系统模型,而且会造成算法的状态空间爆炸。深度强化学习,能够利用深度学习估计系统模型,结合强化学习求解动态多优先级多信道接入,有效地解决了系统模型建模难和状态空间爆炸问 题。
本发明采用如下技术方案:基于深度强化学习的工业5G动态多优先级多接入方法,对于工业5G网络,通过训练神经网络模型实现信道分配,包括以下步骤:
1)建立基于深度强化学习的动态多优先级多信道接入神经网络模型;
2)采集工业5G网络内全部工业5G终端T个时隙的状态、动作、奖励信息作为训练数据,训练神经网络模型;
3)收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,通过神经网络模型进行多优先级信道分配,工业5G终端根据信道分配结果进行多接入。
所述工业5G网络包括:1台工业5G基站,1台边缘计算服务器,以及N个工业5G终端;
所述边缘计算服务器,与工业5G基站相连,用于训练深度强化学习神经网络模型;
所述工业5G基站,从边缘计算服务器下载训练后的神经网络模型,用于调度工业5G终端的动态多优先级的多信道接入;
所述工业5G终端,与工业5G基站通过工业5G网络连接,用于产生不同传输要求的工业数据。
对于工业5G网络,建立工业5G网络模型,包括:确定工业5G网络覆盖范围及其范围内工业5G终端数量N、工业5G终端优先级数P以及信道数C。
所述建立基于深度强化学习的动态多优先级多信道接入神经网络模型,如下:
构建q-eval深度神经网络和q-next深度神经网络两个结构相同的神经网络模型,神经网络参数params=[x in,x rnn,x fc,x out,w,b],其中,x in表示输入层神经元个数,并等于工业5G终端n(n∈N)的状态向量s n长度,N表示工业5G终端个数,x rnn表示循环神经网络层神经元个数,x fc表示全连接层神经元个数,x out表示输出层神经元个数并等于工业5G终端n的动作向量a n长度,w表示权重,b表示偏置;
其中,q-eval深度神经网络用于获得工业5G终端n当前状态向量s n的动作向量a n的估值函数Q(s n,a n);q-next神经网络模型用于选择工业5G终端n的下一个状态向量s n的最大动作向量a′ n的估值函数
Figure PCTCN2020139322-appb-000001
利用强化学习
Figure PCTCN2020139322-appb-000002
更新q-eval深度神经网络参数w、b;其中,α表示学习速率,γ表示折扣比例,r n表示工业5G终端n在当前状态s n执行动作向量a n获得的奖励;
q-next深度神经网络的初始化参数与q-eval深度神经网络的初始化参数相同,q-eval深度神经网络的参数w、b每次迭代训练神经网络模型后进行更新,q-next深度神经网络的参数w、b每迭代训练神经网络模型I次后更新一次。
所述训练数据,包括:
工业5G终端n在时隙t(t∈T)的状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)],其中c n(t)表示工 业5G终端n在时隙t开始时选择的信道c(c∈C),ack n(t)表示工业5G终端n在时隙t结束时数据是否发送成功,p n(t)表示工业5G终端n在时隙t的优先级p(p∈P),cf(t)表示时隙t所有信道c的占用率;
工业5G终端n在时隙t的动作向量
Figure PCTCN2020139322-appb-000003
其中
Figure PCTCN2020139322-appb-000004
表示工业5G终端n在时隙t被分配的信道c(c∈C);
工业5G终端n在时隙t的奖励向量r n(t)=[r n(t)],其中r n(t)表示工业5G终端n在时隙t结束时获得的奖励。
所述训练神经网络模型,包括以下步骤:
(1)将工业5G终端n时隙t的状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)]输入q-eval深度神经网络;
(2)通过q-eval深度神经网络选择动作向量:根据ε-greedy算法选择动作向量,以概率ε随机选择动作向量,即信道,或者以概率1-ε选择获得估值函数最大(即
Figure PCTCN2020139322-appb-000005
)的动作向量,即信道;
(3)根据动作向量a n(t)获得奖励r n(t)和观测o n(t);
(4)根据工业5G终端n时隙t的状态向量s n(t)和动作向量a n(t),获得工业5G终端n下一时隙t+1的状态向量s′ n(t+1),存储<s n(t),a n(t),r n(t),s′ n(t+1)>作为经验池经验,每个时隙的<s n(t),a n(t),r n(t),s′ n(t+1)>作为一个经验;
(5)将s′ n(t+1)输入q-next深度神经网络获得
Figure PCTCN2020139322-appb-000006
以及目标估计值
Figure PCTCN2020139322-appb-000007
(6)从经验池中使用经验回放法随机抽取M个经验,计算每个经验的Q target
Figure PCTCN2020139322-appb-000008
并根据均方误差损失函数
Figure PCTCN2020139322-appb-000009
和梯度下降
Figure PCTCN2020139322-appb-000010
更新q-eval深度神经网络参数w、b,其中η表示神经网络学习速率,θ(t)表示时隙t的神经网络超参数;
(7)重复迭代I次q-eval深度神经网络后,复制q-eval深度神经网络参数w、b给q-next深度神经网络;
(8)重复迭代(1)-(7)直至均方误差损失函数收敛,此时得到的q-eval深度神经网络作为训练后的神经网络模型。
得到训练的神经网络模型后,对该神经网络模型进行优化:
收集工业5G网络内全部工业5G终端的历史时隙的状态信息,通过该神经网络模型得到多优先级信道分配结果;当该分配结果的网络性能符合要求时,即丢包率、系统全局丢包率、端对端时延均小于对应的网络性能指标,则该神经网络模型作为最终的训练后的神经网络模型,用于进行最终的多优先级信道分配;
否则,重复步骤1)-2),直到神经网络模型符合要求为止。
所述网络性能指标包括:
丢包率
Figure PCTCN2020139322-appb-000011
其中
Figure PCTCN2020139322-appb-000012
表示在时隙t信道c是否被分配给工业5G终端n;
Figure PCTCN2020139322-appb-000013
与工业5G终端n优先级p相关;
Figure PCTCN2020139322-appb-000014
表示时隙t开始时工业5G终端n在信道c上准备传输的数据包数量,
Figure PCTCN2020139322-appb-000015
表示时隙t结束时工业5G终端n在信道c上成功传输的数据包数量;
系统全局丢包率
Figure PCTCN2020139322-appb-000016
其中
Figure PCTCN2020139322-appb-000017
表示全部N个工业5G终端在时隙t成功传输的数据包数量,
Figure PCTCN2020139322-appb-000018
表示全部N个工业5G终端在时隙t等待传输的数据包数量;
端到端时延定义为
Figure PCTCN2020139322-appb-000019
其中
Figure PCTCN2020139322-appb-000020
定义为工业5G终端n的传播时延,即电磁波从一个工业5G终端发送端到另一个工业5G终端接收端所经历的时延;
Figure PCTCN2020139322-appb-000021
定义为工业5G终端n的传输时延,即从数据包的第一个比特被发送到最后一个比特被发送所经历的时延;
Figure PCTCN2020139322-appb-000022
定义为工业5G终端n的排队时延,即数据包到达工业5G终端到离开工业5G终端所经历的时延;d hw定义为硬件时延,即工业5G终端的硬件性能造成的时延。
所述收集当前工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,通过神经网络模型进行多优先级信道分配,包括以下步骤:
收集当前时隙t工业5G网络内全部N个工业5G终端的状态向量
Figure PCTCN2020139322-appb-000023
作为训练好的神经网络模型输入,得到输出动作向量
Figure PCTCN2020139322-appb-000024
根据得到的输出动作向量,工业基站调度工业5G终端接入信道。
基于深度强化学习的工业5G动态多优先级多接入系统,包括:
边缘计算服务器,用于建立基于深度强化学习的动态多优先级多信道接入神经网络模型并训练;
工业5G终端,用于产生不同传输要求的工业数据,收集终端的状态信息、动作信息、奖励信息;
工业5G基站,用于下载训练后的神经网络模型,并将工业5G终端的状态信息作为该神经网络模型输入,通过该神经网络模型进行多优先级信道分配。
本发明具有以下有益效果及优点:
1、本发明针对工业5G的高可靠低时延通信需求,将工业5G终端数据传输要求的实时性、可靠性的时变性映射为工业5G终端的动态优先级,使用基于深度强化学习的动态多优先级多信道接入算法解决了工业5G网络中大量的分布式工业5G终端间通信以及海量的实时性、可靠性要求各异的数据造成的传统方法建模难和算法状态空间爆炸的问题,有效地保证了高实时数据的可靠传输和不同优先级的工业5G终端之间的信道接入分配。
2、本发明具有较强的通用性和实用性,能够自适应地处理工业5G终端、信道变化,能够有效地保证工业5G终端的动态多优先级多接入,实现特定丢包率和端到端时延约束下的稳定传输,提高系统安全性和稳定性。
附图说明
图1是本发明方法流程图;
图2是系统模型图;
图3是深度强化学习架构图。
具体实施方式
下面结合附图对本发明进行详细说明。
本发明涉及工业5G网络技术,包括以下步骤:建立工业5G网络模型,确定工业5G终端数量、优先级以及信道数;建立基于深度强化学习的动态多优先级多信道接入神经网络模型,并初始化模型参数;采集工业5G网络内全部工业5G终端多个时隙的状态、动作、奖励信息作为训练数据;使用采集的数据训练神经网络模型,直至丢包率和端到端时延满足工业通信要求;收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,进行多优先级信道分配,工业5G终端根据信道分配结果进行多接入。本发明针对工业5G网络中大规模分布式工业5G终端并发通信以及高可靠低时延通信要求,发明了基于深度强化学习的动态多优先级多信道接入算法。该方法充分考虑海量工业数据传输对实时性、可靠性要求各异所造成的传统方法建模难和算法状态空间爆炸的问题,能够高效、实时地对不同优先级的工业5G终端进行多信道分配,确保大规模并发接入。
本发明主要包括以下实现过程,如图1,包括以下步骤:
步骤1:建立工业5G网络模型,确定工业5G终端数量、优先级以及信道数;
步骤2:建立基于深度强化学习的动态多优先级多信道接入神经网络模型,并初始化模型参数;
步骤3:采集工业5G网络内全部工业5G终端T个时隙的状态、动作、奖励信息作为训练数据;
步骤4:使用采集的训练数据训练神经网络模型,直至丢包率和端到端时延满足工业通信要求;
步骤5:收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,进行多优先级信道分配,工业5G终端根据信道分配结果进行多接入。
该实施例是按照如图1所示的流程实施的,具体步骤如下:
1、建立工业5G网络模型,如图2,确定工业5G终端数量、优先级以及信道数:
(1)工业5G网络包括:1台工业5G基站,1台边缘计算服务器,以及N个工业5G终端;其中,边缘计算服务器与工业5G基站相连,用于训练深度强化学习神经网络模型;工业5G基站从边缘计算服务器下载更新训练的神经网络模型,用于调度动态多用户优先级的多信道接入;工业5G终端,与工业5G基站通过工业5G网络连接,用于产生不同传输要求的工业数据;
(2)确定工业5G网络覆盖范围及其范围内工业5G终端数量N、工业5G终端优先级数P以及信道数C。其中,优先级p与传输数据的实时性、可靠性相关,实时性、可靠性传输要求越高,工业5G终端优先级越高。工业5G网络模型主要包括两类情况:工业设备数量N小于信道数C,工业设备数量N大于等于信道数C。
2、建立基于深度强化学习的动态多优先级多信道接入神经网络模型,并初始化模型参数,如图3,包括以下步骤:
(1)建立基于深度强化学习的动态多优先级多信道接入神经网络模型,包括输入层、循环神经网络(Recurrent Neural Network,RNN)层,全连接层,输出层;
(2)初始化深度神经网络参数params=[x in,x rnn,x fc,x out,w,b],其中,x in表示输入层神经元个数,并等于工业5G终端n(n∈N)的状态向量s n长度,N表示工业5G终端个数,x rnn表示循环神经网络层神经元个数,x fc表示全连接层神经元个数,x out表示输出层神经元个数并等于工业5G终端n的动作向量a n长度,w表示权重,b表示偏置;
(3)构建q-eval和q-next两个结构相同的深度神经网络,其中:q-eval深度神经网络用于获得工业5G终端n(n∈N)当前状态向量s n的动作向量a n的估值函数Q(s n,a n);q-next深度神经网络通过选择工业5G终端n的下一个状态向量s n的最大动作向量a n的估值函数
Figure PCTCN2020139322-appb-000025
利用强化学习
Figure PCTCN2020139322-appb-000026
更新q-eval深度神经网络参数。其中,α表示学习速率,γ表示折扣比例,r n表示工业5G终端n在当前状态s n执行动作向量a n获得的奖励。q-next深度神经网络的初始化参数与q-eval深度神经网络相同,q-eval深度神经网络的参数w、b每次迭代训练神经网络模型后进行更新,q-next深度神经网络的参数w、b每迭代训练神经网络模型I次后更新一次。
3、采集工业5G网络内全部工业5G终端T个时隙的状态、动作、奖励信息作为训练数据包括:
(1)工业5G终端n(n∈N)在时隙t(t∈T)的状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)],其中c n(t)表示工业5G终端n在时隙t时选择的信道,大小为C+1的向量V c,即当工业5G终端n选择信道c时,V c的第c+1个值为1,其余值为0,当工业5G终端n选择不发送时,V c的第0个值为1,其余值为0;ack n(t)表示工业5G终端n在时隙t结束数据是否发送成功,如果ack n(t)=0,表示工业5G终端n在时隙t发送数据失败;如果ack n(t)=1,表示工业5G终端n在时隙t发送数据成功;ack n(t)从观测o n(t)获取;p n(t)表示工业5G终端n在时隙t的优先级,由工业5G终端n在时隙t要发送的数据的实时性和可靠性要求决定,数据的实时性和可靠性要求越高,p n(t)值越小,优先级越高;cf(t)表示时隙t所有信道c的占用率,大小为C+1的向量V cf,即每当有一个工业5G终端选择信道c传输时,V cf的第c+1个值加1,每当有一个工业5G终端选择不发送时,V cf的第0个值加1,信道c的数值越高,表明选择选择信道c的工业5G终端越多;
(2)工业5G终端n在时隙t的动作向量
Figure PCTCN2020139322-appb-000027
其中
Figure PCTCN2020139322-appb-000028
是大小为C+1的向量
Figure PCTCN2020139322-appb-000029
当工业5G终端n在时隙t被分配信道c(c∈C)传输数据时,
Figure PCTCN2020139322-appb-000030
的第c+1个的估值函数最大,当工业5G终端n在时隙t被分配不发送数据时,
Figure PCTCN2020139322-appb-000031
的第0个的估值函数最大;
(3)工业5G终端n在时隙t的奖励向量r n(t)=[r n(t)],其中r n(t)表示工业5G终端n在时隙t结束获得的奖励,奖励值与数据传输是否成功和工业5G终端优先级相关。如果工业5G终端n在时隙t发送数据失败,无论工业5G终端优先级,工业5G终端n获得的奖励为负值;如果工业5G终端n在时隙t发送数据成功,工业5G终端优先级越高,工业5G终端n获得的奖励越高,为正值。
4、使用采集的数据训练神经网络模型,直至丢包率和端到端时延满足工业控制通信要求,其中神经网络训练过程包括以下步骤:
(1)将工业5G终端n时隙t状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)]输入q-eval深度神经网络;
(2)根据ε-greedy算法选择动作向量,设定概率ε,以概率ε随机选择动作向量,即信道,或者以概率1-ε选择获得估值函数最大(即
Figure PCTCN2020139322-appb-000032
)的动作向量,即信道;
(3)根据动作向量a n(t)计算获得的奖励r n(t)和观测o n(t);
(4)根据工业5G终端n时隙t的状态向量s n(t)和动作向量a n(t),获得工业5G终端n下一时隙t+1的状态向量s′ n(t+1),存储<s n(t),a n(t),r n(t),s′ n(t+1)>作为经验池经验;每个时隙的< s n(t),a n(t),r n(t),s′ n(t+1)>作为一个经验;
(5)将s′ n(t+1)输入q-next深度神经网络获得
Figure PCTCN2020139322-appb-000033
以及目标估计值
Figure PCTCN2020139322-appb-000034
(6)从经验池中使用经验回放法随机抽取M个经验,计算每个经验的Q target
Figure PCTCN2020139322-appb-000035
并根据均方误差损失函数
Figure PCTCN2020139322-appb-000036
和梯度下降
Figure PCTCN2020139322-appb-000037
更新q-eval深度神经网络参数w、b,其中η表示神经网络学习速率,θ(t)表示时隙t的神经网络超参数;
(7)重复迭代I次q-eval深度神经网络后,复制q-eval深度神经网络参数w、b给q-next深度神经网络;
(8)重复迭代(1)-(7)直至均方误差损失函数收敛。
5、使用采集的数据训练神经网络模型,直至丢包率和端到端时延满足工业控制通信要求,其中丢包率和端到端时延性能指标包括:
(1)
Figure PCTCN2020139322-appb-000038
表示在时隙t信道c是否被分配给工业5G终端n,如果
Figure PCTCN2020139322-appb-000039
表示在时隙t信道c没有被分配给工业5G终端n,如果
Figure PCTCN2020139322-appb-000040
表示在时隙t信道c被分配给工业5G终端n;高优先级工业5G终端能够有较高概率接入信道传输数据,低优先级工业5G终端有较低概率接入信道传输数据,即工业5G终端n的优先级越高,
Figure PCTCN2020139322-appb-000041
的概率越高;
(2)假定信道容量充足,能够满足工业5G终端最大数据包的发送需求。当工业5G终端数N小于等于信道数C,所有工业5G终端都能接入信道传输数据,工业5G终端n丢包率
Figure PCTCN2020139322-appb-000042
当工业5G终端数N大于信道数C,工业5G终端n丢包率
Figure PCTCN2020139322-appb-000043
工业5G终端n优先级p越高,
Figure PCTCN2020139322-appb-000044
概率越高。
Figure PCTCN2020139322-appb-000045
表示时隙t开始时工业5G终端n在信道c上准备传输的数据包数量,
Figure PCTCN2020139322-appb-000046
表示时隙t结束时工业5G终端n在信道c上成功传输的数据包数量;
(3)假定信道容量充足,能够满足终端最大数据包的发送需求。当工业5G终端数N小于等于信道数C,所有工业5G终端都能接入信道传输数据,系统全局丢包率ρ(t)=0;当工业5G终端数N大于信道数C,系统全局丢包率
Figure PCTCN2020139322-appb-000047
其中
Figure PCTCN2020139322-appb-000048
表示全部N个 工业5G终端在时隙t成功传输的数据包数量,
Figure PCTCN2020139322-appb-000049
表示全部N个工业5G终端在时隙t等待传输的数据包数量;
(4)端到端时延定义为
Figure PCTCN2020139322-appb-000050
其中
Figure PCTCN2020139322-appb-000051
定义为工业5G终端n的传播时延,即电磁波从发送端到接收端所经历的时延;
Figure PCTCN2020139322-appb-000052
定义为工业5G终端n的传输时延,即从数据包的第一个比特被发送到最后一个比特被发送所经历的时延;
Figure PCTCN2020139322-appb-000053
定义为工业5G终端n的排队时延,即数据包到达工业5G终端到离开工业5G终端所经历的时延,工业5G终端n优先级p越高,排队时延越小;d hw定义为硬件时延,即工业5G终端的硬件性能造成的时延。
(5)判断
Figure PCTCN2020139322-appb-000054
ρ(t)和
Figure PCTCN2020139322-appb-000055
是否满足具体系统模型下的性能要求,如果满足,模型训练完成,否则,继续训练模型直至满足性能要求。
6、收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,进行多优先级信道分配;工业5G终端根据信道分配结果进行多接入包括:
(1)收集当前时隙t工业5G网络内全部N个工业5G终端的状态向量
Figure PCTCN2020139322-appb-000056
作为训练好的神经网络模型输入,得到输出动作向量
Figure PCTCN2020139322-appb-000057
(2)根据得到的输出动作向量,工业基站集中调度工业5G终端接入信道。

Claims (10)

  1. 基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,对于工业5G网络,通过训练神经网络模型实现信道分配,包括以下步骤:
    1)建立基于深度强化学习的动态多优先级多信道接入神经网络模型;
    2)采集工业5G网络内全部工业5G终端T个时隙的状态、动作、奖励信息作为训练数据,训练神经网络模型;
    3)收集当前时隙工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,通过神经网络模型进行多优先级信道分配,工业5G终端根据信道分配结果进行多接入。
  2. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述工业5G网络包括:1台工业5G基站,1台边缘计算服务器,以及N个工业5G终端;
    所述边缘计算服务器,与工业5G基站相连,用于训练深度强化学习神经网络模型;
    所述工业5G基站,从边缘计算服务器下载训练后的神经网络模型,用于调度工业5G终端的动态多优先级的多信道接入;
    所述工业5G终端,与工业5G基站通过工业5G网络连接,用于产生不同传输要求的工业数据。
  3. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,对于工业5G网络,建立工业5G网络模型,包括:确定工业5G网络覆盖范围及其范围内工业5G终端数量N、工业5G终端优先级数P以及信道数C。
  4. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述建立基于深度强化学习的动态多优先级多信道接入神经网络模型,如下:
    构建q-eval深度神经网络和q-next深度神经网络两个结构相同的神经网络模型,神经网络参数params=[x in,x rnn,x fc,x out,w,b],其中,x in表示输入层神经元个数,并等于工业5G终端n(n∈N)的状态向量s n长度,N表示工业5G终端个数,x rnn表示循环神经网络层神经元个数,x fc表示全连接层神经元个数,x out表示输出层神经元个数并等于工业5G终端n的动作向量a n长度,w表示权重,b表示偏置;
    其中,q-eval深度神经网络用于获得工业5G终端n当前状态向量s n的动作向量a n的估值函数Q(s n,a n);q-next神经网络模型用于选择工业5G终端n的下一个状态向量s n的最大动作向量a′ n的估值函数
    Figure PCTCN2020139322-appb-100001
    利用强化学习
    Figure PCTCN2020139322-appb-100002
    更新q-eval深度神经网络参数w、b;其中,α表示学习速率,γ表示折扣比例,r n表示工业5G终端n在当前状态s n执行动作向量a n获得的奖励;
    q-next深度神经网络的初始化参数与q-eval深度神经网络的初始化参数相同,q-eval深度 神经网络的参数w、b每次迭代训练神经网络模型后进行更新,q-next深度神经网络的参数w、b每迭代训练神经网络模型I次后更新一次。
  5. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述训练数据,包括:
    工业5G终端n在时隙t(t∈T)的状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)],其中c n(t)表示工业5G终端n在时隙t开始时选择的信道c(c∈C),ack n(t)表示工业5G终端n在时隙t结束时数据是否发送成功,p n(t)表示工业5G终端n在时隙t的优先级p(p∈P),cf(t)表示时隙t所有信道c的占用率;
    工业5G终端n在时隙t的动作向量
    Figure PCTCN2020139322-appb-100003
    其中
    Figure PCTCN2020139322-appb-100004
    表示工业5G终端n在时隙t被分配的信道c(c∈C);
    工业5G终端n在时隙t的奖励向量r n(t)=[r n(t)],其中r n(t)表示工业5G终端n在时隙t结束时获得的奖励。
  6. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述训练神经网络模型,包括以下步骤:
    (1)将工业5G终端n时隙t的状态向量s n(t)=[c n(t),ack n(t),p n(t),cf(t)]输入q-eval深度神经网络;
    (2)通过q-eval深度神经网络选择动作向量:根据ε-greedy算法选择动作向量:以概率ε随机选择动作向量,即信道,或者以概率1-ε选择获得估值函数最大即
    Figure PCTCN2020139322-appb-100005
    的动作向量,即信道;
    (3)根据动作向量a n(t)获得奖励r n(t)和观测o n(t);
    (4)根据工业5G终端n时隙t的状态向量s n(t)和动作向量a n(t),获得工业5G终端n下一时隙t+1的状态向量s′ n(t+1),存储<s n(t),a n(t),r n(t),s′ n(t+1)>作为经验池经验,每个时隙的<s n(t),a n(t),r n(t),s′ n(t+1)>作为一个经验;
    (5)将s′ n(t+1)输入q-next深度神经网络获得
    Figure PCTCN2020139322-appb-100006
    以及目标估计值
    Figure PCTCN2020139322-appb-100007
    (6)从经验池中使用经验回放法随机抽取M个经验,计算每个经验的Q target
    Figure PCTCN2020139322-appb-100008
    并根据均方误差损失函数
    Figure PCTCN2020139322-appb-100009
    和梯度下降
    Figure PCTCN2020139322-appb-100010
    更新q-eval深度神经网络参数w、b,其中η表示神经网络学习速率,θ(t)表示时隙t的神经网络超参数;
    (7)重复迭代I次q-eval深度神经网络后,复制q-eval深度神经网络参数w、b给q-next深度神经网络;
    (8)重复迭代(1)-(7)直至均方误差损失函数收敛,此时得到的q-eval深度神经网络作为训练后的神经网络模型。
  7. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,得到训练的神经网络模型后,对该神经网络模型进行优化:
    收集工业5G网络内全部工业5G终端的历史时隙的状态信息,通过该神经网络模型得到多优先级信道分配结果;当该分配结果的网络性能符合要求时,即丢包率、系统全局丢包率、端对端时延均小于对应的网络性能指标,则该神经网络模型作为最终的训练后的神经网络模型,用于进行最终的多优先级信道分配;
    否则,重复权利要求1中的步骤1)-2),直到神经网络模型符合要求为止。
  8. 根据权利要求7所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述网络性能指标包括:
    丢包率
    Figure PCTCN2020139322-appb-100011
    其中
    Figure PCTCN2020139322-appb-100012
    表示在时隙t信道c是否被分配给工业5G终端n;
    Figure PCTCN2020139322-appb-100013
    与工业5G终端n优先级p相关;
    Figure PCTCN2020139322-appb-100014
    表示时隙t开始时工业5G终端n在信道c上准备传输的数据包数量,
    Figure PCTCN2020139322-appb-100015
    表示时隙t结束时工业5G终端n在信道c上成功传输的数据包数量;
    系统全局丢包率
    Figure PCTCN2020139322-appb-100016
    其中
    Figure PCTCN2020139322-appb-100017
    表示全部N个工业5G终端在时隙t成功传输的数据包数量,
    Figure PCTCN2020139322-appb-100018
    表示全部N个工业5G终端在时隙t等待传输的数据包数量;
    端到端时延定义为
    Figure PCTCN2020139322-appb-100019
    其中
    Figure PCTCN2020139322-appb-100020
    定义为工业5G终端n的传播时延,即电磁波从一个工业5G终端发送端到另一个工业5G终端接收端所经历的时延;
    Figure PCTCN2020139322-appb-100021
    定义为工业5G终端n的传输时延,即从数据包的第一个比特被发送到最后一个比特被发送所经历的时延;
    Figure PCTCN2020139322-appb-100022
    定义为工业5G终端n的排队时延,即数据包到达工业5G终端到离开工业5G终端所经历的时延;d hw定义为硬件时延,即工业5G终端的硬件性能造成的时延。
  9. 根据权利要求1所述的基于深度强化学习的工业5G动态多优先级多接入方法,其特征在于,所述收集当前工业5G网络内全部工业5G终端的状态信息,作为神经网络模型输入,通过神经网络模型进行多优先级信道分配,包括以下步骤:
    收集当前时隙t工业5G网络内全部N个工业5G终端的状态向量
    Figure PCTCN2020139322-appb-100023
    作为训练好的神经网络模型输入,得到输出动作向量
    Figure PCTCN2020139322-appb-100024
    根据得到的输出动作向量,工业基站调度工业5G终端接入信道。
  10. 基于深度强化学习的工业5G动态多优先级多接入系统,其特征在于,包括:
    边缘计算服务器,用于建立基于深度强化学习的动态多优先级多信道接入神经网络模型并训练;
    工业5G终端,用于产生不同传输要求的工业数据,收集终端的状态信息、动作信息、奖励信息;
    工业5G基站,用于下载训练后的神经网络模型,并将工业5G终端的状态信息作为该神经网络模型输入,通过该神经网络模型进行多优先级信道分配。
PCT/CN2020/139322 2020-05-09 2020-12-25 基于深度强化学习的工业5g动态多优先级多接入方法 WO2021227508A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/296,509 US20220217792A1 (en) 2020-05-09 2020-12-25 Industrial 5g dynamic multi-priority multi-access method based on deep reinforcement learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010385640.4A CN111628855B (zh) 2020-05-09 2020-05-09 基于深度强化学习的工业5g动态多优先级多接入方法
CN202010385640.4 2020-05-09

Publications (1)

Publication Number Publication Date
WO2021227508A1 true WO2021227508A1 (zh) 2021-11-18

Family

ID=72272702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139322 WO2021227508A1 (zh) 2020-05-09 2020-12-25 基于深度强化学习的工业5g动态多优先级多接入方法

Country Status (3)

Country Link
US (1) US20220217792A1 (zh)
CN (1) CN111628855B (zh)
WO (1) WO2021227508A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341685A (zh) * 2023-05-31 2023-06-27 合肥工业大学智能制造技术研究院 基于联合注意力的分布式计算卸载模型训练方法和系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200257985A1 (en) * 2019-02-08 2020-08-13 DeepSig Inc. Adversarially generated communications
CN111628855B (zh) * 2020-05-09 2021-06-15 中国科学院沈阳自动化研究所 基于深度强化学习的工业5g动态多优先级多接入方法
CN112188503B (zh) * 2020-09-30 2021-06-22 南京爱而赢科技有限公司 一种应用于蜂窝网络的基于深度强化学习的动态多信道接入方法
US20220007382A1 (en) * 2020-10-07 2022-01-06 Intel Corporation Model-assisted deep reinforcement learning based scheduling in wireless networks
CN113543156B (zh) * 2021-06-24 2022-05-06 中国科学院沈阳自动化研究所 基于多智能体深度强化学习的工业无线网络资源分配方法
CN113613339B (zh) * 2021-07-10 2023-10-17 西北农林科技大学 基于深度强化学习的多优先级无线终端的信道接入方法
CN114599117B (zh) * 2022-03-07 2023-01-10 中国科学院微小卫星创新研究院 低轨卫星网络随机接入中回退资源的动态配置方法
CN115315020A (zh) * 2022-08-08 2022-11-08 重庆邮电大学 基于区分服务的ieee 802.15.4协议的智能csma/ca退避方法
CN116233895B (zh) * 2023-05-04 2023-07-18 合肥工业大学 基于强化学习的5g配网节点通信优化方法、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110557769A (zh) * 2019-09-12 2019-12-10 南京邮电大学 基于深度强化学习的c-ran计算卸载和资源分配方法
CN110691422A (zh) * 2019-10-06 2020-01-14 湖北工业大学 一种基于深度强化学习的多信道智能接入方法
WO2020032594A1 (ko) * 2018-08-07 2020-02-13 엘지전자 주식회사 무선 통신 시스템에서 노드의 동작 방법 및 상기 방법을 이용하는 장치
CN110856268A (zh) * 2019-10-30 2020-02-28 西安交通大学 一种无线网络动态多信道接入方法
CN111628855A (zh) * 2020-05-09 2020-09-04 中国科学院沈阳自动化研究所 基于深度强化学习的工业5g动态多优先级多接入方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014164856A1 (en) * 2013-03-11 2014-10-09 Entropic Communications, Inc. Synchronized multi-channel access system
CN110035478A (zh) * 2019-04-18 2019-07-19 北京邮电大学 一种高速移动场景下的动态多信道接入方法
KR102201858B1 (ko) * 2019-08-26 2021-01-12 엘지전자 주식회사 인공지능 기반 영상 편집 방법 및 지능형 디바이스

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020032594A1 (ko) * 2018-08-07 2020-02-13 엘지전자 주식회사 무선 통신 시스템에서 노드의 동작 방법 및 상기 방법을 이용하는 장치
CN110557769A (zh) * 2019-09-12 2019-12-10 南京邮电大学 基于深度强化学习的c-ran计算卸载和资源分配方法
CN110691422A (zh) * 2019-10-06 2020-01-14 湖北工业大学 一种基于深度强化学习的多信道智能接入方法
CN110856268A (zh) * 2019-10-30 2020-02-28 西安交通大学 一种无线网络动态多信道接入方法
CN111628855A (zh) * 2020-05-09 2020-09-04 中国科学院沈阳自动化研究所 基于深度强化学习的工业5g动态多优先级多接入方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341685A (zh) * 2023-05-31 2023-06-27 合肥工业大学智能制造技术研究院 基于联合注意力的分布式计算卸载模型训练方法和系统
CN116341685B (zh) * 2023-05-31 2023-07-21 合肥工业大学智能制造技术研究院 基于联合注意力的分布式计算卸载模型训练方法和系统

Also Published As

Publication number Publication date
CN111628855B (zh) 2021-06-15
US20220217792A1 (en) 2022-07-07
CN111628855A (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021227508A1 (zh) 基于深度强化学习的工业5g动态多优先级多接入方法
CN111629380B (zh) 面向高并发多业务工业5g网络的动态资源分配方法
CN109947545B (zh) 一种基于用户移动性的任务卸载及迁移的决策方法
CN113222179B (zh) 一种基于模型稀疏化与权重量化的联邦学习模型压缩方法
CN111711666B (zh) 一种基于强化学习的车联网云计算资源优化方法
CN110167176B (zh) 一种基于分布式机器学习的无线网络资源分配方法
WO2023179010A1 (zh) 一种noma-mec系统中的用户分组和资源分配方法及装置
CN113518007B (zh) 一种基于联邦学习的多物联网设备异构模型高效互学习方法
CN110955463A (zh) 支持边缘计算的物联网多用户计算卸载方法
CN112637883A (zh) 电力物联网中对无线环境变化具有鲁棒性的联邦学习方法
CN114585006B (zh) 基于深度学习的边缘计算任务卸载和资源分配方法
CN114285853A (zh) 设备密集型工业物联网中基于端边云协同的任务卸载方法
CN113573363B (zh) 基于深度强化学习的mec计算卸载与资源分配方法
CN113469325A (zh) 一种边缘聚合间隔自适应控制的分层联邦学习方法、计算机设备、存储介质
CN115374853A (zh) 基于T-Step聚合算法的异步联邦学习方法及系统
CN114828018A (zh) 一种基于深度确定性策略梯度的多用户移动边缘计算卸载方法
WO2022242468A1 (zh) 任务卸载方法、调度优化方法和装置、电子设备及存储介质
CN112929900B (zh) 水声网络中基于深度强化学习实现时域干扰对齐的mac协议
CN115756873B (zh) 一种基于联邦强化学习的移动边缘计算卸载方法和平台
CN115314399B (zh) 一种基于逆强化学习的数据中心流量调度方法
CN111930435A (zh) 一种基于pd-bpso技术的任务卸载决策方法
Bhattacharyya et al. QFlow: A learning approach to high QoE video streaming at the wireless edge
CN116484976A (zh) 一种无线网络中异步联邦学习方法
CN114615705B (zh) 一种基于5g网络下单用户资源分配策略方法
CN115413044A (zh) 一种工业无线网络的计算和通信资源联合分配方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935543

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20935543

Country of ref document: EP

Kind code of ref document: A1