CN113055229B - Wireless network self-selection protocol method based on DDQN - Google Patents

Wireless network self-selection protocol method based on DDQN Download PDF

Info

Publication number
CN113055229B
CN113055229B CN202110249773.3A CN202110249773A CN113055229B CN 113055229 B CN113055229 B CN 113055229B CN 202110249773 A CN202110249773 A CN 202110249773A CN 113055229 B CN113055229 B CN 113055229B
Authority
CN
China
Prior art keywords
network
state
action
reward
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110249773.3A
Other languages
Chinese (zh)
Other versions
CN113055229A (en
Inventor
严海蓉
王重阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110249773.3A priority Critical patent/CN113055229B/en
Publication of CN113055229A publication Critical patent/CN113055229A/en
Application granted granted Critical
Publication of CN113055229B publication Critical patent/CN113055229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/0005Control or signalling for completing the hand-off
    • H04W36/0055Transmission or use of information for re-establishing the radio link
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/0005Control or signalling for completing the hand-off
    • H04W36/0083Determination of parameters used for hand-off, e.g. generation or modification of neighbour cell lists
    • H04W36/00837Determination of triggering parameters for hand-off
    • H04W36/008375Determination of triggering parameters for hand-off based on historical data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/24Reselection being triggered by specific parameters
    • H04W36/30Reselection being triggered by specific parameters by measured or perceived connection quality data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a DDQN-based wireless network self-selection protocol method, which aims at the situations of complex current wireless network environment and integration of a plurality of protocols. The method comprises the following steps: acquiring current network environment quality parameters and determining node service types in real time through an environment agent module; noise reduction and normalization are carried out on the data on the basis of the step 1), the node service type is determined through an analytic hierarchy process, and feature extraction is carried out; based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable. According to the method, the data is directly subjected to feature extraction without preprocessing, the obtained historical data is used as training data, and the strong advantage of deep learning is utilized, so that the learning speed and decision performance of the reinforcement learning algorithm are effectively improved.

Description

一种基于DDQN的无线网络自选择协议的方法A method of wireless network self-selection protocol based on DDQN

技术领域Technical field

本发明涉及一种在异构无线网络下网络协议自选择的方法,针对当前无线网络环境复杂及众多协议融合的情况。The invention relates to a method for self-selecting network protocols in heterogeneous wireless networks, aiming at the current complex wireless network environment and the integration of many protocols.

背景技术Background technique

随着网络技术的不断发展,当今世界上广泛应用的网络技术产生大量的重叠,当前的网络环境WLAN和蜂窝网络是最为常见的异构网络组合,在现代信息通信中也起到了重要的作用,运营商也在用户密集区如商场、学校、办公楼部署了自己的WLAN热点用于分散蜂窝网络造成的压力。With the continuous development of network technology, network technologies widely used in the world today have a large amount of overlap. In the current network environment, WLAN and cellular networks are the most common heterogeneous network combinations, and they also play an important role in modern information communications. Operators have also deployed their own WLAN hotspots in user-dense areas such as shopping malls, schools, and office buildings to disperse the pressure caused by cellular networks.

下一代的异构网络,是一种融合多种协议但环境复杂的网络,需要在任何时间,任何地点对用户提供可靠的网络服务。但是在这点实现之前,需要网络环境的成熟,要解决无线网络覆盖,网络自配置,网络设备的自动管理等功能。在即有的网络环境下,实现单一网络协议完成以上配置还有一定的困难,但可以通过一些算法来实现对当前异构网络的资源综合调度,高效的利用异构网络资源的切换将逐渐成为研究的热点。随着无线通信的进一步发展,将会对异构网络的可扩展性和灵活性提出一定的要求。The next generation heterogeneous network is a network that integrates multiple protocols but has a complex environment. It needs to provide reliable network services to users at any time and anywhere. But before this can be realized, the network environment needs to mature, and functions such as wireless network coverage, network self-configuration, and automatic management of network equipment must be solved. In the existing network environment, it is still difficult to implement the above configuration with a single network protocol. However, some algorithms can be used to achieve comprehensive resource scheduling of the current heterogeneous network. Efficient use of heterogeneous network resource switching will gradually become a research topic. hot spots. With the further development of wireless communications, certain requirements will be placed on the scalability and flexibility of heterogeneous networks.

强化学习作为一种在非确定环境下能够做出符合发展环境要求决策的一种工具,能够根据网络动态变化进行有针对性的调整,使异构无线网络能够成为一种自动适应用户场景变化的方案,优化网络环境。强化学习是机器学习的一种,主要通过智能体(Agent)在环境(Environment)中不断的进行调整,最终可以实现某个特定指标(Reward)的最大化,在无线网络中,由于节点的移动,以及节点之间的相互干扰,使网络环境变得复杂,相对于传统的机器学习算法,深度强化学习具有着更大的潜力及更高的准确度,无需对数据进行预处理直接进行特征提取,将获得的历史数据作为训练数据,利用深度学习的强大优势,有效提高强化学习算法的学习速度和决策性能。As a tool that can make decisions that meet the requirements of the development environment in a non-deterministic environment, reinforcement learning can make targeted adjustments according to dynamic changes in the network, making heterogeneous wireless networks a system that automatically adapts to changes in user scenarios. program to optimize the network environment. Reinforcement learning is a type of machine learning. It mainly uses agents to continuously adjust in the environment, and ultimately can maximize a specific indicator (Reward). In wireless networks, due to the movement of nodes, , and the mutual interference between nodes make the network environment complex. Compared with traditional machine learning algorithms, deep reinforcement learning has greater potential and higher accuracy, and can directly extract features without preprocessing the data. , using the obtained historical data as training data, and taking advantage of the powerful advantages of deep learning to effectively improve the learning speed and decision-making performance of the reinforcement learning algorithm.

发明内容Contents of the invention

针对以上现有网络的特点,本发明提供一种基于DDQN(Deep ReinforcementLearning with Double Q-learning)的无线网络自选择协议的方法。包括:网络质量数据的处理方案;基于深度学习的特征提取方案;基于DDQN的网络协议选择方案。本发明的目的通过以下技术方案来实现。In view of the above characteristics of existing networks, the present invention provides a wireless network self-selecting protocol method based on DDQN (Deep Reinforcement Learning with Double Q-learning). Including: network quality data processing solution; feature extraction solution based on deep learning; network protocol selection solution based on DDQN. The object of the present invention is achieved through the following technical solutions.

一种基于DDQN的无线网络自选择协议的方法,该方法包括如下步骤:A method for wireless network self-selection protocol based on DDQN, which method includes the following steps:

1)通过环境代理模块实时获取当前网络环境质量参数及确定节点业务类型;1) Obtain the current network environment quality parameters in real time and determine the node business type through the environment agent module;

2)在1)的基础上对数据进行降噪,归一化处理,通过层次分析法确定节点业务类型,进行特征提取;2) On the basis of 1), perform noise reduction and normalization processing on the data, determine the node business type through the analytic hierarchy process, and perform feature extraction;

3)在2)的基础上,把数据输入到DDQN决策网络中实时训练,应用执行结果,使网络状态趋于稳定。3) On the basis of 2), input the data into the DDQN decision-making network for real-time training, and apply the execution results to stabilize the network state.

1.一种基于DDQN的无线网络自选择协议的方法,其特征在于包括以下步骤:1. A method for wireless network self-selection protocol based on DDQN, which is characterized by including the following steps:

第一步:通过环境代理模块实时获取当前网络环境质量参数及节点业务类型确定状态、动作及奖励值;Step 1: Obtain the current network environment quality parameters and node business type in real time through the environment agent module to determine the status, action and reward value;

状态空间定义:在t时刻一个终端的状态状态空间S的定义为smn∈S,表示终端m接入了第n个网络并在网络中进行信息交互时的状态;其状态空间为:State space definition: The definition of the state space S of a terminal at time t is s mn ∈S, which represents the state when the terminal m is connected to the nth network and interacts with information in the network; its state space is:

S=s1,s2,…,smn (1)S=s 1 ,s 2 ,…,s mn (1)

状态定义:使用平均吞吐量T、时延D、信号强度P、节点距离W来描述网络状态,则网络质量Φ表示为:State definition: Use average throughput T, delay D, signal strength P, and node distance W to describe the network state. The network quality Φ is expressed as:

Φ=T×D×P×W (2)Φ=T×D×P×W (2)

动作空间定义:需要设定动作空间供智能体进行选择,动作空间的定义为:Action space definition: An action space needs to be set for the agent to choose. The definition of action space is:

A=a1,a2,…,an (3)A=a 1 ,a 2 ,…, an (3)

其中an表示某个节点使用第n个网络协议;where a n indicates that a node uses the nth network protocol;

接入业务网络参数由QoS参数组成,对网络QoS建立判决矩阵,求解参数权重:Access service network parameters consist of QoS parameters. Establish a decision matrix for network QoS and solve for parameter weights:

判决矩阵如上所示,其中各元素表示QoS参数的重要程度,具体如下表定义,同时判决矩阵应满足mij>0;mji=1/mij;mij=1;The decision matrix is as shown above, where each element represents the importance of the QoS parameters, as defined in the following table. At the same time, the decision matrix should satisfy m ij >0; m ji =1/m ij ; m ij =1;

表中没有出现的2、4、6、8用来表示相邻判断的中间值;由于在定义奖励值的过程中将业务类型分为4类,以及考虑了吞吐量、时延、信号强度三种属性,判决矩阵应该定义为3*3的矩阵,即Mi∈R3×3,其中i=1、2、3、4分别表示1类、2类、3类、4类四种业务类型,然后根据不同业务QoS参数的需求对四种业务分别建立判决矩阵;2, 4, 6, and 8 that do not appear in the table are used to represent the intermediate values of adjacent judgments; because in the process of defining reward values, business types are divided into 4 categories, and throughput, delay, and signal strength are considered. attribute, the decision matrix should be defined as a 3*3 matrix, that is, Mi ∈R 3×3 , where i=1, 2, 3, and 4 respectively represent the four business types of category 1, category 2, category 3, and category 4. , and then establish decision matrices for the four types of services according to the requirements of different service QoS parameters;

根据现行网络业务类型划分标准RFC2474,通过DSCP确定业务等级中属性值;DSCP通过在每个数据包IP头部的服务类别TOS标识字节中,利用已使用的6比特和未使用的2比特,通过编码值来确定IP优先权;IP优先级字段,可以应用于流分类,数值越大表示优先级越高,取值0到63,可匹配64种等级,根据等级大小每七个等级划分为一类,即可通过发送IP数据包中的DSCP字段确定业务属性和和参数之间的关系;According to the current network service type classification standard RFC2474, DSCP is used to determine the attribute value in the service level; DSCP uses the used 6 bits and the unused 2 bits in the service category TOS identification byte in the IP header of each data packet. The IP priority is determined by encoding the value; the IP priority field can be applied to traffic classification. The larger the value, the higher the priority. The value ranges from 0 to 63, which can match 64 levels. According to the size of the level, every seven levels are divided into In the first category, the relationship between business attributes and parameters can be determined by sending the DSCP field in the IP data packet;

对于这四类业务,i依次取值1、2、3、4;将最大特征值对应的特征向量归一化,即归一化后的特征向量中的各个值就是对应的网络QoS参数的权重;在以上四种情况下会产生不同业务类型对网络参数需求的差别,这些差别将在之后对奖励值权重的划分产生影响;将整个网络看成是一个整体,最终的目标将是通过选择节点使用协议优化整体网络质量,奖励值是一个和网络具有强相关的函数;For these four types of business, i takes the values 1, 2, 3, and 4 in sequence; the eigenvector corresponding to the largest eigenvalue is normalized, that is Each value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, there will be differences in the network parameter requirements of different business types, and these differences will be produced later in the division of reward value weights. Impact; Treat the entire network as a whole, and the ultimate goal will be to optimize the overall network quality by selecting nodes to use protocols. The reward value is a function that is strongly related to the network;

Vt=v1,v2,…,vn (5)V t =v 1 ,v 2 ,…,v n (5)

t表示t时刻网络的状态信息,Vt是网络状态空间的子集Φ,因此,对于特定业务B,网络空间状态Vt,奖励函数R表示为,并将在下一步进行求解:t represents the state information of the network at time t, and V t is a subset Φ of the network state space. Therefore, for a specific business B, the network space state V t , the reward function R is expressed as, and will be solved in the next step:

R=fB(Vt) (6)R=f B (V t ) (6)

节点的接入会影响网络参数的变动,当执行动作后,需要对网络状态进行衡量并反馈相应的奖励;当所执行的动作导致网络吞吐量升高、时延降低、信号强度增强即为有效动作;相反,当所执行动作导致网络吞吐量降低、时延降低、信号强度下降即为无效动作;因此计算奖励时考虑平均吞吐量αavg,平均时延βavg,信号强度γ;The access of nodes will affect changes in network parameters. After an action is executed, the network status needs to be measured and corresponding rewards fed back; when the action performed leads to an increase in network throughput, a reduction in latency, and an increase in signal strength, it is an effective action. ; On the contrary, when the action performed results in a decrease in network throughput, delay, and signal strength, it is an invalid action; therefore, the average throughput α avg , average delay β avg , and signal strength γ are considered when calculating the reward;

第二步:在1)的基础上对数据进行归一化处理,确定节点业务类型,确定奖励函数;Step 2: Normalize the data based on 1), determine the node business type, and determine the reward function;

使用min-max标准化,消除数据因单位不同而产生的影响:Use min-max standardization to eliminate the impact of different data units:

使用上述方程进行归一化分别得到归一化后的网络平均吞吐量ft(α)avg,平均时延ft(β)avg,信号强度ft(γ);Use the above equations for normalization to obtain the normalized network average throughput f t (α) avg , average delay f t (β) avg , and signal strength f t (γ);

综合以上公式得奖励函数:Combining the above formulas, we get the reward function:

R=ω1ft(α)avg2ft(β)avg3ft(γ) (8)R=ω 1 f t (α) avg2 f t (β) avg3 f t (γ) (8)

其中ω1、ω2、ω3是判决矩阵归一化后的特征向量对应的的网络平均吞吐量、时延、信号强度的权重;Among them, ω 1 , ω 2 , and ω 3 are the weights of the average network throughput, delay, and signal strength corresponding to the eigenvectors after normalization of the decision matrix;

第三步:在2)的基础上,把数据输入到DDQN决策网络中实时训练,应用执行结果,使网络状态趋于稳定;Step 3: Based on 2), input the data into the DDQN decision-making network for real-time training, and apply the execution results to stabilize the network state;

首先初始化状态S、动作空间A,初始化Q矩阵为零矩阵,用随机的参数θ初始化Q-MainNet网络和Q-target网络,θ为网络参数,初始化时Q-MainNetθ随机设定,Q-targetθ-=0,t表示当前时间状态,智能体模块读取当前网络状态信息St,将其输入到Q-MainNet网络,在St状态下不同动作的Q值通过Q-MainNet网络输出;根据ε-greedy策略,Q-MainNet网络以概率ε随机选择一个动作at∈A,或者以概率1-ε选择动作 终端在异构无线网络中执行相应动作,经过获取网络数据数据处理,处理成算法需要使用的格式,交给控制层进行处理;从而获得吞吐量α,时延β,信号强度γ;然后将他们分别归一化;根据业务的类型,通过层次分析法得到ft(α)avg、ft(β)avg、ft(γ)的权重,之后加权求和得到奖励值R;Q-MainNet获取系统状态和奖励值,通过公式(9)First initialize the state S and action space A, initialize the Q matrix to a zero matrix, initialize the Q-MainNet network and Q-target network with random parameters θ, θ is the network parameter, Q-MainNet θ is randomly set during initialization, Q-targetθ - =0, t represents the current time state. The agent module reads the current network state information S t and inputs it into the Q-MainNet network. The Q values of different actions in the S t state are output through the Q-MainNet network; according to ε- Greedy strategy, the Q-MainNet network randomly selects an action a t ∈ A with probability ε, or selects an action with probability 1-ε The terminal performs corresponding actions in the heterogeneous wireless network. After obtaining the network data, the data is processed into the format required by the algorithm and handed over to the control layer for processing; thereby obtaining the throughput α, delay β, and signal strength γ; and then they are Normalize separately; according to the type of business, obtain the weights of f t (α) avg , f t (β) avg , and f t (γ) through the analytic hierarchy process, and then weight the sum to obtain the reward value R; Q-MainNet obtains System status and reward value, through formula (9)

进行奖励值计算,其中Rt+1是对应在St+1状态下计算得到的奖励,γ为衰减系数,智能体在当前的状态下的奖励值其实就是未来所有可能的奖励值转换成此时此刻的奖励值;动作执行完毕,系统进入下一个状态St+1Calculate the reward value, where R t+1 is the reward calculated corresponding to the state S t+1 , and γ is the attenuation coefficient. The reward value of the agent in the current state is actually the conversion of all possible reward values in the future into this The reward value at this moment; after the action is executed, the system enters the next state S t+1 ;

Q-MainNet网络将记忆组(st,at,rt,st+1)即当前状态st,动作空间at当前奖励值rt,以及t+1网络状态存储到经验池中,在每一步,Q-target网络随机从经验池中采样,与Q-MainNet网络的输出一起计算损失值相对于参数θ在两个网络Q的差值,即(TargetQ-Q(St+1,a;θt))2上执行梯度下降算法;每一轮迭代后,将Q-MainNet网络的参数复制给Q-target网络;不断循环进行训练。The Q-MainNet network stores the memory group (s t , a t , r t , s t+1 ), that is, the current state s t , the current reward value r t of the action space a t, and the t+1 network state into the experience pool. At each step, the Q-target network randomly samples from the experience pool, and together with the output of the Q-MainNet network, calculates the difference in the loss value relative to the parameter θ in the two networks Q, that is, (TargetQ-Q(S t+1 , a; θ t )) 2 to execute the gradient descent algorithm; after each round of iterations, copy the parameters of the Q-MainNet network to the Q-target network; continue to cycle for training.

附图说明Description of the drawings

图1基于DDQN的无线网络自选择协议的方法的整体流程图;Figure 1 is an overall flow chart of the method of wireless network self-selection protocol based on DDQN;

图2 DDQN算法运行图;Figure 2 DDQN algorithm operation chart;

具体实施方式Detailed ways

下面将参照附图1来描述根据本发明实施的基于DDQN的无线网络自选择协议的方法的具体步骤如下:The specific steps of the method for wireless network self-selection protocol based on DDQN implemented according to the present invention will be described with reference to Figure 1 as follows:

第一步:通过环境代理模块实时获取当前网络环境质量参数及节点业务类型确定状态、动作及奖励值;Step 1: Obtain the current network environment quality parameters and node business type in real time through the environment agent module to determine the status, action and reward value;

要使用强化学习算法,需要定义状态,动作及奖励值,网络质量参数作为状态值进行输入。To use the reinforcement learning algorithm, you need to define states, actions and reward values, and network quality parameters are input as state values.

状态空间定义:在t时刻一个终端的状态状态空间S的定义为smn∈S,表示终端m接入了第n个网络并在网络中进行信息交互时的状态。其状态空间为:State space definition: The state state space S of a terminal at time t is defined as s mn ∈S, which represents the state when terminal m is connected to the nth network and interacts with information in the network. Its state space is:

S=s1,s2,…,smn (1)S=s 1 ,s 2 ,…,s mn (1)

状态定义:异构网络中对于网络指标的描述通常使用吞吐量、时延、丢包率、网络负载等来描述网络业务状态,使用网络信号强度、节点距离、节点功耗、成本、信噪比来描述用户特性,本文将使用平均吞吐量T、时延D、信号强度P、节点距离W来描述网络状态,则网络质量Φ可表示为:Status definition: The description of network indicators in heterogeneous networks usually uses throughput, delay, packet loss rate, network load, etc. to describe network business status, and uses network signal strength, node distance, node power consumption, cost, signal-to-noise ratio To describe user characteristics, this article will use average throughput T, delay D, signal strength P, and node distance W to describe the network status. Then the network quality Φ can be expressed as:

Φ=T×D×P×W (2)Φ=T×D×P×W (2)

动作空间定义:需要设定动作空间供智能体进行选择,动作空间的定义为:Action space definition: An action space needs to be set for the agent to choose. The definition of action space is:

A=a1,a2,…,an (3)A=a 1 ,a 2 ,…, an (3)

其中an表示某个节点使用第n个网络协议。where a n indicates that a node uses the nth network protocol.

奖励值定义:每个节点在创建时都具有各自具体业务的特性,都会有自己的业务类型,即使是在同一种网络环境下,相对应就会有不同的奖励值。结合实际需求,将节点业务类型划分为如下类别:Reward value definition: Each node has its own specific business characteristics when it is created, and will have its own business type. Even in the same network environment, there will be different reward values. Based on actual needs, node business types are divided into the following categories:

1.实时性要求高,时延要尽可能的低,需要传输速率较高,若时延过大会影响业务实现。同时也需要一定的吞吐量来保证数据的可靠性。1. The real-time requirements are high, the delay must be as low as possible, and the transmission rate needs to be high. If the delay is too high, it will affect business realization. At the same time, a certain throughput is also required to ensure data reliability.

2.对吞吐量要求极高,相对于业务1时实性要求不强,需要较大的数据流量。2. It has extremely high requirements for throughput. Compared with business 1, real-time requirements are not strong and requires a large amount of data traffic.

3.对时延要求较高,需要应对突发情况下的网络流量,尽量减少时延,提高用户体验。3. It has high latency requirements and needs to deal with network traffic in emergencies, minimize latency, and improve user experience.

4.只需保证足够的吞吐量。4. Just ensure sufficient throughput.

接入业务网络参数由QoS参数组成,对网络QoS建立判决矩阵,求解参数权重:Access service network parameters consist of QoS parameters. Establish a decision matrix for network QoS and solve for parameter weights:

判决矩阵如公式所示,其中各元素表示QoS参数的重要程度,具体如表定义,同时判决矩阵应满足mij>0;mji=1/mij;mij=1。The decision matrix is as shown in the formula, in which each element represents the importance of the QoS parameter, as defined in the table. At the same time, the decision matrix should satisfy m ij >0; m ji =1/m ij ; m ij =1.

表1属性与参数的关系Table 1 Relationship between attributes and parameters

表1中没有出现的2、4、6、8用来表示相邻判断的中间值。由于在定义奖励值的过程中将业务类型分为4类,以及考虑了吞吐量、时延、信号强度三种属性,判决矩阵应该定义为3*3的矩阵,即Mi∈R3×3,其中i=1、2、3、4分别表示1类、2类、3类、4类四种业务类型,然后根据不同业务QoS参数的需求对四种业务分别建立判决矩阵。2, 4, 6, and 8 that do not appear in Table 1 are used to represent the intermediate values of adjacent judgments. Since the service types are divided into four categories in the process of defining the reward value, and the three attributes of throughput, delay, and signal strength are considered, the decision matrix should be defined as a 3*3 matrix, that is, Mi ∈R 3×3 , where i=1, 2, 3, and 4 respectively represent the four service types of Type 1, Type 2, Type 3, and Type 4. Then, a decision matrix is established for the four types of services according to the requirements of different service QoS parameters.

根据现行网络业务类型划分标准RFC2474,通过DSCP(Differentiated ServicesCode Point)确定业务等级中属性值。DSCP通过在每个数据包IP头部的服务类别TOS标识字节中,利用已使用的6比特和未使用的2比特,通过编码值来确定IP优先权。IP优先级字段,可以应用于流分类,数值越大表示优先级越高,取值0到63,可匹配64种等级,根据等级大小每七个等级划分为一类,即可通过发送IP数据包中的DSCP字段确定业务属性和和参数之间的关系。According to the current network service type classification standard RFC2474, the attribute values in the service level are determined through DSCP (Differentiated Services Code Point). DSCP determines the IP priority by encoding the value in the Class of Service TOS identification byte in the IP header of each data packet, using the used 6 bits and the unused 2 bits. The IP priority field can be applied to traffic classification. The larger the value, the higher the priority. The value ranges from 0 to 63, which can match 64 levels. According to the size of the level, every seven levels are divided into one category, and IP data can be sent by The DSCP field in the packet determines the relationship between service attributes and parameters.

对于这四类业务,i依次取值1、2、3、4。将最大特征值对应的特征向量归一化,即归一化后的特征向量中的各个值就是对应的网络QoS参数的权重。在以上四种情况下会产生不同业务类型对网络参数需求的差别,这些差别将在之后对奖励值权重的划分产生影响。将整个网络看成是一个整体,最终的目标将是通过选择节点使用协议优化整体网络质量,奖励值是一个和网络具有强相关的函数。For these four types of services, i takes the values 1, 2, 3, and 4 in sequence. Normalize the eigenvector corresponding to the largest eigenvalue, that is Each value in the normalized feature vector is the weight of the corresponding network QoS parameter. In the above four situations, different business types will have different requirements for network parameters, and these differences will have an impact on the division of reward value weights later. Considering the entire network as a whole, the ultimate goal will be to optimize the overall network quality by selecting nodes to use protocols. The reward value is a function that is strongly related to the network.

Vt=v1,v2,…,vn (5)V t =v 1 ,v 2 ,…,v n (5)

t表示t时刻网络的状态信息,Vt是网络状态空间的子集Φ,因此,对于特定业务B,网络空间状态Vt,奖励函数R表示为,并将在下一步进行求解:t represents the state information of the network at time t, and V t is a subset Φ of the network state space. Therefore, for a specific business B, the network space state V t , the reward function R is expressed as, and will be solved in the next step:

R=fB(Vt) (6)R=f B (V t ) (6)

节点的接入会影响网络参数的变动,当执行动作后,需要对网络状态进行衡量并反馈相应的奖励。当所执行的动作导致网络吞吐量升高、时延降低、信号强度增强即为有效动作;相反,当所执行动作导致网络吞吐量降低、时延降低、信号强度下降即为无效动作。因此计算奖励时考虑平均吞吐量αavg,平均时延βavg,信号强度γ。The access of nodes will affect the changes of network parameters. After executing the action, the network status needs to be measured and corresponding rewards fed back. When the action performed leads to an increase in network throughput, a reduction in latency, and an increase in signal strength, it is an effective action; conversely, when the action performed results in a decrease in network throughput, latency, and signal strength, it is an invalid action. Therefore, the average throughput α avg , the average delay β avg , and the signal strength γ are considered when calculating the reward.

第二步:在1)的基础上对数据进行归一化处理,确定节点业务类型,确定奖励函数;Step 2: Normalize the data based on 1), determine the node business type, and determine the reward function;

不同的网络参数其单位和数值通常有较大差别,需要进行归一化处理,对所有数值进行线性变换,将数值映射到[0,1]之间。The units and values of different network parameters are usually quite different, so they need to be normalized, linearly transform all values, and map the values to [0,1].

使用min-max标准化,消除数据因单位不同而产生的影响:Use min-max standardization to eliminate the impact of different data units:

使用上述方程进行归一化分别得到归一化后的网络平均吞吐量ft(α)avg,平均时延ft(β)avg,信号强度ft(γ)。Use the above equations for normalization to obtain the normalized network average throughput f t (α) avg , average delay f t (β) avg , and signal strength f t (γ).

综合以上公式可得奖励函数:Combining the above formulas, we can get the reward function:

R=ω1ft(α)avg2ft(β)avg3ft(γ) (8)R=ω 1 f t (α) avg2 f t (β) avg3 f t (γ) (8)

其中ω1、ω2、ω3是判决矩阵归一化后的特征向量对应的的网络平均吞吐量、时延、信号强度的权重。Among them, ω 1 , ω 2 , and ω 3 are the weights of the average network throughput, delay, and signal strength corresponding to the eigenvectors after normalization of the decision matrix.

第三步:在2)的基础上,把数据输入到DDQN决策网络中实时训练,应用执行结果,使网络状态趋于稳定。Step 3: Based on 2), input the data into the DDQN decision-making network for real-time training, and apply the execution results to stabilize the network state.

一个使用DQN最大的不足是,虽然argmax()方法可以让Q值迅速的向目标靠拢,但是很可能会导致过高估计,所谓的过高估计就是我们得到的算法模型有很大的偏差。为了解决这个问题就可以通过分离目标Q值计算和目标Q值选择来消除误差。网络信息处于离散状态,DDQN能够很好的处理离散状态下的数据。One of the biggest shortcomings of using DQN is that although the argmax() method can quickly bring the Q value closer to the target, it is likely to lead to overestimation. The so-called overestimation means that the algorithm model we obtain has a large deviation. In order to solve this problem, the error can be eliminated by separating the target Q value calculation and target Q value selection. Network information is in a discrete state, and DDQN can handle data in a discrete state very well.

参考附图2在DQN中使用两个神经网络来实现,分别是Q-MainNet和Q-target。同样在DDQN中也使用两个网络来进行运算,只是目标Q值的计算方式不同。Refer to Figure 2 to implement it using two neural networks in DQN, namely Q-MainNet and Q-target. Two networks are also used for calculations in DDQN, but the target Q value is calculated in different ways.

首先初始化状态S、动作空间A,初始化Q矩阵为零矩阵,用随机的参数θ初始化Q-MainNet网络和Q-target网络,θ为网络参数,初始化时Q-MainNetθ随机设定,Q-targetθ-=0,t表示当前时间状态,智能体模块读取当前网络状态信息S),将其输入到Q-MainNet网络,在St状态下不同动作的Q值通过Q-MainNet网络输出。根据ε-greedy策略,Q-MainNet网络以概率ε随机选择一个动作at∈A,或者以概率1-ε选择动作 终端在异构无线网络中执行相应动作,经过获取网络数据数据处理,处理成算法需要使用的格式,交给控制层进行处理。从而获得吞吐量α,时延β,信号强度γ。然后将他们分别归一化。根据业务的类型,通过层次分析法得到ft(α)avg、ft(β)avg、ft(γ)的权重,之后加权求和得到奖励值R。Q-MainNet获取系统状态和奖励值,通过公式(9)First initialize the state S and action space A, initialize the Q matrix to a zero matrix, initialize the Q-MainNet network and Q-target network with random parameters θ, θ is the network parameter, Q-MainNet θ is randomly set during initialization, Q-targetθ - =0, t represents the current time state. The agent module reads the current network state information S ) and inputs it into the Q-MainNet network. The Q values of different actions in the S t state are output through the Q-MainNet network. According to the ε-greedy strategy, the Q-MainNet network randomly selects an action a t ∈ A with probability ε, or selects an action with probability 1-ε The terminal performs corresponding actions in the heterogeneous wireless network. After obtaining the network data, the data is processed into the format required by the algorithm and handed over to the control layer for processing. Thus, the throughput α, delay β, and signal strength γ are obtained. Then normalize them separately. According to the type of business, the weights of f t (α) avg , f t (β) avg , and f t (γ) are obtained through the analytic hierarchy process, and then the reward value R is obtained by weighted summation. Q-MainNet obtains the system status and reward value through formula (9)

进行奖励值计算,其中Rt+1是对应在St+1状态下计算得到的奖励,γ为衰减系数,智能体在当前的状态下的奖励值其实就是未来所有可能的奖励值转换成此时此刻的奖励值。动作执行完毕,系统进入下一个状态St+1Calculate the reward value, where R t+1 is the reward calculated corresponding to the state S t+1 , and γ is the attenuation coefficient. The reward value of the agent in the current state is actually the conversion of all possible reward values in the future into this The reward value at this moment. After the action is executed, the system enters the next state S t+1 .

Q-MainNet网络将记忆组(st,at,rt,st+1)即当前状态st,动作空间at当前奖励值rt,以及t+1网络状态存储到经验池中,在每一步,Q-target网络随机从经验池中采样,与Q-MainNet网络的输出一起计算损失值相对于参数θ在两个网络Q的差值,即(TargetQ-Q(St+1,a;θt))2上执行梯度下降算法。每隔G步,将Q-MainNet网络的参数复制给Q-target网络。不断循环进行训练。The Q-MainNet network stores the memory group (s t , a t , r t , s t+1 ), that is, the current state s t , the current reward value r t of the action space a t, and the t+1 network state into the experience pool. At each step, the Q-target network randomly samples from the experience pool, and together with the output of the Q-MainNet network, calculates the difference in the loss value relative to the parameter θ in the two networks Q, that is, (TargetQ-Q(S t+1 , a; θ t )) 2 to perform the gradient descent algorithm. Every G steps, the parameters of the Q-MainNet network are copied to the Q-target network. Continuous training cycle.

Claims (1)

1.一种基于DDQN的无线网络自选择协议的方法,其特征在于包括以下步骤:1. A method for wireless network self-selection protocol based on DDQN, which is characterized by including the following steps: 第一步:通过环境代理模块实时获取网络环境质量参数及节点不断变化的业务类型后,确定状态、动作及奖励值;Step 1: After obtaining the network environment quality parameters and the node's changing business types in real time through the environment agent module, determine the status, action and reward value; 状态空间定义:在t时刻一个终端的状态空间S的定义为smn∈S,表示终端m接入了第n个网络并在网络中进行信息交互时的状态;其状态空间为:State space definition: The definition of the state space S of a terminal at time t is s mn ∈S, which represents the state when the terminal m is connected to the nth network and interacts with information in the network; its state space is: S=sm1,sm2,...,smn#(1)S=s m1 , s m2 ,..., s mn #(1) 状态定义:使用平均吞吐量T、时延D、信号强度P、节点距离W来描述网络状态,则网络质量Φ表示为:State definition: Use average throughput T, delay D, signal strength P, and node distance W to describe the network state. The network quality Φ is expressed as: Φ=T×D×P×W#(2)Φ=T×D×P×W#(2) 动作空间定义:需要设定动作空间供智能体进行选择,动作空间的定义为:Action space definition: An action space needs to be set for the agent to choose. The definition of action space is: A=a1,a2,...,an#(3)A=a 1 , a 2 ,..., an #(3) 其中an表示某个节点使用第n个网络协议;where a n indicates that a node uses the nth network protocol; 接入业务网络参数由QoS参数组成,对网络QoS建立判决矩阵M,求解参数权重:Access service network parameters consist of QoS parameters. Establish a decision matrix M for network QoS and solve for parameter weights: 判决矩阵如上所示,其中各元素表示QoS参数的重要程度,具体定义如下,同时判决矩阵应满足mij>0;mji=1/mijThe decision matrix is as shown above, in which each element represents the importance of the QoS parameters. The specific definition is as follows. At the same time, the decision matrix should satisfy m ij >0; m ji =1/m ij ; 当i和j重要性相等时,mij为1;When i and j are equally important, m ij is 1; 当i和j重要性比较,i稍重要时,mij为3;When the importance of i and j is compared and i is slightly more important, m ij is 3; 当i和j重要性比较,i重要时,mij为5;When the importance of i and j is compared and i is important, m ij is 5; 当i和j重要性比较,i很重要时,mij为7;When the importance of i and j is compared and i is very important, m ij is 7; 当i和j重要性比较,i极重要时,mij为9;When the importance of i and j is compared and i is extremely important, m ij is 9; 没有出现的2、4、6、8用来表示相邻判断的中间值;由于在定义奖励值的过程中将业务类型分为4类,以及考虑了吞吐量、时延、信号强度三种属性,判决矩阵应该定义为3*3的矩阵,即Mb∈R3×3,其中b=1、2、3、4分别表示1类、2类、3类、4类四种业务类型,然后根据不同业务QoS参数的需求对四种业务分别建立4个判决矩阵;The 2, 4, 6, and 8 that do not appear are used to represent the intermediate values of adjacent judgments; because in the process of defining the reward value, the business types are divided into 4 categories, and the three attributes of throughput, delay, and signal strength are considered , the decision matrix should be defined as a 3*3 matrix, that is, M b ∈ R 3×3 , where b = 1, 2, 3, and 4 respectively represent the four business types of category 1, category 2, category 3, and category 4. Then Establish 4 decision matrices for each of the four services according to the requirements of different service QoS parameters; 根据现行网络业务类型划分标准RFC2474,通过DSCP确定业务等级中属性值;DSCP通过在每个数据包IP头部的服务类别TOS标识字节中,利用已使用的6比特和未使用的2比特,通过编码值来确定IP优先权;IP优先级字段,可以应用于流分类,数值越大表示优先级越高,取值0到63,可匹配64种等级,根据等级大小每七个等级划分为一类,即可通过发送IP数据包中的DSCP字段确定业务属性和和参数之间的关系;According to the current network service type classification standard RFC2474, DSCP is used to determine the attribute value in the service level; DSCP uses the used 6 bits and the unused 2 bits in the service category TOS identification byte in the IP header of each data packet. The IP priority is determined by encoding the value; the IP priority field can be applied to traffic classification. The larger the value, the higher the priority. The value ranges from 0 to 63, which can match 64 levels. According to the size of the level, every seven levels are divided into In the first category, the relationship between business attributes and parameters can be determined by sending the DSCP field in the IP data packet; 对于这四类业务,b依次取值1、2、3、4;将最大特征值对应的特征向量归一化,即归一化后的特征向量中的各个值就是对应的网络QoS参数的权重;在以上四种情况下会产生不同业务类型对网络参数需求的差别,这些差别将在之后对奖励值权重的划分产生影响;将整个网络看成是一个整体,最终的目标将是通过选择节点使用协议优化整体网络质量,奖励值是一个和网络具有强相关的函数;For these four types of business, b takes the values 1, 2, 3, and 4 in sequence; the eigenvector corresponding to the largest eigenvalue is normalized, that is Each value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, there will be differences in the network parameter requirements of different business types, and these differences will be produced later in the division of reward value weights. Impact; Treat the entire network as a whole, and the ultimate goal will be to optimize the overall network quality by selecting nodes to use protocols. The reward value is a function that is strongly related to the network; Vt=v1,v2,…,vn#(5)V t =v 1 , v 2 ,..., v n #(5) t表示t时刻网络的状态信息,Vt是网络状态空间的子集Φ,因此,对于特定业务B,网络空间状态Vt,奖励函数R表示为,并将在下一步进行求解:t represents the state information of the network at time t, and V t is a subset Φ of the network state space. Therefore, for a specific business B, the network space state V t , the reward function R is expressed as, and will be solved in the next step: R=fB(Vt)#(6)R=f B (V t )#(6) 节点的接入会影响网络参数的变动,当执行动作后,需要对网络状态进行衡量并反馈相应的奖励;当所执行的动作导致网络吞吐量升高、时延降低、信号强度增强即为有效动作;相反,当所执行动作导致网络吞吐量降低、时延降低、信号强度下降即为无效动作;因此计算奖励时考虑平均吞吐量αavg,平均时延βavg,信号强度γ;The access of nodes will affect changes in network parameters. After an action is executed, the network status needs to be measured and corresponding rewards fed back; when the action performed leads to an increase in network throughput, a reduction in latency, and an increase in signal strength, it is an effective action. ; On the contrary, when the action performed results in a decrease in network throughput, delay, and signal strength, it is an invalid action; therefore, the average throughput α avg , the average delay β avg , and the signal strength γ are considered when calculating the reward; 第二步:在第一步的基础上对数据进行归一化处理,确定节点业务类型,确定奖励函数;Step 2: Based on the first step, normalize the data, determine the node business type, and determine the reward function; 使用min-max标准化,消除数据因单位不同而产生的影响:Use min-max standardization to eliminate the impact of different data units: 其中x′是经过转换后x的标准化数值,这里将α,β,γ依次求值;where x′ is the standardized value of x after conversion, where α, β, and γ are evaluated in sequence; 使用上述方程进行归一化分别得到归一化后的t时刻网络平均吞吐量ft(α)avg,平均时延ft(β)avg,信号强度ft(γ);Use the above equations for normalization to obtain the normalized average network throughput f t (α) avg at time t, the average delay f t (β) avg , and the signal strength f t (γ); 综合以上公式得奖励函数:Combining the above formulas, we get the reward function: R=ω1ft(α)avg2ft(β)avg3ft(γ)#(8)R=ω 1 f t (α)av g2 f t (β)av g3 f t (γ)#(8) 其中ω1、ω2、ω3是判决矩阵归一化后的特征向量对应的的网络平均吞吐量、时延、信号强度的权重;Among them, ω 1 , ω 2 , and ω 3 are the weights of the average network throughput, delay, and signal strength corresponding to the eigenvectors after normalization of the decision matrix; 第三步:在第二步的基础上,把数据输入到DDQN决策网络中实时训练,应用执行结果,使网络状态趋于稳定;Step 3: Based on the second step, input the data into the DDQN decision-making network for real-time training, and apply the execution results to stabilize the network state; 首先初始化状态空间S、动作空间A,初始化Q矩阵为零矩阵,用随机的参数θ初始化Q-MainNet网络和Q-target网络,θ为网络参数,初始化时Q-MainNetθ随机设定,Q-target θ-=0,t表示当前时间状态,智能体模块读取当前网络状态信息St,将其输入到Q-MainNet网络,在St状态下不同动作的Q值通过Q-MainNet网络输出;根据ε-greedy策略,Q-MainNet网络以概率ε随机选择一个动作at∈A,或者以概率1-ε选择动作终端在异构无线网络中执行相应动作,经过获取网络数据数据处理,处理成算法需要使用的格式,交给控制层进行处理;从而获得吞吐量α,时延β,信号强度γ;然后将他们分别归一化;根据业务的类型,通过层次分析法得到ft(α)avg、ft(β)avg、ft(γ)的权重,之后加权求和得到奖励值R;Q-MainNet获取系统状态和奖励值,通过公式(9)First initialize the state space S and action space A, initialize the Q matrix to a zero matrix, initialize the Q-MainNet network and Q-target network with random parameters θ, θ is the network parameter, Q-MainNet θ is randomly set during initialization, and Q-target θ - =0, t represents the current time state. The agent module reads the current network state information S t and inputs it into the Q-MainNet network. The Q values of different actions in the S t state are output through the Q-MainNet network; according to ε-greedy strategy, the Q-MainNet network randomly selects an action a t ∈A with probability ε, or selects an action with probability 1-ε The terminal performs corresponding actions in the heterogeneous wireless network. After obtaining the network data, it is processed into the format required by the algorithm and handed over to the control layer for processing; thereby obtaining throughput α, delay β, and signal strength γ; and then they are Normalize separately; according to the type of business, obtain the weights of f t (α) avg , f t (β) avg , and f t (γ) through the analytic hierarchy process, and then weight the sum to obtain the reward value R; Q-MainNet obtains System status and reward value, through formula (9) 进行奖励值计算,其中Rt+1是对应在St+1状态下计算得到的奖励,γ为衰减系数,智能体在当前的状态下的奖励值其实就是未来所有可能的奖励值转换成此时此刻的奖励值;动作执行完毕,系统进入下一个状态St+1Calculate the reward value, where R t+1 is the reward calculated corresponding to the state S t+1 , and γ is the attenuation coefficient. The reward value of the agent in the current state is actually the conversion of all possible reward values in the future into this The reward value at this moment; after the action is executed, the system enters the next state S t+1 ; Q-MainNet网络将记忆组(st,at,rt,st+1)即当前状态st,动作空间at当前奖励值rt,以及t+1网络状态存储到经验池中,在每一步,Q-target网络随机从经验池中采样,与Q-MainNet网络的输出一起计算损失值相对于参数θ在两个网络Q的差值,即(TargetQ-Q(St+1,a,θt))2上执行梯度下降算法;每一轮迭代后,将Q-MainNet网络的参数复制给Q-target网络;不断循环进行训练。The Q-MainNet network stores the memory group (s t , a t , r t , s t+1 ), that is, the current state s t , the current reward value r t of the action space a t , and the t+1 network state into the experience pool. At each step, the Q-target network randomly samples from the experience pool, and together with the output of the Q-MainNet network, calculates the difference in the loss value relative to the parameter θ in the two networks Q, that is, (TargetQ-Q(S t+1 , a, θ t )) 2 to execute the gradient descent algorithm; after each round of iterations, copy the parameters of the Q-MainNet network to the Q-target network; continue to cycle for training.
CN202110249773.3A 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN Active CN113055229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110249773.3A CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110249773.3A CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Publications (2)

Publication Number Publication Date
CN113055229A CN113055229A (en) 2021-06-29
CN113055229B true CN113055229B (en) 2023-10-27

Family

ID=76510598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110249773.3A Active CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Country Status (1)

Country Link
CN (1) CN113055229B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114036859A (en) * 2021-12-02 2022-02-11 湖北工业大学 Information interaction design test bench
CN118158078B (en) * 2024-03-05 2024-12-13 中国人民解放军61660部队 Decryption flow service chain arrangement method based on block chain and deep reinforcement learning
CN118368259B (en) * 2024-06-18 2024-08-30 井芯微电子技术(天津)有限公司 Network resource allocation method, device, electronic equipment and storage medium
CN118397519B (en) * 2024-06-27 2024-08-23 湖南协成电子技术有限公司 Campus student safety monitoring system and method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
CN105208624A (en) * 2015-08-27 2015-12-30 重庆邮电大学 Service-based multi-access network selection system and method in heterogeneous wireless network
CN107889195A (en) * 2017-11-16 2018-04-06 电子科技大学 A kind of self study heterogeneous wireless network access selection method of differentiated service
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 A terminal access selection method based on deep reinforcement learning
WO2021013368A1 (en) * 2019-07-25 2021-01-28 Telefonaktiebolaget Lm Ericsson (Publ) Machine learning based adaption of qoe control policy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
CN105208624A (en) * 2015-08-27 2015-12-30 重庆邮电大学 Service-based multi-access network selection system and method in heterogeneous wireless network
CN107889195A (en) * 2017-11-16 2018-04-06 电子科技大学 A kind of self study heterogeneous wireless network access selection method of differentiated service
WO2021013368A1 (en) * 2019-07-25 2021-01-28 Telefonaktiebolaget Lm Ericsson (Publ) Machine learning based adaption of qoe control policy
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 A terminal access selection method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种新的面向用户多业务QoS需求的网络接入选择算法;张媛媛等;《计算机科学》;20150331;第42卷(第3期);全文 *
基于Markov模型的接入网络选择算法;马礼等;《计算机工程》;20190531;第45卷(第5期);全文 *

Also Published As

Publication number Publication date
CN113055229A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN113055229B (en) Wireless network self-selection protocol method based on DDQN
EP3720169B1 (en) Data analysis device, system, and method
Sun et al. Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning
CN111866954B (en) User selection and resource allocation method based on federal learning
CN109286959B (en) Vertical switching method of heterogeneous wireless network based on analytic hierarchy process
CN114138373B (en) Edge computing task unloading method based on reinforcement learning
Su et al. QRED: A Q-learning-based active queue management scheme
CN103096415B (en) Route optimizing device and method catering to perceive wireless mesh network
CN111867139A (en) Implementation method and system of deep neural network adaptive backoff strategy based on Q-learning
Zhu et al. Adaptive multi-access algorithm for multi-service edge users in 5G ultra-dense heterogeneous networks
CN109451534A (en) A kind of dynamic control method and device for QoS flow in the management of 5G system session
CN107105453B (en) Cut-in method is selected based on the heterogeneous network of analytic hierarchy process (AHP) and evolutionary game theory
CN110519849B (en) Communication and computing resource joint allocation method for mobile edge computing
Chen et al. Contention resolution in Wi-Fi 6-enabled Internet of Things based on deep learning
CN114465945B (en) SDN-based identification analysis network construction method
CN108901058A (en) A method for optimal selection of Internet of Things node access channel
Wu et al. Link congestion prediction using machine learning for software-defined-network data plane
Razmara et al. A hybrid neural network approach for congestion control in TCP/IP networks
CN114928611B (en) IEEE802.11p protocol-based energy-saving calculation unloading optimization method for Internet of vehicles
CN113676357B (en) Decision-making method for edge data processing in power Internet of things and its application
CN111446985A (en) A predictive anti-jamming method and gateway device for industrial wireless networks
CN119052861A (en) Multi-service QoS cross-band bandwidth allocation method and system based on deep reinforcement learning
CN105392176B (en) A kind of calculation method of actuator node executive capability
CN112084034A (en) MCT scheduling method based on edge platform layer adjustment coefficient
CN108418756B (en) A software-defined backhaul network access selection method based on similarity measure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant