CN113055229B - Wireless network self-selection protocol method based on DDQN - Google Patents

Wireless network self-selection protocol method based on DDQN Download PDF

Info

Publication number
CN113055229B
CN113055229B CN202110249773.3A CN202110249773A CN113055229B CN 113055229 B CN113055229 B CN 113055229B CN 202110249773 A CN202110249773 A CN 202110249773A CN 113055229 B CN113055229 B CN 113055229B
Authority
CN
China
Prior art keywords
network
state
value
service
mainnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110249773.3A
Other languages
Chinese (zh)
Other versions
CN113055229A (en
Inventor
严海蓉
王重阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110249773.3A priority Critical patent/CN113055229B/en
Publication of CN113055229A publication Critical patent/CN113055229A/en
Application granted granted Critical
Publication of CN113055229B publication Critical patent/CN113055229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/0005Control or signalling for completing the hand-off
    • H04W36/0055Transmission or use of information for re-establishing the radio link
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/0005Control or signalling for completing the hand-off
    • H04W36/0083Determination of parameters used for hand-off, e.g. generation or modification of neighbour cell lists
    • H04W36/00837Determination of triggering parameters for hand-off
    • H04W36/008375Determination of triggering parameters for hand-off based on historical data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/24Reselection being triggered by specific parameters
    • H04W36/30Reselection being triggered by specific parameters by measured or perceived connection quality data

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a DDQN-based wireless network self-selection protocol method, which aims at the situations of complex current wireless network environment and integration of a plurality of protocols. The method comprises the following steps: acquiring current network environment quality parameters and determining node service types in real time through an environment agent module; noise reduction and normalization are carried out on the data on the basis of the step 1), the node service type is determined through an analytic hierarchy process, and feature extraction is carried out; based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable. According to the method, the data is directly subjected to feature extraction without preprocessing, the obtained historical data is used as training data, and the strong advantage of deep learning is utilized, so that the learning speed and decision performance of the reinforcement learning algorithm are effectively improved.

Description

Wireless network self-selection protocol method based on DDQN
Technical Field
The invention relates to a network protocol self-selection method under a heterogeneous wireless network, which aims at the conditions of complex current wireless network environment and integration of a plurality of protocols.
Background
With the continuous development of network technology, the network technology widely applied in the world today generates a great deal of overlap, the current network environment WLAN and cellular network are the most common heterogeneous network combination, and play an important role in modern information communication, and operators deploy own WLAN hot spots in dense areas of users such as markets, schools and office buildings for dispersing the pressure caused by the cellular network.
The heterogeneous network of the next generation is a network which integrates multiple protocols and has complex environment, and reliable network service is required to be provided for users at any time and any place. However, before this is achieved, the network environment needs to be mature, so that functions of wireless network coverage, network self-configuration, automatic management of network devices and the like are to be solved. In the existing network environment, it is difficult to implement the above configuration by a single network protocol, but the comprehensive scheduling of the resources of the current heterogeneous network can be implemented by some algorithms, and the efficient switching of utilizing the resources of the heterogeneous network will gradually become a research hotspot. With further development of wireless communication, certain requirements will be placed on scalability and flexibility of heterogeneous networks.
The reinforcement learning is used as a tool capable of making decisions meeting the requirements of the development environment in a non-determined environment, and can be adjusted in a targeted manner according to the dynamic change of the network, so that the heterogeneous wireless network can become a scheme which is automatically adapted to the scene change of the user, and the network environment is optimized. Reinforcement learning is a kind of machine learning, mainly through Agent (Agent) constantly adjusting in Environment (Environment), finally can realize the maximization of certain specific index (report), in wireless network, because the removal of node, and mutual interference between the node makes the network Environment become complicated, compared with traditional machine learning algorithm, the degree of depth reinforcement learning has bigger potentiality and higher degree of accuracy, does not need to carry out the preliminary treatment directly to the data and carries out the feature extraction, regard the historical data that obtains as training data, utilize the powerful advantage of degree of deep learning, effectively improve reinforcement learning algorithm's learning speed and decision performance.
Disclosure of Invention
Aiming at the characteristics of the existing network, the invention provides a wireless network self-selection protocol method based on DDQN (Deep Reinforcement Learning with Double Q-learning). Comprising the following steps: a processing scheme of network quality data; feature extraction schemes based on deep learning; network protocol selection scheme based on DDQN. The aim of the invention is achieved by the following technical scheme.
A method of wireless network self-selection protocol based on DDQN, the method comprising the steps of:
1) Acquiring current network environment quality parameters and determining node service types in real time through an environment agent module;
2) Noise reduction and normalization are carried out on the data on the basis of the step 1), the node service type is determined through an analytic hierarchy process, and feature extraction is carried out;
3) Based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable.
1. A method of wireless network self-selection protocol based on DDQN, comprising the steps of:
the first step: acquiring current network environment quality parameters and node service type determining states, actions and rewarding values in real time through an environment agent module;
state space definition: the state space S of a terminal at time t is defined as S mn S, the E represents a state when the terminal m is accessed to the nth network and performs information interaction in the network; the state space is as follows:
S=s 1 ,s 2 ,…,s mn (1)
state definition: using the average throughput T, the delay D, the signal strength P, the node distance W to describe the network state, the network quality Φ is expressed as:
Φ=T×D×P×W (2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,…,a n (3)
wherein a is n Indicating that a certain node uses the nth network protocol;
the access service network parameters consist of QoS parameters, a decision matrix is established for the network QoS, and the parameter weights are solved:
the decision matrix is as above, wherein each element represents the importance of the QoS parameter, as defined in the following table, and the decision matrix should satisfy m ij >0;m ji =1/m ij ;m ij =1;
2, 4, 6, 8, which are not shown in the table, are used to represent intermediate values of adjacent decisions; since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M i ∈R 3×3 Wherein i=1, 2, 3, 4 respectively represent four service types of class 1, class 2, class 3, and class 4, and then respectively establish decision matrixes for the four services according to the requirements of different service QoS parameters;
determining an attribute value in a service level through DSCP according to the current network service type division standard RFC 2474; DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header; the IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet;
for the four types of services, i sequentially takes values of 1, 2, 3 and 4; normalizing the feature vector corresponding to the maximum feature value, namelyEach value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, differences in network parameter requirements of different service types will occur, and these differences will have an influence on the division of the prize value weights later; considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, the prize value being a function with strong correlation to the network;
V t =v 1 ,v 2 ,…,v n (5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t ) (6)
the access of the nodes can influence the variation of network parameters, and after the action is executed, the network state needs to be measured and corresponding rewards are fed back; when the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the executed action causes the decrease of network throughput, the decrease of time delay and the decrease of signal strength, the action is invalid; thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ;
and a second step of: carrying out normalization processing on the data on the basis of the step 1), determining the node service type and determining the rewarding function;
using the min-max normalization, the effect of data on unit-by-unit is eliminated:
normalization is carried out by using the equation to respectively obtain normalized average throughput f of the network t (α) avg Average time delay f t (β) avg Signal strength f t (γ);
The prize function is obtained by synthesizing the above formula:
R=ω 1 f t (α) avg2 f t (β) avg3 f t (γ) (8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the feature vector after the normalization of the decision matrix;
and a third step of: inputting data into a DDQN decision network for real-time training on the basis of the step 2), and applying an execution result to enable the network state to tend to be stable;
firstly, initializing a state S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S t Input it into Q-MainNet network, at S t Q values of different actions in the state are output through a Q-MainNet network; according to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilon The terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format which is required to be used by an algorithm through network data acquisition and data processing, and then the format is delivered to a control layer for processing; thereby obtaining throughput alpha, time delay beta and signal intensity gamma; they were then normalized separately; according to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighting and summing to obtain a reward value R; Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The rewards calculated in the state are gamma attenuation coefficients, and the rewards of the intelligent agent in the current state are actually all possible rewards in the future, and the rewards are converted into the rewards at the moment; after the action is completed, the system enters the next state S t+1
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Space of motiona t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a;θ t )) 2 Performing a gradient descent algorithm thereon; after each iteration, copying parameters of the Q-MainNet network to the Q-target network; and training is carried out continuously and circularly.
Drawings
Fig. 1 is a general flow chart of a method of DDQN-based wireless network self-selection protocol;
FIG. 2 is a diagram of DDQN algorithm operation;
Detailed Description
The following specific steps of a method for implementing a DDQN-based wireless network self-selection protocol according to the present invention will be described with reference to fig. 1:
the first step: acquiring current network environment quality parameters and node service type determining states, actions and rewarding values in real time through an environment agent module;
to use reinforcement learning algorithms, it is necessary to define state, action and prize values, and network quality parameters are entered as state values.
State space definition: the state space S of a terminal at time t is defined as S mn And E, S, representing a state when the terminal m accesses the nth network and performs information interaction in the network. The state space is as follows:
S=s 1 ,s 2 ,…,s mn (1)
state definition: description of network metrics in heterogeneous networks network traffic states are typically described using throughput, delay, packet loss rate, network load, etc., user characteristics are described using network signal strength, node distance, node power consumption, cost, signal-to-noise ratio, and herein the network states will be described using average throughput T, delay D, signal strength P, node distance W, then the network quality Φ can be expressed as:
Φ=T×D×P×W (2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,…,a n (3)
wherein a is n Indicating that a certain node uses the nth network protocol.
Prize value definition: each node has the characteristics of the specific service when being created, and has the own service type, and even under the same network environment, the corresponding nodes have different rewards. The node service types are divided into the following categories according to actual requirements:
1. the real-time requirement is high, the time delay is as low as possible, the transmission rate is required to be high, and the service realization is influenced if the time delay is too large. While also requiring a certain throughput to ensure data reliability.
2. The throughput requirement is extremely high, and compared with the service 1, the real requirement is not strong, and larger data traffic is required.
3. The method has higher requirement on time delay, needs to deal with network traffic under emergency, reduces the time delay as much as possible, and improves user experience.
4. Only sufficient throughput needs to be ensured.
The access service network parameters consist of QoS parameters, a decision matrix is established for the network QoS, and the parameter weights are solved:
the decision matrix is shown in a formula, wherein each element represents the importance degree of the QoS parameter, and is specifically defined in the table, and the decision matrix meets m ij >0;m ji =1/m ij ;m ij =1。
TABLE 1 relationship of attributes to parameters
2, 4, 6, 8, which are not shown in Table 1, are used to represent adjacency judgmentIs a median value of (c). Since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M i ∈R 3×3 Wherein i=1, 2, 3, 4 respectively represent four service types of class 1, class 2, class 3, and class 4, and then respectively establish decision matrices for the four services according to the requirements of different service QoS parameters.
The attribute values in the traffic classes are determined by DSCP (Differentiated Services Code Point) according to the current network traffic type classification standard RFC 2474. DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header. The IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet.
For these four types of traffic, i takes values 1, 2, 3, 4 in turn. Normalizing the feature vector corresponding to the maximum feature value, namelyThe values in the normalized feature vector are the weights of the corresponding network QoS parameters. In the above four cases, differences in network parameter requirements for different traffic types will occur, which differences will later have an effect on the division of the prize value weights. Considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, with the prize value being a function that has a strong correlation with the network.
V t =v 1 ,v 2 ,…,v n (5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t ) (6)
access to the nodes affects the variation of network parameters, and after performing actions, the network state needs to be measured and corresponding rewards need to be fed back. When the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the performed action causes the network throughput to decrease, the time delay to decrease, and the signal strength to decrease, the action is invalid. Thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ.
And a second step of: carrying out normalization processing on the data on the basis of the step 1), determining the node service type and determining the rewarding function;
the units and the numerical values of different network parameters are usually greatly different, normalization processing is needed, linear transformation is carried out on all numerical values, and the numerical values are mapped between [0,1 ].
Using the min-max normalization, the effect of data on unit-by-unit is eliminated:
normalization is carried out by using the equation to respectively obtain normalized average throughput f of the network t (α) avg Average time delay f t (β) avg Signal strength f t (γ)。
The above formula is synthesized to obtain the prize function:
R=ω 1 f t (α) avg2 f t (β) avg3 f t (γ) (8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the eigenvector after the normalization of the decision matrix.
And a third step of: based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable.
One of the biggest disadvantages of using DQN is that although the argmax () method allows the Q value to be quickly brought towards the target, it is likely to result in overestimation, which is a large deviation of the algorithm model we get. To solve this problem, the error can be eliminated by separating the target Q value calculation and the target Q value selection. The network information is in a discrete state, and the DDQN can well process data in the discrete state.
Implemented in DQN with reference to fig. 2 using two neural networks, Q-MainNet and Q-target, respectively. Similarly, the DDQN uses two networks to perform operations, but the target Q value is calculated in a different manner.
Firstly, initializing a state S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S ) Input it into Q-MainNet network, at S t And Q values of different actions in the state are output through a Q-MainNet network. According to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilon The terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format which is required to be used by an algorithm through network data acquisition and data processing, and then the format is delivered to a control layer for processing. Thus, throughput alpha, time delay beta and signal intensity gamma are obtained. They were then normalized separately. According to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighted summation to obtain the prize value R. Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The calculated rewards in the state, gamma is the attenuation coefficient, and the rewards of the intelligent agent in the current state are actually converted into all possible rewards in the future. After the action is completed, the system enters the next state S t+1
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Motion space a t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a;θ t )) 2 And executing a gradient descent algorithm. Every G steps, the parameters of the Q-MainNet network are copied to the Q-target network. And training is carried out continuously and circularly.

Claims (1)

1. A method of wireless network self-selection protocol based on DDQN, comprising the steps of:
the first step: after the environment agent module acquires the network environment quality parameters and the service types of the nodes which are continuously changed in real time, determining the state, the action and the rewarding value;
state space definition: the state space S of a terminal at time t is defined as S mn S, the E represents a state when the terminal m is accessed to the nth network and performs information interaction in the network; the state space is as follows:
S=s m1 ,s m2 ,...,s mn #(1)
state definition: using the average throughput T, the delay D, the signal strength P, the node distance W to describe the network state, the network quality Φ is expressed as:
Φ=T×D×P×W#(2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,...,a n #(3)
wherein a is n Indicating that a certain node uses the nth network protocol;
the access service network parameters consist of QoS parameters, a judgment matrix M is established for the network QoS, and the parameter weights are solved:
the decision matrix is shown above, wherein each element represents the importance of the QoS parameter, and is specifically defined as follows, and the decision matrix should satisfy m ij >0;m ji =1/m ij
When the importance of i and j are equal, m ij 1 is shown in the specification;
when i and j are of importance compared, i is of little importance, m ij 3;
when the importance of i and j are compared, and i is important, m ij 5;
when i is important compared with j, m is ij 7;
when the importance of i and j are compared, i is extremely important, m ij 9;
2, 4, 6, 8, which are not present, are used to represent intermediate values of adjacent decisions; since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M b ∈R 3×3 B=1, 2, 3, 4 respectively represent four service types of 1 class, 2 class, 3 class and 4 class, and then respectively establish 4 decision matrixes for the four services according to the requirements of different service QoS parameters;
determining an attribute value in a service level through DSCP according to the current network service type division standard RFC 2474; DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header; the IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet;
for the four types of services, b takes values of 1, 2, 3 and 4 in sequence; normalizing the feature vector corresponding to the maximum feature value, namelyEach value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, differences in network parameter requirements of different service types will occur, and these differences will have an influence on the division of the prize value weights later; considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, the prize value being a function with strong correlation to the network;
V t =v 1 ,v 2 ,…,v n #(5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t )#(6)
the access of the nodes can influence the variation of network parameters, and after the action is executed, the network state needs to be measured and corresponding rewards are fed back; when the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the executed action causes the decrease of network throughput, the decrease of time delay and the decrease of signal strength, the action is invalid; thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ;
and a second step of: carrying out normalization processing on the data on the basis of the first step, determining the node service type and determining a reward function;
using the min-max normalization, the effect of data on unit-by-unit is eliminated:
where x' is a normalized value of x after conversion, where α, β, γ are evaluated sequentially;
normalization is carried out by using the equation to respectively obtain normalized network average throughput f at the moment t t (α) avg Average time delay f t (β) avg Signal strength f t (γ);
The prize function is obtained by synthesizing the above formula:
R=ω 1 f t (α)av g2 f t (β)av g3 f t (γ)#(8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the feature vector after the normalization of the decision matrix;
and a third step of: on the basis of the second step, inputting data into a DDQN decision network for real-time training, and applying an execution result to enable the network state to tend to be stable;
firstly, initializing a state space S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S t Input it into Q-MainNet network, at S t Q values of different actions in the state are output through a Q-MainNet network; according to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilonThe terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format required to be used by an algorithm through network data acquisition and data processingDelivering to a control layer for processing; thereby obtaining throughput alpha, time delay beta and signal intensity gamma; they were then normalized separately; according to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighting and summing to obtain a reward value R; Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The rewards calculated in the state are gamma attenuation coefficients, and the rewards of the intelligent agent in the current state are actually all possible rewards in the future, and the rewards are converted into the rewards at the moment; after the action is completed, the system enters the next state S t+1
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Motion space a t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a,θ t )) 2 Performing a gradient descent algorithm thereon; after each iteration, copying parameters of the Q-MainNet network to the Q-target network; and training is carried out continuously and circularly.
CN202110249773.3A 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN Active CN113055229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110249773.3A CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110249773.3A CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Publications (2)

Publication Number Publication Date
CN113055229A CN113055229A (en) 2021-06-29
CN113055229B true CN113055229B (en) 2023-10-27

Family

ID=76510598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110249773.3A Active CN113055229B (en) 2021-03-05 2021-03-05 Wireless network self-selection protocol method based on DDQN

Country Status (1)

Country Link
CN (1) CN113055229B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118368259B (en) * 2024-06-18 2024-08-30 井芯微电子技术(天津)有限公司 Network resource allocation method, device, electronic equipment and storage medium
CN118397519B (en) * 2024-06-27 2024-08-23 湖南协成电子技术有限公司 Campus student safety monitoring system and method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
CN105208624A (en) * 2015-08-27 2015-12-30 重庆邮电大学 Service-based multi-access network selection system and method in heterogeneous wireless network
CN107889195A (en) * 2017-11-16 2018-04-06 电子科技大学 A kind of self study heterogeneous wireless network access selection method of differentiated service
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
WO2021013368A1 (en) * 2019-07-25 2021-01-28 Telefonaktiebolaget Lm Ericsson (Publ) Machine learning based adaption of qoe control policy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
CN105208624A (en) * 2015-08-27 2015-12-30 重庆邮电大学 Service-based multi-access network selection system and method in heterogeneous wireless network
CN107889195A (en) * 2017-11-16 2018-04-06 电子科技大学 A kind of self study heterogeneous wireless network access selection method of differentiated service
WO2021013368A1 (en) * 2019-07-25 2021-01-28 Telefonaktiebolaget Lm Ericsson (Publ) Machine learning based adaption of qoe control policy
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种新的面向用户多业务QoS需求的网络接入选择算法;张媛媛等;《计算机科学》;20150331;第42卷(第3期);全文 *
基于Markov模型的接入网络选择算法;马礼等;《计算机工程》;20190531;第45卷(第5期);全文 *

Also Published As

Publication number Publication date
CN113055229A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109947545B (en) Task unloading and migration decision method based on user mobility
CN108770029B (en) Wireless sensor network clustering routing protocol method based on clustering and fuzzy system
CN111629380B (en) Dynamic resource allocation method for high concurrency multi-service industrial 5G network
CN113055229B (en) Wireless network self-selection protocol method based on DDQN
CN111510879B (en) Heterogeneous Internet of vehicles network selection method and system based on multi-constraint utility function
WO2019184836A1 (en) Data analysis device, and multi-model co-decision system and method
CN114142907B (en) Channel screening optimization method and system for communication terminal equipment
CN107708197B (en) high-energy-efficiency heterogeneous network user access and power control method
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
Sekaran et al. 5G integrated spectrum selection and spectrum access using AI-based frame work for IoT based sensor networks
CN110233755B (en) Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things
CN110519849B (en) Communication and computing resource joint allocation method for mobile edge computing
CN113038612B (en) Cognitive radio power control method based on deep learning
CN116916386A (en) Large model auxiliary edge task unloading method considering user competition and load
CN113473580A (en) Deep learning-based user association joint power distribution strategy in heterogeneous network
CN113676357B (en) Decision method for edge data processing in power internet of things and application thereof
Wu et al. Link congestion prediction using machine learning for software-defined-network data plane
Kaur et al. Intelligent spectrum management based on reinforcement learning schemes in cooperative cognitive radio networks
CN110139282A (en) A kind of energy acquisition D2D communication resource allocation method neural network based
CN113590211A (en) Calculation unloading method based on PSO-DE algorithm
CN115811788B (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
WO2013102294A1 (en) Method of distributed cooperative spectrum sensing based on unsupervised clustering in cognitive self-organizing network
CN108848519B (en) Heterogeneous network user access method based on cross entropy learning
Huang et al. A Hierarchical Deep Learning Approach for Optimizing CCA Threshold and Transmit Power in Wi-Fi Networks
CN114615705B (en) Single-user resource allocation strategy method based on 5G network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant