CN113055229B - Wireless network self-selection protocol method based on DDQN - Google Patents
Wireless network self-selection protocol method based on DDQN Download PDFInfo
- Publication number
- CN113055229B CN113055229B CN202110249773.3A CN202110249773A CN113055229B CN 113055229 B CN113055229 B CN 113055229B CN 202110249773 A CN202110249773 A CN 202110249773A CN 113055229 B CN113055229 B CN 113055229B
- Authority
- CN
- China
- Prior art keywords
- network
- state
- value
- service
- mainnet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000010606 normalization Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000009471 action Effects 0.000 claims description 43
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000000875 corresponding effect Effects 0.000 claims description 22
- 239000003795 chemical substances by application Substances 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 2
- HRULVFRXEOZUMJ-UHFFFAOYSA-K potassium;disodium;2-(4-chloro-2-methylphenoxy)propanoate;methyl-dioxido-oxo-$l^{5}-arsane Chemical compound [Na+].[Na+].[K+].C[As]([O-])([O-])=O.[O-]C(=O)C(C)OC1=CC=C(Cl)C=C1C HRULVFRXEOZUMJ-UHFFFAOYSA-K 0.000 claims 3
- 238000006243 chemical reaction Methods 0.000 claims 1
- 230000002787 reinforcement Effects 0.000 abstract description 7
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 2
- 230000010354 integration Effects 0.000 abstract description 2
- 230000009467 reduction Effects 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 9
- 238000011161 development Methods 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W36/00—Hand-off or reselection arrangements
- H04W36/0005—Control or signalling for completing the hand-off
- H04W36/0055—Transmission or use of information for re-establishing the radio link
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W36/00—Hand-off or reselection arrangements
- H04W36/0005—Control or signalling for completing the hand-off
- H04W36/0083—Determination of parameters used for hand-off, e.g. generation or modification of neighbour cell lists
- H04W36/00837—Determination of triggering parameters for hand-off
- H04W36/008375—Determination of triggering parameters for hand-off based on historical data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W36/00—Hand-off or reselection arrangements
- H04W36/24—Reselection being triggered by specific parameters
- H04W36/30—Reselection being triggered by specific parameters by measured or perceived connection quality data
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to a DDQN-based wireless network self-selection protocol method, which aims at the situations of complex current wireless network environment and integration of a plurality of protocols. The method comprises the following steps: acquiring current network environment quality parameters and determining node service types in real time through an environment agent module; noise reduction and normalization are carried out on the data on the basis of the step 1), the node service type is determined through an analytic hierarchy process, and feature extraction is carried out; based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable. According to the method, the data is directly subjected to feature extraction without preprocessing, the obtained historical data is used as training data, and the strong advantage of deep learning is utilized, so that the learning speed and decision performance of the reinforcement learning algorithm are effectively improved.
Description
Technical Field
The invention relates to a network protocol self-selection method under a heterogeneous wireless network, which aims at the conditions of complex current wireless network environment and integration of a plurality of protocols.
Background
With the continuous development of network technology, the network technology widely applied in the world today generates a great deal of overlap, the current network environment WLAN and cellular network are the most common heterogeneous network combination, and play an important role in modern information communication, and operators deploy own WLAN hot spots in dense areas of users such as markets, schools and office buildings for dispersing the pressure caused by the cellular network.
The heterogeneous network of the next generation is a network which integrates multiple protocols and has complex environment, and reliable network service is required to be provided for users at any time and any place. However, before this is achieved, the network environment needs to be mature, so that functions of wireless network coverage, network self-configuration, automatic management of network devices and the like are to be solved. In the existing network environment, it is difficult to implement the above configuration by a single network protocol, but the comprehensive scheduling of the resources of the current heterogeneous network can be implemented by some algorithms, and the efficient switching of utilizing the resources of the heterogeneous network will gradually become a research hotspot. With further development of wireless communication, certain requirements will be placed on scalability and flexibility of heterogeneous networks.
The reinforcement learning is used as a tool capable of making decisions meeting the requirements of the development environment in a non-determined environment, and can be adjusted in a targeted manner according to the dynamic change of the network, so that the heterogeneous wireless network can become a scheme which is automatically adapted to the scene change of the user, and the network environment is optimized. Reinforcement learning is a kind of machine learning, mainly through Agent (Agent) constantly adjusting in Environment (Environment), finally can realize the maximization of certain specific index (report), in wireless network, because the removal of node, and mutual interference between the node makes the network Environment become complicated, compared with traditional machine learning algorithm, the degree of depth reinforcement learning has bigger potentiality and higher degree of accuracy, does not need to carry out the preliminary treatment directly to the data and carries out the feature extraction, regard the historical data that obtains as training data, utilize the powerful advantage of degree of deep learning, effectively improve reinforcement learning algorithm's learning speed and decision performance.
Disclosure of Invention
Aiming at the characteristics of the existing network, the invention provides a wireless network self-selection protocol method based on DDQN (Deep Reinforcement Learning with Double Q-learning). Comprising the following steps: a processing scheme of network quality data; feature extraction schemes based on deep learning; network protocol selection scheme based on DDQN. The aim of the invention is achieved by the following technical scheme.
A method of wireless network self-selection protocol based on DDQN, the method comprising the steps of:
1) Acquiring current network environment quality parameters and determining node service types in real time through an environment agent module;
2) Noise reduction and normalization are carried out on the data on the basis of the step 1), the node service type is determined through an analytic hierarchy process, and feature extraction is carried out;
3) Based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable.
1. A method of wireless network self-selection protocol based on DDQN, comprising the steps of:
the first step: acquiring current network environment quality parameters and node service type determining states, actions and rewarding values in real time through an environment agent module;
state space definition: the state space S of a terminal at time t is defined as S mn S, the E represents a state when the terminal m is accessed to the nth network and performs information interaction in the network; the state space is as follows:
S=s 1 ,s 2 ,…,s mn (1)
state definition: using the average throughput T, the delay D, the signal strength P, the node distance W to describe the network state, the network quality Φ is expressed as:
Φ=T×D×P×W (2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,…,a n (3)
wherein a is n Indicating that a certain node uses the nth network protocol;
the access service network parameters consist of QoS parameters, a decision matrix is established for the network QoS, and the parameter weights are solved:
the decision matrix is as above, wherein each element represents the importance of the QoS parameter, as defined in the following table, and the decision matrix should satisfy m ij >0;m ji =1/m ij ;m ij =1;
2, 4, 6, 8, which are not shown in the table, are used to represent intermediate values of adjacent decisions; since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M i ∈R 3×3 Wherein i=1, 2, 3, 4 respectively represent four service types of class 1, class 2, class 3, and class 4, and then respectively establish decision matrixes for the four services according to the requirements of different service QoS parameters;
determining an attribute value in a service level through DSCP according to the current network service type division standard RFC 2474; DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header; the IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet;
for the four types of services, i sequentially takes values of 1, 2, 3 and 4; normalizing the feature vector corresponding to the maximum feature value, namelyEach value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, differences in network parameter requirements of different service types will occur, and these differences will have an influence on the division of the prize value weights later; considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, the prize value being a function with strong correlation to the network;
V t =v 1 ,v 2 ,…,v n (5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t ) (6)
the access of the nodes can influence the variation of network parameters, and after the action is executed, the network state needs to be measured and corresponding rewards are fed back; when the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the executed action causes the decrease of network throughput, the decrease of time delay and the decrease of signal strength, the action is invalid; thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ;
and a second step of: carrying out normalization processing on the data on the basis of the step 1), determining the node service type and determining the rewarding function;
using the min-max normalization, the effect of data on unit-by-unit is eliminated:
normalization is carried out by using the equation to respectively obtain normalized average throughput f of the network t (α) avg Average time delay f t (β) avg Signal strength f t (γ);
The prize function is obtained by synthesizing the above formula:
R=ω 1 f t (α) avg +ω 2 f t (β) avg +ω 3 f t (γ) (8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the feature vector after the normalization of the decision matrix;
and a third step of: inputting data into a DDQN decision network for real-time training on the basis of the step 2), and applying an execution result to enable the network state to tend to be stable;
firstly, initializing a state S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S t Input it into Q-MainNet network, at S t Q values of different actions in the state are output through a Q-MainNet network; according to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilon The terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format which is required to be used by an algorithm through network data acquisition and data processing, and then the format is delivered to a control layer for processing; thereby obtaining throughput alpha, time delay beta and signal intensity gamma; they were then normalized separately; according to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighting and summing to obtain a reward value R; Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The rewards calculated in the state are gamma attenuation coefficients, and the rewards of the intelligent agent in the current state are actually all possible rewards in the future, and the rewards are converted into the rewards at the moment; after the action is completed, the system enters the next state S t+1 ;
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Space of motiona t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a;θ t )) 2 Performing a gradient descent algorithm thereon; after each iteration, copying parameters of the Q-MainNet network to the Q-target network; and training is carried out continuously and circularly.
Drawings
Fig. 1 is a general flow chart of a method of DDQN-based wireless network self-selection protocol;
FIG. 2 is a diagram of DDQN algorithm operation;
Detailed Description
The following specific steps of a method for implementing a DDQN-based wireless network self-selection protocol according to the present invention will be described with reference to fig. 1:
the first step: acquiring current network environment quality parameters and node service type determining states, actions and rewarding values in real time through an environment agent module;
to use reinforcement learning algorithms, it is necessary to define state, action and prize values, and network quality parameters are entered as state values.
State space definition: the state space S of a terminal at time t is defined as S mn And E, S, representing a state when the terminal m accesses the nth network and performs information interaction in the network. The state space is as follows:
S=s 1 ,s 2 ,…,s mn (1)
state definition: description of network metrics in heterogeneous networks network traffic states are typically described using throughput, delay, packet loss rate, network load, etc., user characteristics are described using network signal strength, node distance, node power consumption, cost, signal-to-noise ratio, and herein the network states will be described using average throughput T, delay D, signal strength P, node distance W, then the network quality Φ can be expressed as:
Φ=T×D×P×W (2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,…,a n (3)
wherein a is n Indicating that a certain node uses the nth network protocol.
Prize value definition: each node has the characteristics of the specific service when being created, and has the own service type, and even under the same network environment, the corresponding nodes have different rewards. The node service types are divided into the following categories according to actual requirements:
1. the real-time requirement is high, the time delay is as low as possible, the transmission rate is required to be high, and the service realization is influenced if the time delay is too large. While also requiring a certain throughput to ensure data reliability.
2. The throughput requirement is extremely high, and compared with the service 1, the real requirement is not strong, and larger data traffic is required.
3. The method has higher requirement on time delay, needs to deal with network traffic under emergency, reduces the time delay as much as possible, and improves user experience.
4. Only sufficient throughput needs to be ensured.
The access service network parameters consist of QoS parameters, a decision matrix is established for the network QoS, and the parameter weights are solved:
the decision matrix is shown in a formula, wherein each element represents the importance degree of the QoS parameter, and is specifically defined in the table, and the decision matrix meets m ij >0;m ji =1/m ij ;m ij =1。
TABLE 1 relationship of attributes to parameters
2, 4, 6, 8, which are not shown in Table 1, are used to represent adjacency judgmentIs a median value of (c). Since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M i ∈R 3×3 Wherein i=1, 2, 3, 4 respectively represent four service types of class 1, class 2, class 3, and class 4, and then respectively establish decision matrices for the four services according to the requirements of different service QoS parameters.
The attribute values in the traffic classes are determined by DSCP (Differentiated Services Code Point) according to the current network traffic type classification standard RFC 2474. DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header. The IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet.
For these four types of traffic, i takes values 1, 2, 3, 4 in turn. Normalizing the feature vector corresponding to the maximum feature value, namelyThe values in the normalized feature vector are the weights of the corresponding network QoS parameters. In the above four cases, differences in network parameter requirements for different traffic types will occur, which differences will later have an effect on the division of the prize value weights. Considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, with the prize value being a function that has a strong correlation with the network.
V t =v 1 ,v 2 ,…,v n (5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t ) (6)
access to the nodes affects the variation of network parameters, and after performing actions, the network state needs to be measured and corresponding rewards need to be fed back. When the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the performed action causes the network throughput to decrease, the time delay to decrease, and the signal strength to decrease, the action is invalid. Thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ.
And a second step of: carrying out normalization processing on the data on the basis of the step 1), determining the node service type and determining the rewarding function;
the units and the numerical values of different network parameters are usually greatly different, normalization processing is needed, linear transformation is carried out on all numerical values, and the numerical values are mapped between [0,1 ].
Using the min-max normalization, the effect of data on unit-by-unit is eliminated:
normalization is carried out by using the equation to respectively obtain normalized average throughput f of the network t (α) avg Average time delay f t (β) avg Signal strength f t (γ)。
The above formula is synthesized to obtain the prize function:
R=ω 1 f t (α) avg +ω 2 f t (β) avg +ω 3 f t (γ) (8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the eigenvector after the normalization of the decision matrix.
And a third step of: based on the step 2), data is input into a DDQN decision network for real-time training, and an execution result is applied to enable the network state to tend to be stable.
One of the biggest disadvantages of using DQN is that although the argmax () method allows the Q value to be quickly brought towards the target, it is likely to result in overestimation, which is a large deviation of the algorithm model we get. To solve this problem, the error can be eliminated by separating the target Q value calculation and the target Q value selection. The network information is in a discrete state, and the DDQN can well process data in the discrete state.
Implemented in DQN with reference to fig. 2 using two neural networks, Q-MainNet and Q-target, respectively. Similarly, the DDQN uses two networks to perform operations, but the target Q value is calculated in a different manner.
Firstly, initializing a state S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S ) Input it into Q-MainNet network, at S t And Q values of different actions in the state are output through a Q-MainNet network. According to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilon The terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format which is required to be used by an algorithm through network data acquisition and data processing, and then the format is delivered to a control layer for processing. Thus, throughput alpha, time delay beta and signal intensity gamma are obtained. They were then normalized separately. According to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighted summation to obtain the prize value R. Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The calculated rewards in the state, gamma is the attenuation coefficient, and the rewards of the intelligent agent in the current state are actually converted into all possible rewards in the future. After the action is completed, the system enters the next state S t+1 。
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Motion space a t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a;θ t )) 2 And executing a gradient descent algorithm. Every G steps, the parameters of the Q-MainNet network are copied to the Q-target network. And training is carried out continuously and circularly.
Claims (1)
1. A method of wireless network self-selection protocol based on DDQN, comprising the steps of:
the first step: after the environment agent module acquires the network environment quality parameters and the service types of the nodes which are continuously changed in real time, determining the state, the action and the rewarding value;
state space definition: the state space S of a terminal at time t is defined as S mn S, the E represents a state when the terminal m is accessed to the nth network and performs information interaction in the network; the state space is as follows:
S=s m1 ,s m2 ,...,s mn #(1)
state definition: using the average throughput T, the delay D, the signal strength P, the node distance W to describe the network state, the network quality Φ is expressed as:
Φ=T×D×P×W#(2)
defining an action space: an action space is required to be set for the intelligent agent to select, and the definition of the action space is as follows:
A=a 1 ,a 2 ,...,a n #(3)
wherein a is n Indicating that a certain node uses the nth network protocol;
the access service network parameters consist of QoS parameters, a judgment matrix M is established for the network QoS, and the parameter weights are solved:
the decision matrix is shown above, wherein each element represents the importance of the QoS parameter, and is specifically defined as follows, and the decision matrix should satisfy m ij >0;m ji =1/m ij ;
When the importance of i and j are equal, m ij 1 is shown in the specification;
when i and j are of importance compared, i is of little importance, m ij 3;
when the importance of i and j are compared, and i is important, m ij 5;
when i is important compared with j, m is ij 7;
when the importance of i and j are compared, i is extremely important, m ij 9;
2, 4, 6, 8, which are not present, are used to represent intermediate values of adjacent decisions; since the service types are classified into 4 types in the process of defining the reward value, and three attributes of throughput, time delay and signal strength are considered, the decision matrix should be defined as a matrix of 3*3, namely M b ∈R 3×3 B=1, 2, 3, 4 respectively represent four service types of 1 class, 2 class, 3 class and 4 class, and then respectively establish 4 decision matrixes for the four services according to the requirements of different service QoS parameters;
determining an attribute value in a service level through DSCP according to the current network service type division standard RFC 2474; DSCP determines IP priority by encoding value by using the used 6 bits and the unused 2 bits in the class of service TOS identification byte of each packet IP header; the IP priority field can be applied to flow classification, the larger the numerical value is, the higher the priority is, the values are 0 to 63, 64 grades can be matched, the grades are divided into one class according to the grade size, and the relation between the service attribute and the parameter can be determined by sending the DSCP field in the IP data packet;
for the four types of services, b takes values of 1, 2, 3 and 4 in sequence; normalizing the feature vector corresponding to the maximum feature value, namelyEach value in the normalized feature vector is the weight of the corresponding network QoS parameter; in the above four cases, differences in network parameter requirements of different service types will occur, and these differences will have an influence on the division of the prize value weights later; considering the entire network as a whole, the final goal will be to optimize the overall network quality by selecting nodes using protocols, the prize value being a function with strong correlation to the network;
V t =v 1 ,v 2 ,…,v n #(5)
t represents state information of the network at time t, V t Is a subset Φ of the network state space, and therefore, for a particular service B, the network space state V t The reward function R is expressed as, and will be solved in the next step:
R=f B (V t )#(6)
the access of the nodes can influence the variation of network parameters, and after the action is executed, the network state needs to be measured and corresponding rewards are fed back; when the executed actions lead to the increase of network throughput, the decrease of time delay and the enhancement of signal strength, the effective actions are obtained; conversely, when the executed action causes the decrease of network throughput, the decrease of time delay and the decrease of signal strength, the action is invalid; thus accounting for average throughput alpha in calculating rewards avg Average time delay beta avg Signal intensity γ;
and a second step of: carrying out normalization processing on the data on the basis of the first step, determining the node service type and determining a reward function;
using the min-max normalization, the effect of data on unit-by-unit is eliminated:
where x' is a normalized value of x after conversion, where α, β, γ are evaluated sequentially;
normalization is carried out by using the equation to respectively obtain normalized network average throughput f at the moment t t (α) avg Average time delay f t (β) avg Signal strength f t (γ);
The prize function is obtained by synthesizing the above formula:
R=ω 1 f t (α)av g +ω 2 f t (β)av g +ω 3 f t (γ)#(8)
wherein omega 1 、ω 2 、ω 3 The weight of the average throughput, the time delay and the signal intensity of the network corresponding to the feature vector after the normalization of the decision matrix;
and a third step of: on the basis of the second step, inputting data into a DDQN decision network for real-time training, and applying an execution result to enable the network state to tend to be stable;
firstly, initializing a state space S and an action space A, initializing a Q matrix to be a zero matrix, initializing a Q-MainNet network and a Q-target network by using a random parameter theta, wherein theta is a network parameter, randomly setting the Q-MainNet theta during initialization, and setting the Q-target theta - =0, t represents the current time state, and the agent module reads the current network state information S t Input it into Q-MainNet network, at S t Q values of different actions in the state are output through a Q-MainNet network; according to epsilon-greedy strategy, Q-MainNet network randomly selects one action a with probability epsilon t E A, or selecting actions with probability 1-epsilonThe terminal executes corresponding actions in the heterogeneous wireless network, and the corresponding actions are processed into a format required to be used by an algorithm through network data acquisition and data processingDelivering to a control layer for processing; thereby obtaining throughput alpha, time delay beta and signal intensity gamma; they were then normalized separately; according to the type of the service, f is obtained by a hierarchical analysis method t (α) avg 、f t (β) avg 、f t (gamma) and then weighting and summing to obtain a reward value R; Q-MainNet obtains System status and prize values by equation (9)
Performing a prize value calculation, wherein R t+1 Is corresponding to S t+1 The rewards calculated in the state are gamma attenuation coefficients, and the rewards of the intelligent agent in the current state are actually all possible rewards in the future, and the rewards are converted into the rewards at the moment; after the action is completed, the system enters the next state S t+1 ;
The Q-MainNet network will memorize the group (s t ,a t ,r t ,s t+1 ) I.e. current state s t Motion space a t Current prize value r t And the t+1 network state is stored in the experience pool, from which the Q-target network is randomly sampled at each step, and the difference in the loss value between the two networks Q with respect to the parameter θ is calculated together with the output of the Q-MainNet network, i.e. (TargetQ-Q (S) t+1 ,a,θ t )) 2 Performing a gradient descent algorithm thereon; after each iteration, copying parameters of the Q-MainNet network to the Q-target network; and training is carried out continuously and circularly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110249773.3A CN113055229B (en) | 2021-03-05 | 2021-03-05 | Wireless network self-selection protocol method based on DDQN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110249773.3A CN113055229B (en) | 2021-03-05 | 2021-03-05 | Wireless network self-selection protocol method based on DDQN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113055229A CN113055229A (en) | 2021-06-29 |
CN113055229B true CN113055229B (en) | 2023-10-27 |
Family
ID=76510598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110249773.3A Active CN113055229B (en) | 2021-03-05 | 2021-03-05 | Wireless network self-selection protocol method based on DDQN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113055229B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118368259B (en) * | 2024-06-18 | 2024-08-30 | 井芯微电子技术(天津)有限公司 | Network resource allocation method, device, electronic equipment and storage medium |
CN118397519B (en) * | 2024-06-27 | 2024-08-23 | 湖南协成电子技术有限公司 | Campus student safety monitoring system and method based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103327556A (en) * | 2013-07-04 | 2013-09-25 | 中国人民解放军理工大学通信工程学院 | Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network |
CN105208624A (en) * | 2015-08-27 | 2015-12-30 | 重庆邮电大学 | Service-based multi-access network selection system and method in heterogeneous wireless network |
CN107889195A (en) * | 2017-11-16 | 2018-04-06 | 电子科技大学 | A kind of self study heterogeneous wireless network access selection method of differentiated service |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
WO2021013368A1 (en) * | 2019-07-25 | 2021-01-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Machine learning based adaption of qoe control policy |
-
2021
- 2021-03-05 CN CN202110249773.3A patent/CN113055229B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103327556A (en) * | 2013-07-04 | 2013-09-25 | 中国人民解放军理工大学通信工程学院 | Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network |
CN105208624A (en) * | 2015-08-27 | 2015-12-30 | 重庆邮电大学 | Service-based multi-access network selection system and method in heterogeneous wireless network |
CN107889195A (en) * | 2017-11-16 | 2018-04-06 | 电子科技大学 | A kind of self study heterogeneous wireless network access selection method of differentiated service |
WO2021013368A1 (en) * | 2019-07-25 | 2021-01-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Machine learning based adaption of qoe control policy |
CN110809306A (en) * | 2019-11-04 | 2020-02-18 | 电子科技大学 | Terminal access selection method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
一种新的面向用户多业务QoS需求的网络接入选择算法;张媛媛等;《计算机科学》;20150331;第42卷(第3期);全文 * |
基于Markov模型的接入网络选择算法;马礼等;《计算机工程》;20190531;第45卷(第5期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113055229A (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947545B (en) | Task unloading and migration decision method based on user mobility | |
CN108770029B (en) | Wireless sensor network clustering routing protocol method based on clustering and fuzzy system | |
CN111629380B (en) | Dynamic resource allocation method for high concurrency multi-service industrial 5G network | |
CN113055229B (en) | Wireless network self-selection protocol method based on DDQN | |
CN111510879B (en) | Heterogeneous Internet of vehicles network selection method and system based on multi-constraint utility function | |
WO2019184836A1 (en) | Data analysis device, and multi-model co-decision system and method | |
CN114142907B (en) | Channel screening optimization method and system for communication terminal equipment | |
CN107708197B (en) | high-energy-efficiency heterogeneous network user access and power control method | |
CN113596785B (en) | D2D-NOMA communication system resource allocation method based on deep Q network | |
Sekaran et al. | 5G integrated spectrum selection and spectrum access using AI-based frame work for IoT based sensor networks | |
CN110233755B (en) | Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things | |
CN110519849B (en) | Communication and computing resource joint allocation method for mobile edge computing | |
CN113038612B (en) | Cognitive radio power control method based on deep learning | |
CN116916386A (en) | Large model auxiliary edge task unloading method considering user competition and load | |
CN113473580A (en) | Deep learning-based user association joint power distribution strategy in heterogeneous network | |
CN113676357B (en) | Decision method for edge data processing in power internet of things and application thereof | |
Wu et al. | Link congestion prediction using machine learning for software-defined-network data plane | |
Kaur et al. | Intelligent spectrum management based on reinforcement learning schemes in cooperative cognitive radio networks | |
CN110139282A (en) | A kind of energy acquisition D2D communication resource allocation method neural network based | |
CN113590211A (en) | Calculation unloading method based on PSO-DE algorithm | |
CN115811788B (en) | D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning | |
WO2013102294A1 (en) | Method of distributed cooperative spectrum sensing based on unsupervised clustering in cognitive self-organizing network | |
CN108848519B (en) | Heterogeneous network user access method based on cross entropy learning | |
Huang et al. | A Hierarchical Deep Learning Approach for Optimizing CCA Threshold and Transmit Power in Wi-Fi Networks | |
CN114615705B (en) | Single-user resource allocation strategy method based on 5G network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |