CN112367683A - Network selection method based on improved deep Q learning - Google Patents

Network selection method based on improved deep Q learning Download PDF

Info

Publication number
CN112367683A
CN112367683A CN202011286673.XA CN202011286673A CN112367683A CN 112367683 A CN112367683 A CN 112367683A CN 202011286673 A CN202011286673 A CN 202011286673A CN 112367683 A CN112367683 A CN 112367683A
Authority
CN
China
Prior art keywords
network
learning
training
deep
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011286673.XA
Other languages
Chinese (zh)
Other versions
CN112367683B (en
Inventor
马彬
陈海波
张超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011286673.XA priority Critical patent/CN112367683B/en
Publication of CN112367683A publication Critical patent/CN112367683A/en
Application granted granted Critical
Publication of CN112367683B publication Critical patent/CN112367683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/14Reselecting a network or an air interface
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention requests to protect a network selection method based on improved deep Q learning. In a super-dense heterogeneous wireless network with a dormancy mechanism, a network selection algorithm based on improved deep Q learning is provided aiming at the problem that the handover performance is reduced due to the fact that the network dynamics is enhanced. Firstly, constructing a deep Q learning network selection model according to the dynamic analysis of the network; secondly, transferring training samples and weights of an offline training module in the deep Q learning net selection model to an online decision module through transfer learning; and finally, accelerating the training of the neural network by using the migrated training samples and the weight to obtain an optimal network selection strategy. Experimental results show that the method provided by the invention obviously improves the problem of high-dynamic network switching performance reduction caused by a dormancy mechanism, and simultaneously reduces the time complexity of the traditional deep Q learning algorithm in the online network selection process.

Description

Network selection method based on improved deep Q learning
Technical Field
The invention belongs to a network selection method in a super-dense heterogeneous wireless network, and belongs to the field of mobile communication. In particular to a method for network selection by using an improved deep Q learning algorithm.
Background
With the development of wireless mobile communication, a super-dense heterogeneous wireless network formed by multiple access technologies such as a 5G heterogeneous cellular network and a wireless local area network can provide multiple access modes for a terminal and support seamless movement of the terminal. The ultra-dense networking can bring about a higher energy consumption problem, the introduction of the dormancy mechanism can reduce the energy consumption to a certain extent, and meanwhile, the dynamic performance of the network can be further enhanced, so that the service quality of the terminal and the throughput performance of the network are reduced. How to guarantee the throughput obtained by the terminal in the highly dynamic ultra-dense heterogeneous wireless network and improve the comprehensive switching performance of the network system becomes an important subject to be solved by the current research. In network selection, because the artificial intelligence algorithm has strong learning ability and can be adjusted adaptively according to the environment, many researchers in recent years apply the artificial intelligence algorithm to a network selection method.
The document [ Bin MA, Shanru LI, Xiaonzhong XIE.A Adaptive Vertical Handover Network in Heterogeneous Networks [ J ]. Journal of Electronics and Information Technology,2019,41(5):1210 and 1216] trains the classified parameters according to different service types Based on Neural Networks, thereby performing Network selection. The documents [ MA B, ZHANG W J, and XIE X Z. Ind. virtualization Service ordered Fuzzy Vertical Handover Algorithm [ J ]. Journal of Electronics & Information Technology,2017,39(6):1284 and 1290] adopt Fuzzy logic Algorithm, design different membership functions according to the requirement of the terminal application on QoS parameters, and then reasonably select the network according to the current Service type of the terminal. The algorithm has high efficiency and can select the network efficiently, but a corresponding fuzzy inference rule base needs to be established in advance, and under the condition that input parameters are increased, the number of the fuzzy inference rule base is increased rapidly, so that the complexity of inference time is overlarge. A fuzzy neural network algorithm is proposed in a document [ [9] Nurjahan, Rahman S, Sharma T, et al, PSO-NF based vertical handoff determination for ubiquitous speech wireless network (UHWN) [ C ]//2016International work on Computational interest (IWCI) ]. IEEE,2016], an output value of a fuzzy logic is obtained through neural network training, and a network is selected to be accessed according to the output value. The algorithm combines the accuracy of the fuzzy logic algorithm with the self-adaptive capacity of the neural network algorithm, thereby improving the robustness of the algorithm. A network selection Scheme based on Quality of Experience (QoE) perception is provided in the document [ [11] Jianmei C, Yao W, Yufeng L, et al.QoE-aware Vertical Handoff Scheme over Heterogeneous Access Networks [ J ]. IEEE Access,2018:1-1], a QoS network parameter is mapped into a QoE parameter, then a return function is constructed by utilizing the QoE parameter, and finally a Q learning algorithm is adopted for network selection. The algorithm can strengthen the existing benefits through continuous learning, so that a high-benefit network is selected; however, if the network environment is too complex, the learning effect of the network control module is reduced, and the optimal network cannot be selected. In addition, the above methods all solve the problem of network selection under the conventional heterogeneous wireless network, and do not consider such a high dynamic network environment, so that after the terminal is switched to the target network through the existing network selection algorithm, the obtained throughput may rapidly slide down due to sudden dormancy of the target network, a continuous and stable throughput cannot be provided for the terminal, and finally the problem of serious reduction of system switching performance occurs. Therefore, the network selection method based on the traditional heterogeneous wireless network environment cannot effectively improve the service quality of the terminal after accessing the network.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A network selection method based on improved deep Q learning is provided. The technical scheme of the invention is as follows:
a method for network selection based on improved deep Q learning, comprising the steps of:
101. initializing a deep Q learning network selection model by periodically sampling values of ultra-dense heterogeneous wireless network parameters, wherein the network parameter values comprise sampled received signal strength, throughput and dormancy probability, and constructing an action space, a state space and a return function of deep Q learning by the network parameter values, the deep Q learning network selection model is composed of an offline training module and an online decision-making module, the offline training module is used for training samples and weights of a neural network, the online decision-making module is used for obtaining an optimal network selection strategy, and the two modules are both constructed by adopting a deep Q network;
102. according to the deep Q learning network selection model obtained in the step 101, collaborative interaction is carried out on the offline training module and the online decision module by utilizing transfer learning, the neural network training process of the online decision module is accelerated according to a transfer learning algorithm, the training samples of the offline training module are transferred to the online decision module, training errors generated by the two modules after transfer are corrected through the training samples and the weight of the transferred offline training module until the errors approach to 0, the whole transfer learning process is finished, and an optimal strategy is obtained through the deep Q learning network selection model, so that network selection is completed.
2. The method according to claim 1, wherein the step 101 initializes a deep Q learning network selection model, and constructs an action space, a state space and a reward function of deep Q learning by using network parameter values, and specifically includes the steps of:
401. candidate networks which can be accessed by a terminal in a super-dense heterogeneous wireless network environment, namely a set N ═ N for a base station and an access point1,n2,...,niRepresents; wherein n isiIndicating the ith candidate network, the terminal accesses the candidate network n at the time tiIs denoted by at(ni) Then the motion space can be defined as At={at,at∈{at(n1),at(n2),...,at(ni)}};
Defining a state space as St=(rsst,ct,pt) Wherein rs istSet constructed to represent received signal strength of each candidate network at time t, ctSet constructed to represent the throughput of each candidate network at time t, ptRepresenting a set constructed by the sleep probabilities of the candidate networks at time t;
to maximize the throughput obtained by the terminal, the reward function is defined by considering the throughput and the sleep probability of the network as:
Figure BDA0002782540940000031
wherein, Ct(ni) Indicating access of a terminal to a candidate network n at time tiObtained throughput, Pt(ni) Representing candidate networks n at time tiThe sleep probability of (a);
402. the Q function represents the expectation of performing action a in state S, and taking the cumulative reward value resulting from the subsequent action, defined as:
Figure BDA0002782540940000032
wherein t represents moment in the operation process, γt∈[0,1]For the discount factor, for adjusting the degree of importance to the future return, a value of 0 means that only the short-term return is considered, otherwise the long-term return is more important, y being greater with increasing time ttDecreasing gradually, E (·) is the desired function;
the deep Q learning algorithm utilizes a neural network to construct Q (S, a; theta), wherein theta is a weight value, so that Q (S, a; theta) is approximately equal to max (Q (S, a)) to carry out approximate solution, meanwhile, a target Q value of a target network is utilized to prevent an estimated Q value generated by an estimation network from being out of control, and errors between the two are adjusted through a loss function to relieve the problem of iteration instability in the training process.
Further, the step 102 of generating training samples and weights includes the following steps:
the training sample of the neural network is composed of the current state, the action, the return value and the future state at different time in the historical information database, namely (S)t,at,Rt,St+1) And in the deep Q network, in order to train the neural network, an experience playback pool is set for storing training samples at multiple moments, the correlation degree between the training samples is reduced by randomly extracting partial samples, the training samples of the offline training module are migrated into the online decision module, and the migrated offline training samples and online learning samples are utilized to construct the experience playback pool of the online decision module, which is expressed as:
Dsum=Don+ξDoff (3)
wherein D issumFor empirical playback of the total number of samples deposited in the pool, DonFor the total number of on-line learning samples, the initial value is 0, DoffFor the total amount of offline training samples, ξ ∈ [0, 1]]As the sample mobility, xi is gradually reduced along with the increase of the iteration times in the training process;
after the experience playback pool of the on-line decision module is constructed, the weight theta of the neural network obtained by off-line training is usedoffMigrating to an online decision module as an initial weight value of neural network training, namely thetaon=θoff
Further, the weight theta of the neural network obtained by offline training is obtainedoffAfter the online decision-making module is migrated, the neural network starts iterative training, in the process that the offline training module and the online decision-making module are cooperatively matched through migration learning, a training error generated between the offline training module and the online decision-making module is defined as a strategy loss, a strategy simulation mechanism is adopted, and a Q value Q is estimated in the offline training moduleoff(St,at;θoff) Converting the estimation network of the offline training module into an offline strategy network pioff(St,at;θoff);
Similarly, the predicted Q value Q of the on-line decision module is utilizedon(St,at;θon) Converting the evaluation network of the on-line decision module into an on-line strategy network pion(St,at;θon) The strategy loss between the offline training and online decision modules is measured by cross entropy,
further, the under-line policy network pioff(St,at;θoff) Expressed as:
Figure BDA0002782540940000051
wherein, T tableA parameter expressed in a Boltzmann distribution, and the larger the value of the parameter, the action atThe less affected the selection of (A) is by the Q value, i.e. all actions are selected with nearly the same probability, AoffAn action space for deep Q learning during offline training;
on-line policy network pion(St,at;θon) Expressed as:
Figure BDA0002782540940000052
the strategy loss between the offline training and the online decision module is measured by cross entropy, and then the strategy simulation loss function is expressed as:
Figure BDA0002782540940000053
under the condition that the strategy loss exists, the on-line decision module predicts the Q value Qon(St,at;θon) The gradient update of (a) is expressed as:
Figure BDA0002782540940000054
wherein Q isπ(St,at;θon) Representing the non-deviation estimation value of the estimated Q value under the strategy pi;
when Q isπ(St,at;θon)≈Qon(St,at;θon) I.e. the strategy loss between the offline training and the online decision module approaches 0, at this time, the transfer learning process is finished.
Further, in the moving process of the terminal, when the terminal is about to enter or leave a certain base station, a network selection decision moment occurs, at this time, the terminal needs to perform network selection, and in order to obtain the network selection decision moment to be faced by the terminal, prediction is performed according to the received signal strength of the network and the moving speed of the terminal.
Further, the predicting step according to the received signal strength of the network and the moving speed of the terminal specifically includes: assuming that a mobile model of the terminal in the coverage area of the base station moves from a point A to a point C, a point B represents the position of the terminal after moving by delta l from the point A, and predicting the network selection decision moment t to appear at the point C according to the current motion trend of the terminalCThen, the relationship between Δ OAM and Δ OBM is expressed as:
Figure BDA0002782540940000061
where r represents the radius of the network coverage, Δ l represents the distance that the terminal moves, and lBMIndicating the current distance of the terminal from the midpoint M of the chord AC,
Figure BDA0002782540940000062
by detecting the received signal strength value of the B point, the distance l from the base station to the B point can be obtainedOBThe average moving speed of the terminal in the coverage area of the base station can be expressed as
Figure BDA0002782540940000063
The network selection decision time tCExpressed as:
Figure BDA0002782540940000064
suppose that at the time t of the network selection decision, the network corresponding to the maximum Q value in the candidate networks is nmIf the terminal selects the network action best at the decision time t as at(nm) By analogy, the optimal network selection action set formed by the terminal at different network selection decision moments is defined as an optimal strategy pi*Optimum strategy pi*The method is characterized in that the terminal and the candidate network realize the best matching at different network selection decision moments under the ultra-dense heterogeneous wireless network environment with the introduced dormancy mechanism.
Further, the deep Q network specifically includes:
first, an estimation network is constructed using a fully-connected neural network. Evaluation network Q (S, a)i(ii) a θ) is defined as follows:
Q(S,ai;θ)=fDNN(S,ai;θ)ai∈A (10)
wherein f isDNN(. to) represents a nonlinear mapping function of the fully-connected neural network, θ represents a weight, Q (S, a)i(ii) a Theta) represents that the action a is selected when the state space S is input on the premise of the weight thetaiThe Q value of (1).
In the evaluation of the network Q (S) by gradient descenti+1,ai(ii) a θ) to prevent evaluation of the network Q (S, a)i(ii) a Theta) occurrence of runaway of the generated value by defining the target network
Figure BDA0002782540940000065
Make the training more stable, the target network
Figure BDA0002782540940000071
And an evaluation network Q (S, a)i(ii) a Theta) are consistent, and the evaluation network Q (S, a) is simultaneously evaluatedi(ii) a Theta) to the weight value theta
Figure BDA0002782540940000072
Thereby to pair
Figure BDA0002782540940000073
And (6) updating. The difference between the two is gradually reduced by setting a loss function in the updating process, before the loss function is constructed, an empirical playback pool D needs to be constructed, which is defined as follows:
D={(S1,a1,R1,S2),…,(Si,ai,Ri,Si+1),…,(Sm,am,Rm,Sm+1)} (11)
where m is the maximum capacity of the empirical playback pool, (S)i,ai,Ri,Si+1) Data representing the ith time instant.
Defining a loss function L (θ) by a reward value R and an empirical replay pool D:
Figure BDA0002782540940000074
wherein γ is the discount factor for long-term return value, and E [. cndot. ] is the expectation function.
The invention has the following advantages and beneficial effects:
1. the method comprises the steps of carrying out dynamic analysis on a super-dense heterogeneous wireless network environment formed by heterogeneous wireless local area networks and super-dense cellular networks introducing dormancy mechanisms, initializing a deep Q learning network selection model according to the step 101, and obtaining a return function considering network dormancy conditions according to the step 401, so that the possibility of selecting a high-dynamic network by a terminal is greatly reduced, and the problem of reduced system switching performance is effectively solved.
2. The deep Q learning algorithm is improved by adopting transfer learning, a network selection algorithm based on improved deep Q learning is provided, and the training process of the neural network in the online decision module is accelerated by transferring the training samples and the weights in the step 102, so that the time complexity of the traditional deep Q learning algorithm in the online network selection process is reduced.
Drawings
FIG. 1 is a diagram of a simulation scenario for a very dense heterogeneous wireless network according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of an improved deep Q learning method;
FIG. 3 is a comparison of time complexity for different methods;
FIG. 4 is a comparison of throughput for different methods;
fig. 5 is a comparison of access blocking rates for different methods;
fig. 6 is a comparison of packet loss ratios of different methods;
FIG. 7 is a comparison of call drop rates for different methods;
fig. 8 is a table-tennis effect comparison of different methods.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the method comprehensively considers the situation that the network dynamics is enhanced and the time-varying property of a network topological structure is improved in the ultra-dense heterogeneous wireless network with the sleep mechanism, can remarkably improve the problem of the reduction of the switching performance of the high-dynamics network caused by the sleep mechanism, and simultaneously reduces the time complexity of the traditional deep Q learning algorithm in the online network selection process.
The network selection method provided by the invention comprises the following steps:
step one, initializing a deep Q learning network selection model by periodically sampling values of network parameters, and setting a set N ═ N for candidate networks (base stations and access points) which can be accessed by a terminal in a super-dense heterogeneous wireless network environment1,n2,...,niRepresents; wherein n isiIndicating the ith candidate network, the terminal accesses the candidate network n at the time tiIs denoted by at(ni) Then, the motion space of the present invention can be defined as At={at,at∈{at(n1),at(n2),...,at(ni)}}。
The present invention defines the state space as St=(rsst,ct,pt) Wherein rs istSet constructed to represent received signal strength of each candidate network at time t, ctSet constructed to represent the throughput of each candidate network at time t, ptRepresenting a constructed set of sleep probabilities for each candidate network at time t.
In order to maximize the throughput obtained by the terminal, the invention defines the return function as follows by considering the throughput and the dormancy probability of the network:
Figure BDA0002782540940000081
wherein, Ct(ni) Indicating access of a terminal to a candidate network n at time tiObtained throughput, Pt(ni) Representing candidate networks n at time tiThe probability of dormancy of.
The Q function represents the expectation of performing action a in state S, and taking the cumulative reward value resulting from the subsequent action, defined as:
Figure BDA0002782540940000091
wherein t represents moment in the operation process, γt∈[0,1]For the discount factor, for adjusting the degree of importance to the future return, a value of 0 means that only the short-term return is considered, otherwise the long-term return is more important, y being greater with increasing time ttDecreasing gradually, E (-) is the desired function.
After the Q function is iterated for many times, when all Q values do not change greatly any more, the Q function is converged, and the deep Q learning process is finished. However, Q (S, a) can converge to an optimum Q value only when t → + ∞, and thus it is difficult to realize in an actual network selection process. Thus, the deep Q learning algorithm utilizes a neural network to construct Q (S, a; θ), where θ is a weight such that Q (S, a; θ) is approximately equal to max (Q (S, a)) for the approximate solution. Meanwhile, the target Q value of the target network is utilized to prevent the situation that the estimated Q value generated by the estimation network is out of control, and the error between the estimated Q value and the target Q value is adjusted through a loss function, so that the problem of unstable iteration in the training process is solved.
And step two, according to the depth Q learning network selection model, dividing the depth Q learning network selection model into an offline training module and an online decision-making module, wherein both the offline training module and the online decision-making module are constructed by a depth Q network. And accelerating the neural network training process of the on-line decision module according to the transfer learning algorithm. And correcting training errors generated by the two modules after the migration through the training samples and the weights of the training modules under the migration line until the errors approach to 0, and finishing the whole migration learning process. The generation and migration steps of the training samples and the weights are as follows:
the training sample of the neural network is composed of the current state, the action, the return value and the future state at different time in the historical information database, namely (S)t,at,Rt,St+1) Wherein t ∈ (0, + ∞). In the deep Q network, in order to train the neural network efficiently, an experience playback pool is set and used for storing training samples at multiple moments, and the correlation degree between the training samples is reduced by randomly extracting part of samples, so that the problem of unstable iteration occurring in the training process is solved. Therefore, the invention migrates the training samples of the offline training module to the online decision module, and constructs the experience playback pool of the online decision module by using the migrated offline training samples and the online learning samples, which is expressed as:
Dsum=Don+ξDoff (3)
wherein D issumFor empirical playback of the total number of samples deposited in the pool, DonFor the total number of on-line learning samples, the initial value is 0, DoffFor the total amount of offline training samples, ξ ∈ [0, 1]]As the sample mobility increases, ξ gradually decreases as the number of iterations in the training process.
After the experience playback pool of the on-line decision module is constructed, the weight theta of the neural network obtained by off-line training is usedoffMigrating to an online decision module as an initial weight value of neural network training, namely thetaon=θoff
At this time, the neural network starts iterative training, but due to the fact that the training samples and the weights are different between the offline training module and the online decision module, the neural network training effect of the online decision module may be poor after the training samples and the weights are migrated, and therefore the convergence rate of the neural network cannot achieve the expected effect. Therefore, it is necessary to reduce the training error between the offline training and the online decision module during the process of training the samples and weight migration, so as to ensure the neural network training effect of the online decision module. In order to solve the problems, the invention provides a method for realizing cooperative cooperation between offline training and online decision-making module through transfer learningDefining the training error generated between the offline training and the online decision module as the strategy loss, adopting a strategy simulation mechanism to estimate the Q value Q in the offline training module in order to minimize the strategy lossoff(St,at;θoff) Converting the estimation network of the offline training module into an offline strategy network pioff(St,at;θoff) Expressed as:
Figure BDA0002782540940000101
wherein T represents a parameter that follows a Boltzmann distribution, and the larger the value of T, the action atThe less affected the selection of (A) is by the Q value, i.e. all actions are selected with nearly the same probability, AoffThe motion space for deep Q learning during offline training.
Similarly, the predicted Q value Q of the on-line decision module is utilizedon(St,at;θon) Converting the evaluation network of the on-line decision module into an on-line strategy network pion(St,at;θon) Expressed as:
Figure BDA0002782540940000111
the strategy loss between the offline training and the online decision module is measured by cross entropy, and then the strategy simulation loss function is expressed as:
Figure BDA0002782540940000112
under the condition that the strategy loss exists, the on-line decision module predicts the Q value Qon(St,at;θon) The gradient update of (a) is expressed as:
Figure BDA0002782540940000113
wherein Q isπ(St,at;θon) Representing an unbiased estimate of the predicted Q value under strategy pi.
When Q isπ(St,at;θon)≈Qon(St,at;θon) I.e. the strategy loss between the offline training and the online decision module approaches 0, at this time, the transfer learning process is finished.
Step three, when the terminal is moving and is about to enter or leave a certain base station, network selection decision time can occur, the terminal needs to perform network selection at the time, in order to obtain the network selection decision time which the terminal needs to face, prediction is performed according to the received signal strength of the network and the moving speed of the terminal, the fact that the moving model of the terminal in the coverage area of the base station is from a point A to a point C is assumed, the point B represents the position of the terminal after moving by delta l from the point A, and according to the current moving trend of the terminal, the fact that the network selection decision time t will occur at the point C is predictedCThen, the relationship between Δ OAM and Δ OBM is expressed as:
Figure BDA0002782540940000114
where r represents the radius of the network coverage, Δ l represents the distance that the terminal moves, and lBMIndicating the current distance of the terminal from the midpoint M of the chord AC,
Figure BDA0002782540940000115
by detecting the received signal strength value of the B point, the distance l from the base station to the B point can be obtainedOBThe average moving speed of the terminal in the coverage area of the base station can be expressed as
Figure BDA0002782540940000116
The network selection decision time tCExpressed as:
Figure BDA0002782540940000121
suppose that at the time t of the network selection decision, the network corresponding to the maximum Q value in the candidate networks is nmIf the terminal selects the network action best at the decision time t as at(nm) By analogy, the optimal network selection action set formed by the terminal at different network selection decision moments is defined as an optimal strategy pi*The strategy shows that in an ultra-dense heterogeneous wireless network environment with a dormancy mechanism, the terminal and the candidate network realize the best matching at different network selection decision moments.
Based on the above analysis, the present invention designs the algorithm flow chart shown in fig. 2.
In order to verify the invention, a simulation experiment is carried out on an MATLAB platform, and the following simulation scenes are set: a network formed by two access technologies of 5G and WLAN is used as a super-dense heterogeneous network model, and a simulation scene is set up on an MATLAB platform for simulation analysis. Assume that 2 macro base stations of 5G, 4 micro base stations of 5G, and 3 WLAN access points are distributed in a scene, the radii of the macro base stations of 5G are all 800m, the radii of the micro base stations of 5G are all 300m, and the radii of the WLANs are all 80 m. The coverage scenarios for the 5G and WLAN networks within the simulation scenario are shown in fig. 1.
In the simulation process, users in a scene are assumed to be randomly distributed in a simulation area, and the motion direction is randomly changed at intervals. In order to further highlight the superiority of the invention, the method provided by the invention is compared with a Q-Learning-Based method (Q-Learning) in the literature [ Jianmei C, Yao W, Yufeng L, et al QoE-aware Intelligent Handoff Schemeter Heterogeneous Networks [ J ]. IEEE Access,2018:1-1], a Deep Learning Based Handoff Management for depth WLANs: A Deep Learning left approach, IEEE Access,2019: 1-1], a Deep Q-Learning Network-Based method (Deep Q-Network, DQN) and a Short-Term neural Network-Based Short-Term neural Network (Ad-Learning) in the literature [ A two-terminal mapping-assisted Learning approach, 2019: 1-1], and a Short-Term neural Network-Based Short-Term neural Network (Short-Term neural Network ).
Time complexity is an important index of a network selection algorithm, time overhead pairs of the algorithm and the other three algorithms are shown in FIG. 3, three curves in the graph respectively represent the time consumption of the algorithm, Q-learning, DQN and LSTM algorithms, and the time consumed by the four algorithms is increased along with the increase of iteration times; however, the time increase amplitude of the algorithm adopted by the invention is not only obviously slower than two algorithms of DQN and LSTM, but also lower than Q-learning algorithm, and with the increase of iteration times, the last four curves are in a horn shape, which shows that the difference of the time consumed by the four algorithms is increased with the increase of the iteration times, thus proving that the time complexity advantage of the algorithm of the invention is very obvious. The algorithm of the invention adopts transfer learning to improve the traditional deep Q learning algorithm, and the learning efficiency of the online decision-making module is improved by transferring the training sample of the offline training module; meanwhile, the weight of the neural network of the off-line training module is migrated, so that the neural network training time of the on-line decision module is reduced, and the time consumption of the whole algorithm is reduced. For the Q-learning algorithm, when the state and the action space are rapidly increased, the computing capacity is continuously reduced, the time consumption is gradually increased, and the time difference from the algorithm is gradually opened. The DQN algorithm and the LSTM algorithm directly adopt a deep neural network to carry out iterative operation, and under the condition of large iteration times, the time consumption difference between the DQN algorithm and the LSTM algorithm is more obvious.
Fig. 4 shows the variation of the network average throughput obtained by the user terminal under the four algorithms as the simulation times increase. By comparing the four curves in the graph, it can be clearly seen that the average throughput of the network obtained by adopting the algorithm of the invention is far higher than that of the other three algorithms. The invention adopts the deep Q learning algorithm to successfully predict the state change condition of the base station caused by the dormancy mechanism in the future, so that the user terminal can reasonably select the network according to the future dynamic change of the network environment, and the loss of the network throughput caused by the dormancy of the base station in the future is reduced to the maximum extent; meanwhile, the return function of the deep Q learning algorithm is defined according to the throughput of the user accessing the candidate network, so that the actual requirements of the user are met better, and more throughput can be brought to the user in a high-dynamic network environment. For DQN and Q-learning algorithms, throughput is not as high as the algorithm of the present invention because both do not fully consider the status of the base station in the future network environment, nor design a suitable reward function for the user to increase network throughput. In the LSTM algorithm, because the algorithm does not specifically design and consider the network throughput obtained by the user in the process of network selection, the network throughput of the algorithm is the lowest among the current four algorithms.
Fig. 5 comparatively shows the access blocking rate performance of the four algorithms under the trend of increasing number of users. As can be seen from the figure, when the number of users accessing the base station is less than 40, no blocking occurs in each algorithm. When the number of users is 40, the LSTM algorithm generates blockage, and when the number of users is 50, the DQN algorithm also generates blockage; the algorithm of the present invention and the Q-learning algorithm are configured to block when the number of users is 60. With the increase of the number of users, the blocking rates of the four algorithms are increased; however, the algorithm of the present invention has the lowest blocking rate for the same number of users. The algorithm of the invention considers the dormancy condition of the base station, utilizes the dormancy probability to judge the state of the base station at the future moment, avoids the network resource waste caused by sudden dormancy of the base station, and increases the effective utilization rate of each network, thereby leading a user to select the network more reasonably and reducing the access blocking rate. The LSTM and DQN algorithms cannot accurately predict future dynamic changes of a base station, and the delay caused by the algorithms themselves is high, so that a blockage occurs when the number of users is small. For the Q-learning algorithm, although the blocking rate is not high when the number of users is small; however, without considering the base station dormancy, it is impossible to make a corresponding change in time with respect to a dynamic change occurring in the base station, resulting in a rapid increase in the blocking rate in the case where the number of users gradually increases.
Fig. 6 is a relationship between the average packet loss rate and the number of users in the network under four algorithms. It can be seen from the graph that the average packet loss rate of the algorithm of the present invention is always stable below 10%, and the average packet loss rates of the other three algorithms are all above 15%. Therefore, the packet loss rate generated by the algorithm is far lower than that of the other three algorithms. When network selection is carried out, the algorithm of the invention makes a reasonable return function from the perspective of a user according to the throughput obtained by a user terminal; meanwhile, the network dynamics caused by the dormancy of the base station is considered and successfully predicted, so that a proper network can be selected for a user, the loss of data in the transmission process is reduced, and the data can be continuously transmitted. For Q-learning and DQN algorithms, since they select networks only according to the service requirements of the user terminal, it is not possible to accurately predict future dynamic changes of the network; therefore, when the dynamics of the network continuously increases, the optimal network cannot be selected for the user in time, so that the packet loss rate is high. The LSTM algorithm fails to consider the service requirements of the user terminal and does not accurately predict the dynamic conditions of the future network, and the packet loss rate is the highest among the current four algorithms.
Fig. 7 is a comparison between the drop call rate and the number of users for the four algorithms. It can be seen from the figure that although the call drop rates of the four algorithms are slowly increased, after the number of users is increased to 40, the call drop rate of the Q-learning algorithm is rapidly increased and gradually positioned at the highest position, while the call drop rate increase of the algorithm of the present invention is the smallest, and the increase of the LSTM algorithm and the DQN algorithm is between the two. In the process that the number of users is increased from 10 to 100, the call drop rate of the algorithm is always at the lowest point compared with the other three algorithms. Compared with other three algorithms, the algorithm of the invention can predict the change situation of the future network under the condition that the network dynamics is continuously increased, and then provides the network with higher quality for the user to select, thereby effectively reducing the probability of the switching failure. For the Q-learning algorithm, since the network state cannot be accurately predicted, the dropped call rate is sharply increased when the number of users is increased. Similarly, for the DQN and LSTM algorithms, a result of higher network selection delay may be caused in the process of training the deep neural network; therefore, as the number of users increases, the call drop rate also increases significantly.
Fig. 8 shows the total number of handovers performed by the user using the four algorithms. As can be seen from the figure, when the number of users is 100, the total number of times of network handover of the users under the LSTM algorithm is about 380 times, about 370 times under the Q-learning algorithm, and about 310 times under the DQN algorithm; by adopting the algorithm provided by the invention, the total switching times are only about 230 times. This phenomenon shows that the total switching times of the algorithm of the invention are far lower than those of the other three algorithms; meanwhile, the algorithm of the invention can greatly reduce unnecessary switching and well relieve the ping-pong effect. This is because the present invention considers the situation that the algorithm handover failure rate is increased due to the dynamic enhancement of the network environment, so that frequent handover occurs. By combining the dormancy probability of the base station into the algorithm of the invention, the network state change condition of the user after network selection is successfully predicted, thereby greatly reducing the times of switching. The other three algorithms do not properly solve the problem that the switching frequency of the network and the ping-pong effect are aggravated due to the high dynamic influence of the network caused by the dormancy mechanism of the base station; therefore, compared with the existing algorithm, the algorithm can effectively reduce the unnecessary network switching.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A network selection method based on improved deep Q learning is characterized by comprising the following steps:
101. initializing a deep Q learning network selection model by periodically sampling values of ultra-dense heterogeneous wireless network parameters, wherein the network parameter values comprise sampled received signal strength, throughput and dormancy probability, and constructing an action space, a state space and a return function of deep Q learning by the network parameter values, the deep Q learning network selection model is composed of an offline training module and an online decision-making module, the offline training module is used for training samples and weights of a neural network, the online decision-making module is used for obtaining an optimal network selection strategy, and the two modules are both constructed by adopting a deep Q network;
102. according to the deep Q learning network selection model obtained in the step 101, collaborative interaction is carried out on the offline training module and the online decision module by utilizing transfer learning, the neural network training process of the online decision module is accelerated according to a transfer learning algorithm, the training samples of the offline training module are transferred to the online decision module, training errors generated by the two modules after transfer are corrected through the training samples and the weight of the transferred offline training module until the errors approach to 0, the whole transfer learning process is finished, and an optimal strategy is obtained through the deep Q learning network selection model, so that network selection is completed.
2. The method according to claim 1, wherein the step 101 initializes a deep Q learning network selection model, and constructs an action space, a state space and a reward function of deep Q learning by using network parameter values, and specifically includes the steps of:
401. candidate networks which can be accessed by a terminal in a super-dense heterogeneous wireless network environment, namely a set N ═ N for a base station and an access point1,n2,...,niRepresents; wherein n isiIndicating the ith candidate network, the terminal accesses the candidate network n at the time tiIs denoted by at(ni) Then the motion space can be defined as At={at,at∈{at(n1),at(n2),...,at(ni)}};
Defining a state space as St=(rsst,ct,pt) Wherein rs istSet constructed to represent received signal strength of each candidate network at time t, ctSet constructed to represent the throughput of each candidate network at time t, ptRepresenting a set constructed by the sleep probabilities of the candidate networks at time t;
to maximize the throughput obtained by the terminal, the reward function is defined by considering the throughput and the sleep probability of the network as:
Figure RE-FDA0002892303520000021
wherein, Ct(ni) Indicating access of a terminal to a candidate network n at time tiObtained throughput, Pt(ni) Representing candidate networks n at time tiThe sleep probability of (a);
402. the Q function represents the expectation of performing action a in state S, and taking the cumulative reward value resulting from the subsequent action, defined as:
Figure RE-FDA0002892303520000022
wherein t represents moment in the operation process, γt∈[0,1]For the discount factor, for adjusting the degree of importance to the future return, a value of 0 means that only the short-term return is considered, otherwise the long-term return is more important, y being greater with increasing time ttDecreasing gradually, E (·) is the desired function;
the deep Q learning algorithm utilizes a neural network to construct Q (S, a; theta), wherein theta is a weight value, so that Q (S, a; theta) is approximately equal to max (Q (S, a)) to carry out approximate solution, meanwhile, a target Q value of a target network is utilized to prevent an estimated Q value generated by an estimation network from being out of control, and errors between the two are adjusted through a loss function to relieve the problem of iteration instability in the training process.
3. The method for network selection based on improved deep Q learning of claim 1, wherein the training samples and weights generated in step 102 are generated as follows:
the training sample of the neural network is composed of the current state, the action, the return value and the future state at different time in the historical information database, namely (S)t,at,Rt,St+1) And in the deep Q network, in order to train the neural network, an experience playback pool is set for storing training samples at multiple moments, the correlation degree between the training samples is reduced by randomly extracting partial samples, the training samples of the offline training module are migrated into the online decision module, and the migrated offline training samples and online learning samples are utilized to construct the experience playback pool of the online decision module, which is expressed as:
Dsum=Don+ξDoff (3)
wherein D issumFor empirical playback of the total number of samples deposited in the pool, DonFor the total number of on-line learning samples, the initial value is 0, DoffFor the total amount of offline training samples, ξ ∈ [0, 1]]As the sample mobility, xi is gradually reduced along with the increase of the iteration times in the training process;
after the experience playback pool of the on-line decision module is constructed, the weight theta of the neural network obtained by off-line training is usedoffMigrating to an online decision module as an initial weight value of neural network training, namely thetaon=θoff
4. The method of claim 3, wherein the neural network weights θ obtained by offline training are usedoffAfter the online decision-making module is migrated, the neural network starts iterative training, in the process that the offline training module and the online decision-making module are cooperatively matched through migration learning, a training error generated between the offline training module and the online decision-making module is defined as a strategy loss, a strategy simulation mechanism is adopted, and a Q value Q is estimated in the offline training moduleoff(St,at;θoff) Converting the estimation network of the offline training module into an offline strategy network pioff(St,at;θoff);
Similarly, the predicted Q value Q of the on-line decision module is utilizedon(St,at;θon) Converting the valuation network of the on-line decision module into the on-line strategySlightly networked pion(St,at;θon) The strategy loss between the offline training and online decision modules is measured by cross entropy.
5. The method as claimed in claim 4, wherein the offline policy network is pioff(St,at;θoff) Expressed as:
Figure RE-FDA0002892303520000031
wherein T represents a parameter that follows a Boltzmann distribution, and the larger the value of T, the action atThe less affected the selection of (A) is by the Q value, i.e. all actions are selected with nearly the same probability, AoffAn action space for deep Q learning during offline training;
on-line policy network pion(St,at;θon) Expressed as:
Figure RE-FDA0002892303520000032
the strategy loss between the offline training and the online decision module is measured by cross entropy, and then the strategy simulation loss function is expressed as:
Figure RE-FDA0002892303520000041
under the condition that the strategy loss exists, the on-line decision module predicts the Q value Qon(St,at;θon) The gradient update of (a) is expressed as:
Figure RE-FDA0002892303520000042
wherein Q isπ(St,at;θon) Representing the non-deviation estimation value of the estimated Q value under the strategy pi;
when Q isπ(St,at;θon)≈Qon(St,at;θon) I.e. the strategy loss between the offline training and the online decision module approaches 0, at this time, the transfer learning process is finished.
6. The method as claimed in claim 4, wherein the terminal is capable of making a network selection decision when entering or leaving a base station during a moving process, and the terminal is required to make a network selection decision, and the prediction is made according to the received signal strength of the network and the moving speed of the terminal in order to obtain the network selection decision to be faced by the terminal.
7. The method of claim 6, wherein the predicting step according to the received signal strength of the network and the moving speed of the terminal specifically comprises: assuming that a mobile model of the terminal in the coverage area of the base station moves from a point A to a point C, a point B represents the position of the terminal after moving by delta l from the point A, and predicting the network selection decision moment t to appear at the point C according to the current motion trend of the terminalCThen, the relationship between Δ OAM and Δ OBM is expressed as:
Figure RE-FDA0002892303520000043
where r represents the radius of the network coverage, Δ l represents the distance that the terminal moves, and lBMIndicating the current distance of the terminal from the midpoint M of the chord AC,
Figure RE-FDA0002892303520000044
by detecting the received signal strength value of the B point, the distance l from the base station to the B point can be obtainedOBFinally, finallyThe average moving speed of the terminal in the coverage area of the base station can be represented as V, and the network selects the decision time tCExpressed as:
Figure RE-FDA0002892303520000045
suppose that at the time t of the network selection decision, the network corresponding to the maximum Q value in the candidate networks is nmIf the terminal selects the network action best at the decision time t as at(nm) By analogy, the optimal network selection action set formed by the terminal at different network selection decision moments is defined as an optimal strategy pi*Optimum strategy pi*The method is characterized in that the terminal and the candidate network realize the best matching at different network selection decision moments under the ultra-dense heterogeneous wireless network environment with the introduced dormancy mechanism.
8. The method according to claim 7, wherein the deep Q network specifically comprises:
first, an estimation network is constructed using a fully-connected neural network. Evaluation network Q (S, a)i(ii) a θ) is defined as follows:
Q(S,ai;θ)=fDNN(S,ai;θ) ai∈A (10)
wherein f isDNN(. to) represents a nonlinear mapping function of the fully-connected neural network, θ represents a weight, Q (S, a)i(ii) a Theta) represents that the action a is selected when the state space S is input on the premise of the weight thetaiThe Q value of (1).
In the evaluation of the network Q (S) by gradient descenti+1,ai(ii) a θ) to prevent evaluation of the network Q (S, a)i(ii) a Theta) occurrence of runaway of the generated value by defining the target network
Figure RE-FDA0002892303520000051
Make the training more stable, the target network
Figure RE-FDA0002892303520000052
And an evaluation network Q (S, a)i(ii) a Theta) are consistent, and the evaluation network Q (S, a) is simultaneously evaluatedi(ii) a Theta) to the weight value theta
Figure RE-FDA0002892303520000053
Thereby to pair
Figure RE-FDA0002892303520000054
And (6) updating. The difference between the two is gradually reduced by setting a loss function in the updating process, before the loss function is constructed, an empirical playback pool D needs to be constructed, which is defined as follows:
D={(S1,a1,R1,S2),…,(Si,ai,Ri,Si+1),…,(Sm,am,Rm,Sm+1)} (11)
where m is the maximum capacity of the empirical playback pool, (S)i,ai,Ri,Si+1) Data representing the ith time instant.
Defining a loss function L (θ) by a reward value R and an empirical replay pool D:
Figure RE-FDA0002892303520000055
wherein γ is the discount factor for long-term return value, and E [. cndot. ] is the expectation function.
CN202011286673.XA 2020-11-17 2020-11-17 Network selection method based on improved deep Q learning Active CN112367683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011286673.XA CN112367683B (en) 2020-11-17 2020-11-17 Network selection method based on improved deep Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011286673.XA CN112367683B (en) 2020-11-17 2020-11-17 Network selection method based on improved deep Q learning

Publications (2)

Publication Number Publication Date
CN112367683A true CN112367683A (en) 2021-02-12
CN112367683B CN112367683B (en) 2022-07-01

Family

ID=74515167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011286673.XA Active CN112367683B (en) 2020-11-17 2020-11-17 Network selection method based on improved deep Q learning

Country Status (1)

Country Link
CN (1) CN112367683B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966968A (en) * 2021-03-26 2021-06-15 平安科技(深圳)有限公司 List distribution method based on artificial intelligence and related equipment
CN113242584A (en) * 2021-06-22 2021-08-10 重庆邮电大学 Network selection method based on neural network in ultra-dense heterogeneous wireless network
CN113382412A (en) * 2021-05-12 2021-09-10 重庆邮电大学 Network selection method considering terminal security in super-dense heterogeneous network
CN113472484A (en) * 2021-06-29 2021-10-01 哈尔滨工业大学 Internet of things terminal equipment user feature code identification method based on cross entropy iterative learning
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN114021987A (en) * 2021-11-08 2022-02-08 深圳供电局有限公司 Microgrid energy scheduling strategy determination method, device, equipment and storage medium
CN114125962A (en) * 2021-11-10 2022-03-01 国网江苏省电力有限公司电力科学研究院 Self-adaptive network switching method, system and storage medium
CN117749625A (en) * 2023-12-27 2024-03-22 融鼎岳(北京)科技有限公司 Network performance optimization system and method based on deep Q network

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647773A (en) * 2012-05-02 2012-08-22 哈尔滨工业大学 Method for controlling, optimizing and selecting of heterogeneous network access based on Q-learning
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN109068350A (en) * 2018-08-15 2018-12-21 西安电子科技大学 A kind of autonomous network selection system and method for the terminal of Wireless Heterogeneous Networks
US20180376390A1 (en) * 2017-06-22 2018-12-27 At&T Intellectual Property I, L.P. Mobility management for wireless communication networks
WO2019231289A1 (en) * 2018-06-01 2019-12-05 Samsung Electronics Co., Ltd. Method and apparatus for machine learning based wide beam optimization in cellular network
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111586809A (en) * 2020-04-08 2020-08-25 西安邮电大学 Heterogeneous wireless network access selection method and system based on SDN

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647773A (en) * 2012-05-02 2012-08-22 哈尔滨工业大学 Method for controlling, optimizing and selecting of heterogeneous network access based on Q-learning
CN103327556A (en) * 2013-07-04 2013-09-25 中国人民解放军理工大学通信工程学院 Dynamic network selection method for optimizing quality of experience (QoE) of user in heterogeneous wireless network
US20180376390A1 (en) * 2017-06-22 2018-12-27 At&T Intellectual Property I, L.P. Mobility management for wireless communication networks
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
WO2019231289A1 (en) * 2018-06-01 2019-12-05 Samsung Electronics Co., Ltd. Method and apparatus for machine learning based wide beam optimization in cellular network
CN109068350A (en) * 2018-08-15 2018-12-21 西安电子科技大学 A kind of autonomous network selection system and method for the terminal of Wireless Heterogeneous Networks
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 Terminal access selection method based on deep reinforcement learning
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111586809A (en) * 2020-04-08 2020-08-25 西安邮电大学 Heterogeneous wireless network access selection method and system based on SDN

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JIANI SUN等: "ES-DQN-Based Vertical Handoff Algorithm for Heterogeneous Wireless Networks", 《IEEE WIRELESS COMMUNICATIONS LETTERS》, 28 April 2020 (2020-04-28) *
YIDING YU等: "Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks", 《IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS》, 12 March 2019 (2019-03-12) *
冯陈伟 等: "一种基于Q学习的网络接入控制算法", 《计算机工程》, 14 December 2015 (2015-12-14) *
谭俊杰等: "面向智能通信的深度强化学习方法", 《电子科技大学学报》, no. 02, 30 March 2020 (2020-03-30) *
陈前斌等: "基于深度强化学习的异构云无线接入网自适应无线资源分配算法", 《电子与信息学报》, no. 06, 15 June 2020 (2020-06-15) *
马彬等: "面向终端个性化服务的模糊垂直切换算法", 《电子与信息学报》, no. 06 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966968B (en) * 2021-03-26 2022-08-30 平安科技(深圳)有限公司 List distribution method based on artificial intelligence and related equipment
CN112966968A (en) * 2021-03-26 2021-06-15 平安科技(深圳)有限公司 List distribution method based on artificial intelligence and related equipment
CN113382412A (en) * 2021-05-12 2021-09-10 重庆邮电大学 Network selection method considering terminal security in super-dense heterogeneous network
CN113382412B (en) * 2021-05-12 2022-12-27 重庆邮电大学 Network selection method considering terminal security in super-dense heterogeneous network
CN113242584B (en) * 2021-06-22 2022-03-22 重庆邮电大学 Network selection method based on neural network in ultra-dense heterogeneous wireless network
CN113242584A (en) * 2021-06-22 2021-08-10 重庆邮电大学 Network selection method based on neural network in ultra-dense heterogeneous wireless network
CN113472484B (en) * 2021-06-29 2022-08-05 哈尔滨工业大学 Internet of things equipment user feature code identification method based on cross entropy iterative learning
CN113472484A (en) * 2021-06-29 2021-10-01 哈尔滨工业大学 Internet of things terminal equipment user feature code identification method based on cross entropy iterative learning
CN113613301B (en) * 2021-08-04 2022-05-13 北京航空航天大学 Air-ground integrated network intelligent switching method based on DQN
CN113613301A (en) * 2021-08-04 2021-11-05 北京航空航天大学 Air-space-ground integrated network intelligent switching method based on DQN
CN114021987A (en) * 2021-11-08 2022-02-08 深圳供电局有限公司 Microgrid energy scheduling strategy determination method, device, equipment and storage medium
CN114125962A (en) * 2021-11-10 2022-03-01 国网江苏省电力有限公司电力科学研究院 Self-adaptive network switching method, system and storage medium
CN114125962B (en) * 2021-11-10 2024-06-11 国网江苏省电力有限公司电力科学研究院 Self-adaptive network switching method, system and storage medium
CN117749625A (en) * 2023-12-27 2024-03-22 融鼎岳(北京)科技有限公司 Network performance optimization system and method based on deep Q network

Also Published As

Publication number Publication date
CN112367683B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN112367683B (en) Network selection method based on improved deep Q learning
Fadlullah et al. HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
CN111405569A (en) Calculation unloading and resource allocation method and device based on deep reinforcement learning
Han et al. Artificial intelligence-based handoff management for dense WLANs: A deep reinforcement learning approach
CN114390057B (en) Multi-interface self-adaptive data unloading method based on reinforcement learning under MEC environment
Huang et al. An overview of intelligent wireless communications using deep reinforcement learning
CN112672402B (en) Access selection method based on network recommendation in ultra-dense heterogeneous wireless network
CN113242584B (en) Network selection method based on neural network in ultra-dense heterogeneous wireless network
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Lei et al. Learning-based resource allocation: Efficient content delivery enabled by convolutional neural network
Kaur et al. An efficient handover mechanism for 5G networks using hybridization of LSTM and SVM
Abubakar et al. A lightweight cell switching and traffic offloading scheme for energy optimization in ultra-dense heterogeneous networks
Jo et al. Deep reinforcement learning‐based joint optimization of computation offloading and resource allocation in F‐RAN
CN114615730A (en) Content coverage oriented power distribution method for backhaul limited dense wireless network
Zhao et al. Reinforced-lstm trajectory prediction-driven dynamic service migration: A case study
Kiran 5G heterogeneous network (HetNets): a self-optimization technique for vertical handover management
Iqbal et al. Convolutional neural network-based deep Q-network (CNN-DQN) resource management in cloud radio access network
Cicioğlu et al. Handover management in software‐defined 5G small cell networks via long short‐term memory
Ye et al. Performance analysis of mobility prediction based proactive wireless caching
Zhao et al. C-LSTM: CNN and LSTM Based Offloading Prediction Model in Mobile Edge Computing (MEC)
CN112492645B (en) Collaborative vertical switching method based on heterogeneous edge cloud in UHWNs
Nithya et al. Artificial Intelligence on Mobile Multimedia Networks for Call Admission Control Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant