CN112188600B - Method for optimizing heterogeneous network resources by reinforcement learning - Google Patents

Method for optimizing heterogeneous network resources by reinforcement learning Download PDF

Info

Publication number
CN112188600B
CN112188600B CN202011002522.7A CN202011002522A CN112188600B CN 112188600 B CN112188600 B CN 112188600B CN 202011002522 A CN202011002522 A CN 202011002522A CN 112188600 B CN112188600 B CN 112188600B
Authority
CN
China
Prior art keywords
learning
sub
cre
reinforcement learning
heterogeneous network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011002522.7A
Other languages
Chinese (zh)
Other versions
CN112188600A (en
Inventor
李君�
李磊
仲星
朱明浩
李正权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ictehi Technology Development Co ltd
Binjiang College of Nanjing University of Information Engineering
Original Assignee
Ictehi Technology Development Co ltd
Binjiang College of Nanjing University of Information Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ictehi Technology Development Co ltd, Binjiang College of Nanjing University of Information Engineering filed Critical Ictehi Technology Development Co ltd
Priority to CN202011002522.7A priority Critical patent/CN112188600B/en
Publication of CN112188600A publication Critical patent/CN112188600A/en
Application granted granted Critical
Publication of CN112188600B publication Critical patent/CN112188600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0203Power saving arrangements in the radio access network or backbone network of wireless communication networks
    • H04W52/0206Power saving arrangements in the radio access network or backbone network of wireless communication networks in access points, e.g. base stations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Feedback Control In General (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method for optimizing heterogeneous network resources by reinforcement learning, which belongs to the technical field of communication, integrates reinforcement learning and convex optimization theory, provides a method for dividing an action space according to the correlation of actions, namely ABS, CRE and small base station dormancy strategies, and redesigns a reward function value to take a negative number and then take an inverse number as a new reward function value aiming at the problem that the system energy efficiency is over-large in order of magnitude of the reward function value in the reinforcement learning modeling process. The invention reduces the action space of reinforcement learning, ensures the convergence of the system by adopting the convex optimization theory, and accelerates the convergence rate of reinforcement learning; the simulation experiment proves that the method has convergence and lower complexity, and the convergence speed is improved by 60 percent compared with the traditional form type Q-Learning on the premise of almost reaching the theoretical value of the energy efficiency of the system.

Description

Method for optimizing heterogeneous network resources by reinforcement learning
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a method for optimizing heterogeneous network resources by reinforcement learning.
Background
As access to wireless devices increases, higher demands are placed on the communication capacity of the network system. One effective way to solve this problem is to build a heterogeneous network, where introducing elcic can effectively overcome the interference problem and improve the signal-to-interference-and-noise ratio between the mobile device and the base station. At the same time, more stringent requirements are placed on the performance and energy efficiency of heterogeneous networks. As the complexity of heterogeneous networks continues to increase, energy efficiency optimization faces increasing challenges and is one of the hot spots of communication network research, especially for heterogeneous networks equipped with 5G base stations. The key is how to effectively configure heterogeneous network resources to maximize the energy efficiency of the network system.
Research on heterogeneous network resource allocation problems from the bottom direction mainly focuses on jointly considering almost blank subframes (Almost Blank Subframe, ABS), cell coverage extension (Cell Range Expansion, CRE), and characteristics of small base station dormancy strategies, etc. to solve system energy efficiency allocation. Many scholars have finally established a non-convex NP-Hard problem. The conversion to a convex problem is achieved by relaxation (Karush-Kuhn-Tucker, KKT) conditions. The most effective method is to consider ABS jointly, and the CRE and the base station dormancy strategy are divided into three sub-problems of ABS, CRE and small base station dormancy strategy, wherein each sub-problem is convex, and according to a convex optimization theory, the original non-convex NP-Hard problem is obtained by circularly iterating solutions of the three sub-problems. The disadvantage of this solution is that the traditional mathematical method still requires a large amount of computation in actually solving the sub-problem and the computation process is quite complex. Limiting the field of practical application of this solution.
In recent years, machine learning techniques have been increasingly applied to many fields such as big data analysis, advertisement precision delivery, image classification, and the like. At present, a plurality of students introduce machine learning technology into a communication system for resource optimization research, mainly based on deep learning and reinforcement learning.
In the deep neural network, the deep learning has the advantage of good fitting performance. The deep learning method can well approximate the relation between the heterogeneous network resources and the system performance, thereby realizing the maximization of the heterogeneous network performance. The disadvantage is that neural networks can create problems with overfitting and learning speed. The reinforcement learning has the advantage that the model-free scheme and the model-based scheme can be adopted to solve the practical problem like the deep learning. It makes the solution of specific problem become more high-efficient, in time.
The learner maps the relation between the base station and the base station in the heterogeneous network and the base station and the user to the graph theory field, and then decomposes the initial Q-Learning problem into a plurality of Q-Learning sub-problems by combining reinforcement Learning and the graph theory so as to solve the network resource allocation and optimize the system performance.
Disclosure of Invention
The invention aims to: the invention aims to provide a method for optimizing heterogeneous network resources by using reinforcement Learning, aiming at the defect that reinforcement Learning is directly applied to heterogeneous network resource allocation and has overlarge action space, and the convergence rate is improved by 60% compared with that of the traditional form type Q-Learning on the premise that the theoretical value of system energy efficiency is almost reached.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a method for optimizing heterogeneous network resources using reinforcement learning, comprising the steps of:
step 1, establishing a Markov decision process according to a heterogeneous network energy efficiency target which needs to be optimized;
step 2, designing a traditional Q-Learning according to a Markov decision process;
step 3, redesigning the rewarding function value aiming at the problem that the magnitude of the rewarding function value in the Q-Learning is overlarge, taking the negative number and then taking the reciprocal, and compressing the rewarding function value to (-1, 0);
step 4: dividing a traditional Q-Learning action space into three sub-Q-Learning action spaces according to action correlation, namely an ABS, CRE and small base station dormancy strategy;
step 5, the cyclic iteration process is that the stable solution obtained by the three sub Q-Learning is cyclic iterated; in order to accelerate the convergence speed, the stable solution of each loop iteration is not necessarily the optimal solution of three sub Q-Learning;
step 6, bringing the solution obtained by each sub-problem into the condition of solving the two following sub-problems, enabling the solutions of the three sub-problems to reach a stable state at the same time through mutually circulating and iterating, combining the stable solutions of the three sub-problems, and outputting the optimal solution A of the original problem ABSo ,A CREo And A picoo
Further, in step 1, a step is establishedA markov decision process (S, a, P, R), specifically defining S as a state space, i.e. a set of user locations within a heterogeneous network cell; definition A is action space, namely action set selected by the agent under the condition of state S, definition P is transition state, namely: p(s) t+1 =s'|s t =s,a t =a); define R as the bonus function.
Further, in step 3, the prize function value is redesigned, the negative number is taken first, then the reciprocal is taken, and the prize function value is compressed to (-1, 0), namely
Figure BDA0002694823760000031
And the energy efficiency function of the E system is ensured, and the consistency of the rewarding function and the energy efficiency of the system is ensured.
Further, in step 4, the conventional Q-Learning action space is divided into three sub-Q-Learning action spaces, i.e. A is decomposed into
Figure BDA0002694823760000032
And->
Figure BDA0002694823760000033
The action space set for optimizing the sleep strategies of the ABS, CRE and small base station is sequentially defined as +.>
Figure BDA0002694823760000034
Define +.>
Figure BDA0002694823760000035
For the configuration set of CRE, define +.>
Figure BDA0002694823760000036
A sleep strategy set for the small base station; solving the sleep strategy solutions of the ABS, the CRE and the small base station respectively:
Figure BDA0002694823760000037
further, in step 5, the loop iterates, namely
R ABS ~P(R|S,A ABS )≤R ABSo ~P(R|S,A ABSo ),
R CRE ~P(R|S,A CRE )≤R CREo ~P(R|S,A CREo ),
R CRE ~P(R|S,A CRE )≤R Picoo ~P(R|S,A Picoo ) Wherein A is ABSo ,A CREo And
A picoo is the optimal action of three sub-Q-Learning.
The principle of the invention: according to the method, an initial problem is decomposed into a plurality of sub-problems according to the correlation of configuration resources, and the solution of the initial problem is obtained by circularly iterating the solution of the sub-problems. The approach to solving several sub-problems employs Q-Learning instead of traditional mathematical methods. Mapping the initial problem to the reinforcement Learning field, dividing an action space according to the correlation of actions, decomposing the original Q-Learning into a plurality of sub Q-Learning according to the rule of dividing actions, and obtaining the optimal strategy of the initial Q-Learning by circularly iterating the optimal strategy of the sub Q-Learning. The system energy efficiency is redesigned as the reward function, and the system energy efficiency is firstly taken as a negative value and then is taken as an inverse value, so that the reward function value of reinforcement learning can be compressed to (-1, 0), and the new reward function is ensured to be consistent with the system energy efficiency value.
The beneficial effects are that: compared with the prior art, the method for optimizing heterogeneous network resources by utilizing reinforcement learning is integrated with reinforcement learning and convex optimization theory, and the method is used for dividing the action space according to the correlation of actions, namely ABS, CRE and small base station dormancy strategies, aiming at the problem that the energy efficiency of a system in the reinforcement learning modeling process is over large in order of magnitude of a reward function value, redesigning the reward function value, taking the negative number first and then taking the reciprocal as a new reward function value. The invention reduces the action space of reinforcement learning, ensures the convergence of the system by adopting the convex optimization theory, and accelerates the convergence rate of reinforcement learning; the simulation experiment proves that the method has convergence and lower complexity, and the convergence speed is improved by 60 percent compared with the traditional form type Q-Learning on the premise of almost reaching the theoretical value of the energy efficiency of the system.
Drawings
FIG. 1 is a flow chart of a method construction process of the present invention;
FIG. 2 is a schematic diagram of iterative operation of a sub-Q-Learning loop of the present invention;
FIG. 3 is a chart showing the convergence rate of the conventional Q-Learning method under the same parameter setting;
FIG. 4 is a chart showing the convergence speed of the method of the present invention under the same parameter setting;
FIG. 5 is a diagram of the energy efficiency of the method system of the present invention.
Detailed Description
The invention is further described below in conjunction with the detailed description.
As shown in fig. 1-5, a method for optimizing heterogeneous network resources by reinforcement learning includes the following steps:
step 1: establishing a Markov decision process (Markov Decision Process, MDP) (S, A, P, R) according to the heterogeneous network energy efficiency target to be optimized, wherein S is defined as a state space, namely a set of user positions in a heterogeneous network cell; definition A is action space, namely action set selected by the agent under the condition of state S, definition P is transition state, namely
P(s t+1 =s'|s t =s,a t =a); define R as the bonus function.
Step 2: designing a traditional Q-Learning according to a Markov decision process;
step 3: aiming at the problem that the magnitude of the rewarding function value in Q-Learning is overlarge, redesigning the rewarding function value, taking the negative number and then taking the reciprocal, and compressing the rewarding function value to (-1, 0), namely
Figure BDA0002694823760000041
The energy efficiency function of the E system ensures the consistency of the rewarding function and the energy efficiency of the system at the same time;
step 4: according to the correlation of actions, namely ABS, CRE and small base station dormancy strategies, the traditional Q-Learning action is performedThe space is divided into three sub-Q-Learning action spaces, i.e. A is decomposed into
Figure BDA0002694823760000042
Figure BDA0002694823760000043
And->
Figure BDA0002694823760000044
The action space set for optimizing the sleep strategies of the ABS, CRE and small base station is sequentially defined as +.>
Figure BDA0002694823760000045
Define +.>
Figure BDA0002694823760000046
For the configuration set of CRE, define +.>
Figure BDA0002694823760000047
And the method is a small base station dormancy strategy set. Solving sleep strategy solutions of ABS, CRE and small base station respectively
Figure BDA0002694823760000051
Step 5: the loop iteration process is to perform loop iteration on stable solutions obtained by three sub Q-Learning. To increase the convergence rate, the stable solution for each loop iteration is not necessarily the optimal solution for three sub-Q-Learning, i.e
R ABS ~P(R|S,A ABS )≤R ABSo ~P(R|S,A ABSo ),
R CRE ~P(R|S,A CRE )≤R CREo ~P(R|S,A CREo ),
R CRE ~P(R|S,A CRE )≤R Picoo ~P(R|S,A Picoo ) Wherein A is ABSo ,A CREo And
A picoo is the optimal action of three sub Q-Learning;
step 6: the solution obtained by each sub-problem is brought into the condition of solving the two following sub-problems, the solutions of the three sub-problems reach a stable state at the same time through mutually circulating iteration, the stable solutions of the three sub-problems are combined, and the optimal solution A of the original problem is output ABSo ,A CREo And A picoo
FIG. 1 is a flow chart of the construction process of the method of the present invention. The traditional form type Q-Learning complex problem has a higher dimensional action space, and it is impractical to directly apply Q-Learning. As shown in fig. 1, an MDP is established according to the energy efficiency to be optimized, and Q-Learning of the conventional table type is established. Aiming at the problem of overlarge action space, the action space is divided according to the relation of the action required to be optimized by the energy efficiency of the system. The Q-Learning of the original table is decomposed into three sub-Q-Learning, and the actions to be optimized are obtained. When the solutions of the three sub Q-Learning are all kept stable in the loop iteration, the solutions of the three sub Q-Learning are combined and output. The solution of the original Q-Learning is obtained.
FIG. 2 shows a flow chart of three sub-Q-Learning loop iterations, wherein the current sub-Q-Learning solution is updated by the last loop iteration solution, then the sub-Q-Learning solution is used as the condition of two sub-Q-Learning solutions to be solved subsequently, the solutions of the three sub-problems reach a stable state at the same time through the loop iteration, the stable solutions of the three sub-problems are combined, and the optimal solution of the original problem is generated and output.
Based on the flowcharts of fig. 1 and fig. 2, in the simulation experiment, the number of users is set to 50, 100, 150, and 200, respectively, and is randomly entered into the cell. The wireless channel is modeled as a deterministic path loss attenuation and random shadowing fading model, and the system bandwidth is set to 10MHz. FIGS. 3 and 4 show the relationship between the number of iterations and the accuracy of the conventional form type Q-Learning method and the method of improving reinforcement Learning actions (TQL) of the present invention, wherein the Learning rate, the discount factor and the greedy rate are each set to 0.1. Wherein fig. 3 shows convergence after about 80×10000=800000 iterations of the Q-Learning method under different load conditions, and in fig. 4, the proposed TQL method converges after about 800×400=320000 iterations, and in fig. 3-4, accuracy represents the Accuracy rate, learning rate, discover factor represents the Discount factor, greedy rate, and itersips represents the number of iterative steps. As can be seen from fig. 3 and 4, the convergence rate of the TQL method we propose is improved by about 60% compared to method 1.
Fig. 5 shows a comparison of the TQL method proposed by the present invention with the conventional Q-Learning method and ADPs ES IC method for energy efficiency optimization of heterogeneous networks, where Energy Efficiency represents the energy efficiency and UEs represents the number of users. From fig. 5 (a), it can be seen that the optimization of the system energy efficiency by the method proposed by the present invention is already very close to the theoretical value of the system energy efficiency, and it is also seen that the optimization of the system energy efficiency by the method proposed by the present invention is far greater than the performance of the ADPs ES IC proposed by the relevant scholars. Fig. 5 (b) shows the gap between the energy efficiency method of the optimized heterogeneous network and the theoretical optimal value of the energy efficiency, and it can be seen from fig. 4 that the gap exists mainly in that the method of the invention finds a relatively optimized solution in a state that the optimal solution is not found in an individual state, but the relatively optimized solution is found in a state that the optimal solution is not found, and fig. 5 (b) verifies that the loss of the energy efficiency of the system is small.
The foregoing is merely a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and the modifications and variations should also be regarded as the scope of the invention.

Claims (5)

1. A method for optimizing heterogeneous network resources by using reinforcement learning is characterized in that: the method comprises the following steps:
step 1, establishing a Markov decision process according to a heterogeneous network energy efficiency target which needs to be optimized;
step 2, designing a traditional Q-Learning according to a Markov decision process;
step 3, redesigning the order of magnitude of the rewarding function value in the Q-Learning, taking the negative number and then taking the reciprocal, and compressing the rewarding function value to (-1, 0);
step 4: dividing a traditional Q-Learning action space into three sub-Q-Learning action spaces according to action correlation, namely an ABS, CRE and small base station dormancy strategy;
step 5, the cyclic iteration process is that the stable solution obtained by the three sub Q-Learning is cyclic iterated; in order to accelerate the convergence speed, the stable solution of each loop iteration is not necessarily the optimal solution of three sub Q-Learning;
step 6, bringing the solution obtained by each sub-problem into the condition of solving the two following sub-problems, enabling the solutions of the three sub-problems to reach a stable state at the same time through mutually circulating and iterating, combining the stable solutions of the three sub-problems, and outputting the optimal solution A of the original problem ABSo ,A CREo And A picoo
2. The method for optimizing heterogeneous network resources using reinforcement learning of claim 1, wherein: in step 1, a markov decision process (S, a, P, R) is established, specifically, defining S as a state space, that is, a set of user positions in heterogeneous network cells, defining a as an action space, that is, an action set selected by an agent under the condition of state S, and defining P as a transition state, that is:
P(s t+1 =s'|s t =s,a t =a); define R as the bonus function.
3. The method for optimizing heterogeneous network resources using reinforcement learning of claim 2, wherein: in step 3, the value of the reward function is redesigned, the negative number is firstly taken, then the reciprocal is taken, and the value of the reward function is compressed to (-1, 0), namely
Figure FDA0002694823750000011
And the energy efficiency function of the E system is ensured, and the consistency of the rewarding function and the energy efficiency of the system is ensured.
4. A method for optimizing heterogeneous network resources using reinforcement learning as recited in claim 3, wherein: in step 4, the conventional Q-LThe reading action space is divided into three sub-Q-Learning action spaces, i.e. A is decomposed into
Figure FDA0002694823750000012
And->
Figure FDA0002694823750000013
The action space set for optimizing the sleep strategy of the ABS, the CRE and the small base station is sequentially defined
Figure FDA0002694823750000021
Define +.>
Figure FDA0002694823750000022
For the configuration set of CRE, define +.>
Figure FDA0002694823750000023
A sleep strategy set for the small base station; solving the sleep strategy solutions of the ABS, the CRE and the small base station respectively:
Figure FDA0002694823750000024
5. the method for optimizing heterogeneous network resources using reinforcement learning of claim 4, wherein: in step 5, the loop iteration is that
R ABS ~P(R|S,A ABS )≤R ABSo ~P(R|S,A ABSo ),
R CRE ~P(R|S,A CRE )≤R CREo ~P(R|S,A CREo ),
R CRE ~P(R|S,A CRE )≤R Picoo ~P(R|S,A Picoo ) Wherein A is ABSo ,A CREo And A picoo Is the optimal action of three sub-Q-Learning.
CN202011002522.7A 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning Active CN112188600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011002522.7A CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011002522.7A CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Publications (2)

Publication Number Publication Date
CN112188600A CN112188600A (en) 2021-01-05
CN112188600B true CN112188600B (en) 2023-05-30

Family

ID=73955731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011002522.7A Active CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Country Status (1)

Country Link
CN (1) CN112188600B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709882B (en) * 2021-08-24 2023-10-17 吉林大学 Internet of vehicles communication resource allocation method based on graph theory and reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10091785B2 (en) * 2014-06-11 2018-10-02 The Board Of Trustees Of The University Of Alabama System and method for managing wireless frequency usage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"面向智能通信的深度强化学习方法";谭俊杰;《电子科技大学学报》;全文 *

Also Published As

Publication number Publication date
CN112188600A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN112492691B (en) Downlink NOMA power distribution method of depth deterministic strategy gradient
CN112188600B (en) Method for optimizing heterogeneous network resources by reinforcement learning
Li et al. Energy efficiency maximization oriented resource allocation in 5G ultra-dense network: Centralized and distributed algorithms
Xu et al. Dynamic client association for energy-aware hierarchical federated learning
Li et al. Deep neural network based computational resource allocation for mobile edge computing
CN104640185A (en) Cell dormancy energy-saving method based on base station cooperation
Hu et al. Multi-agent DRL-based resource allocation in downlink multi-cell OFDMA system
Zhao et al. Price-based power allocation in two-tier spectrum sharing heterogeneous cellular networks
CN114615730A (en) Content coverage oriented power distribution method for backhaul limited dense wireless network
US11961409B1 (en) Air-ground joint trajectory planning and offloading scheduling method and system for distributed multiple objectives
Wang et al. Joint heterogeneous tasks offloading and resource allocation in mobile edge computing systems
CN111065121B (en) Intensive network energy consumption and energy efficiency combined optimization method considering cell difference
Huang et al. Drop Maslow's Hammer or not: machine learning for resource management in D2D communications
CN116233984A (en) Energy-saving control method and device of base station, electronic equipment and storage medium
CN116132997A (en) Method for optimizing energy efficiency in hybrid power supply heterogeneous network based on A2C algorithm
Mohammad et al. Optimal task allocation for mobile edge learning with global training time constraints
CN115915454A (en) SWIPT-assisted downlink resource allocation method and device
CN101729105A (en) Power control structure and method thereof based on game theory model in network
CN107995034A (en) A kind of dense cellular network energy and business collaboration method
Guo et al. Deep reinforcement learning based traffic offloading scheme for vehicular networks
CN115720341A (en) Method, medium and device for 5G channel shutoff
Besser et al. Deep learning based resource allocation: How much training data is needed?
CN115250156A (en) Wireless network multichannel frequency spectrum access method based on federal learning
CN103607759A (en) Zoom dormancy method and apparatus for micro base station in cellular network
CN104507111B (en) Collaborative communication method and device based on cluster in cellular network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210707

Address after: No.333 Xishan Avenue, Xishan District, Wuxi City, Jiangsu Province

Applicant after: Binjiang College of Nanjing University of Information Engineering

Applicant after: ICTEHI TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: No.333 Xishan Avenue, Xishan District, Wuxi City, Jiangsu Province

Applicant before: Binjiang College of Nanjing University of Information Engineering

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant