CN112188600A - Method for optimizing heterogeneous network resources by using reinforcement learning - Google Patents

Method for optimizing heterogeneous network resources by using reinforcement learning Download PDF

Info

Publication number
CN112188600A
CN112188600A CN202011002522.7A CN202011002522A CN112188600A CN 112188600 A CN112188600 A CN 112188600A CN 202011002522 A CN202011002522 A CN 202011002522A CN 112188600 A CN112188600 A CN 112188600A
Authority
CN
China
Prior art keywords
learning
cre
reward function
abs
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011002522.7A
Other languages
Chinese (zh)
Other versions
CN112188600B (en
Inventor
李君�
李磊
仲星
朱明浩
李正权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ictehi Technology Development Co ltd
Binjiang College of Nanjing University of Information Engineering
Original Assignee
Binjiang College of Nanjing University of Information Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Binjiang College of Nanjing University of Information Engineering filed Critical Binjiang College of Nanjing University of Information Engineering
Priority to CN202011002522.7A priority Critical patent/CN112188600B/en
Publication of CN112188600A publication Critical patent/CN112188600A/en
Application granted granted Critical
Publication of CN112188600B publication Critical patent/CN112188600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0203Power saving arrangements in the radio access network or backbone network of wireless communication networks
    • H04W52/0206Power saving arrangements in the radio access network or backbone network of wireless communication networks in access points, e.g. base stations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a method for optimizing heterogeneous network resources by using reinforcement learning, which belongs to the technical field of communication and integrates reinforcement learning and convex optimization theories, provides a strategy of dormancy of ABS, CRE and a small base station according to the relevance of actions, divides an action space, and redesigns a reward function value by taking a negative number and then taking a reciprocal number as a new reward function value aiming at the problem that the system energy efficiency is too large as the magnitude order of the reward function value in the reinforcement learning modeling process. The invention reduces the action space of reinforcement learning, and the convex optimization theory can ensure the system convergence and accelerate the convergence speed of reinforcement learning; simulation experiments can prove that the method has convergence and lower complexity, and the convergence speed is improved by 60% compared with Q-Learning of the traditional table type on the premise of almost reaching the theoretical value of the system energy efficiency.

Description

Method for optimizing heterogeneous network resources by using reinforcement learning
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a method for optimizing heterogeneous network resources by using reinforcement learning.
Background
As the number of access wireless devices increases, higher demands are placed on the communication capacity of the network system. One of the effective methods to solve this problem is to build a heterogeneous network, in which elcic is introduced to effectively overcome the interference problem and improve the signal-to-interference-and-noise ratio between the mobile device and the base station. At the same time, more stringent requirements are placed on the performance and energy efficiency of heterogeneous networks. As the complexity of heterogeneous networks continues to increase, optimization of energy efficiency faces more and more challenges and is one of the hotspots of communication network research, especially for heterogeneous networks equipped with 5G base stations. The key point is how to effectively configure heterogeneous network resources, so that the energy efficiency of a network system is maximized.
The method mainly focuses on jointly considering characteristics of Almost Blank Subframes (ABS), Cell Range Expansion (CRE), small Cell dormancy policy and the like to solve system energy efficiency configuration. Many scholars have finally established a non-convex NP-Hard problem. This is converted into a convex problem by relaxation (Karush-Kuhn-Tucker, KKT) conditions. The most effective method is to consider ABS, CRE and base station dormancy strategies jointly and divide the strategies into three subproblems of independently considering ABS, CRE and small base station dormancy strategies, wherein each subproblem is convex, and according to a convex optimization theory, the original non-convex NP-Hard problem is obtained by circularly iterating solutions of the three subproblems. The disadvantage of this solution is that the traditional mathematical method still requires a large amount of computation in actually solving the subproblems, and the computation process is quite complex. Limiting the practical application of this solution.
In recent years, machine learning techniques are increasingly applied to many fields, such as big data analysis, advertisement precise placement, image classification, and the like. At present, many scholars introduce machine learning technology into a communication system for resource optimization research, mainly taking deep learning and reinforcement learning as main points.
In the deep neural network, the deep learning has the advantage of good fitting performance. The deep learning method can well approach the relation between heterogeneous network resources and system performance, thereby realizing the maximization of the heterogeneous network performance. The disadvantage is that neural networks can create problems of overfitting and excessive learning speed. The reinforcement learning has the advantages that the model-free scheme can be adopted like the deep learning, and the model scheme can also be adopted to solve the practical problem. It makes the solution of specific problem become more efficient, timely.
The learners map the relationships between the base stations and the base stations in the heterogeneous network and between the base stations and the users to the field of graph theory, and then decompose the initial Q-Learning problem into a plurality of Q-Learning sub-problems by combining reinforcement Learning and the graph theory to solve the network resource allocation so as to optimize the system performance.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a method for optimizing heterogeneous network resources by using reinforcement Learning aiming at the defect of overlarge action space when reinforcement Learning is directly applied to heterogeneous network resource allocation, and the convergence speed is improved by 60 percent compared with the traditional table type Q-Learning on the premise of almost reaching the theoretical value of system energy efficiency.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
a method for optimizing heterogeneous network resources by using reinforcement learning comprises the following steps:
step 1, establishing a Markov decision process according to an energy efficiency target of a heterogeneous network to be optimized;
step 2, designing traditional Q-Learning according to Markov decision process;
step 3, aiming at the problem that the magnitude order of the reward function value in Q-Learning is too large, redesigning the reward function value, firstly taking a negative number and then taking the reciprocal, and compressing the reward function value to (-1, 0);
and 4, step 4: according to the action correlation, namely ABS, CRE and a small base station sleep strategy, dividing the traditional Q-Learning action space into three sub Q-Learning action spaces;
step 5, the loop iteration process is that loop iteration is carried out on the stable solution obtained by the three sub Q-Learning units; in order to accelerate the convergence speed, the stable solution of each loop iteration is not necessarily the optimal solution of the three sub Q-Learning;
and 6, bringing the solution solved by each subproblem into the condition for solving the two subsequent subproblems, enabling the solutions of the three subproblems to simultaneously reach a stable state through mutual loop iteration, combining the stable solutions of the three subproblems, and outputting the optimal solution A of the original problemABSo,ACREoAnd Apicoo
Further, in step 1, a markov decision process (S, a, P, R) is established, specifically, S is defined as a state space, i.e. a set of user positions in a heterogeneous network cell; defining A as an action space, namely an action set selected by the agent under the condition of the state S, and defining P as a transition state, namely: p(s)t+1=s'|st=s,atA); r is defined as the reward function.
Further, in step 3, the reward function value is redesigned by taking a negative number and then taking a reciprocal, and the reward function value is compressed to (-1,0), namely
Figure BDA0002694823760000031
And E, the energy efficiency function of the system E ensures the consistency of the reward function and the system energy efficiency.
Further, in step 4, the conventional Q-Learning motion space is divided into three sub-Q-Learning motion spaces, i.e. a is decomposed into three sub-Q-Learning motion spaces
Figure BDA0002694823760000032
And
Figure BDA0002694823760000033
in turn, optimizing the action space set of ABS, CRE and small cell dormancy strategy, defining
Figure BDA0002694823760000034
For configuration set of ABS, define
Figure BDA0002694823760000035
For configuration set of CRE, define
Figure BDA0002694823760000036
A small base station dormancy strategy set is obtained; solving the dormancy strategy solutions of ABS, CRE and small base station respectively:
Figure BDA0002694823760000037
further, in step 5, the loop is iterated
RABS~P(R|S,AABS)≤RABSo~P(R|S,AABSo),
RCRE~P(R|S,ACRE)≤RCREo~P(R|S,ACREo),
RCRE~P(R|S,ACRE)≤RPicoo~P(R|S,APicoo) Wherein A isABSo,ACREoAnd
Apicoois the optimal action of the three sub-Q-Learning.
The invention principle is as follows: the method decomposes an initial problem into a plurality of sub-problems according to the correlation of configuration resources, and obtains the solution of the initial problem by circularly iterating the solutions of the sub-problems. The approach to solve several sub-problems is to use Q-Learning instead of the traditional mathematical approach. Mapping the initial problem to a reinforcement Learning field, segmenting an action space according to the correlation of actions, decomposing the original Q-Learning into a plurality of sub-Q-Learning according to the criterion of the segmentation actions, and circularly iterating the optimal strategy of the sub-Q-Learning to obtain the optimal strategy of the initial Q-Learning. The system energy efficiency is used as a reward function to be redesigned, the system energy efficiency is firstly subjected to negative value and then is subjected to reciprocal value, the reward function value of reinforcement learning can be compressed to (-1,0), and meanwhile, the new reward function is ensured to be consistent with the system energy efficiency value.
Has the advantages that: compared with the prior art, aiming at the defect that the action space is overlarge when reinforcement learning is directly applied to the configuration of heterogeneous network resources, the method for optimizing the heterogeneous network resources by utilizing the reinforcement learning integrates reinforcement learning and convex optimization theories, provides a strategy for dividing the action space according to the relevance of the action, namely ABS, CRE and small base station dormancy, and redesigns the reward function value to firstly take a negative number and then take a reciprocal number as a new reward function value aiming at the problem that the system energy efficiency is overlarge as the magnitude order of the reward function value in the reinforcement learning modeling process. The invention reduces the action space of reinforcement learning, and the convex optimization theory can ensure the system convergence and accelerate the convergence speed of reinforcement learning; simulation experiments can prove that the method has convergence and lower complexity, and the convergence speed is improved by 60% compared with Q-Learning of the traditional table type on the premise of almost reaching the theoretical value of the system energy efficiency.
Drawings
FIG. 1 is a flow chart of the method construction process of the present invention;
FIG. 2 is a schematic diagram of the iterative operation of the sub-Q-Learning loop of the present invention;
FIG. 3 is a graph of convergence rate of the conventional Q-Learning method under the same parameter setting;
FIG. 4 is a graph of convergence rate for the method of the present invention under the same parameter settings;
FIG. 5 is a system energy efficiency diagram of the method of the present invention.
Detailed Description
The present invention will be further described with reference to the following embodiments.
As shown in fig. 1-5, a method for optimizing heterogeneous network resources by reinforcement learning includes the following steps:
step 1: establishing a Markov Decision Process (MDP) (S, A, P, R) according to the heterogeneous network energy efficiency target needing to be optimized, wherein S is defined as a state space, namely a set of user positions in a heterogeneous network cell; define A as the action space, i.e. the set of actions the agent chooses in the case of state S, and define P as the transition state, i.e. the state of transition
P(st+1=s'|st=s,atA); r is defined as the reward function.
Step 2: designing a traditional Q-Learning according to a Markov decision process;
and step 3: for number of reward function values in Q-LearningRedesigning the value of the reward function when the level is too large, taking the negative number and then taking the reciprocal, and compressing the value of the reward function to (-1,0), namely
Figure BDA0002694823760000041
The energy efficiency function of the E system simultaneously ensures the consistency of the reward function and the system energy efficiency;
and 4, step 4: according to the dependence of the action, namely ABS, CRE and small base station sleep strategy, the traditional Q-Learning action space is divided into three sub-Q-Learning action spaces, namely A is divided into
Figure BDA0002694823760000042
Figure BDA0002694823760000043
And
Figure BDA0002694823760000044
in turn, optimizing the action space set of ABS, CRE and small cell dormancy strategy, defining
Figure BDA0002694823760000045
For configuration set of ABS, define
Figure BDA0002694823760000046
For configuration set of CRE, define
Figure BDA0002694823760000047
And setting a dormancy strategy set for the small base station. Respectively solving dormancy strategy solutions of ABS, CRE and small base station
Figure BDA0002694823760000051
And 5: the loop iteration process is that loop iteration is carried out on stable solutions obtained by three sub Q-Learning. To speed up convergence, the stable solution for each iteration of the loop is not necessarily the optimal solution for the three sub-Q-Learning, i.e., the optimal solution for the loop is obtained
RABS~P(R|S,AABS)≤RABSo~P(R|S,AABSo),
RCRE~P(R|S,ACRE)≤RCREo~P(R|S,ACREo),
RCRE~P(R|S,ACRE)≤RPicoo~P(R|S,APicoo) Wherein A isABSo,ACREoAnd
Apicoois the optimal action of the three sub-Q-Learning;
step 6: the solution solved by each sub-problem is brought into the condition for solving the two following sub-problems, the solutions of the three sub-problems simultaneously reach a stable state through mutual loop iteration, the stable solutions of the three sub-problems are combined, and the optimal solution A of the original problem is outputABSo,ACREoAnd Apicoo
FIG. 1 is a flow chart of the process of construction of the method of the present invention. The traditional table type Q-Learning complex problem has a high dimensional motion space, and direct application of Q-Learning is impractical. As shown in fig. 1, an MDP is established according to the energy efficiency that needs to be optimized, and Q-Learning of the conventional table type is established. Aiming at the problem of overlarge action space, the relevance of the action is optimized according to the energy efficiency requirement of the system, and the action space is divided. The Q-Learning of the original table is decomposed into three sub-Q-Learning, each of which finds the action to be optimized. And when the solutions of the three sub-Q-Learning are stable in the loop iteration, combining and outputting the solutions of the three sub-Q-Learning. A solution of the original Q-Learning is obtained.
Fig. 2 is a flow chart showing three sub-Q-Learning loop iterations, where the current solution of the sub-Q-Learning is updated by the last loop iteration solution, and then the solution of the sub-Q-Learning is used as a condition for two sub-Q-Learning to be solved subsequently, and through the loop iterations, the solutions of three sub-problems simultaneously reach a stable state, and stable solutions of three sub-problems are combined to generate an optimal solution of the original problem for output.
Based on the flow chart constructed in fig. 1 and fig. 2, in the simulation experiment, the number of users was set to 50, 100, 150, and 200, respectively, and was randomly entered into the cell. The wireless channel is modeled as a deterministic path loss attenuation and random shadow fading model, and the system bandwidth is set to be 10 MHz. FIGS. 3 and 4 show the relationship between iteration number and accuracy for the conventional table type Q-Learning method and the method of improving reinforcement Learning actions (TQL) of the present invention, where the Learning rate, discount factor and greedy rate are all set to 0.1. Where fig. 3 shows that the Q-Learning method converges after about 80 × 10000 ═ 800000 iterations under different load conditions, in fig. 4, the proposed TQL method converges after about 800 × 400 ═ 320000 iterations, in fig. 3-4, Accuracy denotes the Accuracy, Learning rate, Discount factor, green rate denotes the greedy rate, and iterationtps denotes the number of iteration steps. As can be seen from fig. 3 and 4, the convergence speed of our proposed TQL method is improved by about 60% compared to method 1.
Fig. 5 shows a comparison between the TQL method proposed by the present invention and the conventional Q-Learning method and the ADP ES IC method for optimizing the Energy Efficiency of the heterogeneous network, where Energy Efficiency represents Energy Efficiency and UEs represents the number of users. From fig. 5(a), it can be seen that the optimization of the system energy efficiency by the method proposed by the present invention is already very close to the theoretical value of the system energy efficiency, and at the same time, the optimization of the system energy efficiency by the method proposed by the present invention is far greater than the performance of ADP ES IC proposed by the related scholars. Fig. 5(b) shows a gap between the energy efficiency optimization method for the heterogeneous network and an energy efficiency theoretical optimal value, and it can be seen from fig. 4 that the gap is mainly that the method provided by the present invention finds a relatively optimized solution in a state where an optimal solution is not found in an individual state, but the method does not find the optimal solution, and fig. 5(b) verifies that the loss of the system energy efficiency is small.
The above description is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the technical principles of the present invention, and these modifications and variations should also be construed as the scope of the present invention.

Claims (5)

1. A method for optimizing heterogeneous network resources by using reinforcement learning is characterized in that: the method comprises the following steps:
step 1, establishing a Markov decision process according to an energy efficiency target of a heterogeneous network to be optimized;
step 2, designing traditional Q-Learning according to Markov decision process;
step 3, redesigning the reward function value aiming at the magnitude order of the reward function value in Q-Learning, firstly taking a negative number and then taking a reciprocal, and compressing the reward function value to (-1, 0);
and 4, step 4: according to the action correlation, namely ABS, CRE and a small base station sleep strategy, dividing the traditional Q-Learning action space into three sub Q-Learning action spaces;
step 5, the loop iteration process is that loop iteration is carried out on the stable solution obtained by the three sub Q-Learning units; in order to accelerate the convergence speed, the stable solution of each loop iteration is not necessarily the optimal solution of the three sub Q-Learning;
and 6, bringing the solution solved by each subproblem into the condition for solving the two subsequent subproblems, enabling the solutions of the three subproblems to simultaneously reach a stable state through mutual loop iteration, combining the stable solutions of the three subproblems, and outputting the optimal solution A of the original problemABSo,ACREoAnd Apicoo
2. The method of claim 1, wherein the method comprises the following steps: in step 1, a markov decision process (S, a, P, R) is established, specifically, S is defined as a state space, i.e., a set of user positions in a heterogeneous network cell, a is defined as an action space, i.e., an action set selected by an agent in the case of the state S, and P is defined as a transition state, i.e.,:
P(st+1=s'|st=s,ata); r is defined as the reward function.
3. The method of claim 2, wherein the method comprises the following steps: in step 3, redesigning the value of the reward function, firstly taking the negative number and then taking the reciprocal, and compressing the value of the reward function to (-1,0), namely
Figure FDA0002694823750000011
And E, the energy efficiency function of the system E ensures the consistency of the reward function and the system energy efficiency.
4. The method of claim 3, wherein the method comprises the following steps: in step 4, the conventional Q-Learning motion space is divided into three sub-Q-Learning motion spaces, i.e. a is decomposed into
Figure FDA0002694823750000012
And
Figure FDA0002694823750000013
in turn, optimizing the action space set of ABS, CRE and small cell dormancy strategy, defining
Figure FDA0002694823750000021
For configuration set of ABS, define
Figure FDA0002694823750000022
For configuration set of CRE, define
Figure FDA0002694823750000023
A small base station dormancy strategy set is obtained; solving the dormancy strategy solutions of ABS, CRE and small base station respectively:
Figure FDA0002694823750000024
5. the method of claim 4, wherein the method comprises the following steps: in step 5, the loop is iterated
RABS~P(R|S,AABS)≤RABSo~P(R|S,AABSo),
RCRE~P(R|S,ACRE)≤RCREo~P(R|S,ACREo),
RCRE~P(R|S,ACRE)≤RPicoo~P(R|S,APicoo) Wherein A isABSo,ACREoAnd ApicooIs the optimal action of the three sub-Q-Learning.
CN202011002522.7A 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning Active CN112188600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011002522.7A CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011002522.7A CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Publications (2)

Publication Number Publication Date
CN112188600A true CN112188600A (en) 2021-01-05
CN112188600B CN112188600B (en) 2023-05-30

Family

ID=73955731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011002522.7A Active CN112188600B (en) 2020-09-22 2020-09-22 Method for optimizing heterogeneous network resources by reinforcement learning

Country Status (1)

Country Link
CN (1) CN112188600B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709882A (en) * 2021-08-24 2021-11-26 吉林大学 Vehicle networking communication resource allocation method based on graph theory and reinforcement learning
CN118540717A (en) * 2024-07-26 2024-08-23 华东交通大学 Base station dormancy and power distribution method based on deep reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150365871A1 (en) * 2014-06-11 2015-12-17 Board Of Trustees Of The University Of Alabama System and method for managing wireless frequency usage
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150365871A1 (en) * 2014-06-11 2015-12-17 Board Of Trustees Of The University Of Alabama System and method for managing wireless frequency usage
US9622133B1 (en) * 2015-10-23 2017-04-11 The Florida International University Board Of Trustees Interference and mobility management in UAV-assisted wireless networks
CN108521673A (en) * 2018-04-09 2018-09-11 湖北工业大学 Resource allocation and power control combined optimization method based on intensified learning in a kind of heterogeneous network
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN110691422A (en) * 2019-10-06 2020-01-14 湖北工业大学 Multi-channel intelligent access method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭俊杰: ""面向智能通信的深度强化学习方法"", 《电子科技大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709882A (en) * 2021-08-24 2021-11-26 吉林大学 Vehicle networking communication resource allocation method based on graph theory and reinforcement learning
CN113709882B (en) * 2021-08-24 2023-10-17 吉林大学 Internet of vehicles communication resource allocation method based on graph theory and reinforcement learning
CN118540717A (en) * 2024-07-26 2024-08-23 华东交通大学 Base station dormancy and power distribution method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN112188600B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
Chen et al. A joint learning and communications framework for federated learning over wireless networks
Chen et al. Dynamic task offloading for internet of things in mobile edge computing via deep reinforcement learning
CN112118287B (en) Network resource optimization scheduling decision method based on alternative direction multiplier algorithm and mobile edge calculation
CN111132230B (en) Bandwidth allocation and data compression joint optimization method for data acquisition
CN110928654A (en) Distributed online task unloading scheduling method in edge computing system
CN109286664A (en) A kind of computation migration terminal energy consumption optimization method based on Lagrange
CN109246761A (en) Consider the discharging method based on alternating direction multipliers method of delay and energy consumption
Meng et al. Deep reinforcement learning based task offloading algorithm for mobile-edge computing systems
CN112188600A (en) Method for optimizing heterogeneous network resources by using reinforcement learning
CN110719641A (en) User unloading and resource allocation joint optimization method in edge computing
EP4383075A1 (en) Data processing method and apparatus
Chen et al. Joint data collection and resource allocation for distributed machine learning at the edge
Pang et al. Joint wireless source management and task offloading in ultra-dense network
Zhou et al. Multi-server federated edge learning for low power consumption wireless resource allocation based on user QoE
Tan et al. Resource allocation of fog radio access network based on deep reinforcement learning
CN106358300A (en) Distributed resource distribution method in microcellular network
CN114615730A (en) Content coverage oriented power distribution method for backhaul limited dense wireless network
Xu et al. Accelerating split federated learning over wireless communication networks
Li et al. Jointly optimizing helpers selection and resource allocation in D2D mobile edge computing
Yao et al. Data-driven resource allocation with traffic load prediction
Wang et al. Joint heterogeneous tasks offloading and resource allocation in mobile edge computing systems
Wen et al. Quality-and availability-based device scheduling and resource allocation for federated edge learning
Tan et al. Minimizing terminal energy consumption of task offloading via resource allocation in mobile edge computing
Sun [Retracted] Certificateless Batch Authentication Scheme and Intrusion Detection Model Based on the Mobile Edge Computing Technology NDN‐IoT Environment
Wei et al. Mobile Edge Computing Task Offloading Based on ADPSO Algorithm in Multi-user Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210707

Address after: No.333 Xishan Avenue, Xishan District, Wuxi City, Jiangsu Province

Applicant after: Binjiang College of Nanjing University of Information Engineering

Applicant after: ICTEHI TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: No.333 Xishan Avenue, Xishan District, Wuxi City, Jiangsu Province

Applicant before: Binjiang College of Nanjing University of Information Engineering

GR01 Patent grant
GR01 Patent grant