CN112867117B - Energy-saving method based on Q learning in NB-IoT - Google Patents

Energy-saving method based on Q learning in NB-IoT Download PDF

Info

Publication number
CN112867117B
CN112867117B CN202110074159.8A CN202110074159A CN112867117B CN 112867117 B CN112867117 B CN 112867117B CN 202110074159 A CN202110074159 A CN 202110074159A CN 112867117 B CN112867117 B CN 112867117B
Authority
CN
China
Prior art keywords
action
base station
devices
energy consumption
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110074159.8A
Other languages
Chinese (zh)
Other versions
CN112867117A (en
Inventor
裴二荣
王振民
朱冰冰
张茹
杨光财
荆玉琪
周礼能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile IoT Co Ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110074159.8A priority Critical patent/CN112867117B/en
Publication of CN112867117A publication Critical patent/CN112867117A/en
Application granted granted Critical
Publication of CN112867117B publication Critical patent/CN112867117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/02Power saving arrangements
    • H04W52/0203Power saving arrangements in the radio access network or backbone network of wireless communication networks
    • H04W52/0206Power saving arrangements in the radio access network or backbone network of wireless communication networks in access points, e.g. base stations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to an energy-saving method based on Q learning in NB-IoT, belonging to the technical field of communication. In the method, the base station can dynamically control the number of equipment initiating random access in each transmission time interval according to parameters such as network load, repetition times, transmission data resources and the like, and reduce the number of equipment colliding in the random access process, thereby reducing the total energy consumption of random access and achieving the purpose of energy conservation. In the method, a base station acts as an intelligent agent, the action of the intelligent agent is defined as the ratio of the number of devices which are allowed to initiate random access to the total number of active devices, and the state of the intelligent agent mainly comprises a series of observed information sets, such as the number of devices which are successfully communicated, energy consumption and the like. The invention can reduce the energy consumption of the system and prolong the service life of the equipment while ensuring the throughput of the equipment.

Description

Energy-saving method based on Q learning in NB-IoT
Technical Field
The invention belongs to the technical field of communication, and relates to an energy-saving method based on Q learning in NB-IoT.
Background
During the last decades, mankind has grown enormously with the beginning of the industrial revolution. In recent years, Low Power Internet of Wide Area Network (LPWAN) has attracted more and more attention due to rapid development of 5th-Generation (5G), Internet of Things (IoT) and mobile computing. Narrowband Band IoT (NB-IoT) based on cellular communication technology is a promising LPWAN technology. NB-IoT requires a minimum system bandwidth of 180kHz for downlink and uplink, respectively, and is a new 3rd Generation Partnership Project (3 GPP) radio access technology, which has the characteristics of low power consumption, narrow bandwidth, strong coverage, low cost and massive connection, so NB-IoT can be widely applied to the scenarios of poor channel transmission conditions (e.g., underground parking lot) or delay-tolerant devices (e.g., water meter).
The NB-IoT data communication mainly occurs in an Uplink Channel, and Uplink Channel resources mainly include a Narrowband Physical Random Access Channel (NPRACH) and a Narrowband Physical Uplink Shared Channel (NPUSCH). The NPRACH is mainly used for starting a random access procedure, and the NPUSCH is mainly responsible for data transmission from a device to a base station. The NB-IoT mainly aims at periodic devices (such as water meters, electric meters, and the like) with high tolerance time delay, and how to complete device communication with lower energy consumption under the condition of ensuring the throughput requirement of the devices is a considered problem.
At present, the existing energy-saving mechanisms lack a dynamic learning process, and the mechanisms mainly optimize the back-off time and the expansibility discontinuous receiving mechanism of each device, so that compromise between the energy consumption and the time delay of the devices is achieved. However, in an actual situation, there is a scenario where multiple devices transmit data simultaneously, and the devices are reasonably scheduled, so as to control the number of devices accessing in each Transmission Time Interval (TTI), and optimize total communication energy consumption on the premise of ensuring throughput. Therefore, a scheduling mechanism based on the Q learning algorithm is designed, so that the base station allows a proper amount of equipment to perform random access in each TTI according to parameters such as network load, the number of uplink resources, the size of data, the number of repeated times and the like, and on the premise of ensuring throughput, the energy consumption of the system is reduced and the service life of the equipment is prolonged.
Disclosure of Invention
In view of the above, the present invention is directed to a Q learning-based energy saving method in NB-IoT. The base station flexibly adjusts the number of the devices initiating the access request in each TTI according to the network load, the time delay, the number of uplink resources, the size of the lead code, the repetition frequency and other factors through a Q learning algorithm, saves the energy consumption of a system and prolongs the service life of the devices on the premise of ensuring the throughput. The method has the characteristics of simplicity and high efficiency, and meanwhile, has certain transportability.
In order to achieve the purpose, the invention provides the following technical scheme:
an energy-saving method based on Q learning in NB-IoT comprises the following steps:
s1: defining a state set and an action set of a base station;
s2: at the moment t is 0, initializing the state and behavior Q value of the base station to be 0;
s3: calculating an initial state s of a base stationtA state value of (d);
s4: selecting a behavior a according to an epsilon greedy strategyt(i);
S5: performing an action at(i) Then, the system obtains the environment reward value r according to the reward function formulatThen enters the next state st+1
S6: updating a behavior Q value function of the base station according to a formula;
s7: t ← t +1, go to step S2.
Further, in step S1, the set of states of the base station is represented as a series of previously observed information, St={Ut-1,Ut-2,Ut-3,L U1Therein of
Figure BDA0002906931050000021
Wherein,
Figure BDA0002906931050000022
which represents the energy consumption for random access,
Figure BDA0002906931050000023
indicating that the device is waiting for energy consumption,
Figure BDA0002906931050000024
which represents the energy consumption for data transmission,
Figure BDA0002906931050000025
indicating the number of devices waiting for the device,
Figure BDA0002906931050000026
which is indicative of the number of communication devices,
Figure BDA0002906931050000027
indicating the number of access failure devices.
Regarding the behavior set, the proportion of the number of devices which are allowed to initiate random access in each TTI to the total active devices in the current TTI is taken as the base station behavior, and the base station behavior alpha in any t-th TTI is defined according to the Markov process of the finite action sett∈{0.2,0.4,0.6,0.8,1.0}。
Further, in step S2, the state of the base station and the Q value are set to zero matrix. The solution objective of the base station Markov decision process is to find an optimal strategy pi*So that the value v(s) for each state s reaches a maximum at the same time. The state value function is represented as follows:
Figure BDA0002906931050000031
wherein r(s)t,at) Indicating the value of the reward, p(s), obtained by the base station from the environmentt+1|st,at) Indicating when the base station is in state stTemporal selection behavior atPost transition to state st+1The probability of (c).
Further, in step S4, the goal of the base station is to obtain a higher reward value, so in each state, the action with the higher Q value will be selected. However, in the initial stage of learning, the experience on the state-action is relatively small, and the Q value cannot accurately represent the correct optimum value. The action of the highest Q value results in the base station always following the same path and not being able to explore other better values, thus easily falling into local optima. Therefore, an epsilon greedy strategy is introduced, and the main principle is as follows:
Figure BDA0002906931050000032
the agent randomly selects an action with a probability of epsilon and selects an action that maximizes the Q value with a probability of 1-epsilon.
Further, in step S5, the base station performs the selecting action to obtain a reward value from the environment, and the reward value function is defined as:
Figure BDA0002906931050000033
Figure BDA0002906931050000034
representing the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total system energy consumption in the t-th TTI.
Wherein,
Figure BDA0002906931050000035
ntdenotes the number of access devices allowed in the current TTI, r denotes the number of repetitions, μ denotes the transmission data resource, Q denotes the total uplink resource, miIndicating the number of preambles.
Et=Esy,t+Era,t+Ewait,t+Edt,t
Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tIndicating the energy consumption for data transmission.
Further, in step S6, after acquiring the reward value from the environment, the base station needs to update the Q matrix, where the update formula is:
Figure BDA0002906931050000036
wherein α represents the learning rate and 0 < α < 1, γ represents the discount factor and 0 ≦ γ < 1. The learning rate and the discount factor cooperate to regulate the updating of the Q matrix, so that the learning performance of the Q algorithm is influenced, wherein the alpha value is 0.01, and the gamma value is 0.8.
The invention has the beneficial effects that: through the Q learning algorithm, the energy consumption of the system can be reduced under the condition of ensuring the throughput of the equipment.
Drawings
FIG. 1 is a diagram of a Q learning and environment interaction process model;
FIG. 2 is a diagram of the steps of an energy-saving algorithm based on Q learning;
fig. 3 is a flow diagram of NB-IoT uplink communication.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides an energy-saving method based on Q learning in NB-IoT aiming at the problem of energy consumption of NB-IoT system. Compared with the traditional optimization algorithm, the invention can dynamically optimize the transmission equipment based on the Q learning algorithm, and the base station can flexibly adjust the number of the access equipment according to the real-time situation of the network. The process is as shown in figure one, firstly, a base station executes a certain behavior based on an epsilon greedy strategy according to the current environment in a certain state; then, the observation environment obtains the reward value, the Q function value is updated according to the formula, the action of the next state is determined, and the actions are repeated until convergence.
The specific algorithm steps are shown in figure two, in the Q learning algorithm iteration process, a state set is defined as S, and if the decision time is t, S istE S, denotes the state of the base station at time t as St. At the same time, we define the limited set of actions that a base station may perform as a, ate.A represents the behavior of the base station at time t. Reward function r(s)t,at) Indicating the state s the base station is based ontPerforming an action atReward value later obtained from the environment, then from state stIs transferred to st+1At the next decision timet +1 pairs of QtThe function is updated. And repeating the steps until the iteration is finished.
The flow of NB-IoT in the uplink transmission process is shown in fig. three. The method comprises the steps that firstly, a base station sends a Narrowband Primary Synchronization Signal (NPSS) and a Narrowband Secondary Synchronization Signal (NSSS) to equipment to enable the time and frequency of the equipment and a cell to be synchronous, the process is a Synchronization process, then the base station sends access request information through an NPRACH, the base station receives the request information of the equipment and responds through a Narrowband Physical Downlink Control Channel (NPDCCH), then the base station establishes connection with the equipment, and after the connection is established, the base station sends a scheduling request and carries out data transmission through the NPUSCH.
The Q learning algorithm is actually a variation of the Markov Decision Process (MDP). In the energy-saving algorithm in NB-IoT, based on the Q-learning algorithm working principle, we represent the state set as follows:
St={Ut-1,Ut-2,Ut-3,L U1therein of
Figure BDA0002906931050000051
Wherein,
Figure BDA0002906931050000052
which represents the energy consumption for random access,
Figure BDA0002906931050000053
indicating that the device is waiting for energy consumption,
Figure BDA0002906931050000054
which represents the energy consumption for data transmission,
Figure BDA0002906931050000055
indicating the number of devices waiting for the device,
Figure BDA0002906931050000056
which is indicative of the number of communication devices,
Figure BDA0002906931050000057
indicating the number of access failure devices.
And taking the ratio of the number of the devices allowed to initiate random access in each TTI to the total active devices in the current TTI as the base station behavior, and setting the behavior set A of the base station to be { a (1), a (2), L, a (k) }. Defining arbitrary t TTI base station behaviors a according to a Markov process with a finite set of actionst∈{0.2,0.4,0.6,0.8,1.0}。
The base station is faced with the task of deciding an optimal strategy to maximize the reward obtained. The base station will make the best decision on the next state/action based on the current state and environment. The discount cumulative prize value function for state st may be expressed as:
Figure BDA0002906931050000058
wherein r(s)t,at) Indicating that the base station is in state stSelection action atThe instant prize earned. Y represents the discount factor and 0 ≦ y < 1, with a discount factor trending to 0 indicating that the base station primarily considers instant rewards. p(s)t+1|st,at) Indicates a base station selection action atTemporal slave state stIs transferred to st+1The probability of (c). The objective of MDP solution is to find an optimal strategy pi*So that the value v(s) for each state s reaches a maximum at the same time. According to the Bellman principle, we can obtain at least one optimal strategy when the total discount expectation reward of the base station is maximum*Such that:
Figure BDA0002906931050000059
wherein V*(st) Indicating the slave state s of the base stationtStart and follow the optimal strategy pi*The maximum discount accumulated prize value earned. For a given strategyRoughly π, is a function that maps the state space to the action space, i.e.: pi: st→at. The optimal strategy can thus be expressed in the form:
π*(st)=argV*(st)
the base station aims to obtain a higher reward value and therefore, in each state, the action with the higher Q value will be selected. However, in the initial stage of learning, the experience on the state-action is relatively small, and the Q value cannot accurately represent the correct optimum value. The action of the highest Q value results in the base station always following the same path and not being able to explore other better values, thus easily falling into local optima. Therefore, to overcome this drawback, the base station must randomly select actions, and therefore, an epsilon greedy strategy is introduced, thereby reducing the possibility of the base station action selection strategy falling into a locally optimal solution.
Figure BDA0002906931050000061
The agent randomly selects an action with a probability of epsilon and selects an action that maximizes the Q value with a probability of 1-epsilon.
Further, in step S5, the base station performs the selecting action to obtain a reward value from the environment, and the reward value function is defined as:
Figure BDA0002906931050000062
Figure BDA0002906931050000063
representing the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total system energy consumption in the t-th TTI.
Wherein,
Figure BDA0002906931050000064
ntindicating that the current TTI allows accessNumber of devices, r denotes the number of repetitions, μ denotes transmission data resource, Q denotes total uplink resource, miIndicating the number of preambles.
Et=Esy,t+Era,t+Ewait,t+Edt,t
Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tIndicating the energy consumption for data transmission.
In the Q learning algorithm, based on the policy pi, the base station recursively calculates the Q value function at each TTI as follows:
Figure BDA0002906931050000065
it is clear that the Q value indicates when the base station is in state stPerforming action a in time following policy πtThe desired discount reward is earned. Therefore, our goal is to evaluate the optimal strategy π*The Q value below. From the above equation, the relationship between the state value function and the behavior value function can be derived as follows:
Figure BDA0002906931050000066
however, based on the non-deterministic environment, the above Q-value function is only true under the optimal strategy, i.e., the value of the Q-value function is changed (or not converged) by Q-learning under the non-optimal strategy. Therefore, the formula for the modified Q function is as follows:
Figure BDA0002906931050000071
where α represents the learning rate and 0 < α < 1, the greater the learning rate, indicating less effectiveness in retaining the previous training. If each state-action pair can be repeated multiple times, the learning rate will drop according to the appropriate scheme, and the Q-learning algorithm can converge to the optimal strategy for any finite MDP. Y represents the discount rate and 0 < y < 1, y represents the degree of importance to future rewards. Higher y values may capture the long-term effective reward, while lower y values make the smart more concerned about the instant reward. The updating of the Q matrix is adjusted by the cooperative action of the learning rate and the discount factor, so that the learning performance of the Q learning algorithm is influenced, wherein the alpha value is 0.01, and the gamma value is 0.8.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (1)

  1. A method for Q-learning based energy saving in NB-IoT, the method comprising the steps of:
    s1: defining a set of states and a set of actions for a base station, a set of states being defined as a series of previously observed information, i.e. St={Ut-1,Ut-2,Ut-3,L U1Therein of
    Figure FDA0003536062420000011
    Figure FDA0003536062420000012
    Which represents the energy consumption for random access,
    Figure FDA0003536062420000013
    indicating that the device is waiting for energy consumption,
    Figure FDA0003536062420000014
    which represents the energy consumption for data transmission,
    Figure FDA0003536062420000015
    indicating the number of devices waiting for the device,
    Figure FDA0003536062420000016
    which is indicative of the number of communication devices,
    Figure FDA0003536062420000017
    representing the number of access failure devices, and defining an action set as the proportion of the number of devices which are allowed to initiate random access in each TTI to the total active devices in the current TTI;
    s2: setting the state and behavior Q value of the base station as a zero matrix at the moment when t is 0;
    s3: selecting an action a according to an epsilon greedy methodt(i) The method comprises the following steps In the initial stage of learning, the experience on state-action is less, the Q value cannot accurately represent the correct optimal value, the action with the highest Q value causes the base station to always follow the same path and cannot search other better values, so that the base station is easy to fall into local optimization, an epsilon greedy strategy is introduced, an intelligent body randomly selects action according to the probability of epsilon, and selects the action which enables the Q value to be maximum according to the probability of 1-epsilon, namely the action
    Figure FDA0003536062420000018
    S4: performing an action at(i) Then, the system obtains the environment reward value R according to the formulatThen enters the next state st+1: the reward value function is defined as:
    Figure FDA0003536062420000019
    wherein
    Figure FDA00035360624200000110
    Representing the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total energy consumption of the system in the t-th TTI,
    Figure FDA00035360624200000111
    ntrepresenting the number of access devices allowed in the current TTI, r representing the number of repetitions, m representing the transmission data resource, Q representing the total uplink resource, miIndicates the number of preambles, Et=Esy,t+Era,t+Ewait,t+Edt,t,Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tRepresenting data transmission energy consumption;
    s5: and updating a behavior Q value function of the base station according to a formula: the Q matrix update formula is:
    Figure FDA00035360624200000112
    wherein r(s)t,at) For agent in state stWhile performing action atThe obtained reward value, alpha represents the learning rate and 0<α<Y 1, y represents discount factor and 0 ≦ y<1, adjusting the updating of a Q matrix under the synergistic action of a learning rate and a discount factor so as to influence the learning performance of a Q algorithm, wherein alpha is 0.01, and gamma is 0.8;
    s6: t ← t +1, go to step S2.
CN202110074159.8A 2021-01-20 2021-01-20 Energy-saving method based on Q learning in NB-IoT Active CN112867117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110074159.8A CN112867117B (en) 2021-01-20 2021-01-20 Energy-saving method based on Q learning in NB-IoT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110074159.8A CN112867117B (en) 2021-01-20 2021-01-20 Energy-saving method based on Q learning in NB-IoT

Publications (2)

Publication Number Publication Date
CN112867117A CN112867117A (en) 2021-05-28
CN112867117B true CN112867117B (en) 2022-04-12

Family

ID=76007591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110074159.8A Active CN112867117B (en) 2021-01-20 2021-01-20 Energy-saving method based on Q learning in NB-IoT

Country Status (1)

Country Link
CN (1) CN112867117B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114567920B (en) * 2022-02-23 2023-05-23 重庆邮电大学 Mixed discontinuous receiving method for policy optimization MTC (machine type communication) equipment
CN114727423A (en) * 2022-04-02 2022-07-08 北京邮电大学 Personalized access method in GF-NOMA system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method
CN110856234A (en) * 2019-11-20 2020-02-28 廊坊新奥燃气设备有限公司 Energy-saving method and system for NB-IoT meter based on PSM access mode
CN111970703A (en) * 2020-06-24 2020-11-20 重庆邮电大学 Method for optimizing uplink communication resources in NB-IoT (NB-IoT)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017149480A1 (en) * 2016-03-01 2017-09-08 Telefonaktiebolaget Lm Ericsson (Publ) Energy efficient operation of radio network nodes and wireless communication devices in nb-iot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method
CN110856234A (en) * 2019-11-20 2020-02-28 廊坊新奥燃气设备有限公司 Energy-saving method and system for NB-IoT meter based on PSM access mode
CN111970703A (en) * 2020-06-24 2020-11-20 重庆邮电大学 Method for optimizing uplink communication resources in NB-IoT (NB-IoT)

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Introduction of NB-IoT";Huawei;《3GPP TSG-RAN WG2 NB-IOT Ad-hoc#2 R2-163218》;20160429;全文 *
Energy-efficient joint power control and resource allocation for cluster-based NB-IoT cellular networks;Zhu shuqiong, Wu Wenquan, Feng Lei, et al.;《Transactions on Emerging Telecommunications Technologies》;20171227;全文 *
异形磁电复合材料增强磁电效应的理论和实验研究;张茹;《中国博士学位论文电子期刊网》;20190115;全文 *
认知无线电网络中的资源优化分配的研究;裴二荣;《中国博士学位论文电子期刊网》;20121215;全文 *

Also Published As

Publication number Publication date
CN112867117A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112867117B (en) Energy-saving method based on Q learning in NB-IoT
Zhao et al. A reinforcement learning method for joint mode selection and power adaptation in the V2V communication network in 5G
US12035380B2 (en) Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
CN109462839B (en) DRX mechanism communication method based on self-adaptive adjustment strategy
CN113490184B (en) Random access resource optimization method and device for intelligent factory
Chen et al. Heterogeneous machine-type communications in cellular networks: Random access optimization by deep reinforcement learning
CN107820309B (en) Wake-up strategy and time slot optimization algorithm for low-power-consumption communication equipment
CN109890085B (en) Method for determining random access back-off parameters of priority-classified machine type communication
CN107094281B (en) Access method and system for M2M equipment to access base station
CN110602798A (en) Distributed determination method for optimal parameters of LTE network machine communication random access
Zhao et al. Deep reinforcement learning aided intelligent access control in energy harvesting based WLAN
Jiang et al. Q-learning based task offloading and resource allocation scheme for internet of vehicles
Wei et al. Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning
CN115766089B (en) Anti-interference optimal transmission method for energy acquisition cognitive Internet of things network
CN105142208B (en) It is embedded in the power and slot allocation method of high energy efficiency in the cellular network of M2M
Miao et al. A DDQN-based Energy-Efficient Resource Allocation Scheme for Low-Latency V2V communication
Wang et al. Deep reinforcement learning based joint partial computation offloading and resource allocation in mobility-aware MEC system
Mazandarani et al. Self-sustaining multiple access with continual deep reinforcement learning for dynamic metaverse applications
Li et al. A Lightweight Transmission Parameter Selection Scheme Using Reinforcement Learning for LoRaWAN
Wu et al. Computation rate maximization in multi-user cooperation-assisted wireless-powered mobile edge computing with OFDMA
Song et al. Deep Reinforcement Learning Enabled Energy-Efficient Resource Allocation in Energy Harvesting Aided V2X Communication
Gu et al. Deep reinforcement learning-guided task reverse offloading in vehicular edge computing
Zhao et al. Deep Reinforcement Learning for the Joint AoI and Throughput Optimization of the Random Access System
Li Deep reinforcement learning based resource allocation for LoRaWAN
Xu et al. Energy efficiency and delay determinacy tradeoff in energy harvesting-powered zero-touch deterministic industrial M2M communications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230324

Address after: 401336 Yuen Road, Nanan District, Chongqing City, No. 8

Patentee after: CHINA MOBILE IOT Co.,Ltd.

Address before: 400065 No. 2, Chongwen Road, Nan'an District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS