CN112867117B - Energy-saving method based on Q learning in NB-IoT - Google Patents
Energy-saving method based on Q learning in NB-IoT Download PDFInfo
- Publication number
- CN112867117B CN112867117B CN202110074159.8A CN202110074159A CN112867117B CN 112867117 B CN112867117 B CN 112867117B CN 202110074159 A CN202110074159 A CN 202110074159A CN 112867117 B CN112867117 B CN 112867117B
- Authority
- CN
- China
- Prior art keywords
- action
- base station
- devices
- energy consumption
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009471 action Effects 0.000 claims abstract description 35
- 238000005265 energy consumption Methods 0.000 claims abstract description 31
- 230000005540 biological transmission Effects 0.000 claims abstract description 20
- 238000004891 communication Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 18
- 230000006399 behavior Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 7
- 208000037918 transfusion-transmitted disease Diseases 0.000 claims description 7
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 230000007786 learning performance Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 230000002195 synergetic effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 11
- 230000000977 initiatory effect Effects 0.000 abstract description 2
- 238000004134 energy conservation Methods 0.000 abstract 1
- 230000007246 mechanism Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000002079 cooperative effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- GVVPGTZRZFNKDS-JXMROGBWSA-N geranyl diphosphate Chemical compound CC(C)=CCC\C(C)=C\CO[P@](O)(=O)OP(O)(O)=O GVVPGTZRZFNKDS-JXMROGBWSA-N 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0203—Power saving arrangements in the radio access network or backbone network of wireless communication networks
- H04W52/0206—Power saving arrangements in the radio access network or backbone network of wireless communication networks in access points, e.g. base stations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/06—Testing, supervising or monitoring using simulated traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W74/00—Wireless channel access
- H04W74/08—Non-scheduled access, e.g. ALOHA
- H04W74/0833—Random access procedures, e.g. with 4-step access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to an energy-saving method based on Q learning in NB-IoT, belonging to the technical field of communication. In the method, the base station can dynamically control the number of equipment initiating random access in each transmission time interval according to parameters such as network load, repetition times, transmission data resources and the like, and reduce the number of equipment colliding in the random access process, thereby reducing the total energy consumption of random access and achieving the purpose of energy conservation. In the method, a base station acts as an intelligent agent, the action of the intelligent agent is defined as the ratio of the number of devices which are allowed to initiate random access to the total number of active devices, and the state of the intelligent agent mainly comprises a series of observed information sets, such as the number of devices which are successfully communicated, energy consumption and the like. The invention can reduce the energy consumption of the system and prolong the service life of the equipment while ensuring the throughput of the equipment.
Description
Technical Field
The invention belongs to the technical field of communication, and relates to an energy-saving method based on Q learning in NB-IoT.
Background
During the last decades, mankind has grown enormously with the beginning of the industrial revolution. In recent years, Low Power Internet of Wide Area Network (LPWAN) has attracted more and more attention due to rapid development of 5th-Generation (5G), Internet of Things (IoT) and mobile computing. Narrowband Band IoT (NB-IoT) based on cellular communication technology is a promising LPWAN technology. NB-IoT requires a minimum system bandwidth of 180kHz for downlink and uplink, respectively, and is a new 3rd Generation Partnership Project (3 GPP) radio access technology, which has the characteristics of low power consumption, narrow bandwidth, strong coverage, low cost and massive connection, so NB-IoT can be widely applied to the scenarios of poor channel transmission conditions (e.g., underground parking lot) or delay-tolerant devices (e.g., water meter).
The NB-IoT data communication mainly occurs in an Uplink Channel, and Uplink Channel resources mainly include a Narrowband Physical Random Access Channel (NPRACH) and a Narrowband Physical Uplink Shared Channel (NPUSCH). The NPRACH is mainly used for starting a random access procedure, and the NPUSCH is mainly responsible for data transmission from a device to a base station. The NB-IoT mainly aims at periodic devices (such as water meters, electric meters, and the like) with high tolerance time delay, and how to complete device communication with lower energy consumption under the condition of ensuring the throughput requirement of the devices is a considered problem.
At present, the existing energy-saving mechanisms lack a dynamic learning process, and the mechanisms mainly optimize the back-off time and the expansibility discontinuous receiving mechanism of each device, so that compromise between the energy consumption and the time delay of the devices is achieved. However, in an actual situation, there is a scenario where multiple devices transmit data simultaneously, and the devices are reasonably scheduled, so as to control the number of devices accessing in each Transmission Time Interval (TTI), and optimize total communication energy consumption on the premise of ensuring throughput. Therefore, a scheduling mechanism based on the Q learning algorithm is designed, so that the base station allows a proper amount of equipment to perform random access in each TTI according to parameters such as network load, the number of uplink resources, the size of data, the number of repeated times and the like, and on the premise of ensuring throughput, the energy consumption of the system is reduced and the service life of the equipment is prolonged.
Disclosure of Invention
In view of the above, the present invention is directed to a Q learning-based energy saving method in NB-IoT. The base station flexibly adjusts the number of the devices initiating the access request in each TTI according to the network load, the time delay, the number of uplink resources, the size of the lead code, the repetition frequency and other factors through a Q learning algorithm, saves the energy consumption of a system and prolongs the service life of the devices on the premise of ensuring the throughput. The method has the characteristics of simplicity and high efficiency, and meanwhile, has certain transportability.
In order to achieve the purpose, the invention provides the following technical scheme:
an energy-saving method based on Q learning in NB-IoT comprises the following steps:
s1: defining a state set and an action set of a base station;
s2: at the moment t is 0, initializing the state and behavior Q value of the base station to be 0;
s3: calculating an initial state s of a base stationtA state value of (d);
s4: selecting a behavior a according to an epsilon greedy strategyt(i);
S5: performing an action at(i) Then, the system obtains the environment reward value r according to the reward function formulatThen enters the next state st+1;
S6: updating a behavior Q value function of the base station according to a formula;
s7: t ← t +1, go to step S2.
Further, in step S1, the set of states of the base station is represented as a series of previously observed information, St={Ut-1,Ut-2,Ut-3,L U1Therein of
Wherein,which represents the energy consumption for random access,indicating that the device is waiting for energy consumption,which represents the energy consumption for data transmission,indicating the number of devices waiting for the device,which is indicative of the number of communication devices,indicating the number of access failure devices.
Regarding the behavior set, the proportion of the number of devices which are allowed to initiate random access in each TTI to the total active devices in the current TTI is taken as the base station behavior, and the base station behavior alpha in any t-th TTI is defined according to the Markov process of the finite action sett∈{0.2,0.4,0.6,0.8,1.0}。
Further, in step S2, the state of the base station and the Q value are set to zero matrix. The solution objective of the base station Markov decision process is to find an optimal strategy pi*So that the value v(s) for each state s reaches a maximum at the same time. The state value function is represented as follows:
wherein r(s)t,at) Indicating the value of the reward, p(s), obtained by the base station from the environmentt+1|st,at) Indicating when the base station is in state stTemporal selection behavior atPost transition to state st+1The probability of (c).
Further, in step S4, the goal of the base station is to obtain a higher reward value, so in each state, the action with the higher Q value will be selected. However, in the initial stage of learning, the experience on the state-action is relatively small, and the Q value cannot accurately represent the correct optimum value. The action of the highest Q value results in the base station always following the same path and not being able to explore other better values, thus easily falling into local optima. Therefore, an epsilon greedy strategy is introduced, and the main principle is as follows:
the agent randomly selects an action with a probability of epsilon and selects an action that maximizes the Q value with a probability of 1-epsilon.
Further, in step S5, the base station performs the selecting action to obtain a reward value from the environment, and the reward value function is defined as:
representing the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total system energy consumption in the t-th TTI.
ntdenotes the number of access devices allowed in the current TTI, r denotes the number of repetitions, μ denotes the transmission data resource, Q denotes the total uplink resource, miIndicating the number of preambles.
Et=Esy,t+Era,t+Ewait,t+Edt,t
Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tIndicating the energy consumption for data transmission.
Further, in step S6, after acquiring the reward value from the environment, the base station needs to update the Q matrix, where the update formula is:
wherein α represents the learning rate and 0 < α < 1, γ represents the discount factor and 0 ≦ γ < 1. The learning rate and the discount factor cooperate to regulate the updating of the Q matrix, so that the learning performance of the Q algorithm is influenced, wherein the alpha value is 0.01, and the gamma value is 0.8.
The invention has the beneficial effects that: through the Q learning algorithm, the energy consumption of the system can be reduced under the condition of ensuring the throughput of the equipment.
Drawings
FIG. 1 is a diagram of a Q learning and environment interaction process model;
FIG. 2 is a diagram of the steps of an energy-saving algorithm based on Q learning;
fig. 3 is a flow diagram of NB-IoT uplink communication.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides an energy-saving method based on Q learning in NB-IoT aiming at the problem of energy consumption of NB-IoT system. Compared with the traditional optimization algorithm, the invention can dynamically optimize the transmission equipment based on the Q learning algorithm, and the base station can flexibly adjust the number of the access equipment according to the real-time situation of the network. The process is as shown in figure one, firstly, a base station executes a certain behavior based on an epsilon greedy strategy according to the current environment in a certain state; then, the observation environment obtains the reward value, the Q function value is updated according to the formula, the action of the next state is determined, and the actions are repeated until convergence.
The specific algorithm steps are shown in figure two, in the Q learning algorithm iteration process, a state set is defined as S, and if the decision time is t, S istE S, denotes the state of the base station at time t as St. At the same time, we define the limited set of actions that a base station may perform as a, ate.A represents the behavior of the base station at time t. Reward function r(s)t,at) Indicating the state s the base station is based ontPerforming an action atReward value later obtained from the environment, then from state stIs transferred to st+1At the next decision timet +1 pairs of QtThe function is updated. And repeating the steps until the iteration is finished.
The flow of NB-IoT in the uplink transmission process is shown in fig. three. The method comprises the steps that firstly, a base station sends a Narrowband Primary Synchronization Signal (NPSS) and a Narrowband Secondary Synchronization Signal (NSSS) to equipment to enable the time and frequency of the equipment and a cell to be synchronous, the process is a Synchronization process, then the base station sends access request information through an NPRACH, the base station receives the request information of the equipment and responds through a Narrowband Physical Downlink Control Channel (NPDCCH), then the base station establishes connection with the equipment, and after the connection is established, the base station sends a scheduling request and carries out data transmission through the NPUSCH.
The Q learning algorithm is actually a variation of the Markov Decision Process (MDP). In the energy-saving algorithm in NB-IoT, based on the Q-learning algorithm working principle, we represent the state set as follows:
Wherein,which represents the energy consumption for random access,indicating that the device is waiting for energy consumption,which represents the energy consumption for data transmission,indicating the number of devices waiting for the device,which is indicative of the number of communication devices,indicating the number of access failure devices.
And taking the ratio of the number of the devices allowed to initiate random access in each TTI to the total active devices in the current TTI as the base station behavior, and setting the behavior set A of the base station to be { a (1), a (2), L, a (k) }. Defining arbitrary t TTI base station behaviors a according to a Markov process with a finite set of actionst∈{0.2,0.4,0.6,0.8,1.0}。
The base station is faced with the task of deciding an optimal strategy to maximize the reward obtained. The base station will make the best decision on the next state/action based on the current state and environment. The discount cumulative prize value function for state st may be expressed as:
wherein r(s)t,at) Indicating that the base station is in state stSelection action atThe instant prize earned. Y represents the discount factor and 0 ≦ y < 1, with a discount factor trending to 0 indicating that the base station primarily considers instant rewards. p(s)t+1|st,at) Indicates a base station selection action atTemporal slave state stIs transferred to st+1The probability of (c). The objective of MDP solution is to find an optimal strategy pi*So that the value v(s) for each state s reaches a maximum at the same time. According to the Bellman principle, we can obtain at least one optimal strategy when the total discount expectation reward of the base station is maximum*Such that:
wherein V*(st) Indicating the slave state s of the base stationtStart and follow the optimal strategy pi*The maximum discount accumulated prize value earned. For a given strategyRoughly π, is a function that maps the state space to the action space, i.e.: pi: st→at. The optimal strategy can thus be expressed in the form:
π*(st)=argV*(st)
the base station aims to obtain a higher reward value and therefore, in each state, the action with the higher Q value will be selected. However, in the initial stage of learning, the experience on the state-action is relatively small, and the Q value cannot accurately represent the correct optimum value. The action of the highest Q value results in the base station always following the same path and not being able to explore other better values, thus easily falling into local optima. Therefore, to overcome this drawback, the base station must randomly select actions, and therefore, an epsilon greedy strategy is introduced, thereby reducing the possibility of the base station action selection strategy falling into a locally optimal solution.
The agent randomly selects an action with a probability of epsilon and selects an action that maximizes the Q value with a probability of 1-epsilon.
Further, in step S5, the base station performs the selecting action to obtain a reward value from the environment, and the reward value function is defined as:
representing the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total system energy consumption in the t-th TTI.
ntindicating that the current TTI allows accessNumber of devices, r denotes the number of repetitions, μ denotes transmission data resource, Q denotes total uplink resource, miIndicating the number of preambles.
Et=Esy,t+Era,t+Ewait,t+Edt,t。
Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tIndicating the energy consumption for data transmission.
In the Q learning algorithm, based on the policy pi, the base station recursively calculates the Q value function at each TTI as follows:
it is clear that the Q value indicates when the base station is in state stPerforming action a in time following policy πtThe desired discount reward is earned. Therefore, our goal is to evaluate the optimal strategy π*The Q value below. From the above equation, the relationship between the state value function and the behavior value function can be derived as follows:
however, based on the non-deterministic environment, the above Q-value function is only true under the optimal strategy, i.e., the value of the Q-value function is changed (or not converged) by Q-learning under the non-optimal strategy. Therefore, the formula for the modified Q function is as follows:
where α represents the learning rate and 0 < α < 1, the greater the learning rate, indicating less effectiveness in retaining the previous training. If each state-action pair can be repeated multiple times, the learning rate will drop according to the appropriate scheme, and the Q-learning algorithm can converge to the optimal strategy for any finite MDP. Y represents the discount rate and 0 < y < 1, y represents the degree of importance to future rewards. Higher y values may capture the long-term effective reward, while lower y values make the smart more concerned about the instant reward. The updating of the Q matrix is adjusted by the cooperative action of the learning rate and the discount factor, so that the learning performance of the Q learning algorithm is influenced, wherein the alpha value is 0.01, and the gamma value is 0.8.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.
Claims (1)
- A method for Q-learning based energy saving in NB-IoT, the method comprising the steps of:s1: defining a set of states and a set of actions for a base station, a set of states being defined as a series of previously observed information, i.e. St={Ut-1,Ut-2,Ut-3,L U1Therein of Which represents the energy consumption for random access,indicating that the device is waiting for energy consumption,which represents the energy consumption for data transmission,indicating the number of devices waiting for the device,which is indicative of the number of communication devices,representing the number of access failure devices, and defining an action set as the proportion of the number of devices which are allowed to initiate random access in each TTI to the total active devices in the current TTI;s2: setting the state and behavior Q value of the base station as a zero matrix at the moment when t is 0;s3: selecting an action a according to an epsilon greedy methodt(i) The method comprises the following steps In the initial stage of learning, the experience on state-action is less, the Q value cannot accurately represent the correct optimal value, the action with the highest Q value causes the base station to always follow the same path and cannot search other better values, so that the base station is easy to fall into local optimization, an epsilon greedy strategy is introduced, an intelligent body randomly selects action according to the probability of epsilon, and selects the action which enables the Q value to be maximum according to the probability of 1-epsilon, namely the actionS4: performing an action at(i) Then, the system obtains the environment reward value R according to the formulatThen enters the next state st+1: the reward value function is defined as:whereinRepresenting the number of serving devices, N representing the total number of transmission devices, T representing the number of TTIs, EtIndicating the total energy consumption of the system in the t-th TTI,ntrepresenting the number of access devices allowed in the current TTI, r representing the number of repetitions, m representing the transmission data resource, Q representing the total uplink resource, miIndicates the number of preambles, Et=Esy,t+Era,t+Ewait,t+Edt,t,Esy,tIndicating synchronous energy consumption, Era,tIndicating random access power consumption, Ewait,tIndicating waiting energy consumption of the apparatus, Edt,tRepresenting data transmission energy consumption;s5: and updating a behavior Q value function of the base station according to a formula: the Q matrix update formula is:wherein r(s)t,at) For agent in state stWhile performing action atThe obtained reward value, alpha represents the learning rate and 0<α<Y 1, y represents discount factor and 0 ≦ y<1, adjusting the updating of a Q matrix under the synergistic action of a learning rate and a discount factor so as to influence the learning performance of a Q algorithm, wherein alpha is 0.01, and gamma is 0.8;s6: t ← t +1, go to step S2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110074159.8A CN112867117B (en) | 2021-01-20 | 2021-01-20 | Energy-saving method based on Q learning in NB-IoT |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110074159.8A CN112867117B (en) | 2021-01-20 | 2021-01-20 | Energy-saving method based on Q learning in NB-IoT |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112867117A CN112867117A (en) | 2021-05-28 |
CN112867117B true CN112867117B (en) | 2022-04-12 |
Family
ID=76007591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110074159.8A Active CN112867117B (en) | 2021-01-20 | 2021-01-20 | Energy-saving method based on Q learning in NB-IoT |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112867117B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114567920B (en) * | 2022-02-23 | 2023-05-23 | 重庆邮电大学 | Mixed discontinuous receiving method for policy optimization MTC (machine type communication) equipment |
CN114727423A (en) * | 2022-04-02 | 2022-07-08 | 北京邮电大学 | Personalized access method in GF-NOMA system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110809274A (en) * | 2019-10-28 | 2020-02-18 | 南京邮电大学 | Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method |
CN110856234A (en) * | 2019-11-20 | 2020-02-28 | 廊坊新奥燃气设备有限公司 | Energy-saving method and system for NB-IoT meter based on PSM access mode |
CN111970703A (en) * | 2020-06-24 | 2020-11-20 | 重庆邮电大学 | Method for optimizing uplink communication resources in NB-IoT (NB-IoT) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017149480A1 (en) * | 2016-03-01 | 2017-09-08 | Telefonaktiebolaget Lm Ericsson (Publ) | Energy efficient operation of radio network nodes and wireless communication devices in nb-iot |
-
2021
- 2021-01-20 CN CN202110074159.8A patent/CN112867117B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110809274A (en) * | 2019-10-28 | 2020-02-18 | 南京邮电大学 | Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method |
CN110856234A (en) * | 2019-11-20 | 2020-02-28 | 廊坊新奥燃气设备有限公司 | Energy-saving method and system for NB-IoT meter based on PSM access mode |
CN111970703A (en) * | 2020-06-24 | 2020-11-20 | 重庆邮电大学 | Method for optimizing uplink communication resources in NB-IoT (NB-IoT) |
Non-Patent Citations (4)
Title |
---|
"Introduction of NB-IoT";Huawei;《3GPP TSG-RAN WG2 NB-IOT Ad-hoc#2 R2-163218》;20160429;全文 * |
Energy-efficient joint power control and resource allocation for cluster-based NB-IoT cellular networks;Zhu shuqiong, Wu Wenquan, Feng Lei, et al.;《Transactions on Emerging Telecommunications Technologies》;20171227;全文 * |
异形磁电复合材料增强磁电效应的理论和实验研究;张茹;《中国博士学位论文电子期刊网》;20190115;全文 * |
认知无线电网络中的资源优化分配的研究;裴二荣;《中国博士学位论文电子期刊网》;20121215;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112867117A (en) | 2021-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112867117B (en) | Energy-saving method based on Q learning in NB-IoT | |
Zhao et al. | A reinforcement learning method for joint mode selection and power adaptation in the V2V communication network in 5G | |
US12035380B2 (en) | Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning | |
CN109462839B (en) | DRX mechanism communication method based on self-adaptive adjustment strategy | |
CN113490184B (en) | Random access resource optimization method and device for intelligent factory | |
Chen et al. | Heterogeneous machine-type communications in cellular networks: Random access optimization by deep reinforcement learning | |
CN107820309B (en) | Wake-up strategy and time slot optimization algorithm for low-power-consumption communication equipment | |
CN109890085B (en) | Method for determining random access back-off parameters of priority-classified machine type communication | |
CN107094281B (en) | Access method and system for M2M equipment to access base station | |
CN110602798A (en) | Distributed determination method for optimal parameters of LTE network machine communication random access | |
Zhao et al. | Deep reinforcement learning aided intelligent access control in energy harvesting based WLAN | |
Jiang et al. | Q-learning based task offloading and resource allocation scheme for internet of vehicles | |
Wei et al. | Power allocation in HetNets with hybrid energy supply using actor-critic reinforcement learning | |
CN115766089B (en) | Anti-interference optimal transmission method for energy acquisition cognitive Internet of things network | |
CN105142208B (en) | It is embedded in the power and slot allocation method of high energy efficiency in the cellular network of M2M | |
Miao et al. | A DDQN-based Energy-Efficient Resource Allocation Scheme for Low-Latency V2V communication | |
Wang et al. | Deep reinforcement learning based joint partial computation offloading and resource allocation in mobility-aware MEC system | |
Mazandarani et al. | Self-sustaining multiple access with continual deep reinforcement learning for dynamic metaverse applications | |
Li et al. | A Lightweight Transmission Parameter Selection Scheme Using Reinforcement Learning for LoRaWAN | |
Wu et al. | Computation rate maximization in multi-user cooperation-assisted wireless-powered mobile edge computing with OFDMA | |
Song et al. | Deep Reinforcement Learning Enabled Energy-Efficient Resource Allocation in Energy Harvesting Aided V2X Communication | |
Gu et al. | Deep reinforcement learning-guided task reverse offloading in vehicular edge computing | |
Zhao et al. | Deep Reinforcement Learning for the Joint AoI and Throughput Optimization of the Random Access System | |
Li | Deep reinforcement learning based resource allocation for LoRaWAN | |
Xu et al. | Energy efficiency and delay determinacy tradeoff in energy harvesting-powered zero-touch deterministic industrial M2M communications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230324 Address after: 401336 Yuen Road, Nanan District, Chongqing City, No. 8 Patentee after: CHINA MOBILE IOT Co.,Ltd. Address before: 400065 No. 2, Chongwen Road, Nan'an District, Chongqing Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS |