CN114980353A - Ordered competition large-scale access learning method for machine type communication system - Google Patents

Ordered competition large-scale access learning method for machine type communication system Download PDF

Info

Publication number
CN114980353A
CN114980353A CN202210472683.5A CN202210472683A CN114980353A CN 114980353 A CN114980353 A CN 114980353A CN 202210472683 A CN202210472683 A CN 202210472683A CN 114980353 A CN114980353 A CN 114980353A
Authority
CN
China
Prior art keywords
preamble
devices
state
equipment
random access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210472683.5A
Other languages
Chinese (zh)
Inventor
孙君
郭兴康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210472683.5A priority Critical patent/CN114980353A/en
Publication of CN114980353A publication Critical patent/CN114980353A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/70Services for machine-to-machine communication [M2M] or machine type communication [MTC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/002Transmission of channel access control information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/002Transmission of channel access control information
    • H04W74/006Transmission of channel access control information in the downlink, i.e. towards the terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/02Hybrid access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a sequential competition large-scale access learning method for a machine type communication system, which comprises the following steps: the newly accessed equipment adopts a multi-agent reinforcement learning algorithm to cooperatively select a channel meeting the self requirement; the equipment sends a lead code on a physical random access channel, the base station sends a response after receiving the request, the response comprises a specific number of the equipment under the selected lead code, the specific number is determined according to the self priority of the equipment, the equipment with the minimum number sends data on the physical random access channel, the other numbered equipment are in a waiting state, and the number of the equipment in waiting is automatically reduced at each unit time. The invention weakens the randomness in the competition process, newly accessed equipment can cooperate to select a proper lead code through a multi-agent reinforcement learning algorithm and is divided according to the priority, and meanwhile, the lowest time delay requirement of unsuccessfully accessed equipment is ensured.

Description

Ordered competition large-scale access learning method for machine type communication system
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to an orderly competitive large-scale access learning method in Internet of things large-scale Machine type communication M2M (Machine-to-Machine).
Background
With the rapid development of communication technology, communication services have gradually developed from traditional human-to-human communication to object-to-object communication, which is called Internet of Things (IoT), and it is expected that by 2023, devices that implement M2M (Machine-to-Machine) communication globally will be over 350 billion, which brings serious challenges to existing cellular networks, and thus fifth generation mobile communication (5G) technology becomes the focus of mtc development, where mtc (passive Machine Type communication) is one of three 5G application scenarios. In order to meet the mass access requirement of the machine type equipment, further control optimization needs to be performed on the basis of the current congestion control mechanism. Meanwhile, in consideration of the difference of Quality of Service (QoS) of Machine Type Communication Devices (MTCDs) of different traffic types, the access requirements of MTCDs of different traffic types are different. Therefore, the preamble resource partitioning problem when MTCD of multiple traffic types accesses the cellular network should also be solved. For the scenario of simultaneously performing random access on multiple traffic types MTCD, the access delay, the number of collisions and the access fairness of the various traffic types MTCD must be considered while improving the system throughput. However, most conventional access schemes are based on random contention, and a dynamic acb (access Class barring) factor is usually adopted to optimize collision, that is, backoff prediction is adopted, so that although the collision problem can be effectively alleviated, collision still occurs. Therefore, a new ordered contention access scheme is needed to solve the collision problem.
Disclosure of Invention
The technical problem to be solved is as follows: the invention provides a new ordered competition access scheme aiming at the traditional random access collision problem based on competition, and unlike the traditional scheme, the scheme is not blind competition but has target competition, thereby weakening the randomness in the competition process. Each device has different priorities and minimum delay requirements, the devices enter a queuing state when collision occurs, newly accessed devices can cooperate to select proper lead codes through a multi-agent reinforcement learning algorithm and are divided according to the priorities, and meanwhile the minimum delay requirement of the devices which are not successfully accessed is required to be ensured.
The technical scheme is as follows:
a sequential competition large-scale access learning method for a machine type communication system comprises the following steps:
s1, before the random access starts, the base station distributes the wireless resources of the physical random access channel and the physical uplink shared channel and broadcasts the wireless resources to all the mobile devices; each physical random access channel is corresponding to a unique lead code;
s2, when the random access starts, the base station broadcasts the number of devices waiting correspondingly under each lead code, and the newly accessed devices adopt a multi-agent reinforcement learning algorithm to cooperatively select channels meeting the self requirements; specifically, the device sends a preamble on a physical random access channel, the base station sends a response after receiving the request, the response includes a specific number of the device under the selected preamble, the specific number is determined according to the priority of the device, the device with the minimum number sends data on the physical random access channel, the other devices with numbers are in a waiting state, and the number of the waiting device is automatically reduced at each unit time;
the multi-agent reinforcement learning algorithm is combined with the number of devices waiting for the preamble sequences, the channel requirements of the devices, the delay tolerance and the priority of the devices at the same time, so that each device is in the maximum delay tolerance of the device and each preamble sequence preamble i The lower lengths are evenly distributed.
Further, in step S2, the number of each new access device is determined according to the priority function of the following devices:
priorityfunc(t)=P MTCD +k(t-t 0 )
wherein t is not less than t 0 Constant P MTCD Indicating the priority level of the device itself; the symbol k represents the growth rate of the priority, which is different for different devices; the symbol t denotes the current time, the symbol t 0 Indicating the time at which the device entered the team.
Further, in step S3, it is assumed that there are m preambles, and the device queue corresponding to the ith preamble is a preamble i 1,2,3, ·, m; at time tThe total number of devices corresponding to all preamble queues under the carving is represented as
Figure BDA0003623553390000021
n t,i Is a preamble queue preambl i Number of devices waiting in e, n t,acces Is the number of newly accessed devices; preamble queue preambl i The maximum tolerated delay of a device in e is expressed as
Figure BDA0003623553390000022
The objective function of the multi-agent reinforcement learning algorithm is expressed as:
Figure BDA0003623553390000023
Figure BDA0003623553390000024
wherein x is t,i Indicating the number of devices selecting the ith preamble among the newly accessed devices at time t, p reambl ei,last indicating the last device under the ith preamble sequence after the new device decides,
Figure BDA0003623553390000025
representing the maximum tolerant time delay of the last equipment under the ith lead code sequence after the decision of the new equipment;
Figure BDA0003623553390000031
the variance of the queue length under the m preamble queues at t time is shown, i.e. the objective function is to minimize the variance.
Further, in step S3, the process of the newly accessed device adopting the multi-agent reinforcement learning algorithm to cooperatively select the channel satisfying its own requirement includes the following steps:
s31, constructing a state set S: the state set S is used to represent the state of the whole access environment and is composed of t +1 states, where S is{s 0 ,s 1 ,...s t }; each state includes device information s under each preamble sequence t ={preamble 1,t ,preamble 2,t ,...,preamble m,t };
S32, constructing an action set A: the action set A is used to represent each agent according to the current state s t And action a taken by its own decision policy π t ;a t Is a one-dimensional array of length m, where a i,t If the number is 1, the device selects the ith preamble, and if the number is 0, the device does not select the ith preamble;
s33, constructing a reward R: after the agent takes action, the state of the current environment changes, and environmental revenue and corresponding reward r are generated at the same time j,t Expressed as:
Figure BDA0003623553390000032
s34, adopting deep reinforcement learning to construct a neural network, wherein the input of the network is action a t State s t Q value Q of output as motion k (s t ,a t ) Calculating the next state s using the target neural network t+1 Q value of (Q) k (s t+1 ,a t ):
Figure BDA0003623553390000033
Wherein alpha is k And gamma are the learning rate and discount factor, s, respectively t+1 And r t+1 Indicating the next state and at state s t Reward obtained after taking action, a' denotes the state s t+1 An executable action, A is a set of executable actions,
Figure BDA0003623553390000034
represents a state s t+1 Adopting an epsilon greedy strategy in the process of searching the maximum value of the maximum Q value in the lower action set A; the loss function E is expressed as:
Figure BDA0003623553390000035
and S35, updating the weight theta of the neural network by adopting a gradient descent method.
Further, in step S35, the process of updating the weight θ of the neural network by using the gradient descent method includes the following steps:
s351, randomly initializing weights theta of the neural network and actions of the agent j
Figure BDA0003623553390000036
Preamble sequences of the respective preamble codes i Setting 0;
s352, calculating the priority of the new access equipment, setting the convergence threshold of the loss function E, and initializing alpha k 、γ、ε;
S353, each agent makes a decision by using an epsilon greedy strategy according to the current state information;
s354, updating the state S of the environment t+1 And a prize r t+1
S355, storing S t 、a t 、s t+1 、r t+1 For empirical playback;
s356, repeating the step S353 to the step S355, and accumulating the experience; randomly extracting a certain number of samples from the accumulated experience, calculating a loss function E according to the samples, and updating the weight theta;
and S357, repeating the steps S353 to S356 until the loss function E reaches the convergence condition or the maximum iteration number T.
Has the advantages that:
(1) different from the traditional random competition access mode, the ordered competition large-scale access learning method for the machine type communication system can solve the collision problem by adopting the ordered competition access, and can allow more mobile devices (MTCDs) to access on the same scale.
(2) According to the ordered competition large-scale access learning method for the machine type communication system, when the mobile equipment (MTCD) makes a decision, a proper lead code is cooperatively selected by adopting a multi-agent reinforcement learning algorithm, and the learning algorithm can be better adapted to environmental changes to improve the convergence rate.
Drawings
Fig. 1 is a diagram of an ordered contention based access model according to an embodiment of the present invention.
FIG. 2 is a model diagram of multi-agent reinforcement learning based on an embodiment of the present invention.
Fig. 3 is a model diagram of each preamble sequence according to an embodiment of the present invention.
FIG. 4 is a diagram of a neural network architecture for a multi-agent embodiment of the present invention.
Detailed Description
The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the invention in any way.
The embodiment provides an orderly competition large-scale access learning method for a machine type communication system, which comprises the following steps:
s1, before the random access starts, the base station distributes the wireless resources of the physical random access channel and the physical uplink shared channel and broadcasts the wireless resources to all the mobile devices; each physical random access channel corresponds to a unique preamble.
S2, when the random access starts, the base station broadcasts the number of devices waiting correspondingly under each lead code, and the newly accessed devices adopt a multi-agent reinforcement learning algorithm to cooperatively select channels meeting the self requirements; specifically, the device sends a preamble on the physical random access channel, the base station receives the request and then sends a response, the response includes a specific number of the device under the selected preamble, the specific number is determined according to the priority of the device, the device with the smallest number sends data on the physical random access channel, the other devices with the numbers are all in a waiting state, and the number of the waiting device is automatically reduced after each unit time.
The multi-agent reinforcement learning algorithm is combined with the number of devices waiting for the lead code sequence, the channel requirement of the devices, the delay tolerance and the priority of the devices at the same time, so that each device is in the maximum timePreamble sequence preamble within delay tolerance i The lower lengths are evenly distributed.
Under the scene of a single Base Station (Base Station BS), there are several mobile devices MTCD, and under the conventional scheme, a user can be subdivided into a newly accessed device and a device which collides and backs off, and the device which collides and backs off under the present invention is not in a queuing state but randomly contends for a preamble again. Before Random Access (Random Access RA) starts, the base station allocates radio resources of a Physical Random Access Channel (PRACH) and a Physical Uplink Shared Channel (PUSCH) and broadcasts the radio resources to all MTCDs.
Referring to fig. 1, when RA starts, the base station broadcasts the number of devices waiting for each preamble, so that the newly accessed device can decide to select a suitable channel according to its own requirements. Different from the traditional random access mode, the device sends a preamble on the PRACH, the base station sends a response after receiving the request, the response includes a specific number of the device under the selected preamble, and the number is determined according to the priority of the device itself. The device with the smallest number transmits data on the PUSCH, and the number of the device is reduced by self every unit time. The remaining numbered devices are all in a wait state. The priority function of the device is as follows:
priorityfunc(t)=P MTCD +k(t-t 0 );
wherein t is not less than t 0 (ii) a The priority function is composed of two parts, where the constant P MTCD Indicating the priority level of the device itself; the symbol k represents the growth rate of the priority, which is different for different devices; the symbol t denotes the current time, the symbol t 0 The priority level of the waiting device in the team increases to a different extent every unit time when the waiting device enters the team.
Compared with the traditional random access mode, the device sends data on the PUSCH independently in the time dimension, so that the steps of collision avoidance and collision avoidance are avoided. Assuming that the MTCD arrival model follows a Beta distribution, as follows:
Figure BDA0003623553390000051
wherein, alpha is 3 and beta is 4.
Referring to fig. 2, in the device decision aspect, a multi-agent reinforcement learning algorithm is adopted to cooperate with each other, each device can be regarded as an agent, and the decision of each agent affects the rest of the agents because the queue length of the preamble code changes with the newly added device. The device decision will take into account the number of devices waiting for the preamble sequence, their own channel requirements, delay tolerance, device priority.
Referring to fig. 3, assuming that there are m preambles, which may be denoted by numbers 1,2,3, a i To represent; n under each preamble queue t,i A waiting device, plus the number n of newly accessed devices t,access Then the total number of devices can be expressed as
Figure BDA0003623553390000061
The maximum tolerated delay per device can be expressed as
Figure BDA0003623553390000062
In the process of waiting for the equipment, the waiting time of the equipment is required to be ensured not to exceed the threshold, and if the number of the equipment exceeds a certain scale, each equipment cannot be ensured to be within the maximum self delay tolerance; assuming that the devices at the head of the queue can successfully transmit data, the MTCD access success rate at time t may be represented as:
Figure BDA0003623553390000063
when a newly accessed device uses a multi-agent to make a cooperation decision, each preamble sequence preamble needs to be as uniform as possible i Length of lower, i.e. in summary, object boxThe number may be expressed as:
Figure BDA0003623553390000064
Figure BDA0003623553390000065
wherein x is t,i Indicating the number of devices selecting the ith preamble among the newly accessed devices at time t, p remableii, last denotes the last device under the ith preamble sequence after the new device decides.
And (4) state set S: the state used to represent the entire access environment includes device information s under each preamble sequence t ={preamble 1,t ,preamble 2,t ,...,preamble m,t Is composed of t +1 states, i.e. S ═ S 0 ,s 1 ,...s t }。
Action set A: according to the current state, each agent takes an action a according to its decision strategy pi t ,a t Is a one-dimensional array of length m, where a i,t And {0,1}, a value of 1 indicates that the device selects the ith preamble, and a value of 0 indicates that the device does not select the ith preamble.
Reward R: after the agent takes action, the state of the current environment changes, and environmental benefits, i.e., returns, are generated. In order to simplify the calculation, it is desirable that each device at the tail of the queue can satisfy its own delay tolerance in the process of waiting in the queue, so that the reward r is given j,t Can be expressed as:
Figure BDA0003623553390000071
wherein j is more than or equal to 1 and less than or equal to n t,access . Because the scale of the state set and the action set is large, deep reinforcement learning is adopted to construct a neural network, and the input of the network is action a t State s t The output is Q value of the action k (s t ,a t ) (ii) a And calculating the next state s using the target neural network t+1 Q value of (i.e. Q) k (s t+1 ,a t ) Then, it can be updated by the following expression:
Figure BDA0003623553390000072
wherein alpha is k And gamma are the learning rate and discount factor, s, respectively t+1 And r t+1 Indicating the next state and at state s t Reward obtained after taking action, a' denotes the state s t+1 An executable action, A is a set of executable actions,
Figure BDA0003623553390000073
represents a state s t+1 And (3) adopting an epsilon greedy strategy in the process of searching the maximum value for the maximum Q value in the lower action set A.
Loss function E: to make Q k+1 (s t ,a t And Q k (s t ,a t ) The difference between the two is minimized, so that the
Figure BDA0003623553390000074
Towards 0, the loss function E can be expressed as:
Figure BDA0003623553390000075
the weight θ of the neural network is updated using a gradient descent method.
Referring to fig. 4, according to the above mentioned technical solution, the specific implementation steps are as follows:
step 1: randomly initializing weights θ of neural network and actions of agent j
Figure BDA0003623553390000076
Preamble of each preamble sequence i And setting 0.
And 2, step: calculating out new accessesPriority of equipment, setting convergence threshold of loss function E, initializing alpha k 、γ、ε。
And step 3: each agent makes a decision based on the current state information and using an epsilon greedy strategy.
And 4, step 4: updating the state s of an environment t+1 And a prize r t+1
And 5: will s t 、a t 、s t+1 、r t+1 These parameter values are stored for empirical playback.
Step 6: and (3) randomly extracting a certain number of samples from the accumulated experience, calculating a loss function E according to the samples, updating the weight theta, and repeating the step (3) until the loss function E reaches a convergence condition or the program reaches the maximum iteration time T.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention. All the components not specified in this embodiment can be implemented by the prior art.

Claims (5)

1. A method for ordered competition large-scale access learning for a machine type communication system is characterized by comprising the following steps:
s1, before the random access starts, the base station distributes the wireless resources of the physical random access channel and the physical uplink shared channel and broadcasts the wireless resources to all the mobile devices; each physical random access channel is corresponding to a unique lead code;
s2, when the random access starts, the base station broadcasts the number of devices waiting correspondingly under each lead code, and the newly accessed devices adopt a multi-agent reinforcement learning algorithm to cooperatively select channels meeting the self requirements; specifically, the device sends a preamble on a physical random access channel, the base station sends a response after receiving the request, the response includes a specific number of the device under the selected preamble, the specific number is determined according to the priority of the device, the device with the minimum number sends data on the physical random access channel, the other devices with numbers are in a waiting state, and the number of the waiting device is automatically reduced at each unit time;
the multi-agent reinforcement learning algorithm is combined with the number of devices waiting for the lead code sequence, the channel requirement of the devices, the delay tolerance and the priority of the devices at the same time, so that each device is in the maximum delay tolerance of the device and the preamble of each lead code sequence i The lower lengths are evenly distributed.
2. The ordered competition large-scale access learning method for the machine-based communication system as claimed in claim 1, wherein in step S2, the number of each new access device is determined according to the priority function of the following devices:
priorityfunc(t)=P MTCD +k(t-t 0 )
wherein t is not less than t 0 Constant P MTCD Indicating the priority level of the device itself; the symbol k represents the growth rate of the priority, which is different for different devices; the symbol t denotes the current time, the symbol t 0 Indicating the time at which the device entered the team.
3. The method according to claim 1, wherein in step S3, it is assumed that there are m preambles, and the device queue corresponding to the ith preamble is preamble i 1,2,3, ·, m; the total number of devices corresponding to all preamble queues at time t is represented as
Figure FDA0003623553380000011
n t,i Is a preamble queue preambl i Number of waiting devices, n, in e t,access Is the number of newly accessed devices; preamble queue preamble i The maximum tolerated delay of the medium device is expressed as
Figure FDA0003623553380000012
The objective function of the multi-agent reinforcement learning algorithm is expressed as:
Figure FDA0003623553380000013
Figure FDA0003623553380000021
wherein x is t,i Indicates the number of devices, preambles, which select the ith preamble from the newly accessed devices at time t i,last Indicating the last device under the ith preamble sequence after the new device decides,
Figure FDA0003623553380000022
representing the maximum tolerant time delay of the last equipment under the ith lead code sequence after the decision of the new equipment;
Figure FDA0003623553380000023
represents the variance of the queue length under the m preamble queues at time t.
4. The ordered competition large-scale access learning method for machine type communication system as claimed in claim 1, wherein in step S3, the process of the newly accessed device adopting multi-agent reinforcement learning algorithm to cooperatively select the channel satisfying its own requirement includes the following steps:
s31, constructing a state set S: the state set S is used to represent the states of the entire access environment, and is composed of t +1 states, S ═ S 0 ,s 1 ,...s t }; each state includes device information s under each preamble sequence t ={preamble 1,t ,preamble 2,t ,...,preamble m,t };
S32, constructing an action set A: the action set A is used to represent each agent according to the current state s t And itselfAction a taken by decision policy π t ;a t Is a one-dimensional array of length m, where a i,t If the number is 1, the device selects the ith preamble, and if the number is 0, the device does not select the ith preamble;
s33, constructing a reward R: after the agent takes action, the state of the current environment changes, and environmental revenue and corresponding reward r are generated at the same time j,t Expressed as:
Figure FDA0003623553380000024
s34, adopting deep reinforcement learning to construct a neural network, wherein the input of the network is action a t State s t Q value Q of output as motion k (s t ,a t ) Calculating the next state s using the target neural network t+1 Q value of (Q) k (s t+1 ,a t ):
Figure FDA0003623553380000025
Wherein alpha is k And gamma are the learning rate and discount factor, s, respectively t+1 And r t+1 Indicating the next state and at state s t Reward obtained after taking action, a' denotes the state s t+1 An executable action, A is a set of executable actions,
Figure FDA0003623553380000026
represents a state s t+1 Adopting an epsilon greedy strategy in the process of searching the maximum value of the maximum Q value in the lower action set A; the loss function E is expressed as:
Figure FDA0003623553380000031
and S35, updating the weight theta of the neural network by adopting a gradient descent method.
5. The sequential competition large-scale access learning method for the machine type communication system according to claim 4, wherein in the step S35, the process of updating the weight θ of the neural network by adopting the gradient descent method comprises the following steps:
s351, randomly initializing weights theta of the neural network and actions of the agent j
Figure FDA0003623553380000032
Preamble of each preamble sequence i Setting 0;
s352, calculating the priority of the new access equipment, setting the convergence threshold of the loss function E, and initializing alpha k 、γ、ε;
S353, each agent makes a decision by using an epsilon greedy strategy according to the current state information;
s354, updating the state S of the environment t+1 And a prize r t+1
S355, storing S t 、a t 、s t+1 、r t+1 For empirical playback;
s356, repeating the step S353 to the step S355, and accumulating the experience; randomly extracting a certain number of samples from the accumulated experience, calculating a loss function E according to the samples, and updating the weight theta;
and S357, repeating the steps S353 to S356 until the loss function E reaches the convergence condition or the maximum iteration number T.
CN202210472683.5A 2022-04-29 2022-04-29 Ordered competition large-scale access learning method for machine type communication system Pending CN114980353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210472683.5A CN114980353A (en) 2022-04-29 2022-04-29 Ordered competition large-scale access learning method for machine type communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210472683.5A CN114980353A (en) 2022-04-29 2022-04-29 Ordered competition large-scale access learning method for machine type communication system

Publications (1)

Publication Number Publication Date
CN114980353A true CN114980353A (en) 2022-08-30

Family

ID=82980489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210472683.5A Pending CN114980353A (en) 2022-04-29 2022-04-29 Ordered competition large-scale access learning method for machine type communication system

Country Status (1)

Country Link
CN (1) CN114980353A (en)

Similar Documents

Publication Publication Date Title
CN113613339B (en) Channel access method of multi-priority wireless terminal based on deep reinforcement learning
Sharma et al. Collaborative distributed Q-learning for RACH congestion minimization in cellular IoT networks
CN111867139B (en) Deep neural network self-adaptive back-off strategy implementation method and system based on Q learning
CN113490184B (en) Random access resource optimization method and device for intelligent factory
Chen et al. Heterogeneous machine-type communications in cellular networks: Random access optimization by deep reinforcement learning
CN109803246B (en) Random access and data transmission method based on grouping in large-scale MTC network
CN111245541B (en) Channel multiple access method based on reinforcement learning
CN108834175B (en) Queue-driven equipment access and resource allocation joint control method in mMTC network
CN113810883A (en) Internet of things large-scale random access control method
Chou et al. Contention-based airtime usage control in multirate IEEE 802.11 wireless LANs
CN111601398B (en) Ad hoc network medium access control method based on reinforcement learning
Shoaei et al. Reconfigurable and traffic-aware MAC design for virtualized wireless networks via reinforcement learning
CN115278908A (en) Wireless resource allocation optimization method and device
CN114599117A (en) Dynamic configuration method for backspacing resources in random access of low earth orbit satellite network
CN114980353A (en) Ordered competition large-scale access learning method for machine type communication system
CN115066036A (en) Multi-base-station queuing type lead code allocation method based on multi-agent cooperation
CN113098665B (en) Processing method and device for allocating PUCCH resources for CSI
Lee et al. Multi-agent reinforcement learning for a random access game
CN113056010A (en) Reserved time slot distribution method based on LoRa network
Nwogu et al. A combined static/dynamic partitioned resource usage approach for random access in 5G cellular networks
Kim et al. Dynamic Transmission and Delay Optimization Random Access for Reduced Power Consumption
CN111935842A (en) Multi-user clustering scheduling method for non-orthogonal multiple access
Eftekhari et al. Energy and spectrum efficient retransmission scheme with RAW optimization for IEEE 802.11 ah networks
CN116915377B (en) Unauthorized access pilot frequency distribution method based on hybrid automatic request mechanism
CN113473419B (en) Method for accessing machine type communication device into cellular data network based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination