CN116709567A - Joint learning access method based on channel characteristics - Google Patents

Joint learning access method based on channel characteristics Download PDF

Info

Publication number
CN116709567A
CN116709567A CN202310736635.7A CN202310736635A CN116709567A CN 116709567 A CN116709567 A CN 116709567A CN 202310736635 A CN202310736635 A CN 202310736635A CN 116709567 A CN116709567 A CN 116709567A
Authority
CN
China
Prior art keywords
channel
state
access method
access
preamble
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310736635.7A
Other languages
Chinese (zh)
Inventor
孙君
李杭州
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310736635.7A priority Critical patent/CN116709567A/en
Publication of CN116709567A publication Critical patent/CN116709567A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access, e.g. scheduled or random access
    • H04W74/08Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access]
    • H04W74/0833Non-scheduled or contention based access, e.g. random access, ALOHA, CSMA [Carrier Sense Multiple Access] using a random access procedure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access, e.g. scheduled or random access
    • H04W74/002Transmission of channel access control information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a joint learning access method based on channel characteristics, which mainly comprises the following steps: initializing a deep Q network of each agent, including a training network and a target network; calculating channel parameters reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters; calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH; each agent selects actions according to the current state information and by using a greedy strategy, and selects preambles from the available preamble pools; updating the state and rewards of the environment, and storing experience into an experience library; and randomly extracting a certain number of samples from the accumulated experience, calculating a loss function according to the samples, and updating the weight until the loss function reaches a convergence condition or the maximum iteration number is reached. Compared with the prior art, the invention can better adapt to environmental changes and can improve the success rate of access.

Description

Joint learning access method based on channel characteristics
Technical Field
The invention relates to a joint learning access method based on channel characteristics, and belongs to the technical field of wireless communication.
Background
The communication technology is updated continuously, the communication service is developed extremely rapidly, and the more the communication scene becomes originally. Large-scale machine type communication (mctc, massivemachine typecommunication) uses a scenario, also known as large-scale internet of things (mclot, massive internet ofthings), featuring a large number of low-complexity and energy-limited machine type communication devices (MTCD, machine type communication device) periodically transmitting very short data packets with relaxed latency requirements, expected to play an important role in future 6G wireless networks. In order to meet the massive access requirements of the MTCD, further control optimization is required on the basis of the current congestion control mechanism. Meanwhile, for the scenario that multiple service types MTCDs perform Random Access (RA) simultaneously, access delay, collision number and access fairness of the various service types MTCDs must be considered while improving the throughput of the system. In the enhanced mctc scenario, since the delay requirement is relaxed, a contention-based random access procedure may be adopted, and if a conventional LTE random access procedure is adopted, a large number of MTCDs may start the procedure of accessing the network at the same time, and a large number of RA attempts may increase the preamble collision probability, thereby reducing the number of successful accesses. This problem can be solved by increasing the resources allocated to the physical random access channel (PRACH, physical random access channel), however, since the uplink resources are limited, the higher the amount of resources available for PRACH resources, the lower the amount of resources available for the physical uplink shared channel (PUSCH, physical uplink shared channel), and many MTCDs that successfully complete RA attempts cannot find enough transmission resources in PUSCH. However, the time delay problem caused by access resource limitation and access conflict still exists.
Most of the current schemes only consider the resource allocation of the random channel, and rarely consider that the resource of the physical uplink shared channel (PUSCH, physical Uplink Shared Channel) can be borrowed for joint allocation. Furthermore, most schemes lack consideration of channel characteristics, and ignore the problem of uneven preamble distribution caused by a large number of devices with different transmission conditions.
In view of the foregoing, it is necessary to propose a joint learning access method based on channel characteristics to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide a joint learning access method based on channel characteristics, which can solve the problems of unbalanced finite resource allocation and access efficiency.
In order to achieve the above object, the present invention provides a joint learning access method based on channel characteristics, which mainly includes the following steps:
step 1, initializing a deep Q network of each agent, comprising a training network Q (s, a; θ) and a target network Q (s, a; θ) - ) Parameters θ and θ -
Step 2, calculating channel parameters W reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters W;
step 3, calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH;
step 4, each agent selects actions according to the current state information and by using a greedy strategy, and selects a preamble from an available preamble pool;
step 5, updating the state s of the environment t+1 Prize r t+1 Store experience(s) t ,a t ,s t+1 ,r t+1 ) To an experience library;
step 6, randomly extracting a certain number of samples from the accumulated experience, and calculating a loss function L according to the samples i And (theta), updating the weight theta, and repeating the step 3 until the loss function E reaches a convergence condition or the maximum iteration number T is reached.
As a further improvement of the present invention, in step 1, the loss function in each agent may be expressed as:
L i (θ)=E[(y DQN -Q k (s t ,a t ;θ)) 2 ],
wherein ,representing the target value, θ - Representing the weight of the target network, E () represents the mean, r represents the prize, t represents the slot, and γ represents the discount factor.
As a further improvement of the present invention, in step 2, the root mean square delay spread and the level crossing rate are used to evaluate the statistical properties of the channel quality, wherein the root mean square delay spread can be expressed as:
wherein ,representing the average excess delay.
As a further improvement of the invention, the level crossing rate refers to the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, which can be expressed in the rice fading process as:
wherein the symbols alpha andis independent, 2b 0 To disperse power s 2 For specularly reflecting power, b 2 =2b 0 (πf m ) 2 Is suitable for isotropic scattering environment, f m Is the maximum doppler shift.
As a further improvement of the present invention, the root mean square delay spread and the level crossing rate may be normalized:
and calculates channel parameters:
W=w 1 ·LCR nor +w 2 ·σ nor
wherein ,w1 and w2 Is a scale factor, and w 1 +w 2 =1 for balancing root mean square delay spread and level crossing rate at channel parametersImportance in the number.
As a further improvement of the invention, the arrival model of the device obeys the beta distribution:
wherein the symbols α=3, β=4.
As a further improvement of the present invention, all devices are divided into M groups, so the number of devices per slot is:
wherein ,Nt (W)=N t ·λ·e -Wt λ=α (α+β), representing the number of devices reached by the current slot, is the desire for a beta distribution.
As a further improvement of the present invention, in step 3, the number of devices and the number of preambles based on contention are counted according to the number L of preamblesThe average device number of the successfully accessed preamble codes in the physical random access channel PRACH can be obtained:
suppose that a base station allocates a transmission θ max Physical Uplink Shared Channel (PUSCH) resources required by bits, and corresponding equipment number S capable of successfully transmitting in Physical Uplink Shared Channel (PUSCH) pu
Wherein the sign log 2 (I) The number of information bits transmitted for each symbol of the constellation is known from the above formula, as a function of the objectIncreasing PRACH, S which can be successfully transmitted in corresponding Physical Uplink Shared Channel (PUSCH) pu The fewer passes through a givenValue finding T pr and Tpu Is the optimum value of (a):
wherein ,Tpr ∈{1,2,T ra -1}, letAn optimal allocation scheme can be found.
As a further improvement of the present invention, in step 4, a deep Q network algorithm with shared parameters is adopted to design a framework of a deep neural network based on joint channel access, and Q learning is combined to generate a strategy pi, wherein the input is an observed state space S, and the output is all executable actions in an action space a; each pair of state actions has a corresponding Q value Q (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:
wherein the symbol γ is a discount factor, s t+1 and rt+1 Representing the next state and in state s t Awards obtained after taking action, a' representing the state s t+1 The next action, a is the set of executable actions,representing state s t+1 And (3) the maximum Q value in the lower action set A, and the intelligent agent adopts an epsilon-greedy strategy to search the maximum value.
As a further improvement of the invention, in step 5, the preamble decision problem is converted into a RL problem, defining a set of states, an action space and a reward function,
state set S:the method comprises three parts, wherein the first part is the access state of the device in the last real-time slot, the state has three values of 1,0 and-1 respectively, when the value is 1, the transmitted preamble is not detected to be conflicted by the base station, 0 indicates that the selected preamble is detected to be conflicted by the base station, 1 indicates that the device in the last time slot selects to actively back off and does not participate in preamble competition, the second part is the preamble selected by the device in the last RA process and is marked as p i Wherein the third part->The number of the same lead codes selected by other devices in the previous RA process;
action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent t The action space a may be defined as two parts: attempting access and selective backoff;
bonus function: when a device selects a connection and successfully connects, a positive prize will be obtained, if a collision occurs, a negative prize will be obtained, if the device selects a backoff, the prize will be calculated based on the number of devices attempting to connect at the current time and the number of retransmissions remaining,
wherein the symbol alpha k For learning rate, beta k Is an adjustable parameter.
The beneficial effects of the invention are as follows: the invention can better adapt to environmental changes and can improve the success rate of access.
Drawings
Fig. 1 is a MTCD indoor scene random access diagram of the joint learning access method based on channel characteristics of the present invention.
Fig. 2 is a PRACH and PUSCH resource allocation diagram of the joint learning access method based on channel characteristics of the present invention.
Fig. 3 is an access model diagram of the joint learning access method based on channel characteristics of the present invention.
Fig. 4 is a multi-state and dual-state markov decision process for a joint learning access method based on channel characteristics of the present invention.
Fig. 5 is a diagram of a shared network structure of multiple agents based on a channel characteristic joint learning access method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
In this case, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the aspects of the present invention are shown in the drawings, and other details not greatly related to the present invention are omitted.
In addition, it should be further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1 to 5, the present invention discloses a joint learning access method based on channel characteristics, which normalizes level crossing rate and root mean square delay spread before the conventional contention-based random access collision problem, converts the level crossing rate and root mean square delay spread into a channel parameter according to different weights, and divides the channel parameter into GG, MG and BG according to the size of the channel parameter. And calculating the total access equipment number of the current time slot, and optimally distributing PRACH and PUSCH according to the total access equipment number. And carrying out dynamic resource allocation on the two channels by combining learning to form a double-channel combined learning access scheme based on channel statistical characteristics, finding out a close relation among the channel characteristics, the number of access devices and the access success rate, and optimizing an access strategy by combining multi-agent learning to solve the problems of unbalanced allocation of limited resources and access efficiency.
The joint learning access method based on the channel characteristics comprises the following steps:
step 1, initializing a deep Q network of each agent, comprising a training network Q (s, a; θ) and a target network Q (s, a; θ) - ) Parameters θ and θ -
Step 2, calculating channel parameters W reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters W;
step 3, calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH;
step 4, each agent selects actions according to the current state information and by using a greedy strategy, and selects a preamble from an available preamble pool;
step 5, updating the state s of the environment t+1 Prize r t+1 Store experience(s) t ,a t ,s t+1 ,r t+1 ) To an experience library;
step 6, randomly extracting a certain number of samples from the accumulated experience, and calculating a loss function L according to the samples i And (theta), updating the weight theta, and repeating the step 3 until the loss function E reaches a convergence condition or the maximum iteration number T is reached.
Steps 1 to 5 will be described in detail below.
Step 1 in DQL, two neural networks are constructed, namely a training network Q (s, a; θ) with a parameter θ and a target network Q (s, a; θ) - ) The parameter θ can be updated by using a random gradient descent method to minimize the loss function of small random batches of samples. For this purpose, the loss function can be defined as the mean square error between the target value and the estimated value. The loss function in each agent can be expressed as:
L i (θ)=E[(y DQN -Q k (s t ,a t ;θ)) 2 ],
wherein the formula isRepresenting the target value, θ - Representing the weight of the target network.
In step 2, under the single base station scene, there are several mobile devices, each device has different channel parameters, under the traditional scheme, the user can be subdivided into a newly accessed device and a device with collision for backoff, under the invention, the device with collision for backoff does not need to randomly compete the preamble again, but judges whether to access according to rewards.
The small-scale fading characteristics of the wireless channel mainly comprise multipath effects and time-varying characteristics, and the invention analyzes the two characteristics respectively. The characteristics of multipath fading channels are often described by a Power Delay Profile (PDP), root mean square delay spread is a parameter describing the characteristics of the channel fading, and can be expressed as:
wherein ,representing the average excess delay.
The level crossing rate, which is the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, is a critical second order statistic that defines the quality of the received signal, and can be expressed in the rice fading process as:
wherein the symbols alpha andis independent, 2b 0 To disperse power s 2 For specularly reflecting power, b 2 =2b 0 (πf m ) 2 Is suitable for all directionsLike-polarity scattering environment, f m Is the maximum doppler shift.
The root mean square delay spread and level crossing rate are statistical characteristics used to evaluate channel quality. To describe channel quality more fully, they can be normalized:
the root mean square delay spread is combined with the level crossing rate in different proportions to form a composite channel parameter. Specifically, the channel parameters may be calculated using the following formula:
W=w 1 ·LCR nor +w 2 ·σ nor
wherein ,w1 and w2 Is a scale factor, and w 1 +w 2 =1 for balancing the importance of root mean square delay spread and level crossing rate in channel parameters.
The arrival model of MTCD is assumed to obey beta distribution as follows:
wherein the symbols α=3, β=4.
In order to further study the influence of channel parameters on the success rate of access, all devices with access requirements at the time t of the system are divided into M groups, and the parameters corresponding to each group and reflecting the channel conditions are W i, wherein Wi =0.1, 0.2,..1, representing the channel fading characteristics corresponding to the i-th group of devices. According to the characteristic that the number of access devices is small due to the poor channel fading characteristics, the access devices have an exponential decay rule along with the channel fading, and in order to further study the influence of channel parameters on the access success rate, all the devices are divided into M groups, so that the number of the devices in each time slot is equal to the number of the devices in each time slotThe method comprises the following steps:
wherein ,Nt (W)=N t ·λ·e -Wt λ=α (α+β), representing the number of devices reached by the current slot, is the desire for a beta distribution.
In step 3, in the random access procedure, each device performs a contention-based procedure to request resources for data transmission. A typical RA period of 5ms is employed, wherein the uplink radio resources are divided into PRACH subsets and PUSCH subsets. The frequency of PRACH is 1.08MHz and its duration depends on the RA preamble format, e.g. by using generic preamble format 0, the time of PRACH is 1ms. PRACH access resource by L 0 The sequence is mapped to 839 subcarriers of 1.25kHz, consisting of 64 orthogonal preamble sequences. PUSCH consists of 72 15kHz subcarriers for transmitting user data. PRACH and PUSCH occupy the entire bandwidth in the frequency domain, PRACH last T in the time domain pr ∈{1,2,T ra-1 Time slots, where T ra =5。
The total number of available preambles of PRACH is l=l throughout the random access period 0 ·T pr
Unlike conventional random access procedures, the present description employs a two-step random access procedure, which may be described in detail as a base station periodically broadcasting a system information block including a plurality of key parameters for synchronization, preamble information, and pre-configured resources, prior to random access, which randomly selects a preamble in a process available for contention-based. Then, if the preamble has no collision, the base station transmits and transmits θ to the MTC device max SCMA codebook information related to PUSCH resources required by bits.
Based on the number of contention-based preambles (L) and the number of devicesThe average device number of the successfully accessed preamble in the PRACH can be obtained:
suppose that a base station allocates a transmission θ max PUSCH resource required by bit, corresponding device number S capable of solving successful transmission in PUSCH pu
Wherein the sign log 2 (I) The number of information bits transmitted for each symbol of the constellation is the number of successfully transmitted S in the corresponding PUSCH as the PRACH increases, as can be seen from the above equation pu The fewer. By a given methodValue finding T pr and Tpu Is the optimum value of (a):
wherein ,Tpr ∈{1,2,T ra -1}, letAn optimal allocation scheme can be found.
In step 4, a deep Q network algorithm of parameter sharing is adopted, a framework of a deep neural network based on joint channel access is designed, and Q learning is combined to generate a strategy pi. Where the input is the observed state space S and the output is all executable actions in action space a; each pair of state actions has a corresponding Q value Q (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:
wherein the symbol γ is a discount factor, s t+1 and rt+1 Representing the next state and in state s t Awards obtained after taking action, a' representing the state s t+1 The next action, a is the set of executable actions,representing state s t+1 And (3) the maximum Q value in the lower action set A, and the intelligent agent adopts an epsilon-greedy strategy to search the maximum value.
In step 5, a conventional access scheme is included that employs access class restriction (ACB) in order to reassign access requests of MTC devices over time. As a result, the number of access requests per RA cycle and thus the number of RA failed attempts is reduced. The objective of optimal access control based on conventional ACB schemes is to dynamically derive the appropriate ACB factors to maximize successful traffic, with the collision problem still being severe, so reinforcement learning is employed to solve the collision problem.
Under the condition of competing games, the multi-agent can achieve local optimum, but cannot meet the maximization of the overall network performance. The multi-agent problem is thus translated into a collaborative game, i.e. using the same bonus function for all agents. The preamble decision problem is converted into a RL problem, defining a state set, an action space and a reward function.
State set S:the method is divided into three parts, wherein the first part is the access state of the MTCD in the last real time slot. This state has three values, 1,0, -1, respectively. Wherein a value of 1 indicates that the transmitted preamble is not collision detected by the base station, 0 indicates that the selected preamble is collision detected by the base station, -1 indicates that the last slot device selects active backoff and does not participate in preamble contention. The second part is the preamble selected by the MTCD during the last RA, denoted p i Wherein the third part->The number of preambles is the same as that of other MTCDs selected in the previous RA procedure.
Action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent t The action space a may be defined as two parts: access and selective backoff are attempted.
Bonus function: when a device selects a connection and successfully connects, a positive prize will be obtained. If a collision occurs, it will get a negative prize. If the device chooses to backoff, the reward will be calculated based on the number of devices attempting to connect at the current time and the number of retransmissions remaining.
Wherein the symbol alpha k For learning rate, beta k Is an adjustable parameter.
In summary, the present invention is different from the conventional random contention access mode, and firstly, the present invention considers the level crossing rate and the time delay expansion two channel statistical characteristics, normalizes them respectively, converts them into a channel parameter according to different weights, and then classifies the channel parameters of the MTCD into three groups of good, good and bad according to different thresholds, so that the access fairness under different channel conditions can be balanced better.
In the invention, when the MTCD is accessed, PRACH and PUSCH resources are firstly jointly allocated, and then, whether an access preamble is selected in the current time slot or not is adopted by a reinforcement learning method, and the learning method can better adapt to environmental changes and improve the success rate of access.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. The joint learning access method based on the channel characteristics is characterized by mainly comprising the following steps of:
step 1, initializing a deep Q network of each agent, comprising a training network Q (s, a; θ) and a target network Q (s, a; θ) - ) Parameters θ and θ -
Step 2, calculating channel parameters W reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters W;
step 3, calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH;
step 4, each agent selects actions according to the current state information and by using a greedy strategy, and selects a preamble from an available preamble pool;
step 5, updating the state s of the environment t+1 Prize r t+1 Store experience(s) t ,a t ,s t+1 ,r t+1 ) To an experience library;
step 6, randomly extracting a certain number of samples from the accumulated experience, and calculating a loss function L according to the samples i And (theta), updating the weight theta, and repeating the step 3 until the loss function E reaches a convergence condition or the maximum iteration number T is reached.
2. The channel characteristic based joint learning access method of claim 1, wherein: in step 1, the loss function in each agent can be expressed as:
L i (θ)=E[(y DQN -Q k (s t ,a t ;θ)) 2 ],
wherein ,representing the target value, θ - Representing the weight of the target network, E () represents the mean, r represents the prize, t represents the slot, and γ represents the discount factor.
3. The channel characteristic based joint learning access method of claim 1, wherein: in step 2, the root mean square delay spread and the level crossing rate are used to evaluate the statistical characteristics of the channel quality, where the root mean square delay spread can be expressed as:
wherein ,representing the average excess delay.
4. The channel characteristic based joint learning access method of claim 3, wherein: the level crossing rate refers to the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, and can be expressed in the rice fading process as:
wherein the symbols alpha andis independent, 2b 0 To disperse power s 2 For specularly reflecting power, b 2 =2b 0 (πf m ) 2 Is suitable for isotropic scattering environment, f m Is the maximum doppler shift.
5. The channel characteristic based joint learning access method of claim 4, wherein: the root mean square delay spread and the level crossing rate can be normalized:
and calculates channel parameters:
W=w 1 ·LCR nor +w 2 ·σ nor
wherein ,w1 and w2 Is a scale factor, and w 1 +w 2 =1 for balancing the importance of root mean square delay spread and level crossing rate in channel parameters.
6. The channel characteristic based joint learning access method of claim 5, wherein: the arrival model of the device obeys the beta distribution:
wherein the symbols α=3, β=4.
7. The channel characteristic based joint learning access method of claim 6, wherein: all devices are divided into M groups, so the number of devices per slot is:
wherein ,Nt (W)=N t ·λ·e -Wt Representing the number of devices reached by the current slot, λ=α/(α+β) is the desire for a beta distribution.
8. The channel characteristic based joint learning access method of claim 1, wherein: in step 3, according to the number of the contention-based preambles L and the number of devicesCan calculateThe average equipment number of the successfully accessed lead codes in the physical random access channel PRACH is obtained:
suppose that a base station allocates a transmission θ max Physical Uplink Shared Channel (PUSCH) resource required by bits, and corresponding equipment number S capable of solving successful transmission in Physical Uplink Shared Channel (PUSCH) pu
Wherein the sign log 2 (I) The number of information bits transmitted for each symbol of the constellation is the number of S that can be successfully transmitted in the corresponding physical uplink shared channel PUSCH as the PRACH increases, as can be seen from the above equation pu The fewer passes through a givenValue finding T pr and Tpu Is the optimum value of (a):
wherein ,Tpr ∈{1,2,T ra -1}, letAn optimal allocation scheme can be found.
9. The channel characteristic based joint learning access method of claim 1, wherein: in step 4, a deep Q network algorithm of parameter sharing is adopted, a framework of a deep neural network based on joint channel access is designed, Q learning is combined to generate a strategy pi,where the input is the observed state space S and the output is all executable actions in action space a; each pair of state actions has a corresponding Q value Q (s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:
wherein ,st+1 and rt+1 Representing the next state and in state s t Awards obtained after taking action, a' representing the state s t+1 The next action, a is the set of executable actions,representing state s t+1 And (3) the maximum Q value in the lower action set A, and the intelligent agent adopts an epsilon-greedy strategy to search the maximum value.
10. The channel characteristic based joint learning access method of claim 1, wherein: in step 5, the preamble decision problem is converted into a RL problem, a state set, an action space and a reward function are defined,
state set S:the method comprises three parts, wherein the first part is the access state of the device in the last real-time slot, the state has three values of 1,0 and-1 respectively, when the value is 1, the transmitted preamble is not detected to be conflicted by the base station, 0 indicates that the selected preamble is detected to be conflicted by the base station, 1 indicates that the device in the last time slot selects to actively back off and does not participate in preamble competition, the second part is the preamble selected by the device in the last RA process and is marked as p i Wherein the third part->Is other thanThe number of identical preambles selected by the device and the device in the previous RA procedure;
action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent t The action space a can be defined as two parts: attempting access and selective backoff;
bonus function: when a device selects a connection and successfully connects, a positive prize will be obtained, if a collision occurs, a negative prize will be obtained, if the device selects a backoff, the prize will be calculated based on the number of devices attempting to connect at the current time and the number of retransmissions remaining,
wherein the symbol alpha k For learning rate, beta k Is an adjustable parameter.
CN202310736635.7A 2023-06-20 2023-06-20 Joint learning access method based on channel characteristics Pending CN116709567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310736635.7A CN116709567A (en) 2023-06-20 2023-06-20 Joint learning access method based on channel characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310736635.7A CN116709567A (en) 2023-06-20 2023-06-20 Joint learning access method based on channel characteristics

Publications (1)

Publication Number Publication Date
CN116709567A true CN116709567A (en) 2023-09-05

Family

ID=87828971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310736635.7A Pending CN116709567A (en) 2023-06-20 2023-06-20 Joint learning access method based on channel characteristics

Country Status (1)

Country Link
CN (1) CN116709567A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176213A (en) * 2023-11-03 2023-12-05 中国人民解放军国防科技大学 SCMA codebook selection and power distribution method based on deep prediction Q network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176213A (en) * 2023-11-03 2023-12-05 中国人民解放军国防科技大学 SCMA codebook selection and power distribution method based on deep prediction Q network

Similar Documents

Publication Publication Date Title
CN108882301B (en) Non-orthogonal random access method based on optimal power backoff in large-scale M2M network
CN112737837B (en) Method for allocating bandwidth resources of unmanned aerial vehicle cluster under high dynamic network topology
CN108633102A (en) The sending, receiving method and equipment of upstream data
CN106792451B (en) D2D communication resource optimization method based on multi-population genetic algorithm
CN104350798A (en) Random access channel enhancements for LTE devices
CN105873214B (en) A kind of resource allocation methods of the D2D communication system based on genetic algorithm
Khorov et al. Two-slot based model of the IEEE 802.11 ah restricted access window with enabled transmissions crossing slot boundaries
CN110233755B (en) Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things
CN116709567A (en) Joint learning access method based on channel characteristics
CN111182511B (en) AGA-based NOMA resource allocation method in mMTC scene
CN116744311B (en) User group spectrum access method based on PER-DDQN
CN108282902A (en) Accidental access method, base station and user equipment
CN112153744A (en) Physical layer security resource allocation method in ICV network
Tran et al. Novel reinforcement learning based power control and subchannel selection mechanism for grant-free NOMA URLLC-enabled systems
CN110113720B (en) ACB mechanism-based group paging congestion control method
CN103916972B (en) A kind of method and apparatus of startup RTS/CTS mechanism
CN110505681B (en) Non-orthogonal multiple access scene user pairing method based on genetic method
CN108235440B (en) Spectrum resource allocation method and system based on interference threshold in Femtocell network
CN108601083B (en) Resource management method based on non-cooperative game in D2D communication
Tsiropoulou et al. Service differentiation and resource allocation in SC-FDMA wireless networks through user-centric Distributed non-cooperative Multilateral Bargaining
CN109963272A (en) A kind of accidental access method towards in differentiation MTC network
CN109152060A (en) Transmitter channel distribution model and method in a kind of shortwave downlink communication
Liew et al. Performance evaluation of backoff misbehaviour in IEEE 802.11 ah using evolutionary game theory
CN115066036A (en) Multi-base-station queuing type lead code allocation method based on multi-agent cooperation
CN114375058A (en) Task queue aware edge computing real-time channel allocation and task unloading method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination