CN116709567A

CN116709567A - Joint learning access method based on channel characteristics

Info

Publication number: CN116709567A
Application number: CN202310736635.7A
Authority: CN
Inventors: 孙君; 李杭州
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-09-05

Abstract

The invention provides a joint learning access method based on channel characteristics, which mainly comprises the following steps: initializing a deep Q network of each agent, including a training network and a target network; calculating channel parameters reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters; calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH; each agent selects actions according to the current state information and by using a greedy strategy, and selects preambles from the available preamble pools; updating the state and rewards of the environment, and storing experience into an experience library; and randomly extracting a certain number of samples from the accumulated experience, calculating a loss function according to the samples, and updating the weight until the loss function reaches a convergence condition or the maximum iteration number is reached. Compared with the prior art, the invention can better adapt to environmental changes and can improve the success rate of access.

Description

Joint learning access method based on channel characteristics

Technical Field

The invention relates to a joint learning access method based on channel characteristics, and belongs to the technical field of wireless communication.

Background

The communication technology is updated continuously, the communication service is developed extremely rapidly, and the more the communication scene becomes originally. Large-scale machine type communication (mctc, massivemachine typecommunication) uses a scenario, also known as large-scale internet of things (mclot, massive internet ofthings), featuring a large number of low-complexity and energy-limited machine type communication devices (MTCD, machine type communication device) periodically transmitting very short data packets with relaxed latency requirements, expected to play an important role in future 6G wireless networks. In order to meet the massive access requirements of the MTCD, further control optimization is required on the basis of the current congestion control mechanism. Meanwhile, for the scenario that multiple service types MTCDs perform Random Access (RA) simultaneously, access delay, collision number and access fairness of the various service types MTCDs must be considered while improving the throughput of the system. In the enhanced mctc scenario, since the delay requirement is relaxed, a contention-based random access procedure may be adopted, and if a conventional LTE random access procedure is adopted, a large number of MTCDs may start the procedure of accessing the network at the same time, and a large number of RA attempts may increase the preamble collision probability, thereby reducing the number of successful accesses. This problem can be solved by increasing the resources allocated to the physical random access channel (PRACH, physical random access channel), however, since the uplink resources are limited, the higher the amount of resources available for PRACH resources, the lower the amount of resources available for the physical uplink shared channel (PUSCH, physical uplink shared channel), and many MTCDs that successfully complete RA attempts cannot find enough transmission resources in PUSCH. However, the time delay problem caused by access resource limitation and access conflict still exists.

Most of the current schemes only consider the resource allocation of the random channel, and rarely consider that the resource of the physical uplink shared channel (PUSCH, physical Uplink Shared Channel) can be borrowed for joint allocation. Furthermore, most schemes lack consideration of channel characteristics, and ignore the problem of uneven preamble distribution caused by a large number of devices with different transmission conditions.

In view of the foregoing, it is necessary to propose a joint learning access method based on channel characteristics to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a joint learning access method based on channel characteristics, which can solve the problems of unbalanced finite resource allocation and access efficiency.

In order to achieve the above object, the present invention provides a joint learning access method based on channel characteristics, which mainly includes the following steps:

step 1, initializing a deep Q network of each agent, comprising a training network Q (s, a; θ) and a target network Q (s, a; θ) ^- ) Parameters θ and θ ^- ；

Step 2, calculating channel parameters W reaching the equipment according to the beta distribution, and grouping the equipment according to the size of the channel parameters W;

step 3, calculating the total number of the retransmitted devices and the grouped devices, and carrying out joint allocation on a physical random access channel PRACH and a physical uplink shared channel PUSCH;

step 4, each agent selects actions according to the current state information and by using a greedy strategy, and selects a preamble from an available preamble pool;

step 5, updating the state s of the environment _t+1 Prize r _t+1 Store experience(s) _t ,a _t ,s _t+1 ,r _t+1 ) To an experience library;

step 6, randomly extracting a certain number of samples from the accumulated experience, and calculating a loss function L according to the samples _i And (theta), updating the weight theta, and repeating the step 3 until the loss function E reaches a convergence condition or the maximum iteration number T is reached.

As a further improvement of the present invention, in step 1, the loss function in each agent may be expressed as:

L _i (θ)＝E[(y ^DQN -Q _k (s _t ,a _t ；θ)) ² ]，

wherein ,representing the target value, θ ^- Representing the weight of the target network, E () represents the mean, r represents the prize, t represents the slot, and γ represents the discount factor.

As a further improvement of the present invention, in step 2, the root mean square delay spread and the level crossing rate are used to evaluate the statistical properties of the channel quality, wherein the root mean square delay spread can be expressed as:

wherein ,representing the average excess delay.

As a further improvement of the invention, the level crossing rate refers to the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, which can be expressed in the rice fading process as:

wherein the symbols alpha andis independent, 2b ₀ To disperse power s ² For specularly reflecting power, b ² ＝2b ₀ (πf _m ) ² Is suitable for isotropic scattering environment, f _m Is the maximum doppler shift.

As a further improvement of the present invention, the root mean square delay spread and the level crossing rate may be normalized:

and calculates channel parameters:

W＝w ₁ ·LCR _nor +w ₂ ·σ _nor ，

wherein ,w₁ and w₂ Is a scale factor, and w ₁ +w ₂ =1 for balancing root mean square delay spread and level crossing rate at channel parametersImportance in the number.

As a further improvement of the invention, the arrival model of the device obeys the beta distribution:

wherein the symbols α=3, β=4.

As a further improvement of the present invention, all devices are divided into M groups, so the number of devices per slot is:

wherein ,N_t (W)＝N _t ·λ·e ^-Wt λ=α (α+β), representing the number of devices reached by the current slot, is the desire for a beta distribution.

As a further improvement of the present invention, in step 3, the number of devices and the number of preambles based on contention are counted according to the number L of preamblesThe average device number of the successfully accessed preamble codes in the physical random access channel PRACH can be obtained:

suppose that a base station allocates a transmission θ _max Physical Uplink Shared Channel (PUSCH) resources required by bits, and corresponding equipment number S capable of successfully transmitting in Physical Uplink Shared Channel (PUSCH) _pu ：

Wherein the sign log ₂ (I) The number of information bits transmitted for each symbol of the constellation is known from the above formula, as a function of the objectIncreasing PRACH, S which can be successfully transmitted in corresponding Physical Uplink Shared Channel (PUSCH) _pu The fewer passes through a givenValue finding T _pr and T_pu Is the optimum value of (a):

wherein ,T_pr ∈{1,2,T _ra -1}, letAn optimal allocation scheme can be found.

As a further improvement of the present invention, in step 4, a deep Q network algorithm with shared parameters is adopted to design a framework of a deep neural network based on joint channel access, and Q learning is combined to generate a strategy pi, wherein the input is an observed state space S, and the output is all executable actions in an action space a; each pair of state actions has a corresponding Q value Q (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:

wherein the symbol γ is a discount factor, s _t+1 and r_t+1 Representing the next state and in state s _t Awards obtained after taking action, a' representing the state s _t+1 The next action, a is the set of executable actions,representing state s _t+1 And (3) the maximum Q value in the lower action set A, and the intelligent agent adopts an epsilon-greedy strategy to search the maximum value.

As a further improvement of the invention, in step 5, the preamble decision problem is converted into a RL problem, defining a set of states, an action space and a reward function,

state set S:the method comprises three parts, wherein the first part is the access state of the device in the last real-time slot, the state has three values of 1,0 and-1 respectively, when the value is 1, the transmitted preamble is not detected to be conflicted by the base station, 0 indicates that the selected preamble is detected to be conflicted by the base station, 1 indicates that the device in the last time slot selects to actively back off and does not participate in preamble competition, the second part is the preamble selected by the device in the last RA process and is marked as p ⁱ Wherein the third part->The number of the same lead codes selected by other devices in the previous RA process;

action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent _t The action space a may be defined as two parts: attempting access and selective backoff;

bonus function: when a device selects a connection and successfully connects, a positive prize will be obtained, if a collision occurs, a negative prize will be obtained, if the device selects a backoff, the prize will be calculated based on the number of devices attempting to connect at the current time and the number of retransmissions remaining,

wherein the symbol alpha _k For learning rate, beta _k Is an adjustable parameter.

The beneficial effects of the invention are as follows: the invention can better adapt to environmental changes and can improve the success rate of access.

Drawings

Fig. 1 is a MTCD indoor scene random access diagram of the joint learning access method based on channel characteristics of the present invention.

Fig. 2 is a PRACH and PUSCH resource allocation diagram of the joint learning access method based on channel characteristics of the present invention.

Fig. 3 is an access model diagram of the joint learning access method based on channel characteristics of the present invention.

Fig. 4 is a multi-state and dual-state markov decision process for a joint learning access method based on channel characteristics of the present invention.

Fig. 5 is a diagram of a shared network structure of multiple agents based on a channel characteristic joint learning access method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

In this case, in order to avoid obscuring the present invention due to unnecessary details, only the structures and/or processing steps closely related to the aspects of the present invention are shown in the drawings, and other details not greatly related to the present invention are omitted.

In addition, it should be further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1 to 5, the present invention discloses a joint learning access method based on channel characteristics, which normalizes level crossing rate and root mean square delay spread before the conventional contention-based random access collision problem, converts the level crossing rate and root mean square delay spread into a channel parameter according to different weights, and divides the channel parameter into GG, MG and BG according to the size of the channel parameter. And calculating the total access equipment number of the current time slot, and optimally distributing PRACH and PUSCH according to the total access equipment number. And carrying out dynamic resource allocation on the two channels by combining learning to form a double-channel combined learning access scheme based on channel statistical characteristics, finding out a close relation among the channel characteristics, the number of access devices and the access success rate, and optimizing an access strategy by combining multi-agent learning to solve the problems of unbalanced allocation of limited resources and access efficiency.

The joint learning access method based on the channel characteristics comprises the following steps:

Steps 1 to 5 will be described in detail below.

Step 1 in DQL, two neural networks are constructed, namely a training network Q (s, a; θ) with a parameter θ and a target network Q (s, a; θ) ^- ) The parameter θ can be updated by using a random gradient descent method to minimize the loss function of small random batches of samples. For this purpose, the loss function can be defined as the mean square error between the target value and the estimated value. The loss function in each agent can be expressed as:

L _i (θ)＝E[(y ^DQN -Q _k (s _t ,a _t ；θ)) ² ]，

wherein the formula isRepresenting the target value, θ ^- Representing the weight of the target network.

In step 2, under the single base station scene, there are several mobile devices, each device has different channel parameters, under the traditional scheme, the user can be subdivided into a newly accessed device and a device with collision for backoff, under the invention, the device with collision for backoff does not need to randomly compete the preamble again, but judges whether to access according to rewards.

The small-scale fading characteristics of the wireless channel mainly comprise multipath effects and time-varying characteristics, and the invention analyzes the two characteristics respectively. The characteristics of multipath fading channels are often described by a Power Delay Profile (PDP), root mean square delay spread is a parameter describing the characteristics of the channel fading, and can be expressed as:

wherein ,representing the average excess delay.

The level crossing rate, which is the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, is a critical second order statistic that defines the quality of the received signal, and can be expressed in the rice fading process as:

wherein the symbols alpha andis independent, 2b ₀ To disperse power s ² For specularly reflecting power, b ² ＝2b ₀ (πf _m ) ² Is suitable for all directionsLike-polarity scattering environment, f _m Is the maximum doppler shift.

The root mean square delay spread and level crossing rate are statistical characteristics used to evaluate channel quality. To describe channel quality more fully, they can be normalized:

the root mean square delay spread is combined with the level crossing rate in different proportions to form a composite channel parameter. Specifically, the channel parameters may be calculated using the following formula:

W＝w ₁ ·LCR _nor +w ₂ ·σ _nor ，

wherein ,w₁ and w₂ Is a scale factor, and w ₁ +w ₂ =1 for balancing the importance of root mean square delay spread and level crossing rate in channel parameters.

The arrival model of MTCD is assumed to obey beta distribution as follows:

wherein the symbols α=3, β=4.

In order to further study the influence of channel parameters on the success rate of access, all devices with access requirements at the time t of the system are divided into M groups, and the parameters corresponding to each group and reflecting the channel conditions are W _i, wherein W_i =0.1, 0.2,..1, representing the channel fading characteristics corresponding to the i-th group of devices. According to the characteristic that the number of access devices is small due to the poor channel fading characteristics, the access devices have an exponential decay rule along with the channel fading, and in order to further study the influence of channel parameters on the access success rate, all the devices are divided into M groups, so that the number of the devices in each time slot is equal to the number of the devices in each time slotThe method comprises the following steps:

In step 3, in the random access procedure, each device performs a contention-based procedure to request resources for data transmission. A typical RA period of 5ms is employed, wherein the uplink radio resources are divided into PRACH subsets and PUSCH subsets. The frequency of PRACH is 1.08MHz and its duration depends on the RA preamble format, e.g. by using generic preamble format 0, the time of PRACH is 1ms. PRACH access resource by L ₀ The sequence is mapped to 839 subcarriers of 1.25kHz, consisting of 64 orthogonal preamble sequences. PUSCH consists of 72 15kHz subcarriers for transmitting user data. PRACH and PUSCH occupy the entire bandwidth in the frequency domain, PRACH last T in the time domain _pr ∈{1,2,T _ra-1 Time slots, where T _ra ＝5。

The total number of available preambles of PRACH is l=l throughout the random access period ₀ ·T _pr 。

Unlike conventional random access procedures, the present description employs a two-step random access procedure, which may be described in detail as a base station periodically broadcasting a system information block including a plurality of key parameters for synchronization, preamble information, and pre-configured resources, prior to random access, which randomly selects a preamble in a process available for contention-based. Then, if the preamble has no collision, the base station transmits and transmits θ to the MTC device _max SCMA codebook information related to PUSCH resources required by bits.

Based on the number of contention-based preambles (L) and the number of devicesThe average device number of the successfully accessed preamble in the PRACH can be obtained:

suppose that a base station allocates a transmission θ _max PUSCH resource required by bit, corresponding device number S capable of solving successful transmission in PUSCH _pu ：

Wherein the sign log ₂ (I) The number of information bits transmitted for each symbol of the constellation is the number of successfully transmitted S in the corresponding PUSCH as the PRACH increases, as can be seen from the above equation _pu The fewer. By a given methodValue finding T _pr and T_pu Is the optimum value of (a):

wherein ,T_pr ∈{1,2,T _ra -1}, letAn optimal allocation scheme can be found.

In step 4, a deep Q network algorithm of parameter sharing is adopted, a framework of a deep neural network based on joint channel access is designed, and Q learning is combined to generate a strategy pi. Where the input is the observed state space S and the output is all executable actions in action space a; each pair of state actions has a corresponding Q value Q (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:

In step 5, a conventional access scheme is included that employs access class restriction (ACB) in order to reassign access requests of MTC devices over time. As a result, the number of access requests per RA cycle and thus the number of RA failed attempts is reduced. The objective of optimal access control based on conventional ACB schemes is to dynamically derive the appropriate ACB factors to maximize successful traffic, with the collision problem still being severe, so reinforcement learning is employed to solve the collision problem.

Under the condition of competing games, the multi-agent can achieve local optimum, but cannot meet the maximization of the overall network performance. The multi-agent problem is thus translated into a collaborative game, i.e. using the same bonus function for all agents. The preamble decision problem is converted into a RL problem, defining a state set, an action space and a reward function.

State set S:the method is divided into three parts, wherein the first part is the access state of the MTCD in the last real time slot. This state has three values, 1,0, -1, respectively. Wherein a value of 1 indicates that the transmitted preamble is not collision detected by the base station, 0 indicates that the selected preamble is collision detected by the base station, -1 indicates that the last slot device selects active backoff and does not participate in preamble contention. The second part is the preamble selected by the MTCD during the last RA, denoted p ⁱ Wherein the third part->The number of preambles is the same as that of other MTCDs selected in the previous RA procedure.

Action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent _t The action space a may be defined as two parts: access and selective backoff are attempted.

Bonus function: when a device selects a connection and successfully connects, a positive prize will be obtained. If a collision occurs, it will get a negative prize. If the device chooses to backoff, the reward will be calculated based on the number of devices attempting to connect at the current time and the number of retransmissions remaining.

In summary, the present invention is different from the conventional random contention access mode, and firstly, the present invention considers the level crossing rate and the time delay expansion two channel statistical characteristics, normalizes them respectively, converts them into a channel parameter according to different weights, and then classifies the channel parameters of the MTCD into three groups of good, good and bad according to different thresholds, so that the access fairness under different channel conditions can be balanced better.

In the invention, when the MTCD is accessed, PRACH and PUSCH resources are firstly jointly allocated, and then, whether an access preamble is selected in the current time slot or not is adopted by a reinforcement learning method, and the learning method can better adapt to environmental changes and improve the success rate of access.

The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. The joint learning access method based on the channel characteristics is characterized by mainly comprising the following steps of:

2. The channel characteristic based joint learning access method of claim 1, wherein: in step 1, the loss function in each agent can be expressed as:

L _i (θ)＝E[(y ^DQN -Q _k (s _t ,a _t ；θ)) ² ]，

3. The channel characteristic based joint learning access method of claim 1, wherein: in step 2, the root mean square delay spread and the level crossing rate are used to evaluate the statistical characteristics of the channel quality, where the root mean square delay spread can be expressed as:

wherein ,representing the average excess delay.

4. The channel characteristic based joint learning access method of claim 3, wherein: the level crossing rate refers to the frequency at which the envelope crosses a specified level R in the positive (or negative) direction, and can be expressed in the rice fading process as:

5. The channel characteristic based joint learning access method of claim 4, wherein: the root mean square delay spread and the level crossing rate can be normalized:

and calculates channel parameters:

W＝w ₁ ·LCR _nor +w ₂ ·σ _nor ，

6. The channel characteristic based joint learning access method of claim 5, wherein: the arrival model of the device obeys the beta distribution:

wherein the symbols α=3, β=4.

7. The channel characteristic based joint learning access method of claim 6, wherein: all devices are divided into M groups, so the number of devices per slot is:

wherein ,N_t (W)＝N _t ·λ·e ^-Wt Representing the number of devices reached by the current slot, λ=α/(α+β) is the desire for a beta distribution.

8. The channel characteristic based joint learning access method of claim 1, wherein: in step 3, according to the number of the contention-based preambles L and the number of devicesCan calculateThe average equipment number of the successfully accessed lead codes in the physical random access channel PRACH is obtained:

suppose that a base station allocates a transmission θ _max Physical Uplink Shared Channel (PUSCH) resource required by bits, and corresponding equipment number S capable of solving successful transmission in Physical Uplink Shared Channel (PUSCH) _pu ：

Wherein the sign log ₂ (I) The number of information bits transmitted for each symbol of the constellation is the number of S that can be successfully transmitted in the corresponding physical uplink shared channel PUSCH as the PRACH increases, as can be seen from the above equation _pu The fewer passes through a givenValue finding T _pr and T_pu Is the optimum value of (a):

wherein ,T_pr ∈{1,2,T _ra -1}, letAn optimal allocation scheme can be found.

9. The channel characteristic based joint learning access method of claim 1, wherein: in step 4, a deep Q network algorithm of parameter sharing is adopted, a framework of a deep neural network based on joint channel access is designed, Q learning is combined to generate a strategy pi,where the input is the observed state space S and the output is all executable actions in action space a; each pair of state actions has a corresponding Q value Q (s _t ,a _t ) The method comprises the steps of carrying out a first treatment on the surface of the Each step selects the action that achieves the maximum Q value in each state, and the Q function is updated according to the following rules, as follows:

wherein ,s_t+1 and r_t+1 Representing the next state and in state s _t Awards obtained after taking action, a' representing the state s _t+1 The next action, a is the set of executable actions,representing state s _t+1 And (3) the maximum Q value in the lower action set A, and the intelligent agent adopts an epsilon-greedy strategy to search the maximum value.

10. The channel characteristic based joint learning access method of claim 1, wherein: in step 5, the preamble decision problem is converted into a RL problem, a state set, an action space and a reward function are defined,

state set S:the method comprises three parts, wherein the first part is the access state of the device in the last real-time slot, the state has three values of 1,0 and-1 respectively, when the value is 1, the transmitted preamble is not detected to be conflicted by the base station, 0 indicates that the selected preamble is detected to be conflicted by the base station, 1 indicates that the device in the last time slot selects to actively back off and does not participate in preamble competition, the second part is the preamble selected by the device in the last RA process and is marked as p ⁱ Wherein the third part->Is other thanThe number of identical preambles selected by the device and the device in the previous RA procedure;

action space: according to the current state, each agent takes action a according to the decision strategy pi of the agent _t The action space a can be defined as two parts: attempting access and selective backoff;