CN116744311A

CN116744311A - User group spectrum access method based on PER-DDQN

Info

Publication number: CN116744311A
Application number: CN202310592111.5A
Authority: CN
Inventors: 魏祥麟; 魏楠; 范建华; 胡永扬; 赵框; 王彦刚
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-09-12
Anticipated expiration: 2043-05-24
Also published as: CN116744311B

Abstract

The application discloses a user group spectrum access method based on PER-DDQN, which comprises the following steps: under a distributed dynamic spectrum access scene, each user group is taken as an agent, a single user group is taken as the agent to have respective DRL model, the distribution strategy is independently learned, and each DRL model adopts a priority experience playback mechanism and combines a DQN algorithm of a double-network structure; respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group; and performing spectrum access according to the processing condition of the member service in the group, realizing the transmission of the service information in the group, and completing the utilization of spectrum resources. The method has the advantages of higher convergence rate, better performance and the like.

Description

User group spectrum access method based on PER-DDQN

Technical Field

The application relates to the technical field of wireless communication networks, in particular to a user group spectrum access method based on PER-DDQN.

Background

Cognitive radio is considered as a powerful tool to address spectrum resource shortages and to improve spectrum utilization. However, the wireless communication channel is open and vulnerable to malicious attacks, so that the spectrum utilization rate of the cognitive wireless network is severely reduced. Therefore, the anti-interference communication capability of the cognitive wireless network is receiving more and more attention. The current cognitive radio technology mainly aims at researching the Dynamic Spectrum Access (DSA) problem of single user, and in practical application situations, as the number of users and the variety of communication services are gradually increased, the situation that the spectrum is shared in the form of single user nodes is easy to cause access confusion, serious inter-user interference and the like is easy to cause.

Disclosure of Invention

The technical problem to be solved by the application is how to provide a user group spectrum access method based on PER-DDQN with higher convergence rate and better performance.

In order to solve the technical problems, the application adopts the following technical scheme: a PER-DDQN based user group spectrum access method, the method comprising the steps of:

under a distributed dynamic spectrum access scene, each user group is used as an intelligent agent, a state space, an action space and rewarding setting of the intelligent agent are defined, and an authorized channel in the cognitive wireless network is selected; a single user group is taken as an agent to have respective DRL models, independent learning and distribution strategies are carried out, and each DRL model adopts a priority experience playback mechanism and combines a DQN algorithm of a double-network structure;

respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group;

and performing spectrum access according to the processing condition of the member service in the group, realizing the transmission of the service information in the group, and completing the utilization of spectrum resources.

The further technical proposal is that: the state space is the information quantity possession of the selected channel and the users in the group for each time slot;

the actions of each user group are to respectively select a transmitting party and a receiving party according to the information quantity of users in the group;

the benefit of the action is whether the spectrum access is reasonable or not, and whether the user information quantity in the group is increased or not, so that each user group is stimulated to make a correct allocation strategy.

The further technical scheme is that the method for constructing the user group model comprises the following steps:

dividing users under the wireless network into N user groups according to the types of communication services to be processed, wherein one user group processes a service, each service contains different data quantity, L authorized wireless channels are shared among the user groups, when each time slot starts, all the authorized channels are in an idle available state, one user group serves as an intelligent agent, one channel is randomly selected for access, and a transmitting party in the user group transmits data on the channel;

multiple user groups can be selectively accessed to the same channel, and spectrum resource sharing is realized through distributed spectrum access; the user group comprises M users, and the users in the group process business through information sharing; the intelligent agent selects part of users in the group to access the frequency spectrum, and the part of users are used as transmitting parties to broadcast and transmit service information to other users in the range; when all users in a group completely possess the required service information, indicating that the group service is completed, stopping channel access and ending transmission in the group; the purpose of the spectrum access of the user group is to fully utilize the user resources in the group and shorten the service processing time of the whole user group.

A further technical solution is that when a user is selected as a transmitting party, there are 3 cases where the same group of users in their transmission range acts as receiving parties:

1) The receiver only in the transmission range of one transmitter in the same group, and then receives the signals of the corresponding transmitters, and the rest signals are used as interference signals;

2) If the receiver is located in the coverage area within the transmission range of the same group of multiple transmitters, all signal-to-interference-and-noise ratios received by the receiver are calculated, the maximum signal-to-interference-and-noise ratio is selected as the corresponding transmitter, and the rest signals are used as interference signals;

3) The receiver is not in the transmission range of the transmitter in the same group, can not receive signals, and corresponds to the increment of user information of 0; when the user selected as the receiving party has already fully owned the traffic information volume, the user no longer receives information, corresponding to a user information increment of 0.

The further technical scheme is that the signal-to-interference-and-noise ratio received by the mth user as the receiving party is shown in a formula (1):

wherein p represents the transmission power as the transmitting user; i h _mj | ² Representing the channel gain from the mth user as the receiving party to the corresponding jth user as the transmitting party; i h _mk | ² The channel gain from the kth user as the interfering party to the mth user as the receiving party is represented, k noteq m and k noteq j; b represents the bandwidth of the channel; n (N) ₀ Representing the noise spectrum density received by the user; the transmission rate of the mth user can be shown by equation (2):

V _m ＝log ₂ (1+SINR _m ) (2)

setting the size of the business data volume to be processed by the ith user group as C _i The information increment of each time slot of the mth user in the ith user group is expressed as Can be calculated from formula (3):

wherein ,indicating the information amount already owned by the mth user of the ith group,/>

If it isThe user is indicated to be received, the user is not taken as a receiver any more, and the file can be transmitted to other users as a sender;

if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; μ represents the SINR threshold that the user is required to reach as the receiver; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When the service processing of the user group is completed, the service processing of the user group is indicated to be completed, and the spectrum resources are not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):

the construction method of the state space comprises the following steps:

dividing service processing time length into a plurality of time slots, wherein each channel is in an available state when each time slot starts; the state of the ith user group is divided into two parts: wherein ,/>Indicating the channel number selected by the current time slot of the ith user group, the ith user group selecting one of the L channels for access,/channel access> Indicating that the ith user group has processed the group of services, keeping a waiting state and not accessing the channel any more; />Indicating that the ith user group has selected the +.>A plurality of channels; />Indicating the user service processing state in the ith user group when the current time slot starts.g _m E {0,1}, when w _m ＝C _i G at the time of _m =1, indicating that the mth user of the ith group has completed the service, and the user information amount is no longer increased; when w is _m ＜C _i G at the time of _m =0, indicating that the mth user of the ith group has not completed the service, and continues to receive, as a receiving party, the information transmitted by the corresponding transmitter user.

The construction method of the action space comprises the following steps:

after each user group selects a channel, deciding which users are used as transmitting parties and which users are used as receiving parties according to the service processing states of the users in the group; the actions of the ith user group are expressed as: a, a _i ＝[x ₁ ,x ₂ ,...,x _M]； wherein x_m E {0,1}, when x _m When=1, it indicates that the mth user is selectedAs the transmitting side; x is x _m =0 means that the mth user is selected as the receiving side; if the users in the group are selected as transmitters, the corresponding users as receivers feed back the received signal-to-noise ratio to the transmitters, and calculate interference with the receiver based on the positions of other transmitters on the same channel.

The further technical scheme is that the method for setting rewards comprises the following steps:

the principle of spectrum access is to utilize the user resources of completed service in group to realize the average time length minimization of the whole service processing, i-th agent rewarding r _i Consists of 5 parts:

1) Invalidation allocation penaltyWhen the mth user in the ith group is selected as the transmitting party, x _m =1, if the user does not have complete traffic data, the agent will receive a small penalty +.> Is negative;

2) User group information augmentation rewardsTotal data amount D of rewards and user groups _i In agreement, i.e.)>

3) Completion rewards for individual user business processes within a groupSince the information increment of the user group gradually decreases with increasing slot, the reward +.>And also gradually decrease, thus giving the users in the group an additional completion service incentive,that is, each time a user in the group has all the information of the service, the agent gets an extra positive prize +.>And the user is selected again as the receiver, the information amount is not increased any more, thereby decreasing the bonus +.>Gradually reducing the negative influence;

4) Whole set business processing completion rewardsFor the ith user group (agent), if +.>Giving a larger positive prize R to the current agent ₄ And ending the service processing of the group, and not participating in the channel access any more;

5) Service processing duration penaltyIn order to whip the agent to process the service faster, the agent receives a penalty every time slot> Is negative and with increasing service processing duration, < >>Gradually increasing.

In summary, the prize settings are expressed as:

the further technical scheme is that the method comprises the following steps:

the multi-user group DSA strategy is distributed, user information in the groups cannot be shared among the user groups, and each user group serving as an intelligent agent is provided with a corresponding DRL model to independently carry out spectrum access decision;

firstly, initializing an original network and a target network of each intelligent agent, wherein the original network obtains network parameters of theta, and the network parameters of the target network are theta'; status s of ith user group at the beginning of each time slot _i Input into original network, randomly exploring environment with epsilon probability and selecting action, and selecting action a according to maximum action value with 1-epsilon probability _i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a _i The rewards r are obtained from the environment _i And the state s 'of the next slot' _i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' _i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' _i Assigning a value to be the current time slot state;

then, the experience (s _i ,a _i ,r _i ,s′ _i ) Storing the experience information into an experience pool, and continuously extracting experience learning from the experience pool and training a network; adopting a preferential experience playback mechanism, and using non-uniform sampling to replace uniform sampling; when the TD value is larger, the difference between the current Q value and the target Q value is larger, and the current Q value should be updated more, which is regarded as effective experience; therefore, the priority experience revisit mechanism measures the value of experience by using the TD value, and prioritizes the experience samples in the experience pool; as shown in equation (5), priority is proportional to TD value:

then, calculating action value and network loss; the dual network architecture is employed to calculate action value, i.e., different action value functions are used to select and evaluate actions:

determining an action using an original network, determining an action value using a target network, and a target valueThe calculation of (2) is shown in formula (6):

the calculation of the network loss value is shown in formula (7):

and finally, copying the original network parameters theta to the target parameters theta' at regular intervals to finish updating the target network parameters.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the method adopts a priority experience playback mechanism and combines the DQN algorithm of a double-network structure, thereby shortening the average processing time of the whole communication service. Modeling a user group spectrum access scene into a Markov decision model, regarding each user group as an agent, utilizing a channel selection result and the user condition in the group, and making a proper spectrum access decision through training and learning of a DRL model, thereby realizing high-efficiency information sharing and shortening service processing time. The performance of the algorithm provided by the application is verified by designing experiments under different scale scenes. Test results show that compared with the original DQN algorithm, the PER-DDQN algorithm has better performance and can learn a better allocation strategy.

Drawings

The application will be described in further detail with reference to the drawings and the detailed description.

FIG. 1 is a main flow chart of a method according to an embodiment of the present application;

FIG. 2 is a functional block diagram of a user group model in a method according to an embodiment of the present application;

FIG. 3 is a diagram of the relationship between a sender and other users in a method according to an embodiment of the present application;

FIG. 4 is a framework diagram of an algorithm model in a method according to an embodiment of the application;

FIG. 5a is a graph of average rewards achieved by an overall user group in an embodiment of the application;

FIG. 5b is a graph of average processing time duration for all traffic in an embodiment of the application;

FIG. 6a is a graph of average rewards achieved by an overall user group in an embodiment of the application;

FIG. 6b is a graph of average processing time duration for all traffic in an embodiment of the application;

FIG. 7a is a graph of average rewards achieved by an overall user group in an embodiment of the application;

fig. 7b is a graph of average processing time duration for all traffic in an embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

As shown in fig. 1, the embodiment of the application discloses a user group spectrum access method based on PER-DDQN, which comprises the following steps:

under a distributed dynamic spectrum access scene, each user group is used as an intelligent agent, firstly, a state space, an action space and rewarding setting of the intelligent agent are defined, and an authorized channel in a cognitive wireless network is selected; a single user group is taken as an agent to have respective DRL (deep reinforcement learning) models, independent learning and distribution strategies are carried out, and each DRL model adopts a preferential experience playback mechanism and combines a DQN algorithm of a double-network structure;

then respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group;

and finally, performing spectrum access according to the service processing condition of the members in the group, realizing the transmission of service information in the group, and completing the utilization of spectrum resources.

The method provides that the frequency spectrum is accessed in the form of a user group, the users in the network are divided into a plurality of user groups according to the types of communication services, the frequency spectrum access is performed according to the processing conditions of the member services in the groups, the service information transmission in the groups is realized, and the utilization of the frequency spectrum resources is completed. The user group spectrum access problem faces difficulties in terms of incompletely observed spectrum environments and complex intra-group user conditions compared to the DSA problem of a single secondary user node. Therefore, the user group in each time slot needs to consider not only the service completion situation of the users in the group, but also the mutual interference situation of the users and other user groups sharing the same spectrum resource.

Aiming at the spectrum access problem in the complex communication service spectrum environment, the method accesses the spectrum in the form of a user group, and provides a DRL (deep reinforcement learning) -based distributed user group spectrum access method. And each time slot of each user group randomly selects channel access, and no communication exists among the user groups. The individual user groups as agents possess respective DRL models, and independently learn allocation policies. Each DRL model adopts a preferential experience playback mechanism and combines the DQN algorithm of a double-network structure, and shortens the overall service processing time through detailed design of the rewarding function, and compared with the original DQN algorithm, the DRL model has better performance and faster convergence speed.

The constructed user group model is shown in fig. 2, the users under the wireless network are divided into N user groups according to the types of communication services to be processed, one user group processes a service, each service contains different data quantity, L authorized wireless channels are shared among the user groups, all authorized channels are in idle available states at the beginning of each time slot, one user group serves as an agent, a channel is randomly selected for access, and the transmitters in the group transmit data on the channel. Multiple user groups can select to access the same channel, and spectrum resource sharing is realized through distributed spectrum access. A user group contains M users, and the users in the group process business through information sharing. The intelligent agent selects part of users in the group to access the frequency spectrum, and the part of users serve as transmitters to broadcast and transmit service information to other users in the range of the users. When all users in a group have the required service information completely, the group service is completed, channel access is stopped and transmission in the group is finished. The purpose of the spectrum access of the user group is to fully utilize the user resources in the group and shorten the service processing time of the whole user group.

As shown in fig. 3, when a user is selected as a transmitting party, there are three cases in which the same group of users within its transmission range acts as receiving parties:

The signal-to-interference-and-noise ratio received by the mth user as the receiving party is shown in formula (1):

wherein p represents the transmission power as the transmitting user; i h _mj | ² Representing the channel gain from the mth user as the receiving party to the corresponding jth user as the transmitting party; i h _mk | ² Representing the channel from the kth user as an interfering party to the mth user as a receiving partyGain, k+.m and k+.j; b represents the bandwidth of the channel; n (N) ₀ Representing the noise spectrum density received by the user; the transmission rate of the mth user is shown in formula (2):

V _m ＝log ₂ (1+SINR _m ) (2)

if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; μ represents the SINR threshold that the user is required to reach as the receiver; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When it is time, it means that the service processing of the user group is completed,the frequency spectrum resource is not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):

the user group spectrum access method based on PER-DDQN is designed:

the DRL is utilized to solve the problem of spectrum access of the user group in the DSA scene, and the state space, the action space and the rewarding setting of the agent are required to be defined in advance. Under the distributed dynamic spectrum access scene, each user group serves as an intelligent agent, an authorized channel in the cognitive wireless network is selected firstly, then a transmitter and a receiver are selected from users in the group respectively, and the selected channels are distributed to the transmitter for transmitting service information, so that information sharing in the group is realized. The state space has information about the channels selected for each time slot and the users in the group. The actions of each user group are to select the transmitter and the receiver according to the information quantity of the users in the group. The benefit of the action is whether the spectrum access is reasonable or not, and whether the user information quantity in the group is increased or not, so that each user group is stimulated to make a correct allocation strategy. The final objective of the user group spectrum access problem is to increase the channel utilization rate, and simultaneously, by reasonably setting a reward function, the information increment of the users in each time slot group is maximized, so that the service processing time is shortened.

State space:

Action space:

after each user group selects a channel, deciding which users are used as transmitting parties and which users are used as receiving parties according to the service processing states of the users in the group; the actions of the ith user group are expressed as: a, a _i ＝[x ₁ ,x ₂ ,...,x _M]； wherein x_m E {0,1}, when x _m When=1, it means that the mth user is selected as the transmitting party; x is x _m =0 means that the mth user is selected as the receiving side; if the users in the group are selected as transmitters, the corresponding users as receivers feed back the received signal-to-noise ratio to the transmitters, and calculate interference with the receiver based on the positions of other transmitters on the same channel.

Prize setting:

the user group is used as an intelligent agent, and the principle of spectrum access is to use the fact that the group is finishedUser resources forming service, minimizing average duration of overall service processing, rewarding r of ith intelligent agent _i Consists of 5 parts:

3) Completion rewards for individual user business processes within a groupSince the information increment of the user group gradually decreases with increasing slot, the reward +.>And gradually decrease, thus giving the users in the group an extra service prize, i.e. the agent gets an extra positive prize +.>And the user is selected again as the receiver, the information amount is not increased any more, thereby decreasing the bonus +.>Gradually reducing the negative influence;

In summary, the prize settings are expressed as:

the method comprises the following steps:

first, initializing an original network (network parameter θ) and a target network (network parameter θ') of each agent; status s of ith user group at the beginning of each time slot _i Input into original network, randomly exploring environment with epsilon probability and selecting action, and selecting action a according to maximum action value with 1-epsilon probability _i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a _i The rewards r are obtained from the environment _i And the state s 'of the next slot' _i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' _i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' _i Assigning a value to be the current time slot state;

the calculation of the network loss value is shown in formula (7):

The algorithm flow is as follows:

experimental setup and analysis:

to verify the effectiveness of the method, a series of simulated comparative experiments were designed. Firstly, verifying the effectiveness of the proposed algorithm in a smaller scale scene, then further expanding the system scale by increasing the number of users and increasing the amount of each group of business data, and again verifying the adaptability of the proposed algorithm to a large scale communication scene. The method and the system respectively compare the provided algorithm with the original DQN algorithm from two aspects of average rewards obtained by the user group during training and average finishing time of the whole business, and the provided algorithm can enable the user group to learn a better allocation strategy and has higher convergence rate. Finally, two parameters affecting the algorithm are analyzed, and a reliable reference is provided for parameter setting of the experiment in this section.

Initial parameter setting:

the experimental scene is arranged in a 200m multiplied by 200m plane space, the space occupied by each user group is 50m multiplied by 50m, and the positions of the users in the group are randomly distributed. The transmission range of each user is set at 30m ² Within, the transmission power p _s =20 mW. At the beginning of each slot, all channels are available.

The DRL model used in the experiment is a two-layer fully connected network, each layer of network comprises 128 neurons, and a Tanh activation function is adopted. Each user group is used as an agent to have a DRL network, the network inputs the channel number accessed by the user group of each time slot and the service completion condition of the users in the group, the dimension is M+1, the distribution result of each user group to the users in the group is output, and the dimension is 2 ^M . The network super-parameters are set as follows: learning rate lr =0.01, a prize discount coefficient γ=0.9; in order to fully explore the environment state by the intelligent agent, the random exploration rate epsilon gradually decreases from 0.3 to 0 along with the increase of training step length; updating step length r of target network parameter theta _iter =100; experience pool capacity size e=500. In the rewards settingThe values of (2) are shown in Table 1.

Table 1 values of rewards during training phase

Analysis of results:

the training indexes of average rewards obtained from the whole user group and average processing time length of all the services under different scale scenes are compared with experimental results of the PER-DDQN method and the original DQN algorithm. The effectiveness of the PER-DDQN method at different sizes of user group sizes is demonstrated. When all the communication services in the spectrum environment are processed, the training is finished, and the experimental result is averaged every 20 rounds. Wherein the average rewarding curve reflects the convergence condition of the algorithm, and the average processing time length curve reflects the performance effect of the algorithm. The scene of fig. 5 a-5 b scales with 3 channels, 4 user groups, with 5 users in each user group (l=3, n=4, m=5). The scene of fig. 5 a-5 b scales with 6 channels, 4 user groups, with 10 users in each user group (l=6, n=4, m=10). The traffic information for each group is the same for both scales. As can be seen from fig. 5 a-5 b, when the number of channels and the number of users are small, the PER-DDQN algorithm is more stable than the DQN algorithm training, although the average rewards and service average processing duration results obtained by training using the PER-DDQN algorithm and the DQN algorithm are about the same. It can be seen from fig. 5 a-5 b that the advantages of the PER-DDQN algorithm are gradually revealed as the number of channels and the number of users increases. FIG. 5a shows that the PER-DDQN algorithm enables a higher prize to be obtained by a user group, indicating that the PER-DDQN algorithm can accumulate more efficient experience during training; fig. 5b shows that the PER-DDQN algorithm can achieve a shorter average processing time, which is shortened by about 20 slots on average compared to the DQN algorithm, proving that the PER-DDQN algorithm can enable the user group to learn a better allocation strategy.

The effectiveness and applicability of the PER-DDQN algorithm in a larger scale scene are further proved by expanding the system scale and increasing the service data volume of each group. Because of the more training rounds, the curves in the figure are training results averaged every 50 rounds. First, the system scale is enlarged, the number of channels and the number of user groups are increased, and fig. 6a to 6b show experimental results at a system scale of 8 channels and 6 user groups, each having 10 users. Then, the information content of each service is tripled under the original system scale, and the experimental results are shown in fig. 7 a-7 b. Both experimental results show that the PER-DDQN algorithm can be suitable for the multi-user group spectrum access problem in a large-scale scene, and compared with the DQN algorithm, the efficient training result can be obtained, and a better allocation strategy can be learned.

In conclusion, the method adopts a priority experience playback mechanism and combines the DQN algorithm of a double-network structure, so that the average processing time of the whole communication service is shortened. Modeling a user group spectrum access scene into a Markov decision model, regarding each user group as an agent, utilizing a channel selection result and the user condition in the group, and making a proper spectrum access decision through training and learning of a DRL model, thereby realizing high-efficiency information sharing and shortening service processing time. The performance of the algorithm provided by the application is verified by designing experiments under different scale scenes. Test results show that compared with the original DQN algorithm, the PER-DDQN algorithm has better performance and can learn a better allocation strategy.

Claims

1. A user group spectrum access method based on PER-DDQN is characterized by comprising the following steps:

2. The PER-DDQN based user group spectrum access method of claim 1, wherein:

the state space is the selected channel of each time slot and the service processing state of the users in the group;

the action of each user group is to select a transmitter and a receiver according to the service processing state of the users in the group, and the users as the transmitter access channels to broadcast transmission service data to the receiver users in the coverage area;

3. A PER-DDQN based subscriber group spectrum access method according to claim 1 or 2, wherein the method of constructing the subscriber group model is as follows:

4. The PER-DDQN based user group spectrum access method of claim 3, wherein:

when a user is selected as a transmitting party, there are 3 cases where the same group of users in their transmission range acts as receiving parties:

5. The PER-DDQN based user group spectrum access method of claim 4, wherein:

V _m ＝log ₂ (1+SINR _m ) (2)

setting the size of the business data volume to be processed by the ith user group as C _i The information increment of each time slot of the mth user in the ith user group is expressed asCan be calculated from formula (3):

if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; mu represents the SINR threshold reached by the user as a receiver requirementA value; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When the service processing of the user group is completed, the service processing of the user group is indicated to be completed, and the spectrum resources are not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):

6. the PER-DDQN based user group spectrum access method of claim 5, wherein the state space constructing method comprises the steps of:

dividing service processing time length into a plurality of time slots, wherein each channel is in an available state when each time slot starts; the state of the ith user group is divided into two parts: wherein ,/>Indicating the channel number selected by the current time slot of the ith user group, the ith user group selecting one of the L channels for access,/channel access>Indicating that the ith user group has processed the group of services, keeping a waiting state and not accessing the channel any more; />Indicating that the ith user group has selected the +.>A plurality of channels; />Indicating the user service processing state in the ith user group when the current time slot starts.When w is _m ＝C _i G at the time of _m =1, indicating that the mth user of the ith group has completed the service, and the user information amount is no longer increased; when w is _m ＜C _i G at the time of _m =0, indicating that the mth user of the ith group has not completed the service, and continues to receive, as a receiving party, the information transmitted by the corresponding transmitter user.

7. The PER-DDQN based user group spectrum access method of claim 6, wherein the construction method of the action space comprises the steps of:

8. The PER-DDQN based user group spectrum access method of claim 7, wherein the method of prize setting comprises the steps of:

1) Invalidation allocation penaltyWhen the mth user in the ith group is selected as the transmitting party, x _m =1, if the user does not have complete traffic data, the agent will receive a small penalty +.>Is negative;

5) Service processing duration penaltyIn order to whip the agent to process the service faster, the agent receives a penalty every time slot>Is negative and with increasing service processing duration, < >>Gradually increasing.

In summary, the prize settings are expressed as:

9. the PER-DDQN based user group spectrum access method of claim 8, wherein the method comprises the steps of:

firstly, initializing an original network and a target network of each intelligent agent, wherein the original network obtains network parameters of theta, and the network parameters of the target network are theta'; status s of ith user group at the beginning of each time slot _i Is input into the original network and random with epsilon probabilitySearch for environment and select action, select action a based on maximum action value with probability of 1-epsilon _i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a _i The rewards r are obtained from the environment _i And the state s 'of the next slot' _i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' _i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' _i Assigning a value to be the current time slot state;

the calculation of the network loss value is shown in formula (7):