CN116744311A - User group spectrum access method based on PER-DDQN - Google Patents

User group spectrum access method based on PER-DDQN Download PDF

Info

Publication number
CN116744311A
CN116744311A CN202310592111.5A CN202310592111A CN116744311A CN 116744311 A CN116744311 A CN 116744311A CN 202310592111 A CN202310592111 A CN 202310592111A CN 116744311 A CN116744311 A CN 116744311A
Authority
CN
China
Prior art keywords
user
group
user group
users
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310592111.5A
Other languages
Chinese (zh)
Other versions
CN116744311B (en
Inventor
魏祥麟
魏楠
范建华
胡永扬
赵框
王彦刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310592111.5A priority Critical patent/CN116744311B/en
Publication of CN116744311A publication Critical patent/CN116744311A/en
Application granted granted Critical
Publication of CN116744311B publication Critical patent/CN116744311B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/14Spectrum sharing arrangements between different networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/06Selective distribution of broadcast services, e.g. multimedia broadcast multicast service [MBMS]; Services to user groups; One-way selective calling services
    • H04W4/08User group management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/02Selection of wireless resources by user or terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/535Allocation or scheduling criteria for wireless resources based on resource usage policies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application discloses a user group spectrum access method based on PER-DDQN, which comprises the following steps: under a distributed dynamic spectrum access scene, each user group is taken as an agent, a single user group is taken as the agent to have respective DRL model, the distribution strategy is independently learned, and each DRL model adopts a priority experience playback mechanism and combines a DQN algorithm of a double-network structure; respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group; and performing spectrum access according to the processing condition of the member service in the group, realizing the transmission of the service information in the group, and completing the utilization of spectrum resources. The method has the advantages of higher convergence rate, better performance and the like.

Description

User group spectrum access method based on PER-DDQN
Technical Field
The application relates to the technical field of wireless communication networks, in particular to a user group spectrum access method based on PER-DDQN.
Background
Cognitive radio is considered as a powerful tool to address spectrum resource shortages and to improve spectrum utilization. However, the wireless communication channel is open and vulnerable to malicious attacks, so that the spectrum utilization rate of the cognitive wireless network is severely reduced. Therefore, the anti-interference communication capability of the cognitive wireless network is receiving more and more attention. The current cognitive radio technology mainly aims at researching the Dynamic Spectrum Access (DSA) problem of single user, and in practical application situations, as the number of users and the variety of communication services are gradually increased, the situation that the spectrum is shared in the form of single user nodes is easy to cause access confusion, serious inter-user interference and the like is easy to cause.
Disclosure of Invention
The technical problem to be solved by the application is how to provide a user group spectrum access method based on PER-DDQN with higher convergence rate and better performance.
In order to solve the technical problems, the application adopts the following technical scheme: a PER-DDQN based user group spectrum access method, the method comprising the steps of:
under a distributed dynamic spectrum access scene, each user group is used as an intelligent agent, a state space, an action space and rewarding setting of the intelligent agent are defined, and an authorized channel in the cognitive wireless network is selected; a single user group is taken as an agent to have respective DRL models, independent learning and distribution strategies are carried out, and each DRL model adopts a priority experience playback mechanism and combines a DQN algorithm of a double-network structure;
respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group;
and performing spectrum access according to the processing condition of the member service in the group, realizing the transmission of the service information in the group, and completing the utilization of spectrum resources.
The further technical proposal is that: the state space is the information quantity possession of the selected channel and the users in the group for each time slot;
the actions of each user group are to respectively select a transmitting party and a receiving party according to the information quantity of users in the group;
the benefit of the action is whether the spectrum access is reasonable or not, and whether the user information quantity in the group is increased or not, so that each user group is stimulated to make a correct allocation strategy.
The further technical scheme is that the method for constructing the user group model comprises the following steps:
dividing users under the wireless network into N user groups according to the types of communication services to be processed, wherein one user group processes a service, each service contains different data quantity, L authorized wireless channels are shared among the user groups, when each time slot starts, all the authorized channels are in an idle available state, one user group serves as an intelligent agent, one channel is randomly selected for access, and a transmitting party in the user group transmits data on the channel;
multiple user groups can be selectively accessed to the same channel, and spectrum resource sharing is realized through distributed spectrum access; the user group comprises M users, and the users in the group process business through information sharing; the intelligent agent selects part of users in the group to access the frequency spectrum, and the part of users are used as transmitting parties to broadcast and transmit service information to other users in the range; when all users in a group completely possess the required service information, indicating that the group service is completed, stopping channel access and ending transmission in the group; the purpose of the spectrum access of the user group is to fully utilize the user resources in the group and shorten the service processing time of the whole user group.
A further technical solution is that when a user is selected as a transmitting party, there are 3 cases where the same group of users in their transmission range acts as receiving parties:
1) The receiver only in the transmission range of one transmitter in the same group, and then receives the signals of the corresponding transmitters, and the rest signals are used as interference signals;
2) If the receiver is located in the coverage area within the transmission range of the same group of multiple transmitters, all signal-to-interference-and-noise ratios received by the receiver are calculated, the maximum signal-to-interference-and-noise ratio is selected as the corresponding transmitter, and the rest signals are used as interference signals;
3) The receiver is not in the transmission range of the transmitter in the same group, can not receive signals, and corresponds to the increment of user information of 0; when the user selected as the receiving party has already fully owned the traffic information volume, the user no longer receives information, corresponding to a user information increment of 0.
The further technical scheme is that the signal-to-interference-and-noise ratio received by the mth user as the receiving party is shown in a formula (1):
wherein p represents the transmission power as the transmitting user; i h mj | 2 Representing the channel gain from the mth user as the receiving party to the corresponding jth user as the transmitting party; i h mk | 2 The channel gain from the kth user as the interfering party to the mth user as the receiving party is represented, k noteq m and k noteq j; b represents the bandwidth of the channel; n (N) 0 Representing the noise spectrum density received by the user; the transmission rate of the mth user can be shown by equation (2):
V m =log 2 (1+SINR m ) (2)
setting the size of the business data volume to be processed by the ith user group as C i The information increment of each time slot of the mth user in the ith user group is expressed as Can be calculated from formula (3):
wherein ,indicating the information amount already owned by the mth user of the ith group,/>
If it isThe user is indicated to be received, the user is not taken as a receiver any more, and the file can be transmitted to other users as a sender;
if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; μ represents the SINR threshold that the user is required to reach as the receiver; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When the service processing of the user group is completed, the service processing of the user group is indicated to be completed, and the spectrum resources are not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):
the construction method of the state space comprises the following steps:
dividing service processing time length into a plurality of time slots, wherein each channel is in an available state when each time slot starts; the state of the ith user group is divided into two parts: wherein ,/>Indicating the channel number selected by the current time slot of the ith user group, the ith user group selecting one of the L channels for access,/channel access> Indicating that the ith user group has processed the group of services, keeping a waiting state and not accessing the channel any more; />Indicating that the ith user group has selected the +.>A plurality of channels; />Indicating the user service processing state in the ith user group when the current time slot starts.g m E {0,1}, when w m =C i G at the time of m =1, indicating that the mth user of the ith group has completed the service, and the user information amount is no longer increased; when w is m <C i G at the time of m =0, indicating that the mth user of the ith group has not completed the service, and continues to receive, as a receiving party, the information transmitted by the corresponding transmitter user.
The construction method of the action space comprises the following steps:
after each user group selects a channel, deciding which users are used as transmitting parties and which users are used as receiving parties according to the service processing states of the users in the group; the actions of the ith user group are expressed as: a, a i =[x 1 ,x 2 ,...,x M]; wherein xm E {0,1}, when x m When=1, it indicates that the mth user is selectedAs the transmitting side; x is x m =0 means that the mth user is selected as the receiving side; if the users in the group are selected as transmitters, the corresponding users as receivers feed back the received signal-to-noise ratio to the transmitters, and calculate interference with the receiver based on the positions of other transmitters on the same channel.
The further technical scheme is that the method for setting rewards comprises the following steps:
the principle of spectrum access is to utilize the user resources of completed service in group to realize the average time length minimization of the whole service processing, i-th agent rewarding r i Consists of 5 parts:
1) Invalidation allocation penaltyWhen the mth user in the ith group is selected as the transmitting party, x m =1, if the user does not have complete traffic data, the agent will receive a small penalty +.> Is negative;
2) User group information augmentation rewardsTotal data amount D of rewards and user groups i In agreement, i.e.)>
3) Completion rewards for individual user business processes within a groupSince the information increment of the user group gradually decreases with increasing slot, the reward +.>And also gradually decrease, thus giving the users in the group an additional completion service incentive,that is, each time a user in the group has all the information of the service, the agent gets an extra positive prize +.>And the user is selected again as the receiver, the information amount is not increased any more, thereby decreasing the bonus +.>Gradually reducing the negative influence;
4) Whole set business processing completion rewardsFor the ith user group (agent), if +.>Giving a larger positive prize R to the current agent 4 And ending the service processing of the group, and not participating in the channel access any more;
5) Service processing duration penaltyIn order to whip the agent to process the service faster, the agent receives a penalty every time slot> Is negative and with increasing service processing duration, < >>Gradually increasing.
In summary, the prize settings are expressed as:
the further technical scheme is that the method comprises the following steps:
the multi-user group DSA strategy is distributed, user information in the groups cannot be shared among the user groups, and each user group serving as an intelligent agent is provided with a corresponding DRL model to independently carry out spectrum access decision;
firstly, initializing an original network and a target network of each intelligent agent, wherein the original network obtains network parameters of theta, and the network parameters of the target network are theta'; status s of ith user group at the beginning of each time slot i Input into original network, randomly exploring environment with epsilon probability and selecting action, and selecting action a according to maximum action value with 1-epsilon probability i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a i The rewards r are obtained from the environment i And the state s 'of the next slot' i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' i Assigning a value to be the current time slot state;
then, the experience (s i ,a i ,r i ,s′ i ) Storing the experience information into an experience pool, and continuously extracting experience learning from the experience pool and training a network; adopting a preferential experience playback mechanism, and using non-uniform sampling to replace uniform sampling; when the TD value is larger, the difference between the current Q value and the target Q value is larger, and the current Q value should be updated more, which is regarded as effective experience; therefore, the priority experience revisit mechanism measures the value of experience by using the TD value, and prioritizes the experience samples in the experience pool; as shown in equation (5), priority is proportional to TD value:
then, calculating action value and network loss; the dual network architecture is employed to calculate action value, i.e., different action value functions are used to select and evaluate actions:
determining an action using an original network, determining an action value using a target network, and a target valueThe calculation of (2) is shown in formula (6):
the calculation of the network loss value is shown in formula (7):
and finally, copying the original network parameters theta to the target parameters theta' at regular intervals to finish updating the target network parameters.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: the method adopts a priority experience playback mechanism and combines the DQN algorithm of a double-network structure, thereby shortening the average processing time of the whole communication service. Modeling a user group spectrum access scene into a Markov decision model, regarding each user group as an agent, utilizing a channel selection result and the user condition in the group, and making a proper spectrum access decision through training and learning of a DRL model, thereby realizing high-efficiency information sharing and shortening service processing time. The performance of the algorithm provided by the application is verified by designing experiments under different scale scenes. Test results show that compared with the original DQN algorithm, the PER-DDQN algorithm has better performance and can learn a better allocation strategy.
Drawings
The application will be described in further detail with reference to the drawings and the detailed description.
FIG. 1 is a main flow chart of a method according to an embodiment of the present application;
FIG. 2 is a functional block diagram of a user group model in a method according to an embodiment of the present application;
FIG. 3 is a diagram of the relationship between a sender and other users in a method according to an embodiment of the present application;
FIG. 4 is a framework diagram of an algorithm model in a method according to an embodiment of the application;
FIG. 5a is a graph of average rewards achieved by an overall user group in an embodiment of the application;
FIG. 5b is a graph of average processing time duration for all traffic in an embodiment of the application;
FIG. 6a is a graph of average rewards achieved by an overall user group in an embodiment of the application;
FIG. 6b is a graph of average processing time duration for all traffic in an embodiment of the application;
FIG. 7a is a graph of average rewards achieved by an overall user group in an embodiment of the application;
fig. 7b is a graph of average processing time duration for all traffic in an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the embodiment of the application discloses a user group spectrum access method based on PER-DDQN, which comprises the following steps:
under a distributed dynamic spectrum access scene, each user group is used as an intelligent agent, firstly, a state space, an action space and rewarding setting of the intelligent agent are defined, and an authorized channel in a cognitive wireless network is selected; a single user group is taken as an agent to have respective DRL (deep reinforcement learning) models, independent learning and distribution strategies are carried out, and each DRL model adopts a preferential experience playback mechanism and combines a DQN algorithm of a double-network structure;
then respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group;
and finally, performing spectrum access according to the service processing condition of the members in the group, realizing the transmission of service information in the group, and completing the utilization of spectrum resources.
The method provides that the frequency spectrum is accessed in the form of a user group, the users in the network are divided into a plurality of user groups according to the types of communication services, the frequency spectrum access is performed according to the processing conditions of the member services in the groups, the service information transmission in the groups is realized, and the utilization of the frequency spectrum resources is completed. The user group spectrum access problem faces difficulties in terms of incompletely observed spectrum environments and complex intra-group user conditions compared to the DSA problem of a single secondary user node. Therefore, the user group in each time slot needs to consider not only the service completion situation of the users in the group, but also the mutual interference situation of the users and other user groups sharing the same spectrum resource.
Aiming at the spectrum access problem in the complex communication service spectrum environment, the method accesses the spectrum in the form of a user group, and provides a DRL (deep reinforcement learning) -based distributed user group spectrum access method. And each time slot of each user group randomly selects channel access, and no communication exists among the user groups. The individual user groups as agents possess respective DRL models, and independently learn allocation policies. Each DRL model adopts a preferential experience playback mechanism and combines the DQN algorithm of a double-network structure, and shortens the overall service processing time through detailed design of the rewarding function, and compared with the original DQN algorithm, the DRL model has better performance and faster convergence speed.
The constructed user group model is shown in fig. 2, the users under the wireless network are divided into N user groups according to the types of communication services to be processed, one user group processes a service, each service contains different data quantity, L authorized wireless channels are shared among the user groups, all authorized channels are in idle available states at the beginning of each time slot, one user group serves as an agent, a channel is randomly selected for access, and the transmitters in the group transmit data on the channel. Multiple user groups can select to access the same channel, and spectrum resource sharing is realized through distributed spectrum access. A user group contains M users, and the users in the group process business through information sharing. The intelligent agent selects part of users in the group to access the frequency spectrum, and the part of users serve as transmitters to broadcast and transmit service information to other users in the range of the users. When all users in a group have the required service information completely, the group service is completed, channel access is stopped and transmission in the group is finished. The purpose of the spectrum access of the user group is to fully utilize the user resources in the group and shorten the service processing time of the whole user group.
As shown in fig. 3, when a user is selected as a transmitting party, there are three cases in which the same group of users within its transmission range acts as receiving parties:
1) The receiver only in the transmission range of one transmitter in the same group, and then receives the signals of the corresponding transmitters, and the rest signals are used as interference signals;
2) If the receiver is located in the coverage area within the transmission range of the same group of multiple transmitters, all signal-to-interference-and-noise ratios received by the receiver are calculated, the maximum signal-to-interference-and-noise ratio is selected as the corresponding transmitter, and the rest signals are used as interference signals;
3) The receiver is not in the transmission range of the transmitter in the same group, can not receive signals, and corresponds to the increment of user information of 0; when the user selected as the receiving party has already fully owned the traffic information volume, the user no longer receives information, corresponding to a user information increment of 0.
The signal-to-interference-and-noise ratio received by the mth user as the receiving party is shown in formula (1):
wherein p represents the transmission power as the transmitting user; i h mj | 2 Representing the channel gain from the mth user as the receiving party to the corresponding jth user as the transmitting party; i h mk | 2 Representing the channel from the kth user as an interfering party to the mth user as a receiving partyGain, k+.m and k+.j; b represents the bandwidth of the channel; n (N) 0 Representing the noise spectrum density received by the user; the transmission rate of the mth user is shown in formula (2):
V m =log 2 (1+SINR m ) (2)
setting the size of the business data volume to be processed by the ith user group as C i The information increment of each time slot of the mth user in the ith user group is expressed as Can be calculated from formula (3):
wherein ,indicating the information amount already owned by the mth user of the ith group,/>
If it isThe user is indicated to be received, the user is not taken as a receiver any more, and the file can be transmitted to other users as a sender;
if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; μ represents the SINR threshold that the user is required to reach as the receiver; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When it is time, it means that the service processing of the user group is completed,the frequency spectrum resource is not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):
the user group spectrum access method based on PER-DDQN is designed:
the DRL is utilized to solve the problem of spectrum access of the user group in the DSA scene, and the state space, the action space and the rewarding setting of the agent are required to be defined in advance. Under the distributed dynamic spectrum access scene, each user group serves as an intelligent agent, an authorized channel in the cognitive wireless network is selected firstly, then a transmitter and a receiver are selected from users in the group respectively, and the selected channels are distributed to the transmitter for transmitting service information, so that information sharing in the group is realized. The state space has information about the channels selected for each time slot and the users in the group. The actions of each user group are to select the transmitter and the receiver according to the information quantity of the users in the group. The benefit of the action is whether the spectrum access is reasonable or not, and whether the user information quantity in the group is increased or not, so that each user group is stimulated to make a correct allocation strategy. The final objective of the user group spectrum access problem is to increase the channel utilization rate, and simultaneously, by reasonably setting a reward function, the information increment of the users in each time slot group is maximized, so that the service processing time is shortened.
State space:
dividing service processing time length into a plurality of time slots, wherein each channel is in an available state when each time slot starts; the state of the ith user group is divided into two parts: wherein ,/>Indicating the channel number selected by the current time slot of the ith user group, the ith user group selecting one of the L channels for access,/channel access> Indicating that the ith user group has processed the group of services, keeping a waiting state and not accessing the channel any more; />Indicating that the ith user group has selected the +.>A plurality of channels; />Indicating the user service processing state in the ith user group when the current time slot starts.g m E {0,1}, when w m =C i G at the time of m =1, indicating that the mth user of the ith group has completed the service, and the user information amount is no longer increased; when w is m <C i G at the time of m =0, indicating that the mth user of the ith group has not completed the service, and continues to receive, as a receiving party, the information transmitted by the corresponding transmitter user.
Action space:
after each user group selects a channel, deciding which users are used as transmitting parties and which users are used as receiving parties according to the service processing states of the users in the group; the actions of the ith user group are expressed as: a, a i =[x 1 ,x 2 ,...,x M]; wherein xm E {0,1}, when x m When=1, it means that the mth user is selected as the transmitting party; x is x m =0 means that the mth user is selected as the receiving side; if the users in the group are selected as transmitters, the corresponding users as receivers feed back the received signal-to-noise ratio to the transmitters, and calculate interference with the receiver based on the positions of other transmitters on the same channel.
Prize setting:
the user group is used as an intelligent agent, and the principle of spectrum access is to use the fact that the group is finishedUser resources forming service, minimizing average duration of overall service processing, rewarding r of ith intelligent agent i Consists of 5 parts:
1) Invalidation allocation penaltyWhen the mth user in the ith group is selected as the transmitting party, x m =1, if the user does not have complete traffic data, the agent will receive a small penalty +.> Is negative;
2) User group information augmentation rewardsTotal data amount D of rewards and user groups i In agreement, i.e.)>
3) Completion rewards for individual user business processes within a groupSince the information increment of the user group gradually decreases with increasing slot, the reward +.>And gradually decrease, thus giving the users in the group an extra service prize, i.e. the agent gets an extra positive prize +.>And the user is selected again as the receiver, the information amount is not increased any more, thereby decreasing the bonus +.>Gradually reducing the negative influence;
4) Whole set business processing completion rewardsFor the ith user group (agent), if +.>Giving a larger positive prize R to the current agent 4 And ending the service processing of the group, and not participating in the channel access any more;
5) Service processing duration penaltyIn order to whip the agent to process the service faster, the agent receives a penalty every time slot> Is negative and with increasing service processing duration, < >>Gradually increasing.
In summary, the prize settings are expressed as:
the method comprises the following steps:
the multi-user group DSA strategy is distributed, user information in the groups cannot be shared among the user groups, and each user group serving as an intelligent agent is provided with a corresponding DRL model to independently carry out spectrum access decision;
first, initializing an original network (network parameter θ) and a target network (network parameter θ') of each agent; status s of ith user group at the beginning of each time slot i Input into original network, randomly exploring environment with epsilon probability and selecting action, and selecting action a according to maximum action value with 1-epsilon probability i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a i The rewards r are obtained from the environment i And the state s 'of the next slot' i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' i Assigning a value to be the current time slot state;
then, the experience (s i ,a i ,r i ,s′ i ) Storing the experience information into an experience pool, and continuously extracting experience learning from the experience pool and training a network; adopting a preferential experience playback mechanism, and using non-uniform sampling to replace uniform sampling; when the TD value is larger, the difference between the current Q value and the target Q value is larger, and the current Q value should be updated more, which is regarded as effective experience; therefore, the priority experience revisit mechanism measures the value of experience by using the TD value, and prioritizes the experience samples in the experience pool; as shown in equation (5), priority is proportional to TD value:
then, calculating action value and network loss; the dual network architecture is employed to calculate action value, i.e., different action value functions are used to select and evaluate actions:
determining an action using an original network, determining an action value using a target network, and a target valueThe calculation of (2) is shown in formula (6):
the calculation of the network loss value is shown in formula (7):
and finally, copying the original network parameters theta to the target parameters theta' at regular intervals to finish updating the target network parameters.
The algorithm flow is as follows:
experimental setup and analysis:
to verify the effectiveness of the method, a series of simulated comparative experiments were designed. Firstly, verifying the effectiveness of the proposed algorithm in a smaller scale scene, then further expanding the system scale by increasing the number of users and increasing the amount of each group of business data, and again verifying the adaptability of the proposed algorithm to a large scale communication scene. The method and the system respectively compare the provided algorithm with the original DQN algorithm from two aspects of average rewards obtained by the user group during training and average finishing time of the whole business, and the provided algorithm can enable the user group to learn a better allocation strategy and has higher convergence rate. Finally, two parameters affecting the algorithm are analyzed, and a reliable reference is provided for parameter setting of the experiment in this section.
Initial parameter setting:
the experimental scene is arranged in a 200m multiplied by 200m plane space, the space occupied by each user group is 50m multiplied by 50m, and the positions of the users in the group are randomly distributed. The transmission range of each user is set at 30m 2 Within, the transmission power p s =20 mW. At the beginning of each slot, all channels are available.
The DRL model used in the experiment is a two-layer fully connected network, each layer of network comprises 128 neurons, and a Tanh activation function is adopted. Each user group is used as an agent to have a DRL network, the network inputs the channel number accessed by the user group of each time slot and the service completion condition of the users in the group, the dimension is M+1, the distribution result of each user group to the users in the group is output, and the dimension is 2 M . The network super-parameters are set as follows: learning rate lr =0.01, a prize discount coefficient γ=0.9; in order to fully explore the environment state by the intelligent agent, the random exploration rate epsilon gradually decreases from 0.3 to 0 along with the increase of training step length; updating step length r of target network parameter theta iter =100; experience pool capacity size e=500. In the rewards settingThe values of (2) are shown in Table 1.
Table 1 values of rewards during training phase
Analysis of results:
the training indexes of average rewards obtained from the whole user group and average processing time length of all the services under different scale scenes are compared with experimental results of the PER-DDQN method and the original DQN algorithm. The effectiveness of the PER-DDQN method at different sizes of user group sizes is demonstrated. When all the communication services in the spectrum environment are processed, the training is finished, and the experimental result is averaged every 20 rounds. Wherein the average rewarding curve reflects the convergence condition of the algorithm, and the average processing time length curve reflects the performance effect of the algorithm. The scene of fig. 5 a-5 b scales with 3 channels, 4 user groups, with 5 users in each user group (l=3, n=4, m=5). The scene of fig. 5 a-5 b scales with 6 channels, 4 user groups, with 10 users in each user group (l=6, n=4, m=10). The traffic information for each group is the same for both scales. As can be seen from fig. 5 a-5 b, when the number of channels and the number of users are small, the PER-DDQN algorithm is more stable than the DQN algorithm training, although the average rewards and service average processing duration results obtained by training using the PER-DDQN algorithm and the DQN algorithm are about the same. It can be seen from fig. 5 a-5 b that the advantages of the PER-DDQN algorithm are gradually revealed as the number of channels and the number of users increases. FIG. 5a shows that the PER-DDQN algorithm enables a higher prize to be obtained by a user group, indicating that the PER-DDQN algorithm can accumulate more efficient experience during training; fig. 5b shows that the PER-DDQN algorithm can achieve a shorter average processing time, which is shortened by about 20 slots on average compared to the DQN algorithm, proving that the PER-DDQN algorithm can enable the user group to learn a better allocation strategy.
The effectiveness and applicability of the PER-DDQN algorithm in a larger scale scene are further proved by expanding the system scale and increasing the service data volume of each group. Because of the more training rounds, the curves in the figure are training results averaged every 50 rounds. First, the system scale is enlarged, the number of channels and the number of user groups are increased, and fig. 6a to 6b show experimental results at a system scale of 8 channels and 6 user groups, each having 10 users. Then, the information content of each service is tripled under the original system scale, and the experimental results are shown in fig. 7 a-7 b. Both experimental results show that the PER-DDQN algorithm can be suitable for the multi-user group spectrum access problem in a large-scale scene, and compared with the DQN algorithm, the efficient training result can be obtained, and a better allocation strategy can be learned.
In conclusion, the method adopts a priority experience playback mechanism and combines the DQN algorithm of a double-network structure, so that the average processing time of the whole communication service is shortened. Modeling a user group spectrum access scene into a Markov decision model, regarding each user group as an agent, utilizing a channel selection result and the user condition in the group, and making a proper spectrum access decision through training and learning of a DRL model, thereby realizing high-efficiency information sharing and shortening service processing time. The performance of the algorithm provided by the application is verified by designing experiments under different scale scenes. Test results show that compared with the original DQN algorithm, the PER-DDQN algorithm has better performance and can learn a better allocation strategy.

Claims (9)

1. A user group spectrum access method based on PER-DDQN is characterized by comprising the following steps:
under a distributed dynamic spectrum access scene, each user group is used as an intelligent agent, a state space, an action space and rewarding setting of the intelligent agent are defined, and an authorized channel in the cognitive wireless network is selected; a single user group is taken as an agent to have respective DRL models, independent learning and distribution strategies are carried out, and each DRL model adopts a priority experience playback mechanism and combines a DQN algorithm of a double-network structure;
respectively selecting a transmitting party and a receiving party from users in the group, and distributing the selected channels to the transmitting party for transmitting service information so as to realize information sharing in the group;
and performing spectrum access according to the processing condition of the member service in the group, realizing the transmission of the service information in the group, and completing the utilization of spectrum resources.
2. The PER-DDQN based user group spectrum access method of claim 1, wherein:
the state space is the selected channel of each time slot and the service processing state of the users in the group;
the action of each user group is to select a transmitter and a receiver according to the service processing state of the users in the group, and the users as the transmitter access channels to broadcast transmission service data to the receiver users in the coverage area;
the benefit of the action is whether the spectrum access is reasonable or not, and whether the user information quantity in the group is increased or not, so that each user group is stimulated to make a correct allocation strategy.
3. A PER-DDQN based subscriber group spectrum access method according to claim 1 or 2, wherein the method of constructing the subscriber group model is as follows:
dividing users under the wireless network into N user groups according to the types of communication services to be processed, wherein one user group processes a service, each service contains different data quantity, L authorized wireless channels are shared among the user groups, when each time slot starts, all the authorized channels are in an idle available state, one user group serves as an intelligent agent, one channel is randomly selected for access, and a transmitting party in the user group transmits data on the channel;
multiple user groups can be selectively accessed to the same channel, and spectrum resource sharing is realized through distributed spectrum access; the user group comprises M users, and the users in the group process business through information sharing; the intelligent agent selects part of users in the group to access the frequency spectrum, and the part of users are used as transmitting parties to broadcast and transmit service information to other users in the range; when all users in a group completely possess the required service information, indicating that the group service is completed, stopping channel access and ending transmission in the group; the purpose of the spectrum access of the user group is to fully utilize the user resources in the group and shorten the service processing time of the whole user group.
4. The PER-DDQN based user group spectrum access method of claim 3, wherein:
when a user is selected as a transmitting party, there are 3 cases where the same group of users in their transmission range acts as receiving parties:
1) The receiver only in the transmission range of one transmitter in the same group, and then receives the signals of the corresponding transmitters, and the rest signals are used as interference signals;
2) If the receiver is located in the coverage area within the transmission range of the same group of multiple transmitters, all signal-to-interference-and-noise ratios received by the receiver are calculated, the maximum signal-to-interference-and-noise ratio is selected as the corresponding transmitter, and the rest signals are used as interference signals;
3) The receiver is not in the transmission range of the transmitter in the same group, can not receive signals, and corresponds to the increment of user information of 0; when the user selected as the receiving party has already fully owned the traffic information volume, the user no longer receives information, corresponding to a user information increment of 0.
5. The PER-DDQN based user group spectrum access method of claim 4, wherein:
the signal-to-interference-and-noise ratio received by the mth user as the receiving party is shown in formula (1):
wherein p represents the transmission power as the transmitting user; i h mj | 2 Representing the channel gain from the mth user as the receiving party to the corresponding jth user as the transmitting party; i h mk | 2 The channel gain from the kth user as the interfering party to the mth user as the receiving party is represented, k noteq m and k noteq j; b represents the bandwidth of the channel; n (N) 0 Representing the noise spectrum density received by the user; the transmission rate of the mth user can be shown by equation (2):
V m =log 2 (1+SINR m ) (2)
setting the size of the business data volume to be processed by the ith user group as C i The information increment of each time slot of the mth user in the ith user group is expressed asCan be calculated from formula (3):
wherein ,indicating the information amount already owned by the mth user of the ith group,/>
If it isThe user is indicated to be received, the user is not taken as a receiver any more, and the file can be transmitted to other users as a sender;
if it isThen it indicates that the user did not fully receive the service file and will continue to be the recipient; mu represents the SINR threshold reached by the user as a receiver requirementA value; SINR received by all receiver users is larger than or equal to a set threshold value, otherwise, transmission fails, and the information increment is 0; when->When the service processing of the user group is completed, the service processing of the user group is indicated to be completed, and the spectrum resources are not occupied any more; therefore, the objective of optimizing the user group spectrum access problem is as shown in formula (4):
6. the PER-DDQN based user group spectrum access method of claim 5, wherein the state space constructing method comprises the steps of:
dividing service processing time length into a plurality of time slots, wherein each channel is in an available state when each time slot starts; the state of the ith user group is divided into two parts: wherein ,/>Indicating the channel number selected by the current time slot of the ith user group, the ith user group selecting one of the L channels for access,/channel access>Indicating that the ith user group has processed the group of services, keeping a waiting state and not accessing the channel any more; />Indicating that the ith user group has selected the +.>A plurality of channels; />Indicating the user service processing state in the ith user group when the current time slot starts.When w is m =C i G at the time of m =1, indicating that the mth user of the ith group has completed the service, and the user information amount is no longer increased; when w is m <C i G at the time of m =0, indicating that the mth user of the ith group has not completed the service, and continues to receive, as a receiving party, the information transmitted by the corresponding transmitter user.
7. The PER-DDQN based user group spectrum access method of claim 6, wherein the construction method of the action space comprises the steps of:
after each user group selects a channel, deciding which users are used as transmitting parties and which users are used as receiving parties according to the service processing states of the users in the group; the actions of the ith user group are expressed as: a, a i =[x 1 ,x 2 ,...,x M]; wherein xm E {0,1}, when x m When=1, it means that the mth user is selected as the transmitting party; x is x m =0 means that the mth user is selected as the receiving side; if the users in the group are selected as transmitters, the corresponding users as receivers feed back the received signal-to-noise ratio to the transmitters, and calculate interference with the receiver based on the positions of other transmitters on the same channel.
8. The PER-DDQN based user group spectrum access method of claim 7, wherein the method of prize setting comprises the steps of:
the principle of spectrum access is to utilize the user resources of completed service in group to realize the average time length minimization of the whole service processing, i-th agent rewarding r i Consists of 5 parts:
1) Invalidation allocation penaltyWhen the mth user in the ith group is selected as the transmitting party, x m =1, if the user does not have complete traffic data, the agent will receive a small penalty +.>Is negative;
2) User group information augmentation rewardsTotal data amount D of rewards and user groups i In agreement, i.e.)>
3) Completion rewards for individual user business processes within a groupSince the information increment of the user group gradually decreases with increasing slot, the reward +.>And gradually decrease, thus giving the users in the group an extra service prize, i.e. the agent gets an extra positive prize +.>And the user is selected again as the receiver, the information amount is not increased any more, thereby decreasing the bonus +.>Gradually reducing the negative influence;
4) Whole set business processing completion rewardsFor the ith user group (agent), if +.>Giving a larger positive prize R to the current agent 4 And ending the service processing of the group, and not participating in the channel access any more;
5) Service processing duration penaltyIn order to whip the agent to process the service faster, the agent receives a penalty every time slot>Is negative and with increasing service processing duration, < >>Gradually increasing.
In summary, the prize settings are expressed as:
9. the PER-DDQN based user group spectrum access method of claim 8, wherein the method comprises the steps of:
the multi-user group DSA strategy is distributed, user information in the groups cannot be shared among the user groups, and each user group serving as an intelligent agent is provided with a corresponding DRL model to independently carry out spectrum access decision;
firstly, initializing an original network and a target network of each intelligent agent, wherein the original network obtains network parameters of theta, and the network parameters of the target network are theta'; status s of ith user group at the beginning of each time slot i Is input into the original network and random with epsilon probabilitySearch for environment and select action, select action a based on maximum action value with probability of 1-epsilon i The method comprises the steps of carrying out a first treatment on the surface of the The ith user group is taking action a i The rewards r are obtained from the environment i And the state s 'of the next slot' i The method comprises the steps of carrying out a first treatment on the surface of the If the ith user group has completed the service s' i Channel selection in (a)Set to 0, indicating that the subscriber group is no longer accessing the spectrum, then the state s 'of the next slot' i Assigning a value to be the current time slot state;
then, the experience (s i ,a i ,r i ,s′ i ) Storing the experience information into an experience pool, and continuously extracting experience learning from the experience pool and training a network; adopting a preferential experience playback mechanism, and using non-uniform sampling to replace uniform sampling; when the TD value is larger, the difference between the current Q value and the target Q value is larger, and the current Q value should be updated more, which is regarded as effective experience; therefore, the priority experience revisit mechanism measures the value of experience by using the TD value, and prioritizes the experience samples in the experience pool; as shown in equation (5), priority is proportional to TD value:
then, calculating action value and network loss; the dual network architecture is employed to calculate action value, i.e., different action value functions are used to select and evaluate actions:
determining an action using an original network, determining an action value using a target network, and a target valueThe calculation of (2) is shown in formula (6):
the calculation of the network loss value is shown in formula (7):
and finally, copying the original network parameters theta to the target parameters theta' at regular intervals to finish updating the target network parameters.
CN202310592111.5A 2023-05-24 2023-05-24 User group spectrum access method based on PER-DDQN Active CN116744311B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310592111.5A CN116744311B (en) 2023-05-24 2023-05-24 User group spectrum access method based on PER-DDQN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310592111.5A CN116744311B (en) 2023-05-24 2023-05-24 User group spectrum access method based on PER-DDQN

Publications (2)

Publication Number Publication Date
CN116744311A true CN116744311A (en) 2023-09-12
CN116744311B CN116744311B (en) 2024-03-22

Family

ID=87900264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310592111.5A Active CN116744311B (en) 2023-05-24 2023-05-24 User group spectrum access method based on PER-DDQN

Country Status (1)

Country Link
CN (1) CN116744311B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675054A (en) * 2024-02-02 2024-03-08 中国电子科技集团公司第十研究所 Multi-domain combined anti-interference intelligent decision method and system
CN117750525A (en) * 2024-02-19 2024-03-22 中国电子科技集团公司第十研究所 Frequency domain anti-interference method and system based on reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
US20210326695A1 (en) * 2020-04-21 2021-10-21 Caci, Inc. - Federal Method and apparatus employing distributed sensing and deep learning for dynamic spectrum access and spectrum sharing
WO2022130254A1 (en) * 2020-12-16 2022-06-23 Telefonaktiebolaget Lm Ericsson (Publ) Method, device and apparatus for optimizing grant free uplink transmission of machine to machine (m2m) type devices
CN115190489A (en) * 2022-07-07 2022-10-14 内蒙古大学 Cognitive wireless network dynamic spectrum access method based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383922A (en) * 2019-07-07 2021-02-19 东北大学秦皇岛分校 Deep reinforcement learning frequency spectrum sharing method based on prior experience replay
US20210326695A1 (en) * 2020-04-21 2021-10-21 Caci, Inc. - Federal Method and apparatus employing distributed sensing and deep learning for dynamic spectrum access and spectrum sharing
WO2022130254A1 (en) * 2020-12-16 2022-06-23 Telefonaktiebolaget Lm Ericsson (Publ) Method, device and apparatus for optimizing grant free uplink transmission of machine to machine (m2m) type devices
CN115190489A (en) * 2022-07-07 2022-10-14 内蒙古大学 Cognitive wireless network dynamic spectrum access method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
盘小娜: "认知无线传感器网络中频谱接入与频谱资源分配算法的研究", ,《中国优秀硕士学位论文全文数据库信息科技辑》, pages 3 *
魏楠等人: "面向频谱接入深度强化学习模型的后门攻击方法", 《面向频谱接入深度强化学习模型的后门攻击方法》, pages 2 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675054A (en) * 2024-02-02 2024-03-08 中国电子科技集团公司第十研究所 Multi-domain combined anti-interference intelligent decision method and system
CN117675054B (en) * 2024-02-02 2024-04-23 中国电子科技集团公司第十研究所 Multi-domain combined anti-interference intelligent decision method and system
CN117750525A (en) * 2024-02-19 2024-03-22 中国电子科技集团公司第十研究所 Frequency domain anti-interference method and system based on reinforcement learning
CN117750525B (en) * 2024-02-19 2024-05-31 中国电子科技集团公司第十研究所 Frequency domain anti-interference method and system based on reinforcement learning

Also Published As

Publication number Publication date
CN116744311B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN116744311B (en) User group spectrum access method based on PER-DDQN
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
Liu et al. Distributed Q-learning aided uplink grant-free NOMA for massive machine-type communications
CN113709701B (en) Millimeter wave vehicle networking combined beam distribution and relay selection method, system and equipment
CN110492955B (en) Spectrum prediction switching method based on transfer learning strategy
CN110856268B (en) Dynamic multichannel access method for wireless network
CN113242601B (en) NOMA system resource allocation method based on optimized sample sampling and storage medium
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
CN106454700A (en) D2D (Device-to-Device) communication content distribution scheme based on social network
CN113207127B (en) Dynamic spectrum access method based on hierarchical deep reinforcement learning in NOMA system
CN105873214A (en) Resource allocation method of D2D communication system based on genetic algorithm
CN112153744B (en) Physical layer security resource allocation method in ICV network
CN108712746A (en) One kind partly overlaps channel aggregation betting model and learning algorithm
CN102448071B (en) Cognitive network power distribution method based on interference temperature
CN105792218A (en) Optimization method of cognitive radio network with radio frequency energy harvesting capability
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
CN114828018A (en) Multi-user mobile edge computing unloading method based on depth certainty strategy gradient
CN111669759A (en) Dynamic multi-channel cooperative sensing method based on deep Q network
CN111741450A (en) Network flow prediction method and device and electronic equipment
CN114375058A (en) Task queue aware edge computing real-time channel allocation and task unloading method
Gao et al. Reinforcement learning based resource allocation in cache-enabled small cell networks with mobile users
CN108811143A (en) Uplink based on user activity is exempted to authorize the CTU distribution methods of SCMA
CN110062399A (en) A kind of cognition isomery cellular network frequency spectrum distributing method based on game theory
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
CN114938543A (en) Honeycomb heterogeneous network resource allocation method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant