CN117545094A

CN117545094A - Dynamic spectrum resource allocation method for hierarchical heterogeneous cognitive wireless sensor network

Info

Publication number: CN117545094A
Application number: CN202410026070.8A
Authority: CN
Inventors: 孙璐; 韩振远; 万良田; 林云
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-02-09
Anticipated expiration: 2044-01-09
Also published as: CN117545094B

Abstract

The invention provides a dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensor network, which relates to the technical field of cognitive radio dynamic spectrum access and comprises the following steps of S1, establishing a basic framework of the hierarchical heterogeneous sensor cognitive wireless sensor network in a real-time task scene; s2, carrying out mathematical modeling on spectrum resource allocation in the hierarchical heterogeneous sensor network according to the basic framework to obtain a mathematical model; s3, according to the mathematical model obtained in the S2, dynamic spectrum allocation is carried out on the sensor network by utilizing a lasting MADQN algorithm. The invention optimizes the total throughput of the system and the collision rate of the primary and secondary users of the cognitive wireless network at the same time, and improves the spectrum utilization rate of the primary user channel in the cognitive wireless network through the system.

Description

Dynamic spectrum resource allocation method for hierarchical heterogeneous cognitive wireless sensor network

Technical Field

The invention relates to the technical field of cognitive radio dynamic spectrum access, in particular to a dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensing network.

Background

With the rapid development of the internet of things and the rapid development of various information technologies in recent years, information technologies such as artificial intelligence, big data and 5G have penetrated into various aspects of human life, great convenience is brought to human life, and the wireless sensor network (Wireless Sensor Network, WSN) has also brought forward the rapid development due to the advantages of capability of collecting various data information, rapid deployment, good concealment, low cost and the like. Meanwhile, with the rapid development and wide application of wireless communication, the number of various wireless communication modes is increased, and the traditional static allocation mode cannot meet the increasing frequency demand of people, so that research and application of a cognitive wireless sensor network (Cognitive Radio Sensor Network, CRSN) are proposed and advocated in the academia. Due to the traditional static allocation mode, a large amount of idle spectrum resources exist in the authorized frequency band, the utilization rate of part of the authorized frequency band is even less than 1%, even if the authorized user does not use the spectrum in the authorized frequency band, the unauthorized user cannot use the spectrum, spectrum holes exist, serious waste of the spectrum resources is caused by unreasonable spectrum resource allocation, and the cognitive radio technology is applied to the wireless sensor network, so that the cognitive radio technology can autonomously search for the spectrum holes and access the spectrum holes, and the cognitive radio technology has very important significance for improving the spectrum utilization rate of the authorized frequency band and the communication quality of the sensor network.

In the existing cognitive wireless sensor network, the sensor network is used for selecting frequency spectrum holes and accessing frequency spectrum, the problem of energy consumption is always a disadvantage of each sensor network, and for large-scale arranged sensor nodes, the mutual interference among cognitive users caused by insufficient available frequency spectrum holes is caused, so that the communication quality of each wireless device is seriously affected.

For the cognitive wireless sensor network, the collision rate and average access rate of the cognitive user, the main user and the cognitive user are always important indexes for evaluating the effectiveness of the algorithm. In recent years, the research on CRSN systems is also increasing year by year, but most of the CRSN systems are single sensor nodes or the basic algorithm for strengthening learning by single intelligent agent, and the CRSN systems are not in accordance with the characteristics of an actual sensor network for the single sensor nodes, so that the research significance is not great; the common single-agent reinforcement learning is applied to the multi-node sensor network, and the problems of slower convergence time, higher collision rate between primary and secondary users and the like can be caused due to larger dimension of the action space, and the method also does not meet the practical requirements.

Disclosure of Invention

In view of the above, the invention aims to provide a dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensing network, which aims to solve the problems of slower convergence time and higher collision rate between primary and secondary users in the existing allocation method.

The invention adopts the following technical means:

a dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensing network comprises the following steps:

s1, establishing a basic framework of a hierarchical heterogeneous sensor cognitive wireless sensor network in a real-time task scene;

s2, carrying out mathematical modeling on spectrum resource allocation in the hierarchical heterogeneous sensor network according to the basic framework to obtain a mathematical model;

s3, according to the mathematical model obtained in the S2, dynamic spectrum allocation is carried out on the sensor network by utilizing a lasting MADQN algorithm.

Further, S1 specifically includes the following steps:

s11, arranging common sensor nodes in each area to detect information, and uploading the detected information to a microcomputer center in the area, wherein the types of sensors in the hierarchical heterogeneous sensor network are different;

and S12, the microcomputer center receives detection results of the sensors and integrates information, acquires state information of a main user channel, searches spectrum holes for spectrum access, improves spectrum utilization rate and completes information transmission, and thus a basic framework of the hierarchical heterogeneous sensor cognitive wireless sensor network in a real-time task scene is obtained.

Further, S2 specifically includes the following steps:

s21, acquiring position information of a secondary user base station, and calculating path loss of an expected signal by adopting a WINNER II channel model according to the position information of the secondary user base station and the signal link distance information, wherein the calculation formula is as follows:

wherein:representing the carrier frequency of the wireless channel; />Representing path loss at a reference distance; />Path loss index; />Representing path loss frequency dependence;

s22, deducing the channel gain between the secondary user and the base station by utilizing a Rayleigh channel model, wherein the calculation formula is as follows:

wherein:is determined by the path loss; k is a k factor representing the ratio of LOS path received signal power to scattering path; />The uniform distribution of values between 0 and 1 is represented, and CN (DEG) represents a circularly symmetric complex Gaussian random variable.

Further, S3 specifically includes the following steps:

s31, calculating the signal-to-interference-and-noise ratio of the primary user and the secondary user through a primary user base station to obtain the value of the signal-to-interference-and-noise ratio, so as to determine whether the secondary user interferes with the communication quality of the primary user in the transmission process, and judging the value of the access behavior of the secondary user;

s32, based on the value of the signal-to-interference-and-noise ratio, dynamic spectrum allocation is carried out by utilizing the lasting MADQN, wherein each microcomputer center respectively represents one agent, corresponding actions are taken to act on the environment based on the states of each agent, and each agent learns other agents as a part of the environment.

Further, S31 specifically includes the following steps:

s311, calculating the signal-to-interference-plus-noise ratio of the primary and secondary users by the base station of the primary and secondary users, wherein the calculation formula of the signal-to-interference-plus-noise ratio of the secondary users is as follows:

wherein:,/>,/>respectively expressed as->，/>，/>The transmit power at the j-th wireless channel;，/>，/>respectively represent transmitter +.>Link channel gain with the receiver; B. and (2)>Representing channel bandwidth and noise spectral density, respectively;

s312, when the SINR of the secondary user is received at the base station, the channel capacity calculation formula is as follows:

wherein: b represents the channel bandwidth; k represents the SINR gap between the channel capacity and the actual coding scheme;

s313, calculating according to the value of the signal-to-interference-and-noise ratio at the secondary user base station, and judging according to the mutual influence according to the communication rate when two secondary users access the idle channel at the same time, wherein the formula of the communication rate is that。

Further, in S32, when the microcomputer center acquires the channel state information of the main user, a certain probability of a sensing error occurs, and the channel state is predicted to a certain extent by using the historical experience information through the memory pool of the deep Q network.

Further, the step of the Dueling MADQN algorithm in S32 is as follows:

s321, each micro-computing center represents an intelligent agent, the channel state of each main user is detected through spectrum sensing, spectrum holes are found, the channel state is transmitted to a neural network of each intelligent agent as the state of the intelligent agent, the output of the neural network is the Q value of each action, the action space of each intelligent agent is N+1, N represents the number of channels, the intelligent agent can select an action according to a ɛ -greedy strategy, namely, the intelligent agent randomly selects an action according to a probability ɛ, the action with the largest Q value output by the neural network is selected according to a probability 1- ɛ, and the balance between the exploration and the utilization of the intelligent agent is ensured through a ɛ -greedy strategy;

s322, the agent acts on the environment after selecting actions according to ɛ -greedy strategies, and the environment gives a reward according to the actions of the agent; based on this model, when the SU accesses a channel that is not used by the PU and SU, its reward function isThe method comprises the steps of carrying out a first treatment on the surface of the When SU accesses a channel which is occupied by PU, the primary and secondary users collide, and the rewarding function is-C; when more than two SUs access the same channel, collision occurs between the SUs, so that the signal-to-interference-and-noise ratio is also affected, resulting in a decrease in transmission rate, with a reward function of +>The method comprises the steps of carrying out a first treatment on the surface of the When no channel is accessed, the bonus function is 0;

s323, storing each group of variables in an experience pool, sampling N data from the experience pool when the data in the experience pool are sufficient, calculating each data by using a target network, minimizing target loss, and updating the current network so as to increase the judging capability of the current network on the action, make more accurate judgment, and improve the access success rate of SU.

Further, in S321, the neural network is:

wherein: the value of V is the average of the Q values in this state, while the value of A is the dominance function, and the average of the A values is 0.

The invention also provides a storage medium comprising a stored program, wherein when the program runs, the dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensing network is executed.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor runs and executes the dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensing network by the computer program.

Compared with the prior art, the invention has the following advantages:

1. the multi-agent duel depth Q network algorithm provided by the invention is also called a Dueling MADQN algorithm, the Dueling DQN is applied to the MADQN, the algorithm divides the Q value output by the neural network into the sum of a state value function V and a dominance function A while the characteristics of agent interaction, the neural network, an experience pool and the like of the MADQN are reserved, the convergence speed and the effectiveness of the algorithm when the state action space is large are improved, the collision probability between primary users and secondary users is reduced, and the spectrum utilization rate of an authorized frequency band is improved. Compared with the traditional research method, the multi-agent reinforcement learning is applied to learn each cognitive user as an agent and other agents as a part of the environment, and the Dueling DQN is introduced into the multi-agent reinforcement learning, so that the problems of full convergence speed, low average access success rate and the like caused by large action space can be effectively avoided, and the multi-agent reinforcement learning method has the characteristics of being good in stability, wide in applicability and the like.

2. And secondly, the hierarchical heterogeneous sensor network is applied to the cognitive wireless sensor network, and a microcomputer center is arranged in each area to perform information fusion and spectrum access. The energy consumption problem of the sensor node and the communication quality problem caused by the large scale and the insufficient computing power of the sensor are effectively reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a basic frame diagram of the present invention.

Fig. 3 is a graph comparing the average access success rate algorithm results of pu=6 and su=4.

Fig. 4 is a graph comparing the average bonus algorithm results of su=4 when pu=6.

Fig. 5 is a graph comparing the average access success rate algorithm results of pu=24 and su=2.

Fig. 6 is a graph comparing the average bonus algorithm results when pu=24 and su=2.

Fig. 7 is a graph comparing the average access success rate algorithm results of pu=24 and su=6.

Fig. 8 is a graph comparing the average bonus algorithm results when pu=24 and su=6.

Fig. 9 is a graph comparing the average access success rate algorithm results of pu=24 and su=16.

Fig. 10 is a graph comparing the average bonus algorithm results when pu=24 and su=16.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1 and 2, the invention provides a dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensing network, which comprises the following steps:

s1, the types of sensors in the hierarchical heterogeneous sensor network are different, common sensor nodes in each area are responsible for information detection, detection results are uploaded to a microcomputer center in the area, the microcomputer center can integrate information after receiving the detection results of the sensors, spectrum holes are searched for spectrum access, and a basic framework of the hierarchical heterogeneous sensor cognitive wireless sensor network in a real-time task scene is obtained;

s11, detecting the change state of the environmental information by a common sensor, and uploading the detection information to a microcomputer center in the area;

s12, acquiring state information of a main user channel through a microcomputer center, searching a spectrum cavity for spectrum access, thereby improving spectrum utilization rate and completing information transmission;

wherein:representing the carrier frequency of the wireless channel; />Representing the path loss at the reference distance; />Path loss index; />Representing the path loss frequency dependence.

wherein:is determined by the path loss; k is a K factor representing the ratio of LOS path received signal power to scattering path; />Represents the uniform distribution of values between 0 and 1, and CN () represents a circularly symmetric complex Gaussian random variable.

S3, according to the mathematical model obtained in the S2, dynamic spectrum allocation is carried out by utilizing a lasting MADQN, and the microcomputer center uploads information in the environment to the base station on the premise of not affecting normal communication of a main user;

s31, calculating the signal-to-interference-and-noise ratio of the primary user and the secondary user through a primary user base station, so as to determine whether the secondary user interferes with the communication quality of the primary user in the transmission process, and judging the value of the access behavior of the secondary user;

wherein:,/>,/>respectively expressed as->，/>，/>The transmit power at the j-th wireless channel;，/>，/>respectively represent transmitter +.>Link channel gain with the receiver; B. and (2)>Representing the channel bandwidth and noise spectral density, respectively.

S312, when the signal-to-interference-and-noise ratio (SINR) of the secondary user is received at the base station, the channel capacity calculation formula is as follows:

wherein: b represents the channel bandwidth; k represents the SINR gap between the channel capacity and the actual coding scheme.

S313, calculating according to the value of the signal-to-interference-and-noise ratio at the base station of the secondary user, and judging the interaction between the primary user and the secondary user according to the communication rate when the two secondary users are simultaneously accessed to the idle channel;

s32, dynamic spectrum allocation is carried out by utilizing the lasting MADQN, wherein each microcomputer center respectively represents one agent, corresponding actions are taken to act on the environment based on the states of each agent, and each agent learns that other agents are considered as part of the environment.

In a neural network of a single agent Dueling DQN, the Q network is modeled as:

wherein: the V value can be regarded as the average value of the Q value in this state, and the a value is called the dominance function, whose average value is 0.

In the common DQN, when the Q value of a certain action needs to be updated, the Q network is directly updated, the Q value of the action is improved, and the intelligent agent does not consider the difference between different actions at the moment; in the lasting DQN, the sum of the a values must be 0, so the network will update the V value preferentially, and the V value is the average value of Q, and the adjustment of the average value corresponds to one pass of all Q values, so that more values can be updated in fewer times. The Dueling DQN is able to learn the state cost function more efficiently and the reporting value is also larger when the motion space is increased than conventionally the DQN is more stable.

S321, each microcomputer center represents an intelligent agent, the channel state of each main user is detected through spectrum sensing, spectrum holes are found, the state of the intelligent agent is transmitted to a neural network of each intelligent agent, the output of the neural network is the Q value of each action, the action space of each intelligent agent is N+1, N represents the number of channels, the intelligent agent can select an action according to a ɛ -greedy strategy, namely, the intelligent agent randomly selects an action according to a probability ɛ, the action with the maximum Q value of the output of the neural network is selected according to a probability 1- ɛ, and the balance of the exploration and the utilization of the intelligent agent can be ensured through a ɛ -greedy strategy.

S322, the agent selects actions according to the ɛ -greedy strategy and then acts on the environment, and the environment gives a reward according to the actions of the agent. Based on this model, when the SU accesses a channel that is not used by the PU and SU, its reward function isThe method comprises the steps of carrying out a first treatment on the surface of the When SU accesses a channel which is occupied by PU, the primary and secondary users collide, and the rewarding function is-C; when more than two SUs access the same channel, collision occurs between the SUs, so that the signal-to-interference-and-noise ratio is also affected, resulting in a decrease in transmission rate, with a reward function of +>The method comprises the steps of carrying out a first treatment on the surface of the When no channel is accessed, the bonus function is 0.

And storing the variables of each group in an experience pool, sampling N data from the experience pool when the data of the experience pool are sufficient, calculating each data by using a target network, and minimizing target loss, so as to update the current network, increase the judging capability of the current network on the action, enable the current network to make more accurate judgment, and improve the access success rate of SU.

In the MADQN algorithm, each agent learns as part of the environment other agents, which also represents interactions between agents.

The neural network for each agent is modified based on MADQN. For MADQN, the output of each agent neural network is an action cost function Q, which reflects the value of each action of the agent in that state, without concern for differences between the different actions. The Dueling MADQN models the action cost function Q output by the neural network as the sum of the state cost function V and the dominance function a, and modeling the two separately can make the agent pay more attention to the difference between different actions, and can also distinguish whether the change of the Q value is caused by the state of the agent or the action selected by the agent. Along with the increase of the action space, the advantage of the Dueling MADQN is more obvious, and the Dueling MADQN is also more suitable for the large-scale model, when the number of surrounding main users is large, the miniature sensor can face the problem of large action space, the convergence speed and the access success rate are influenced, the Dueling DQN can well judge the difference between different actions, and the problems of slow convergence, high collision rate and the like caused by the access success rate are effectively avoided.

Due to the limitation of the frequency spectrum sensing technology, when the microcomputer center acquires the channel state information of the main user, a certain probability of sensing errors can occur, and the channel state can be predicted to a certain extent by utilizing the historical experience information through the memory pool of the deep Q network.

According to the dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensing network, the neural network in the lasting DQN is introduced into the MADQN and a microcomputer center is put in to collect sensor information in each area, and the aims of improving the average access success rate of the system and reducing the average collision rate of the system are achieved.

In the embodiment, experiments are performed in an actual task scene, and tests are performed under the number of main users and the number of secondary users in different scales respectively. The comparison algorithm herein employs the MADQN, MA Q-learning, myotic algorithm.

As shown in fig. 3, the average access success rate comparison chart of four algorithms when the primary user number is 6, the secondary user number is 4, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 4, the average prize-reward comparison graph of four algorithms for the case where the primary user number is 6, the secondary user number is 4, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 5, the average access success rate comparison chart of four algorithms when the primary user number is 24, the secondary user number is 2, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 6, the average prize-reward comparison graph of four algorithms for the case where the primary user number is 24, the secondary user number is 2, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 7, the average access success rate comparison chart of four algorithms when the primary user number is 24, the secondary user number is 6, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 8, the average prize-reward comparison chart of four algorithms for the case where the primary user number is 24, the secondary user number is 6, the spectrum sensing error rate is 0.1,0.2, and the channel transition probability is 0.3 exists.

As shown in fig. 9, the average access success rate comparison chart of four algorithms when the primary user number is 24, the secondary user number is 16, the spectrum sensing error rate is 0 and the channel transition probability is 0.3 exists.

As shown in fig. 10, the four-algorithm average prize-by-prize comparison graph with a primary user number of 24 and a secondary user number of 16, where the spectrum sensing error rate is 0 and the channel transition probability is 0.3.

As can be seen from fig. 5 and fig. 6, when the action space is large, the Dueling MADQN can distinguish the difference between different actions, so that the convergence speed of the algorithm is faster and the average access success rate is greatly increased.

As can be seen from fig. 9 and 10, when the number of secondary users is 16, the channel transition probability is 0.3, so that the channel is in a saturated state, and when the spectrum sensing error probability is not set for the channel transition probability, the lasting MADQN can still maintain better performance.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensing network is characterized by comprising the following steps of:

s12, the microcomputer center receives detection results of all sensors and integrates information, state information of a main user channel is obtained, spectrum holes are searched for spectrum access, so that spectrum utilization rate is improved, information transmission is completed, and a basic framework of the hierarchical heterogeneous sensor cognitive wireless sensor network in a real-time task scene is obtained;

wherein:representing the carrier frequency of the wireless channel; />Representing path loss at a reference distance; />Path loss index;representing path loss frequency dependence; d is the distance between the primary and secondary users; m is the communication link sequence number;

wherein:is determined by the path loss; />Representing the uniform distribution of values of 0-1, wherein CN (DEG) represents a circularly symmetric complex Gaussian random variable;kfor the ratio of the LOS path received signal power to the scattering pathkA factor;jis an imaginary unit;

2. The dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensing network according to claim 1, wherein the step S3 specifically comprises the following steps:

3. The dynamic spectrum resource allocation method for hierarchical heterogeneous cognitive wireless sensor network according to claim 4, wherein S31 specifically comprises the following steps:

wherein:,/>,/>respectively expressed as->，/>，/>The transmit power at the j-th wireless channel; />，，/>Respectively represent transmitter +.>Link channel gain with the receiver; B. and (2)>Representing channel bandwidth and noise spectral density, respectively;SU _i represent the firstiA plurality of secondary users;PU _j represent the firstjA master user;SU _k represent the firstkA plurality of secondary users;

s313, calculating according to the value of the signal-to-interference-and-noise ratio at the base station of the secondary user, judging the interaction between the primary user and the secondary user according to the communication rate when the two secondary users are simultaneously accessed to the idle channel, wherein the formula of the communication rate is as follows。

4. The dynamic spectrum resource allocation method for hierarchical heterogeneous cognitive wireless sensor network according to claim 2, wherein in S32, when the microcomputer center obtains the channel state information of the main user, a certain probability of a perception error occurs, and the channel state is predicted to a certain extent by using historical experience information through a memory pool of the deep Q network.

5. The dynamic spectrum resource allocation method for the hierarchical heterogeneous cognitive wireless sensor network according to claim 2, wherein the step of the lasting MADQN algorithm in S32 is as follows:

s322, the agent acts on the environment after selecting actions according to ɛ -greedy strategies, and the environment gives a reward according to the actions of the agent; based on this model, when the secondary user SU accesses a channel that is not used by the primary user PU and the secondary user SU, its reward function isThe method comprises the steps of carrying out a first treatment on the surface of the When the secondary user SU accesses a channel which is occupied by the primary user PU, the primary user and the secondary user collide, and the rewarding function is-C; when more than two secondary users SU access the same channel, collision occurs between the secondary users SU, so the signal-to-interference-and-noise ratio is also affected, resulting in a decrease of transmission rate, and the reward function is +.>The method comprises the steps of carrying out a first treatment on the surface of the When no channel is accessed, the bonus function is 0;

s323, storing each group of variables in an experience pool, sampling N data from the experience pool when the data in the experience pool are sufficient, calculating each data by using a target network, minimizing target loss, and updating the current network so as to increase the judging capability of the current network on the action, make more accurate judgment, and improve the access success rate of the secondary user SU.

6. The dynamic spectrum resource allocation method for hierarchical heterogeneous cognitive wireless sensor networks according to claim 5, wherein in S321, the neural network is:

wherein: the value V is the average value of the Q value in the state, the value A is the dominant function, and the average value of the value A is 0; s represents the state of the agent; a represents the action of the agent.

7. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when run, performs the dynamic spectrum resource allocation method for a hierarchical heterogeneous cognitive wireless sensing network according to any one of claims 1 to 6.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method for dynamic spectrum resource allocation for hierarchical heterogeneous cognitive wireless sensing network according to any one of claims 1 to 6 by the computer program execution.