CN114095940A

CN114095940A - Slice resource allocation method and equipment for hybrid access cognitive wireless network

Info

Publication number: CN114095940A
Application number: CN202111365101.5A
Authority: CN
Inventors: 张勇; 郭达; 袁思雨; 郄文博; 程振杰
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-02-25

Abstract

The invention provides a slice resource allocation method and equipment for a hybrid access cognitive wireless network, wherein the method comprises the following steps: constructing an agent aiming at a cognitive user, wherein the state corresponds to interaction or does not need to interact, and the action corresponds to transmission power and a channel; inputting the state of the agent into a graph convolution neural network model of a hybrid access cognitive wireless network scene so as to calculate attention heads for neighbor cognitive users and outputting action related information of the agent through a convolution layer after all the attention heads are connected; judging whether the action meets the communication delay requirement, the transmission rate requirement and the interference temperature threshold requirement or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and calculating the value of a reward function; if not, punishing the reward function; and distributing network slice resources for the cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the trained intelligent agent. The frequency spectrum use efficiency is improved through the scheme.

Description

Slice resource allocation method and equipment for hybrid access cognitive wireless network

Technical Field

The invention relates to the technical field of wireless communication, in particular to a slice resource allocation method and equipment for a hybrid access cognitive wireless network.

Background

With the development of information technology, the industrial mode is changed, and the frequency spectrum resource is more precious. On one hand, with the continuous development of emerging wireless communication services, the spectrum resources are increasingly tense; on the other hand, the existing low-efficiency legacy system has low utilization efficiency of frequency spectrum resources and is difficult to reform. The use of wireless devices, such as vehicles, mobile phones, tablets, and various wireless sensors, has increased rapidly over the past decade, which has prompted the development of the fifth generation wireless communications (5G). In a 5G wireless network, it is expected that the data rate will be 10 times the current data rate, and with more powerful connectivity and one hundred percent coverage, it is expected to provide better quality of service and user experience. The related LTE-G230 (wireless private network technology) standard has been released, and the complex spectrum characteristic of the 230MHz frequency band brings new challenges to multi-industry multi-service sharing. Although the country has given authorized frequency bands to various departments such as national power grid and water conservancy to develop respective services, part of the frequency bands belong to various departments in various regions and are used as required, so that most of frequency spectrum belongs to an idle state, and the frequency utilization rate is low. Optimizing spectrum usage rules to improve spectrum usage efficiency is an urgent problem to be solved.

The network slicing technology is one of the key technologies of 5G, can uniformly allocate a large amount of special bottom-layer physical network resources, virtualizes network functions, provides efficient services for users, and meets diversified requirements of the users in a targeted manner. Along with the development of times and technological progress, a plurality of novel and high-standard service demands emerge, so that the existing physical network resources are integrated and distributed to different communication services according to needs, the construction cost is reduced, and the limited resources can be reasonably distributed to a plurality of service slices. Typical application scenarios are proposed based on the 5G, and are enhanced mobile broadband (eMBB), ultra-high-reliability low-delay communication (URLLC) and massive Internet of things communication respectively.

In addition, Cognitive Radio (CR) is a technology used for the usage rules of the spectrum of a swimming lake. In the CR, if normal communication of authorized users is not affected, unauthorized users may be allowed to access the authorized spectrum region for communication. Nowadays, the structure of a wireless network is increasingly complex, and interference caused by dense network coverage is not easy to be overlooked. In a cognitive wireless network, a complex network environment, an infinite state space and high-dimensional optimization parameters are a challenge for a traditional optimization method.

And reinforcement learning is used as an important machine learning branch, and plays a great role in decision and optimization of complex problems, such as defeating top class teachers in chess games, allocating and scheduling network resources, performing intelligent recommendation according to user interests and the like. With the development of artificial intelligence technology, various problems can be solved under the drive of data and algorithms. Compared with the traditional algorithm for manually selecting the features, the deep learning has greater potential in a wireless network, the deep learning can automatically extract the features from the data without excessive manual intervention, and the end-to-end mode is directly used for training, so that the model complexity is reduced. The reinforcement learning can continuously interact with the environment, and experience adjustment strategies are continuously accumulated by adopting a trial and error method. The deep learning and reinforcement learning algorithms are combined, historical experience data are accumulated to serve as training data of a neural network in the learning process, and the advantages of the deep learning are exerted, so that a model is trained better, and decision is optimized. The information extraction capability of the graph convolution neural network in a complex topological graph scene is higher than that of a common convolution neural network. The graph convolution neural network is combined with the reinforcement learning algorithm, so that the information extraction capability of the intelligent agent in a complex topological scene can be effectively improved, and a better resource allocation strategy can be obtained.

However, the complexity of the practical application problem is high, the information amount is also large, and the global information may not be mastered at the same time, so that the single-agent reinforcement learning algorithm is no longer applicable.

Disclosure of Invention

In view of this, the invention provides a slice resource allocation method and device for a hybrid access cognitive wireless network, so as to implement a reinforcement learning method suitable for a slice resource allocation scenario of the hybrid access cognitive wireless network, thereby improving spectrum utilization efficiency.

In order to achieve the purpose, the invention is realized by adopting the following scheme:

according to an aspect of the embodiments of the present invention, a slice resource allocation method for a hybrid access cognitive radio network is provided, including:

constructing an agent for each cognitive user in the hybrid access cognitive wireless network, wherein the state of the agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;

determining the state of an agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent through a convolution layer in the neural network model after all the attention heads are connected;

obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication delay requirement when the current action of the intelligent agent is an ultra-high reliable low-delay communication slice user, the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user and whether the current action of the intelligent agent meets the interference temperature threshold requirement according to the value of an interference temperature threshold constraint function;

if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, calculating a value of a reward function, and performing reward and punishment on the current action of the intelligent agent by using the value of the reward function so as to train the neural network model and the intelligent agent; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;

and distributing network slice resources for cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the intelligent agent.

In some embodiments, the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network includes: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained;

the information that the current cognitive user does not need to interact with other cognitive users in the hybrid access cognitive wireless network comprises the following information: and recognizing the signal-to-interference-and-noise ratio of the user and the binary correlation coefficient of the main user and the channel.

In some embodiments, determining a state of the agent, and inputting the determined state of the agent into a graph convolution-based neural network model in a hybrid access cognitive wireless network scenario to calculate an attention head for each neighbor cognitive user in the state of the agent and output action-related information of the agent via a convolution layer in the neural network model after connecting all the attention heads, includes:

determining the state of the agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent after connecting all the attention heads and sequentially passing through a nonlinear activation function and a convolution layer in the neural network model.

In some embodiments, the attention header computed for each neighbor-aware user in the state of the agent is represented as:

wherein, the first and the second end of the pipe are connected with each other,

attention head m, W representing neighbor cognitive user j of agent i^mWeight matrix, h, representing attention head m_iRepresenting output of neurons corresponding to cognitive users of agent i, h_jRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j_+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, k belongs to X_+iA set of representations X_+iCognitive users k, h in (1)_kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)^mh_i)^TRepresents W^mh_kThe transposed matrix of (W)^mh_j)^TRepresents W^mh_jThe transposed matrix of (2);

the output of the convolutional layer in the neural network model is represented as:

wherein h is_i' represents the output of the convolutional layer, σ represents the nonlinear activation function, conjugate [ ·]Indicating a splicing operation, X_+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, j belongs to X_+iA set of representations X_+iCognitive users j, h in (1)_jSet of representations X_+iOutput of neuron corresponding to cognitive user in (1), W^mA weight matrix representing attention headers M, M ∈ M representing attention headers M in the set M of attention headers of all neighbor aware users.

In some embodiments, reward punishment of a current action of the agent with a value of a reward function to train the neural network model and the agent, includes:

calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to carry out reward punishment on the current action of the agent and train the neural network model and the agent; the loss function comprises a KL gradient regularization term of attention weight distribution, wherein the KL gradient regularization term is used for measuring the difference between the attention weight distribution corresponding to the connection results of all current attention heads and the target attention weight distribution.

In some embodiments, the loss function is expressed as:

where L (θ) represents the value of the loss function, θ represents the parameter of the neural network, BS represents the mini-batch number (the number of batches of data extracted from the data pool), r_bThe value of the excitation function at the b-th mini-batch is shown, γ represents the signal to interference plus noise ratio, Q(s)_b', a'; θ) represents the next action a', the next state s_b' and Q value when the neural network parameter is theta, Q (s, a; theta) represents Q value when the action a, the state s and the neural network parameter are theta, lambda represents a coefficient of regularization loss, M represents the number of attention heads,

weight distribution representing current state

And weight distribution of the next state

The value of the KL dispersion of (A),

and

the attention weight distribution of the agent in the attention head m of the convolutional layer k is shown.

In some embodiments, the reward function is expressed as:

wherein r is_iA prize value representing the current action for agent i,

representing a disturbance temperature threshold constraint function,

represents the transmission power when the cognitive user of agent i is connected with cognitive base station a and selects channel k,

represents the correlation coefficient, eta, of the cognitive user of the agent i and the cognitive base station a_iRepresenting the energy efficiency of a cognitive user of an agent i, k ∈ C representing a channel k in a channel set C, S representing a Sigmoid function, IT^maxRepresenting an interference temperature threshold, n belongs to CU and represents a cognitive user n in a cognitive user set CU, a belongs to CBS and represents a cognitive base station in a cognitive base station set CBS,

representing the association coefficient between the cognitive user of agent i and cognitive base station a,

representing the correlation coefficient of the cognitive user of agent i with channel k,

represents the gain, k, of cognitive users and channel k of agent i_consIs Boltzmann constant, B denotes the channel bandwidth, R_iRepresents a transmission rate;

in some embodiments, the signal to interference plus noise ratio is expressed as:

wherein, γ_nRepresenting the signal-to-interference-and-noise ratio of a cognitive user n, a belongs to the CBS and represents a cognitive base station a in a cognitive base station set CBS, k belongs to the C and represents a channel k in a channel set C,

representing the correlation coefficient of the cognitive base station a and the cognitive user n,

representing the correlation coefficient of the channel k and the cognitive user n,

representing the channel gain of the cognitive user n and the cognitive base station a,

the transmission power when the cognitive user n and the cognitive base station a are represented and a channel k is selected, n 'epsilon SU represents the cognitive user n' in the cognitive user set SU,

representing the correlation coefficient of the cognitive base station a and the cognitive user n',

representing the correlation coefficient of the channel k and the cognitive user n',

the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set PU,

representing the correlation coefficient, g, of channel k and primary user m_nShows the channel gain of the cognitive user n and the main base station,

indicating the transmission power, σ, at which the primary user m selects channel k²Representing white gaussian noise;

the disturbance temperature threshold constraint function is expressed as:

wherein n belongs to CU and represents cognitive user n in the cognitive user set CU, a belongs to CBS and represents a cognitive base station a in the cognitive base station set CBS,

representing the transmission power when the cognitive user n and the cognitive base station a select a channel k;

the transmission rate requirement is expressed as:

wherein R is_nRepresenting the transmission rate, R, of cognitive user n_minDenotes a minimum transmission rate threshold, n ∈ CU^eMBBCognitive use represented as enhanced mobile broadband slice userFamily collection CU^eMBBCognitive user n in (1);

the communication delay requirement is expressed as:

where D represents the parameter of the Poisson distribution followed by the arrival rate, ξ is a set number, D is_maxFor maximum transmission delay, n ∈ CU^URLLCCognitive user set CU represented as ultra-high-reliability low-delay communication slice user^URLLCCognitive user n in (1).

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to any of the embodiments when the processor executes the program.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of any of the above embodiments.

According to the hybrid access cognitive wireless network slice resource allocation method, the electronic device and the computer readable storage medium, each cognitive user is regarded as an intelligent agent, the mapping relation between reinforcement learning and the hybrid spectrum access cognitive wireless network slice scene is established, the multi-attention mechanism and the loss design are applied, and the multi-intelligent-agent reinforcement learning method suitable for the hybrid spectrum access cognitive wireless network slice scene is realized. Whether interaction is needed or not is distinguished by designing a reward function based on interference and efficiency, higher reward can be obtained, the performance on stability and convergence is superior, and the spectrum use efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. In the drawings:

fig. 1 is a flowchart illustrating a slice resource allocation method for a hybrid access cognitive radio network according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a hybrid access mode cognitive wireless network slice resource allocation model according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating the GQN algorithm neural network structure in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

The existing single agent reinforcement learning algorithm is not suitable for solving the problem of resource allocation in a complex network. In the multi-agent reinforcement learning, each agent can only observe a local state without knowing global information, and the learning training model is close to reality under the condition, for example, multi-player online games, robot cooperative production and the like, the agents have various relationships, and have a simple cooperation or a simple competition relationship, and a cooperation and competition relationship.

The cognitive wireless network energy efficiency is further improved by controlling the channel association and the power distribution of the secondary users on the premise of ensuring the service requirements of the primary users and the secondary users. Meanwhile, a secondary user mixed spectrum access mechanism is introduced into a cognitive radio scene, and the secondary user can select an overlay or underlay access mode according to the state of an access channel. The invention provides a slice resource allocation method for a hybrid access cognitive wireless network, which is used for realizing multi-agent reinforcement learning suitable for a slice resource allocation scene of a cognitive network in a hybrid access mode, solving a complex optimization problem by combining a graph convolution neural network and a traditional DQN algorithm and improving the spectrum use efficiency.

Fig. 1 is a flowchart illustrating a method for allocating slice resources of a hybrid access cognitive radio network according to an embodiment of the present invention, and referring to fig. 1, the method for allocating slice resources of a network according to the embodiment may include the following steps:

step S110: constructing an intelligent agent for each cognitive user in a hybrid access cognitive wireless network, wherein the state of the intelligent agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the intelligent agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;

step S120: determining the state of an agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent through a convolution layer in the neural network model after all the attention heads are connected;

step S130: obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication time delay requirement when the current action of the intelligent agent is an ultra-high reliable low time delay communication slice user and the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and whether the current action of the intelligent agent meets the interference temperature threshold value requirement or not according to the value of an interference temperature threshold value constraint function;

step S140: if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, calculating a value of a reward function, and performing reward and punishment on the current action of the intelligent agent by using the value of the reward function so as to train the neural network model and the intelligent agent; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;

step S150: and distributing network slice resources for cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the intelligent agent.

In step S110, the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network may include: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained; the information that the current cognitive user does not need to interact with other cognitive users in the hybrid access cognitive wireless network may include: and recognizing the signal-to-interference-and-noise ratio of the user and the binary correlation coefficient of the main user and the channel. The binary correlation coefficient may be represented by one of 0 and 1, and the other is not associated. The transmission power of the cognitive user may be different or the same according to different channels selected by the cognitive user and different connected cognitive base stations.

For example, state of agent i at time t

Is shown as

Wherein the content of the first and second substances,

status information indicating that the agent needs to interact,

can be specifically expressed as

The symbols in the data transmission channel sequentially represent the transmission power of the cognitive user, the binary association coefficient between the cognitive user and the cognitive base station, the binary association coefficient between the cognitive user and the channel and the binary coefficient whether the cognitive user meets the communication requirement or not; the power, correlation coefficient and whether the service requirement is satisfied in these parameters are the need to deal withMutual state information.

Status information indicating that the agent does not need to interact,

can be specifically expressed as

Wherein, { gamma., (gamma.)_m}_1*mA signal to interference plus noise ratio matrix representing cognitive users of size 1 m,

and representing a binary correlation coefficient matrix of the primary users and the channels with the size of k m. The primary user SINR and channel occupancy may be status information that does not require interaction.

In step S120, the discretized transmission power may be used for calculation in the neural network model. This way, the discretized power prediction result can be output. For example, if the power is divided into a plurality of levels, the predicted output transmission power may be represented by the corresponding power level. The corresponding value of the transmission power can be obtained according to the power class.

In the step S120, determining a state of the agent, and inputting the determined state of the agent into a neural network model based on graph convolution in a hybrid access cognitive wireless network scenario, so as to calculate an attention head for each neighbor cognitive user in the state of the agent and output information related to an action of the agent via a convolution layer in the neural network model after all the attention heads are connected, which may specifically include the steps of: s121, determining the state of the agent, inputting the determined state of the agent into a neural network model based on graph convolution under a mixed access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent after connecting all the attention heads and sequentially passing through a nonlinear activation function and a convolution layer in the neural network model.

In this embodiment, the nonlinear activation function may be, for example, an MLP layer or a Relu layer. The neural network model may include an input layer (for inputting characteristic values, i.e., states of the agent), a multilayer perceptron layer (MLP layer)/Relu nonlinear activation function (Relu layer), convolutional layers, an output layer, and the like. The outputted motion related information of the agent may include a power level, a channel, etc. The corresponding power is obtained according to the power level. And various parameters under the scene of the hybrid access cognitive wireless network can be calculated according to the power and the channel so as to carry out reasonable constraint.

In step S120, for each agent, only the cognitive user (neighbor cognitive user) that can interact with the cognitive user will affect the resource allocation, so the attention head is calculated only for the neighbor of the cognitive user of the agent. In particular implementation, the attention head calculated for each neighbor cognitive user in the state of the agent may be represented as:

wherein the content of the first and second substances,

attention head m, W representing neighbor cognitive user j of agent i^mWeight matrix, h, representing attention head m_iRepresenting output of neurons corresponding to cognitive users of agent i, h_jRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j_+iRepresents a set formed by cognitive users of an agent i and neighbor cognitive users thereof, k belongs to X_+iA set of representations X_+iCognitive users k, h in (1)_kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)^mh_k)^TRepresents W^mh_kThe transposed matrix of (W)^mh_j)^TRepresents W^mh_jThe transposed matrix of (2);

the output of convolutional layers in the neural network model can be expressed as:

wherein h is_i' represents the output of the convolutional layer, sigma represents the nonlinear activation function, concatenate [. cndot]Indicating a splicing operation, X_+iRepresenting a set of cognitive users i and their neighbours, W^mRepresenting a weight matrix, M ∈ M representing an attention header M of the attention headers of all neighbor cognitive users.

In these embodiments, a multi-headed attention mechanism may be implemented by the graph convolution layer.

In the step S140, performing reward and punishment on the current action of the agent by using the value of the reward function to train the neural network model and the agent, specifically, the method includes the steps of: calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to reward and punish the current action of the agent and train the neural network model and the agent; wherein, the loss function may include a KL gradient regularization term of attention weight distribution, where the KL gradient regularization term may be used to measure a difference between an attention weight distribution (a distribution of weights used when attention heads are connected) corresponding to a connection result of all current attention heads and a target attention weight distribution. The weight distribution of the attention head can be optimized by a KL gradient regularization term in the loss function. In addition, the loss function may also contain conventional terms to optimize the neural network. In addition, if the communication delay requirement, the transmission rate requirement and the interference temperature threshold requirement are met, the establishment function can be calculated and the neural network can be updated. The loss function of the embodiment can be obtained by adding a regularization term on the basis of a conventional loss function, and the cooperative competition relationship between the agents can be stabilized by adding time regularization.

In particular implementation, the loss function may be expressed as:

wherein, L (theta) tableDenotes the value of the loss function, theta denotes the parameter of the neural network, BS denotes the mini-batch number, r_bDenotes the value of the mini-batch winning excitation function at the b-th time, gamma denotes the signal to interference plus noise ratio, Q(s)_b', a'; θ) represents the next state s' in the next action a_b' and Q value when the neural network parameter is theta, Q (s, a; theta) represents Q value when the action a, the state s and the neural network parameter are theta, lambda represents a coefficient of regularization loss, M represents the number of attention heads,

weight distribution representing current state

And weight distribution of the next state

The value of the KL dispersion of (A),

and

the attention weight distribution of the agent at the attention head m of convolutional layer k is shown.

In this embodiment, by adding a time regularization term to the loss function, the cooperative competition relationship between the agents can be stabilized.

In the step S130, the cognitive user needs to use the channel for communication within the interference range acceptable by the primary user, so the interference temperature threshold function is designed to measure the interference of the cognitive user to the primary user and limit the interference degree. In addition, the training process for the agent is also the process of optimizing the action output, so that some constraints can be met under the output transmission power and channel, including the constraint of the interference temperature threshold function. Then the actions of the agent may be rewarded or penalized based on the set constraints. For example, several perturbation temperatures exceeding a certain threshold may give a penalty for the action and conversely a reward. In addition, cognitive users may seek to maximize energy efficiency, so actions may be rewarded or penalized for the goal of maximizing energy efficiency.

In a specific implementation, in step S130, the reward function may be represented as:

wherein r is_iA prize value representing the current action for agent i,

representing a disturbance temperature threshold constraint function,

representing the correlation coefficient, eta, of the cognitive users of agent i with cognitive base station a_iRepresenting the energy efficiency of the cognitive user of agent i,_k∈Crepresenting channels k, S in a channel set C, representing a Sigmoid function, IT^maxA temperature threshold value is indicated for the disturbance,_n∈CUrepresenting the cognitive users n in the set of cognitive users CU,_a∈CBSrepresents the cognitive base stations in the cognitive base station set CBS,

representing the gain, k, of cognitive users of agent i with channel k_consIs Boltzmann constant, B denotes the channel bandwidth, R_iIndicating the transmission rate.

In a specific embodiment, the signal to interference plus noise ratio may be expressed as:

wherein, γ_nRepresenting the signal-to-interference-and-noise ratio of a cognitive user n, a belongs to CBS and represents a cognitive base station a in a cognitive base station set CBS, k belongs to C and represents a channel k in a channel set C,

the transmission power of the master user m when selecting the channel k is represented, and sigma represents Gaussian white noise;

the disturbance temperature threshold constraint function may be expressed as:

wherein the content of the first and second substances,_n∈CUrepresenting the cognitive users n in the set of cognitive users CU,_a∈CBSrepresents the cognitive base station a in the cognitive base station set CBS,

shows the channel gain of the cognitive user n and the cognitive base station a,

the transmission rate requirement may be expressed as:

wherein R is_nRepresenting the transmission rate, R, of cognitive user n_minRepresents a minimum transmission rate threshold, n ∈ CU^eMBBCognitive user set CU denoted as enhanced mobile broadband slice user^eMBBCognitive user n in (1);

the communication delay requirement is expressed as: .

In the above embodiment, the M/1 queuing model (using queuing theory to calculate the user computation delay) is used to add the transmission delay constraint of the user. By a formula of probability

And restricting the communication delay, thereby obtaining the formula of the communication delay requirement.

Further, in step S150, after the agent and the neural network are trained, parameters in the neural network, such as the number of cognitive users, may be set according to an actual scene when the agent and the neural network are applied to network slice resource allocation, so as to provide an optimal network resource allocation result by using the agent.

In the embodiment of the invention, a cognitive radio network scene comprises a main base station and a plurality of secondary base stations; the primary user is connected with the primary base station, and the secondary user is connected with the secondary base station. According to different communication services of users, a main user and a secondary user are divided into an enhanced mobile broadband eMBB slice user (with the requirement of the lowest communication rate) and an ultra-high reliable low-delay communication URLLC slice user (with the requirement of the maximum communication delay). And under the condition of ensuring the normal communication of the master user, the secondary user selects an overlay network access mode or an underlay network access mode according to the state of the access channel to share the frequency spectrum with the master user. The spectrum resources are divided into a set number of channels. The method can be set that when a main user does not exist in a channel accessed by a cognitive user, the cognitive user accesses the frequency spectrum by adopting an overlay mode; when the underlay mode is adopted to access the frequency spectrum, the cognitive user needs to use a channel for communication under the condition of meeting interference temperature constraint, so that the cognitive user uses the channel for communication in an interference range acceptable by a master user. In a hybrid access cognitive wireless network, cognitive user related parameters include: transmission power, channel gain with the cognitive base station, channel gain with the main base station, correlation coefficient with the cognitive base station, and correlation coefficient with the channel. A cognitive user may occupy only one channel. The master user-related parameters include: the method comprises the steps of transmission power, channel gain of the cognitive base station, channel gain of the main base station and correlation coefficient of the channel. The primary user may occupy both channels (randomly, continuously for a certain time). The signal-to-interference-and-noise ratio of the cognitive user can be calculated according to the parameters. The transmission rate of the cognitive user can be calculated based on the transmission power of the cognitive user according to a Shannon channel formula. For an eMBB slice user, a minimum transmission rate threshold value can be set, and for a URLLC slice user, a maximum transmission delay threshold value can be set. For calculating the transmission delay of the user, an M/M/1 queuing model can be used. The communication requirements of URLLC slice users can be represented using probabilities. An interference temperature is defined and may be derived based on a ratio of interference power to channel bandwidth. And correspondingly setting a maximum interference temperature threshold value for the cognitive user in the underlay access mode. The various associations may be represented by binary association constraints. A maximum transmission power limit for cognitive users may be set. The optimization of network slice resource allocation may be attributed to optimization under various constraints, e.g., service demand constraints for eMBB slice users, URLLC slice users, interference temperature constraints. Finally, training of an agent and the like can be carried out based on a graph convolution reinforcement learning algorithm, wherein a multi-head attention mechanism and time relation regularization are considered in a neural network. During model training and application, input features can be subjected to weighted summation, then a plurality of attention heads of the intelligent agent are connected, and convolutional layer output is obtained through a multilayer perceptron or a nonlinear activation function; in the convolutional layer, the intelligent agent is divided into state information which needs interaction and state information which does not need interaction. The output may include the power and selected channel of the cognitive user; whether the power and the channel meet the constraint or not, whether the interference temperature threshold value meets the constraint or not and the like can be judged according to the power and the channel; the action may be penalized if not satisfied and awarded if satisfied. Other parameters about the primary user may be preset.

In addition, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to any of the above embodiments.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to any of the above embodiments.

The above method is described below with reference to a specific example, however, it should be noted that the specific example is only for better describing the present application and is not to be construed as limiting the present application.

Consider the whole process divided into two parts: establishing a cognitive radio network slice resource allocation model in a hybrid access mode; and (II) mapping the cognitive network slice resource allocation problem to a reinforcement learning building model, and realizing a cognitive network slice resource allocation algorithm based on multi-agent graph volume reinforcement learning. The two parts will be described separately below.

Establishing a cognitive radio network slice resource allocation model in a hybrid access mode

In order to further improve the energy efficiency of the cognitive network, a hybrid spectrum access mechanism is introduced in the embodiment, and the secondary user may select an overlay or underlay access mode according to the state of an access channel. Fig. 2 is a schematic structural diagram of a hybrid access mode cognitive wireless network slice resource allocation model according to an embodiment of the present invention, and as shown in fig. 2, a cognitive radio network scene includes a main base station and a plurality of secondary base stations. The primary user is connected with the primary base station, and the secondary user is connected with the secondary base station. According to different communication services of users, a primary user and a secondary user can be divided into an eMBB (enhanced mobile broadband) slice user and a URLLC (ultra-high reliable low-delay communication) slice user. The eMBB slice user has the lowest communication speed requirement, and the URLLC slice user has the largest communication time delay requirement. On the basis of ensuring the normal communication of the primary user, the secondary user selects an overlay (overlay network/overlay network) or underlay network access mode to share the frequency spectrum with the primary user according to the state of an access channel, so that the frequency spectrum resources are more efficiently utilized.

Referring back to fig. 2, in the cognitive radio network scenario, there are 1 master base station PBS and a (e.g., three) cognitive base stations CBS. The master base station PBS and the cognitive base station CBS share the same frequency spectrum resource. M (for example, three) main users PU are connected with the main base station PBS, N cognitive users CU are connected with the cognitive base station CBS, and for example, each cognitive base station CBS is connected with three cognitive users CU. The spectrum resources are divided into K channels, K ═ 1, 2. And the cognitive user CU shares the spectrum resources with the main user PU in a hybrid access mode. And when a main user PU exists in a channel accessed by the cognitive user CU, the cognitive user CU accesses the frequency spectrum by adopting an underlay mode. And when the channel accessed by the cognitive user CU does not have the main user PU, the cognitive user CU accesses the frequency spectrum by adopting an overlay mode. When the underlay mode is adopted to access the frequency spectrum, the cognitive user CU needs to use a channel for communication within the interference range acceptable by the primary user PU. The interference temperature concept is introduced in the embodiment and used for quantifying the interference of the cognitive user CU to the main user PU, and the cognitive user CU needs to use a channel for communication on the premise of meeting the interference temperature constraint.

The cognitive base station set is denoted as CBS ═ 1,2^eMBB＝{1,2,...,N₁And recording cognitive users related to the URLLC slice as CU^URLLC＝{1,2,...,N₂}，N＝N₁+N₂The set of cognitive users is denoted as CU {1, 2. The master user associated with the eMBB slice is marked as PU^eMBB＝{1,2,...,M₁And recording a main user associated with the URLLC slice as PU^URLLC＝{1,2,...,M₂}，M＝M₁+M₂The primary user set is denoted as PU ═ 1, 2. The channel set is denoted as C ═ 1, 2., K }, the bandwidth of each channel is B, and the total bandwidth is W ═ K × B.

The transmission power of cognitive user n is recorded as

Channel gain with cognitive base station a is

Channel gain with the master base station is g_n。

For the correlation coefficient between the cognitive base station and the cognitive user,

indicating that the cognitive user n is associated with the cognitive base station a, otherwise

Indicating no association.

For the channel and the cognitive user correlation coefficient,

indicating cognitive user n and channel association k, otherwise

Indicating no association. The transmission power of the master user m is recorded as

Channel gain with cognitive base station a is

Channel increase with main base stationBenefit is g_m。

For the channel and the primary user association coefficients,

indicating a primary user m and a channel association k, otherwise

In addition, a master user can occupy two channels, and a cognitive user can only occupy one channel. The communication behavior of the master user is simplified into that the master user randomly occupies some two channels for a certain time.

The signal to interference plus noise ratio SINR of the cognitive user n can be calculated according to the definition of the signal to interference plus noise ratio SINR, as shown in equation (1):

wherein, γ_nRepresents the signal-to-interference-and-noise ratio, a belongs to the CBS and represents the cognitive base station a in the cognitive base station CBS, k belongs to the C and represents the channel k in the channel set C,

representing the correlation coefficient of the cognitive base station and the cognitive user,

representing the channel and cognitive user correlation coefficients,

represents the channel gain of the cognitive user and the cognitive base station,

representing the transmission power of the cognitive user, n 'epsilon SU represents a secondary user n' in the secondary user set SU,

the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set,

indicating the channel and primary user correlation coefficient, g_nShows the channel gain of the cognitive user n and the main base station,

represents the transmission power of the primary user, and σ represents gaussian white noise.

According to the Shannon channel formula R ═ B · log₂(1+ gamma) to obtain the transmission rate R of the cognitive user n_nAnd further obtaining the energy efficiency of the cognitive user, as shown in formula (2):

wherein eta is_nRepresenting the energy efficiency, R, of a cognitive user n_nRepresenting the transmission rate R of a cognitive user n_n，R_n＝B·log₂(1+ γ), wherein B represents and γ represents.

For rate sensitive users (eMBB slice users), a minimum transmission rate threshold R is set_min. For time delay sensitive users (URLLC slice users), setting a maximum transmission time delay threshold value D_max。

To calculate the transmission delay of the users, an M/1 queuing model is used. Assuming that the arrival rate follows a poisson distribution with parameter d, the transmission delay of the user follows a parameter R_n-d is an exponential distribution. Representing URLLC slice users using probabilitiesThe requirements of the communication are set by the user,

where ξ is a very small number. Therefore, the communication requirement of URLLC slice user is

D_nThe transmission delay of the cognitive user n is shown, and P is the probability.

The interference temperature is defined as the ratio of the interference power to the channel bandwidth, and is recorded as

Wherein k is_consIs the Boltzmann constant, P_iBW is the channel bandwidth for interference power. Setting a maximum interference temperature threshold IT for cognitive users adopting an underlay access mode_max。

The slice resource allocation optimization problem of the hybrid access cognitive network can be expressed as shown in formula (3):

wherein eta is_nRepresenting the energy efficiency, R, of cognitive users_nThe transmission rate of the cognitive user is known,

representing the transmission power of the cognitive user; r_minDenotes the minimum transmission rate, n ∈ CU^eMBBIndicating that the cognitive user n belongs to an eMBB slice user (sensitive to speed), d indicating a parameter of Poisson distribution followed by the arrival speed, and the transmission delay following parameter of the cognitive user is R_n-an exponential distribution of d,

indicating that cognitive user n belongs to URLLC slice user (sensitive to time delay), D_maxRepresenting the maximum transmission delay and ξ represents a parameter.

Representing the interference power of each channel, B representing the bandwidth of each channel, k_consIs Boltzmann constant, IT_maxThe maximum temperature of the disturbance,

representing channel k in channel set C.

Represents the maximum transmission power of the cognitive user n,

representAnd the transmission power of the cognitive user n corresponding to the cognitive base station a on the channel k.

Representing the correlation coefficient of channel k and primary user m,

representing a primary user m in the primary user set PU. a epsilon CBS represents the cognitive base station a in the cognitive base station set CBS,

representing the correlation coefficient between the cognitive base station a and the cognitive user n

Equations (C1) through (C3) are binary correlation coefficient constraints. The formula (C4) represents the maximum transmission power limit for cognitive users. Equations (C5) and (C6) are service requirement constraints corresponding to eMBB slice users and URLLC slice users. The formula (C7) is an interference temperature constraint, i.e., tolerance of the primary user to interference generated by the cognitive user.

(II) mapping the cognitive network slice resource allocation problem to reinforcement learning and establishing a model

The graph convolution reinforcement learning algorithm has two key technologies, namely a multi-head attention mechanism and time relation regularization. The convolution kernel adopts a multi-head attention mechanism and is used for capturing high-order effective information, so that the interaction between the intelligent agents is well learned, and the training is more stable. For agent i, its neighbors are denoted X_i. For the attention head m, it can be expressed as shown in formula (4):

wherein the content of the first and second substances,

attention head m, W of neighbor cognitive user j representing agent i^mRepresents a weight matrix, h_iRepresenting the output of a neuron, X_+iSet representing cognitive user i and its neighborsAnd τ represents the scaling factor, (W)^mh_i)^TRepresents W_mh_iThe transposed matrix of (2). Performing weighted sum on the input characteristic values, then connecting M attention heads of the agent i, and then passing through an MLP (multilayer perceptron) layer or a nonlinear ReLU (linear rectification function/activation function) layer to obtain the output of the final convolution layer, wherein the formula (5) is shown in the specification;

wherein, h'_iRepresents the output of the convolutional layer, sigma represents the nonlinear activation function, concatenate ·]Indicating a splicing operation, X_+iRepresenting a set of cognitive users i and their neighbours, W^mRepresenting a weight matrix, M ∈ M representing an attention header M of the attention headers of all neighbor cognitive users.

In addition, because information interaction is required among different agents, in order to reduce the transmission of unnecessary information and increase the effective information interaction ratio, the state of the agents is changed

Two categories (state information requiring interaction)

And status information that does not require interaction

). The neural network structure of the graph convolution reinforcement learning algorithm GQN is shown in fig. 3, and some agents 1, agents 2, … and AgentN need interaction or not interaction.

The second main technology is to provide time relation regularization to promote stable cooperation of the intelligent agents within a certain time, and in an actual application scene, the stable cooperation between the intelligent agents can often obtain the maximum benefit for a long time. Therefore, the KL divergence is adopted to measure the difference between the current attention weight distribution and the target attention weight distribution, and the stable cooperation of the intelligent agent is enhanced. Adding the KL divergence as a regularization term into a loss function, as shown in formula (6):

wherein L (theta) represents the value of the loss function, theta represents the parameter of the neural network, BS represents the number of mini-batch in the reinforcement learning algorithm, and r_bA mini-batch (mini-batch of the b-th time) winning excitation function value is shown, Q (s, a; theta) represents a Q value when an action a, a state s and a neural network parameter are theta, s represents a state, s' represents a next state, a represents an action, lambda represents a coefficient of regularization loss, M represents the number of attention heads,

KL dispersion values representing the weight distribution of the current state and the weight distribution of the next state,

indicating the attention weight distribution of agent i in convolutional layer k attention head m.

The smaller the KL Divergence (Kullback-Leibler Divergence), the more the agents can achieve consistent cooperation for a long time, thereby being greatly helpful for capturing the cooperative relationship characteristics among the agents.

In a hybrid access cognitive network slicing scene, the resource allocation problem of all cognitive users is a complex non-convex optimization problem in theory. To solve this problem, the present embodiment proposes a multi-agent reinforcement learning method (CRNGQN algorithm) for hybrid access cognitive radio network slicing scenario. The multi-agent reinforcement learning algorithm is based on GQN algorithm, and adopts a graph structure to represent the cooperation relationship between the agents. In the CRNGQN algorithm, reinforcement learning basic elements are set as follows.

It is assumed that all agent states do not change at time t. First, the state of agent i is designed as

Status information required to be interacted with by agents, including

Binary coefficient for cognizing whether user meets communication requirement

The cognitive user meets the communication requirement, otherwise

Status information for agents that do not require interaction, including

Wherein the content of the first and second substances,

represents the transmission power of agent (cognitive user) i,

represents the correlation coefficient of an agent (cognitive user) i and a cognitive base station,

representing the correlation coefficient of the agent (primary user) i with the channel. { Gamma_m}_1*mAn SINR matrix (signal to interference plus noise ratio matrix) representing the primary user,

a channel occupancy matrix representing primary users.

Second, the behavior of agent i is designed as an optimization variable for the scenario

Representing channel association, is a discrete variable, however power

The variables are continuous variables. The same as the traditional DQN algorithm processing method, the power in the CRNGQN algorithm needs to be discretized, and the value range of the power is as shown in formula (7):

representing the maximum power occupied on channel a.

Finally, a reward function is designed. The goal in view of the system is to maximize η_iAnd needs to meet interference temperature constraints. Therefore, in designing the reward function, it is necessary to penalize an action that does not meet the constraint condition and reward an action that increases the target value. Combining the above factors, the reward function is designed as shown in formula (8):

wherein r is_iA prize value representing the number of agents i,

representing a disturbance temperature threshold constraint function,

represents the transmission power of agent (cognitive user) i,

representing a correlation coefficient, η, representing the agent (primary user) i with the channel_iThe representation represents the energy efficiency of the agent (primary user) i.

Wherein

For the disturbance temperature threshold constraint function, the expression is shown in equation (9):

wherein

For Sigmoid function, k belongs to C to represent channel k in channel set C, S represents Sigmoid function, IT^maxRepresenting an interference temperature threshold, n belongs to CU and represents a cognitive user n in a cognitive user set CU, a belongs to CBS and represents a cognitive base station in a cognitive base station set CBS,

The slice resource allocation technology for the hybrid access cognitive wireless network of the embodiment is used for efficiently selecting the optimal strategy in a complex cognitive network by combining a deep reinforcement learning technology. The main innovation points are as follows: (1) under the 5G network slice scene, a multi-slice cognitive radio network model based on a hybrid underlay-overlay spectrum access mode is provided. The model introduces a network slicing technology into a multi-service cognitive radio network scene, cognitive users can access to a channel without a main user by adopting an overlay spectrum access mode, and access to a channel with a main user by adopting an underlay spectrum access mode. (2) Aiming at the problem of resource allocation in a cognitive radio network, a multi-target self-adaptive deep reinforcement learning framework is provided. The framework is applicable to a variety of objective functions such as network throughput, spectral efficiency, and energy efficiency. (3) A deep reinforcement learning algorithm combined with a graph convolution neural network is provided. The algorithm uses GAT to help the DQN agent extract environment information, and introduces a time relation regularization mechanism to improve the stability of the proxy relation. The algorithm classifies the states of the agents according to whether the agents (intelligent agents) need to communicate or not, reduces unnecessary state information interaction, and improves the efficiency of information interaction and the convergence speed of the algorithm.

In the embodiment, the multi-agent reinforcement learning method for the slice resource allocation scene of the hybrid access mode cognitive network is realized. On the basis of the cognitive radio technology, a hybrid spectrum access mechanism is introduced, so that the utilization rate of a spectrum can be further improved. Aiming at the scene, a CRNGQN (Cognitive Relay network, CRN) algorithm based on graph convolution reinforcement learning is provided, a graph structure is established by aiming at a design state, an action and a reward function, and the model is set to be three module coding layers, a graph convolution layer and a DQN (deep reinforcement learning) layer, so that cooperative communication among multiple intelligent agents is enhanced, and the convergence speed of the model is improved. The influence of the learning rate, the number of graph convolution layers and the number of neighbors of the agent on the experimental result is explored through experimental simulation, and the feasibility of the algorithm is proved. Compared with a DGN algorithm and a DQN algorithm which do not classify states according to the need of interaction, the proposed algorithm can obtain higher reward, convergence speed and stability. Comparing the hybrid spectrum access mode with the overlay and underlay access modes, the energy efficiency in the proposed spectrum hybrid access mode is better than that of the single overlay and underlay spectrum access modes.

In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A slice resource allocation method for a hybrid access cognitive radio network is characterized by comprising the following steps:

constructing an intelligent agent for each cognitive user in a hybrid access cognitive wireless network, wherein the state of the intelligent agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the intelligent agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;

obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication delay requirement when the current action of the intelligent agent is an ultra-high reliable low-delay communication slice user and the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and whether the current action of the intelligent agent meets the interference temperature threshold value requirement or not according to the value of an interference temperature threshold value constraint function;

punishing the reward function if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, and performing reward and punishment on the current action of the intelligent body by utilizing the value of the reward function so as to train the neural network model and the intelligent body; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;

2. The hybrid access cognitive radio network slice resource allocation method of claim 1,

the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network comprises the following information: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained;

3. The slice resource allocation method for a hybrid access cognitive wireless network according to claim 1, wherein the determining of the state of the agent and the inputting of the determined state of the agent into the neural network model based on graph convolution under the scene of the hybrid access cognitive wireless network are performed to calculate an attention head for each neighbor cognitive user in the state of the agent and output the action related information of the agent via convolution layers in the neural network model after all the attention heads are connected, and the method comprises:

4. The hybrid access cognitive wireless network slice resource allocation method of claim 3, wherein the attention header computed for each neighbor cognitive user in the state of the agent is represented as:

wherein the content of the first and second substances,

attention head m, W representing neighbor cognitive user j of agent i^mWeight matrix, h, representing attention head m_iRepresenting output of neurons corresponding to cognitive users of agent i, h_jRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j_+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, k belongs to X_+iA set of representations X_+iCognitive users k, h in (1)_kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)^mh_k)^TRepresents W^mh_kThe transposed matrix of (W)^mh_j)^TRepresents W^mh_jThe transposed matrix of (2);

wherein h is_i' represents the output of the convolutional layer, and σ represents the nonlinearityActivation function, concatenate [. cndot]Indicating a splicing operation, X_+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, j belongs to X_+iSet of representations X_+iCognitive users j, h in (1)_jA set of representations X_+iOutput of neuron corresponding to cognitive user in (1), W^mA weight matrix representing attention headers M, M ∈ M representing attention headers M in the set M of attention headers of all neighbor aware users.

5. The method for allocating slice resources of a hybrid access cognitive wireless network according to claim 1, wherein the current action of the agent is rewarded and punished by using a value of a reward function to train a neural network model and the agent, comprising:

calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to reward and punish the current action of the agent and train the neural network model and the agent; the loss function comprises a KL gradient regularization term of attention weight distribution, wherein the KL gradient regularization term is used for measuring the difference between the attention weight distribution corresponding to the connection results of all current attention heads and the target attention weight distribution.

6. The hybrid access cognitive radio network slice resource allocation method of claim 5,

the loss function is expressed as:

wherein L (theta) represents the value of the loss function, theta represents the parameter of the neural network, BS represents the mini-batch number, r_bDenotes the value of the mini-batch winning excitation function at the b-th time, gamma denotes the signal to interference plus noise ratio, Q(s)_b', a'; θ) represents the next action a', the next state s_b' and Q value when the neural network parameter is theta, Q (s, a; theta) represents the Q value when the action a, the state s and the neural network parameter are theta,λ represents the coefficient of the regularization loss, M represents the number of attention heads,

weight distribution representing current state

And weight distribution of the next state

The value of the KL dispersion of (A),

and

7. The hybrid access cognitive radio network slice resource allocation method of claim 1,

the reward function is represented as:

wherein r is_iA value representing a value of a reward function for the current action of agent i,

representing a disturbance temperature threshold constraint function,

represents the transmission power when the cognitive user of agent i is connected to cognitive base station a and selects channel k,

representing the correlation coefficient, eta, of the cognitive users of agent i with cognitive base station a_iRepresenting the energy efficiency of a cognitive user of an agent i, k ∈ C representing a channel k in a channel set C, S representing a Sigmoid function, IT^maxRepresenting an interference temperature threshold value, n belongs to CU and represents cognitive users n in a cognitive user set CU, a belongs to CBS and represents cognitive base stations in a cognitive base station set CBS,

8. The hybrid access cognitive radio network slice resource allocation method of claim 1,

the signal to interference plus noise ratio is expressed as:

wherein, γ_nRepresenting the SINR of a cognitive user n, a ∈ CBS tableThe cognitive base station a in the cognitive base station set CBS is shown, k epsilon C represents a channel k in the channel set C,

indicating the transmission power when the primary user m selects channel k，σ²Representing white gaussian noise;

the disturbance temperature threshold constraint function is expressed as:

the transmission rate requirement is expressed as:

wherein R is_nRepresenting the transmission rate, R, of cognitive user n_minDenotes a minimum transmission rate threshold, n ∈ CU^eMBBCognitive user set CU denoted as enhanced mobile broadband slice user^eMBBCognitive user n in (1);

the communication delay requirement is expressed as:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.