CN114095940A - Slice resource allocation method and equipment for hybrid access cognitive wireless network - Google Patents

Slice resource allocation method and equipment for hybrid access cognitive wireless network Download PDF

Info

Publication number
CN114095940A
CN114095940A CN202111365101.5A CN202111365101A CN114095940A CN 114095940 A CN114095940 A CN 114095940A CN 202111365101 A CN202111365101 A CN 202111365101A CN 114095940 A CN114095940 A CN 114095940A
Authority
CN
China
Prior art keywords
cognitive
user
agent
representing
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111365101.5A
Other languages
Chinese (zh)
Inventor
张勇
郭达
袁思雨
郄文博
程振杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202111365101.5A priority Critical patent/CN114095940A/en
Publication of CN114095940A publication Critical patent/CN114095940A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/14Spectrum sharing arrangements between different networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/53Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a slice resource allocation method and equipment for a hybrid access cognitive wireless network, wherein the method comprises the following steps: constructing an agent aiming at a cognitive user, wherein the state corresponds to interaction or does not need to interact, and the action corresponds to transmission power and a channel; inputting the state of the agent into a graph convolution neural network model of a hybrid access cognitive wireless network scene so as to calculate attention heads for neighbor cognitive users and outputting action related information of the agent through a convolution layer after all the attention heads are connected; judging whether the action meets the communication delay requirement, the transmission rate requirement and the interference temperature threshold requirement or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and calculating the value of a reward function; if not, punishing the reward function; and distributing network slice resources for the cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the trained intelligent agent. The frequency spectrum use efficiency is improved through the scheme.

Description

Slice resource allocation method and equipment for hybrid access cognitive wireless network
Technical Field
The invention relates to the technical field of wireless communication, in particular to a slice resource allocation method and equipment for a hybrid access cognitive wireless network.
Background
With the development of information technology, the industrial mode is changed, and the frequency spectrum resource is more precious. On one hand, with the continuous development of emerging wireless communication services, the spectrum resources are increasingly tense; on the other hand, the existing low-efficiency legacy system has low utilization efficiency of frequency spectrum resources and is difficult to reform. The use of wireless devices, such as vehicles, mobile phones, tablets, and various wireless sensors, has increased rapidly over the past decade, which has prompted the development of the fifth generation wireless communications (5G). In a 5G wireless network, it is expected that the data rate will be 10 times the current data rate, and with more powerful connectivity and one hundred percent coverage, it is expected to provide better quality of service and user experience. The related LTE-G230 (wireless private network technology) standard has been released, and the complex spectrum characteristic of the 230MHz frequency band brings new challenges to multi-industry multi-service sharing. Although the country has given authorized frequency bands to various departments such as national power grid and water conservancy to develop respective services, part of the frequency bands belong to various departments in various regions and are used as required, so that most of frequency spectrum belongs to an idle state, and the frequency utilization rate is low. Optimizing spectrum usage rules to improve spectrum usage efficiency is an urgent problem to be solved.
The network slicing technology is one of the key technologies of 5G, can uniformly allocate a large amount of special bottom-layer physical network resources, virtualizes network functions, provides efficient services for users, and meets diversified requirements of the users in a targeted manner. Along with the development of times and technological progress, a plurality of novel and high-standard service demands emerge, so that the existing physical network resources are integrated and distributed to different communication services according to needs, the construction cost is reduced, and the limited resources can be reasonably distributed to a plurality of service slices. Typical application scenarios are proposed based on the 5G, and are enhanced mobile broadband (eMBB), ultra-high-reliability low-delay communication (URLLC) and massive Internet of things communication respectively.
In addition, Cognitive Radio (CR) is a technology used for the usage rules of the spectrum of a swimming lake. In the CR, if normal communication of authorized users is not affected, unauthorized users may be allowed to access the authorized spectrum region for communication. Nowadays, the structure of a wireless network is increasingly complex, and interference caused by dense network coverage is not easy to be overlooked. In a cognitive wireless network, a complex network environment, an infinite state space and high-dimensional optimization parameters are a challenge for a traditional optimization method.
And reinforcement learning is used as an important machine learning branch, and plays a great role in decision and optimization of complex problems, such as defeating top class teachers in chess games, allocating and scheduling network resources, performing intelligent recommendation according to user interests and the like. With the development of artificial intelligence technology, various problems can be solved under the drive of data and algorithms. Compared with the traditional algorithm for manually selecting the features, the deep learning has greater potential in a wireless network, the deep learning can automatically extract the features from the data without excessive manual intervention, and the end-to-end mode is directly used for training, so that the model complexity is reduced. The reinforcement learning can continuously interact with the environment, and experience adjustment strategies are continuously accumulated by adopting a trial and error method. The deep learning and reinforcement learning algorithms are combined, historical experience data are accumulated to serve as training data of a neural network in the learning process, and the advantages of the deep learning are exerted, so that a model is trained better, and decision is optimized. The information extraction capability of the graph convolution neural network in a complex topological graph scene is higher than that of a common convolution neural network. The graph convolution neural network is combined with the reinforcement learning algorithm, so that the information extraction capability of the intelligent agent in a complex topological scene can be effectively improved, and a better resource allocation strategy can be obtained.
However, the complexity of the practical application problem is high, the information amount is also large, and the global information may not be mastered at the same time, so that the single-agent reinforcement learning algorithm is no longer applicable.
Disclosure of Invention
In view of this, the invention provides a slice resource allocation method and device for a hybrid access cognitive wireless network, so as to implement a reinforcement learning method suitable for a slice resource allocation scenario of the hybrid access cognitive wireless network, thereby improving spectrum utilization efficiency.
In order to achieve the purpose, the invention is realized by adopting the following scheme:
according to an aspect of the embodiments of the present invention, a slice resource allocation method for a hybrid access cognitive radio network is provided, including:
constructing an agent for each cognitive user in the hybrid access cognitive wireless network, wherein the state of the agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;
determining the state of an agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent through a convolution layer in the neural network model after all the attention heads are connected;
obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication delay requirement when the current action of the intelligent agent is an ultra-high reliable low-delay communication slice user, the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user and whether the current action of the intelligent agent meets the interference temperature threshold requirement according to the value of an interference temperature threshold constraint function;
if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, calculating a value of a reward function, and performing reward and punishment on the current action of the intelligent agent by using the value of the reward function so as to train the neural network model and the intelligent agent; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;
and distributing network slice resources for cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the intelligent agent.
In some embodiments, the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network includes: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained;
the information that the current cognitive user does not need to interact with other cognitive users in the hybrid access cognitive wireless network comprises the following information: and recognizing the signal-to-interference-and-noise ratio of the user and the binary correlation coefficient of the main user and the channel.
In some embodiments, determining a state of the agent, and inputting the determined state of the agent into a graph convolution-based neural network model in a hybrid access cognitive wireless network scenario to calculate an attention head for each neighbor cognitive user in the state of the agent and output action-related information of the agent via a convolution layer in the neural network model after connecting all the attention heads, includes:
determining the state of the agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent after connecting all the attention heads and sequentially passing through a nonlinear activation function and a convolution layer in the neural network model.
In some embodiments, the attention header computed for each neighbor-aware user in the state of the agent is represented as:
Figure BDA0003360377960000041
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003360377960000042
attention head m, W representing neighbor cognitive user j of agent imWeight matrix, h, representing attention head miRepresenting output of neurons corresponding to cognitive users of agent i, hjRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, k belongs to X+iA set of representations X+iCognitive users k, h in (1)kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)mhi)TRepresents WmhkThe transposed matrix of (W)mhj)TRepresents WmhjThe transposed matrix of (2);
the output of the convolutional layer in the neural network model is represented as:
Figure BDA0003360377960000043
wherein h isi' represents the output of the convolutional layer, σ represents the nonlinear activation function, conjugate [ ·]Indicating a splicing operation, X+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, j belongs to X+iA set of representations X+iCognitive users j, h in (1)jSet of representations X+iOutput of neuron corresponding to cognitive user in (1), WmA weight matrix representing attention headers M, M ∈ M representing attention headers M in the set M of attention headers of all neighbor aware users.
In some embodiments, reward punishment of a current action of the agent with a value of a reward function to train the neural network model and the agent, includes:
calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to carry out reward punishment on the current action of the agent and train the neural network model and the agent; the loss function comprises a KL gradient regularization term of attention weight distribution, wherein the KL gradient regularization term is used for measuring the difference between the attention weight distribution corresponding to the connection results of all current attention heads and the target attention weight distribution.
In some embodiments, the loss function is expressed as:
Figure BDA0003360377960000044
where L (θ) represents the value of the loss function, θ represents the parameter of the neural network, BS represents the mini-batch number (the number of batches of data extracted from the data pool), rbThe value of the excitation function at the b-th mini-batch is shown, γ represents the signal to interference plus noise ratio, Q(s)b', a'; θ) represents the next action a', the next state sb' and Q value when the neural network parameter is theta, Q (s, a; theta) represents Q value when the action a, the state s and the neural network parameter are theta, lambda represents a coefficient of regularization loss, M represents the number of attention heads,
Figure BDA0003360377960000051
weight distribution representing current state
Figure BDA0003360377960000052
And weight distribution of the next state
Figure BDA0003360377960000053
The value of the KL dispersion of (A),
Figure BDA0003360377960000054
and
Figure BDA0003360377960000055
the attention weight distribution of the agent in the attention head m of the convolutional layer k is shown.
In some embodiments, the reward function is expressed as:
Figure BDA0003360377960000056
Figure BDA0003360377960000057
Figure BDA0003360377960000058
wherein r isiA prize value representing the current action for agent i,
Figure BDA0003360377960000059
representing a disturbance temperature threshold constraint function,
Figure BDA00033603779600000510
represents the transmission power when the cognitive user of agent i is connected with cognitive base station a and selects channel k,
Figure BDA00033603779600000511
represents the correlation coefficient, eta, of the cognitive user of the agent i and the cognitive base station aiRepresenting the energy efficiency of a cognitive user of an agent i, k ∈ C representing a channel k in a channel set C, S representing a Sigmoid function, ITmaxRepresenting an interference temperature threshold, n belongs to CU and represents a cognitive user n in a cognitive user set CU, a belongs to CBS and represents a cognitive base station in a cognitive base station set CBS,
Figure BDA00033603779600000512
representing the association coefficient between the cognitive user of agent i and cognitive base station a,
Figure BDA00033603779600000513
representing the correlation coefficient of the cognitive user of agent i with channel k,
Figure BDA00033603779600000514
represents the gain, k, of cognitive users and channel k of agent iconsIs Boltzmann constant, B denotes the channel bandwidth, RiRepresents a transmission rate;
in some embodiments, the signal to interference plus noise ratio is expressed as:
Figure BDA00033603779600000515
wherein, γnRepresenting the signal-to-interference-and-noise ratio of a cognitive user n, a belongs to the CBS and represents a cognitive base station a in a cognitive base station set CBS, k belongs to the C and represents a channel k in a channel set C,
Figure BDA00033603779600000516
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure BDA00033603779600000517
representing the correlation coefficient of the channel k and the cognitive user n,
Figure BDA00033603779600000518
representing the channel gain of the cognitive user n and the cognitive base station a,
Figure BDA00033603779600000519
the transmission power when the cognitive user n and the cognitive base station a are represented and a channel k is selected, n 'epsilon SU represents the cognitive user n' in the cognitive user set SU,
Figure BDA0003360377960000061
representing the correlation coefficient of the cognitive base station a and the cognitive user n',
Figure BDA0003360377960000062
representing the correlation coefficient of the channel k and the cognitive user n',
Figure BDA0003360377960000063
the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set PU,
Figure BDA0003360377960000064
representing the correlation coefficient, g, of channel k and primary user mnShows the channel gain of the cognitive user n and the main base station,
Figure BDA0003360377960000065
indicating the transmission power, σ, at which the primary user m selects channel k2Representing white gaussian noise;
the disturbance temperature threshold constraint function is expressed as:
Figure BDA0003360377960000066
wherein n belongs to CU and represents cognitive user n in the cognitive user set CU, a belongs to CBS and represents a cognitive base station a in the cognitive base station set CBS,
Figure BDA0003360377960000067
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure BDA0003360377960000068
representing the correlation coefficient of the channel k and the cognitive user n,
Figure BDA0003360377960000069
representing the channel gain of the cognitive user n and the cognitive base station a,
Figure BDA00033603779600000610
representing the transmission power when the cognitive user n and the cognitive base station a select a channel k;
the transmission rate requirement is expressed as:
Figure BDA00033603779600000611
wherein R isnRepresenting the transmission rate, R, of cognitive user nminDenotes a minimum transmission rate threshold, n ∈ CUeMBBCognitive use represented as enhanced mobile broadband slice userFamily collection CUeMBBCognitive user n in (1);
the communication delay requirement is expressed as:
Figure BDA00033603779600000612
where D represents the parameter of the Poisson distribution followed by the arrival rate, ξ is a set number, D ismaxFor maximum transmission delay, n ∈ CUURLLCCognitive user set CU represented as ultra-high-reliability low-delay communication slice userURLLCCognitive user n in (1).
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to any of the embodiments when the processor executes the program.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of any of the above embodiments.
According to the hybrid access cognitive wireless network slice resource allocation method, the electronic device and the computer readable storage medium, each cognitive user is regarded as an intelligent agent, the mapping relation between reinforcement learning and the hybrid spectrum access cognitive wireless network slice scene is established, the multi-attention mechanism and the loss design are applied, and the multi-intelligent-agent reinforcement learning method suitable for the hybrid spectrum access cognitive wireless network slice scene is realized. Whether interaction is needed or not is distinguished by designing a reward function based on interference and efficiency, higher reward can be obtained, the performance on stability and convergence is superior, and the spectrum use efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts. In the drawings:
fig. 1 is a flowchart illustrating a slice resource allocation method for a hybrid access cognitive radio network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a hybrid access mode cognitive wireless network slice resource allocation model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the GQN algorithm neural network structure in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The existing single agent reinforcement learning algorithm is not suitable for solving the problem of resource allocation in a complex network. In the multi-agent reinforcement learning, each agent can only observe a local state without knowing global information, and the learning training model is close to reality under the condition, for example, multi-player online games, robot cooperative production and the like, the agents have various relationships, and have a simple cooperation or a simple competition relationship, and a cooperation and competition relationship.
The cognitive wireless network energy efficiency is further improved by controlling the channel association and the power distribution of the secondary users on the premise of ensuring the service requirements of the primary users and the secondary users. Meanwhile, a secondary user mixed spectrum access mechanism is introduced into a cognitive radio scene, and the secondary user can select an overlay or underlay access mode according to the state of an access channel. The invention provides a slice resource allocation method for a hybrid access cognitive wireless network, which is used for realizing multi-agent reinforcement learning suitable for a slice resource allocation scene of a cognitive network in a hybrid access mode, solving a complex optimization problem by combining a graph convolution neural network and a traditional DQN algorithm and improving the spectrum use efficiency.
Fig. 1 is a flowchart illustrating a method for allocating slice resources of a hybrid access cognitive radio network according to an embodiment of the present invention, and referring to fig. 1, the method for allocating slice resources of a network according to the embodiment may include the following steps:
step S110: constructing an intelligent agent for each cognitive user in a hybrid access cognitive wireless network, wherein the state of the intelligent agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the intelligent agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;
step S120: determining the state of an agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent through a convolution layer in the neural network model after all the attention heads are connected;
step S130: obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication time delay requirement when the current action of the intelligent agent is an ultra-high reliable low time delay communication slice user and the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and whether the current action of the intelligent agent meets the interference temperature threshold value requirement or not according to the value of an interference temperature threshold value constraint function;
step S140: if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, calculating a value of a reward function, and performing reward and punishment on the current action of the intelligent agent by using the value of the reward function so as to train the neural network model and the intelligent agent; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;
step S150: and distributing network slice resources for cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the intelligent agent.
In step S110, the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network may include: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained; the information that the current cognitive user does not need to interact with other cognitive users in the hybrid access cognitive wireless network may include: and recognizing the signal-to-interference-and-noise ratio of the user and the binary correlation coefficient of the main user and the channel. The binary correlation coefficient may be represented by one of 0 and 1, and the other is not associated. The transmission power of the cognitive user may be different or the same according to different channels selected by the cognitive user and different connected cognitive base stations.
For example, state of agent i at time t
Figure BDA0003360377960000091
Is shown as
Figure BDA0003360377960000092
Wherein the content of the first and second substances,
Figure BDA0003360377960000093
status information indicating that the agent needs to interact,
Figure BDA0003360377960000094
can be specifically expressed as
Figure BDA0003360377960000095
The symbols in the data transmission channel sequentially represent the transmission power of the cognitive user, the binary association coefficient between the cognitive user and the cognitive base station, the binary association coefficient between the cognitive user and the channel and the binary coefficient whether the cognitive user meets the communication requirement or not; the power, correlation coefficient and whether the service requirement is satisfied in these parameters are the need to deal withMutual state information.
Figure BDA0003360377960000096
Status information indicating that the agent does not need to interact,
Figure BDA0003360377960000097
can be specifically expressed as
Figure BDA0003360377960000098
Wherein, { gamma., (gamma.)m}1*mA signal to interference plus noise ratio matrix representing cognitive users of size 1 m,
Figure BDA0003360377960000099
and representing a binary correlation coefficient matrix of the primary users and the channels with the size of k m. The primary user SINR and channel occupancy may be status information that does not require interaction.
In step S120, the discretized transmission power may be used for calculation in the neural network model. This way, the discretized power prediction result can be output. For example, if the power is divided into a plurality of levels, the predicted output transmission power may be represented by the corresponding power level. The corresponding value of the transmission power can be obtained according to the power class.
In the step S120, determining a state of the agent, and inputting the determined state of the agent into a neural network model based on graph convolution in a hybrid access cognitive wireless network scenario, so as to calculate an attention head for each neighbor cognitive user in the state of the agent and output information related to an action of the agent via a convolution layer in the neural network model after all the attention heads are connected, which may specifically include the steps of: s121, determining the state of the agent, inputting the determined state of the agent into a neural network model based on graph convolution under a mixed access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent after connecting all the attention heads and sequentially passing through a nonlinear activation function and a convolution layer in the neural network model.
In this embodiment, the nonlinear activation function may be, for example, an MLP layer or a Relu layer. The neural network model may include an input layer (for inputting characteristic values, i.e., states of the agent), a multilayer perceptron layer (MLP layer)/Relu nonlinear activation function (Relu layer), convolutional layers, an output layer, and the like. The outputted motion related information of the agent may include a power level, a channel, etc. The corresponding power is obtained according to the power level. And various parameters under the scene of the hybrid access cognitive wireless network can be calculated according to the power and the channel so as to carry out reasonable constraint.
In step S120, for each agent, only the cognitive user (neighbor cognitive user) that can interact with the cognitive user will affect the resource allocation, so the attention head is calculated only for the neighbor of the cognitive user of the agent. In particular implementation, the attention head calculated for each neighbor cognitive user in the state of the agent may be represented as:
Figure BDA0003360377960000101
wherein the content of the first and second substances,
Figure BDA0003360377960000102
attention head m, W representing neighbor cognitive user j of agent imWeight matrix, h, representing attention head miRepresenting output of neurons corresponding to cognitive users of agent i, hjRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j+iRepresents a set formed by cognitive users of an agent i and neighbor cognitive users thereof, k belongs to X+iA set of representations X+iCognitive users k, h in (1)kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)mhk)TRepresents WmhkThe transposed matrix of (W)mhj)TRepresents WmhjThe transposed matrix of (2);
the output of convolutional layers in the neural network model can be expressed as:
Figure BDA0003360377960000103
wherein h isi' represents the output of the convolutional layer, sigma represents the nonlinear activation function, concatenate [. cndot]Indicating a splicing operation, X+iRepresenting a set of cognitive users i and their neighbours, WmRepresenting a weight matrix, M ∈ M representing an attention header M of the attention headers of all neighbor cognitive users.
In these embodiments, a multi-headed attention mechanism may be implemented by the graph convolution layer.
In the step S140, performing reward and punishment on the current action of the agent by using the value of the reward function to train the neural network model and the agent, specifically, the method includes the steps of: calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to reward and punish the current action of the agent and train the neural network model and the agent; wherein, the loss function may include a KL gradient regularization term of attention weight distribution, where the KL gradient regularization term may be used to measure a difference between an attention weight distribution (a distribution of weights used when attention heads are connected) corresponding to a connection result of all current attention heads and a target attention weight distribution. The weight distribution of the attention head can be optimized by a KL gradient regularization term in the loss function. In addition, the loss function may also contain conventional terms to optimize the neural network. In addition, if the communication delay requirement, the transmission rate requirement and the interference temperature threshold requirement are met, the establishment function can be calculated and the neural network can be updated. The loss function of the embodiment can be obtained by adding a regularization term on the basis of a conventional loss function, and the cooperative competition relationship between the agents can be stabilized by adding time regularization.
In particular implementation, the loss function may be expressed as:
Figure BDA0003360377960000111
wherein, L (theta) tableDenotes the value of the loss function, theta denotes the parameter of the neural network, BS denotes the mini-batch number, rbDenotes the value of the mini-batch winning excitation function at the b-th time, gamma denotes the signal to interference plus noise ratio, Q(s)b', a'; θ) represents the next state s' in the next action ab' and Q value when the neural network parameter is theta, Q (s, a; theta) represents Q value when the action a, the state s and the neural network parameter are theta, lambda represents a coefficient of regularization loss, M represents the number of attention heads,
Figure BDA0003360377960000112
weight distribution representing current state
Figure BDA0003360377960000113
And weight distribution of the next state
Figure BDA0003360377960000114
The value of the KL dispersion of (A),
Figure BDA0003360377960000115
and
Figure BDA0003360377960000116
the attention weight distribution of the agent at the attention head m of convolutional layer k is shown.
In this embodiment, by adding a time regularization term to the loss function, the cooperative competition relationship between the agents can be stabilized.
In the step S130, the cognitive user needs to use the channel for communication within the interference range acceptable by the primary user, so the interference temperature threshold function is designed to measure the interference of the cognitive user to the primary user and limit the interference degree. In addition, the training process for the agent is also the process of optimizing the action output, so that some constraints can be met under the output transmission power and channel, including the constraint of the interference temperature threshold function. Then the actions of the agent may be rewarded or penalized based on the set constraints. For example, several perturbation temperatures exceeding a certain threshold may give a penalty for the action and conversely a reward. In addition, cognitive users may seek to maximize energy efficiency, so actions may be rewarded or penalized for the goal of maximizing energy efficiency.
In a specific implementation, in step S130, the reward function may be represented as:
Figure BDA0003360377960000117
Figure BDA0003360377960000118
Figure BDA0003360377960000119
wherein r isiA prize value representing the current action for agent i,
Figure BDA00033603779600001110
representing a disturbance temperature threshold constraint function,
Figure BDA00033603779600001111
represents the transmission power when the cognitive user of agent i is connected with cognitive base station a and selects channel k,
Figure BDA00033603779600001112
representing the correlation coefficient, eta, of the cognitive users of agent i with cognitive base station aiRepresenting the energy efficiency of the cognitive user of agent i,k∈Crepresenting channels k, S in a channel set C, representing a Sigmoid function, ITmaxA temperature threshold value is indicated for the disturbance,n∈CUrepresenting the cognitive users n in the set of cognitive users CU,a∈CBSrepresents the cognitive base stations in the cognitive base station set CBS,
Figure BDA0003360377960000121
representing the association coefficient between the cognitive user of agent i and cognitive base station a,
Figure BDA0003360377960000122
representing the correlation coefficient of the cognitive user of agent i with channel k,
Figure BDA0003360377960000123
representing the gain, k, of cognitive users of agent i with channel kconsIs Boltzmann constant, B denotes the channel bandwidth, RiIndicating the transmission rate.
In a specific embodiment, the signal to interference plus noise ratio may be expressed as:
Figure BDA0003360377960000124
wherein, γnRepresenting the signal-to-interference-and-noise ratio of a cognitive user n, a belongs to CBS and represents a cognitive base station a in a cognitive base station set CBS, k belongs to C and represents a channel k in a channel set C,
Figure BDA0003360377960000125
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure BDA0003360377960000126
representing the correlation coefficient of the channel k and the cognitive user n,
Figure BDA0003360377960000127
representing the channel gain of the cognitive user n and the cognitive base station a,
Figure BDA0003360377960000128
the transmission power when the cognitive user n and the cognitive base station a are represented and a channel k is selected, n 'epsilon SU represents the cognitive user n' in the cognitive user set SU,
Figure BDA0003360377960000129
representing the correlation coefficient of the cognitive base station a and the cognitive user n',
Figure BDA00033603779600001210
representing the correlation coefficient of the channel k and the cognitive user n',
Figure BDA00033603779600001211
the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set PU,
Figure BDA00033603779600001212
representing the correlation coefficient, g, of channel k and primary user mnShows the channel gain of the cognitive user n and the main base station,
Figure BDA00033603779600001213
the transmission power of the master user m when selecting the channel k is represented, and sigma represents Gaussian white noise;
the disturbance temperature threshold constraint function may be expressed as:
Figure BDA00033603779600001214
wherein the content of the first and second substances,n∈CUrepresenting the cognitive users n in the set of cognitive users CU,a∈CBSrepresents the cognitive base station a in the cognitive base station set CBS,
Figure BDA00033603779600001215
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure BDA00033603779600001216
representing the correlation coefficient of the channel k and the cognitive user n,
Figure BDA00033603779600001217
shows the channel gain of the cognitive user n and the cognitive base station a,
Figure BDA00033603779600001218
representing the transmission power when the cognitive user n and the cognitive base station a select a channel k;
the transmission rate requirement may be expressed as:
Figure BDA00033603779600001219
wherein R isnRepresenting the transmission rate, R, of cognitive user nminRepresents a minimum transmission rate threshold, n ∈ CUeMBBCognitive user set CU denoted as enhanced mobile broadband slice usereMBBCognitive user n in (1);
the communication delay requirement is expressed as: .
Figure BDA0003360377960000131
Where D represents the parameter of the Poisson distribution followed by the arrival rate, ξ is a set number, D ismaxFor maximum transmission delay, n ∈ CUURLLCCognitive user set CU represented as ultra-high-reliability low-delay communication slice userURLLCCognitive user n in (1).
In the above embodiment, the M/1 queuing model (using queuing theory to calculate the user computation delay) is used to add the transmission delay constraint of the user. By a formula of probability
Figure BDA0003360377960000132
And restricting the communication delay, thereby obtaining the formula of the communication delay requirement.
Further, in step S150, after the agent and the neural network are trained, parameters in the neural network, such as the number of cognitive users, may be set according to an actual scene when the agent and the neural network are applied to network slice resource allocation, so as to provide an optimal network resource allocation result by using the agent.
In the embodiment of the invention, a cognitive radio network scene comprises a main base station and a plurality of secondary base stations; the primary user is connected with the primary base station, and the secondary user is connected with the secondary base station. According to different communication services of users, a main user and a secondary user are divided into an enhanced mobile broadband eMBB slice user (with the requirement of the lowest communication rate) and an ultra-high reliable low-delay communication URLLC slice user (with the requirement of the maximum communication delay). And under the condition of ensuring the normal communication of the master user, the secondary user selects an overlay network access mode or an underlay network access mode according to the state of the access channel to share the frequency spectrum with the master user. The spectrum resources are divided into a set number of channels. The method can be set that when a main user does not exist in a channel accessed by a cognitive user, the cognitive user accesses the frequency spectrum by adopting an overlay mode; when the underlay mode is adopted to access the frequency spectrum, the cognitive user needs to use a channel for communication under the condition of meeting interference temperature constraint, so that the cognitive user uses the channel for communication in an interference range acceptable by a master user. In a hybrid access cognitive wireless network, cognitive user related parameters include: transmission power, channel gain with the cognitive base station, channel gain with the main base station, correlation coefficient with the cognitive base station, and correlation coefficient with the channel. A cognitive user may occupy only one channel. The master user-related parameters include: the method comprises the steps of transmission power, channel gain of the cognitive base station, channel gain of the main base station and correlation coefficient of the channel. The primary user may occupy both channels (randomly, continuously for a certain time). The signal-to-interference-and-noise ratio of the cognitive user can be calculated according to the parameters. The transmission rate of the cognitive user can be calculated based on the transmission power of the cognitive user according to a Shannon channel formula. For an eMBB slice user, a minimum transmission rate threshold value can be set, and for a URLLC slice user, a maximum transmission delay threshold value can be set. For calculating the transmission delay of the user, an M/M/1 queuing model can be used. The communication requirements of URLLC slice users can be represented using probabilities. An interference temperature is defined and may be derived based on a ratio of interference power to channel bandwidth. And correspondingly setting a maximum interference temperature threshold value for the cognitive user in the underlay access mode. The various associations may be represented by binary association constraints. A maximum transmission power limit for cognitive users may be set. The optimization of network slice resource allocation may be attributed to optimization under various constraints, e.g., service demand constraints for eMBB slice users, URLLC slice users, interference temperature constraints. Finally, training of an agent and the like can be carried out based on a graph convolution reinforcement learning algorithm, wherein a multi-head attention mechanism and time relation regularization are considered in a neural network. During model training and application, input features can be subjected to weighted summation, then a plurality of attention heads of the intelligent agent are connected, and convolutional layer output is obtained through a multilayer perceptron or a nonlinear activation function; in the convolutional layer, the intelligent agent is divided into state information which needs interaction and state information which does not need interaction. The output may include the power and selected channel of the cognitive user; whether the power and the channel meet the constraint or not, whether the interference temperature threshold value meets the constraint or not and the like can be judged according to the power and the channel; the action may be penalized if not satisfied and awarded if satisfied. Other parameters about the primary user may be preset.
In addition, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method according to any of the above embodiments.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to any of the above embodiments.
The above method is described below with reference to a specific example, however, it should be noted that the specific example is only for better describing the present application and is not to be construed as limiting the present application.
Consider the whole process divided into two parts: establishing a cognitive radio network slice resource allocation model in a hybrid access mode; and (II) mapping the cognitive network slice resource allocation problem to a reinforcement learning building model, and realizing a cognitive network slice resource allocation algorithm based on multi-agent graph volume reinforcement learning. The two parts will be described separately below.
Establishing a cognitive radio network slice resource allocation model in a hybrid access mode
In order to further improve the energy efficiency of the cognitive network, a hybrid spectrum access mechanism is introduced in the embodiment, and the secondary user may select an overlay or underlay access mode according to the state of an access channel. Fig. 2 is a schematic structural diagram of a hybrid access mode cognitive wireless network slice resource allocation model according to an embodiment of the present invention, and as shown in fig. 2, a cognitive radio network scene includes a main base station and a plurality of secondary base stations. The primary user is connected with the primary base station, and the secondary user is connected with the secondary base station. According to different communication services of users, a primary user and a secondary user can be divided into an eMBB (enhanced mobile broadband) slice user and a URLLC (ultra-high reliable low-delay communication) slice user. The eMBB slice user has the lowest communication speed requirement, and the URLLC slice user has the largest communication time delay requirement. On the basis of ensuring the normal communication of the primary user, the secondary user selects an overlay (overlay network/overlay network) or underlay network access mode to share the frequency spectrum with the primary user according to the state of an access channel, so that the frequency spectrum resources are more efficiently utilized.
Referring back to fig. 2, in the cognitive radio network scenario, there are 1 master base station PBS and a (e.g., three) cognitive base stations CBS. The master base station PBS and the cognitive base station CBS share the same frequency spectrum resource. M (for example, three) main users PU are connected with the main base station PBS, N cognitive users CU are connected with the cognitive base station CBS, and for example, each cognitive base station CBS is connected with three cognitive users CU. The spectrum resources are divided into K channels, K ═ 1, 2. And the cognitive user CU shares the spectrum resources with the main user PU in a hybrid access mode. And when a main user PU exists in a channel accessed by the cognitive user CU, the cognitive user CU accesses the frequency spectrum by adopting an underlay mode. And when the channel accessed by the cognitive user CU does not have the main user PU, the cognitive user CU accesses the frequency spectrum by adopting an overlay mode. When the underlay mode is adopted to access the frequency spectrum, the cognitive user CU needs to use a channel for communication within the interference range acceptable by the primary user PU. The interference temperature concept is introduced in the embodiment and used for quantifying the interference of the cognitive user CU to the main user PU, and the cognitive user CU needs to use a channel for communication on the premise of meeting the interference temperature constraint.
The cognitive base station set is denoted as CBS ═ 1,2eMBB={1,2,...,N1And recording cognitive users related to the URLLC slice as CUURLLC={1,2,...,N2},N=N1+N2The set of cognitive users is denoted as CU {1, 2. The master user associated with the eMBB slice is marked as PUeMBB={1,2,...,M1And recording a main user associated with the URLLC slice as PUURLLC={1,2,...,M2},M=M1+M2The primary user set is denoted as PU ═ 1, 2. The channel set is denoted as C ═ 1, 2., K }, the bandwidth of each channel is B, and the total bandwidth is W ═ K × B.
The transmission power of cognitive user n is recorded as
Figure BDA0003360377960000151
Channel gain with cognitive base station a is
Figure BDA0003360377960000152
Channel gain with the master base station is gn
Figure BDA0003360377960000153
For the correlation coefficient between the cognitive base station and the cognitive user,
Figure BDA0003360377960000154
indicating that the cognitive user n is associated with the cognitive base station a, otherwise
Figure BDA0003360377960000155
Indicating no association.
Figure BDA0003360377960000156
For the channel and the cognitive user correlation coefficient,
Figure BDA0003360377960000157
indicating cognitive user n and channel association k, otherwise
Figure BDA0003360377960000158
Indicating no association. The transmission power of the master user m is recorded as
Figure BDA0003360377960000159
Channel gain with cognitive base station a is
Figure BDA00033603779600001510
Channel increase with main base stationBenefit is gm
Figure BDA00033603779600001511
For the channel and the primary user association coefficients,
Figure BDA00033603779600001512
indicating a primary user m and a channel association k, otherwise
Figure BDA00033603779600001513
In addition, a master user can occupy two channels, and a cognitive user can only occupy one channel. The communication behavior of the master user is simplified into that the master user randomly occupies some two channels for a certain time.
The signal to interference plus noise ratio SINR of the cognitive user n can be calculated according to the definition of the signal to interference plus noise ratio SINR, as shown in equation (1):
Figure BDA0003360377960000161
wherein, γnRepresents the signal-to-interference-and-noise ratio, a belongs to the CBS and represents the cognitive base station a in the cognitive base station CBS, k belongs to the C and represents the channel k in the channel set C,
Figure BDA0003360377960000162
representing the correlation coefficient of the cognitive base station and the cognitive user,
Figure BDA0003360377960000163
representing the channel and cognitive user correlation coefficients,
Figure BDA0003360377960000164
represents the channel gain of the cognitive user and the cognitive base station,
Figure BDA0003360377960000165
representing the transmission power of the cognitive user, n 'epsilon SU represents a secondary user n' in the secondary user set SU,
Figure BDA0003360377960000166
representing the correlation coefficient of the cognitive base station a and the cognitive user n',
Figure BDA0003360377960000167
representing the correlation coefficient of the channel k and the cognitive user n',
Figure BDA0003360377960000168
the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set,
Figure BDA0003360377960000169
indicating the channel and primary user correlation coefficient, gnShows the channel gain of the cognitive user n and the main base station,
Figure BDA00033603779600001610
represents the transmission power of the primary user, and σ represents gaussian white noise.
According to the Shannon channel formula R ═ B · log2(1+ gamma) to obtain the transmission rate R of the cognitive user nnAnd further obtaining the energy efficiency of the cognitive user, as shown in formula (2):
Figure BDA00033603779600001611
wherein eta isnRepresenting the energy efficiency, R, of a cognitive user nnRepresenting the transmission rate R of a cognitive user nn,Rn=B·log2(1+ γ), wherein B represents and γ represents.
For rate sensitive users (eMBB slice users), a minimum transmission rate threshold R is setmin. For time delay sensitive users (URLLC slice users), setting a maximum transmission time delay threshold value Dmax
To calculate the transmission delay of the users, an M/1 queuing model is used. Assuming that the arrival rate follows a poisson distribution with parameter d, the transmission delay of the user follows a parameter Rn-d is an exponential distribution. Representing URLLC slice users using probabilitiesThe requirements of the communication are set by the user,
Figure BDA00033603779600001612
where ξ is a very small number. Therefore, the communication requirement of URLLC slice user is
Figure BDA00033603779600001613
DnThe transmission delay of the cognitive user n is shown, and P is the probability.
The interference temperature is defined as the ratio of the interference power to the channel bandwidth, and is recorded as
Figure BDA0003360377960000171
Wherein k isconsIs the Boltzmann constant, PiBW is the channel bandwidth for interference power. Setting a maximum interference temperature threshold IT for cognitive users adopting an underlay access modemax
The slice resource allocation optimization problem of the hybrid access cognitive network can be expressed as shown in formula (3):
Figure BDA0003360377960000172
Figure BDA0003360377960000173
Figure BDA0003360377960000174
Figure BDA0003360377960000175
Figure BDA0003360377960000176
Figure BDA0003360377960000177
Figure BDA0003360377960000178
Figure BDA0003360377960000179
wherein eta isnRepresenting the energy efficiency, R, of cognitive usersnThe transmission rate of the cognitive user is known,
Figure BDA00033603779600001710
representing the transmission power of the cognitive user; rminDenotes the minimum transmission rate, n ∈ CUeMBBIndicating that the cognitive user n belongs to an eMBB slice user (sensitive to speed), d indicating a parameter of Poisson distribution followed by the arrival speed, and the transmission delay following parameter of the cognitive user is Rn-an exponential distribution of d,
Figure BDA00033603779600001711
indicating that cognitive user n belongs to URLLC slice user (sensitive to time delay), DmaxRepresenting the maximum transmission delay and ξ represents a parameter.
Figure BDA00033603779600001712
Representing the interference power of each channel, B representing the bandwidth of each channel, kconsIs Boltzmann constant, ITmaxThe maximum temperature of the disturbance,
Figure BDA00033603779600001713
representing channel k in channel set C.
Figure BDA00033603779600001714
Represents the maximum transmission power of the cognitive user n,
Figure BDA00033603779600001715
representAnd the transmission power of the cognitive user n corresponding to the cognitive base station a on the channel k.
Figure BDA00033603779600001716
Representing the correlation coefficient of channel k and primary user m,
Figure BDA0003360377960000181
representing a primary user m in the primary user set PU. a epsilon CBS represents the cognitive base station a in the cognitive base station set CBS,
Figure BDA0003360377960000182
representing the correlation coefficient between the cognitive base station a and the cognitive user n
Equations (C1) through (C3) are binary correlation coefficient constraints. The formula (C4) represents the maximum transmission power limit for cognitive users. Equations (C5) and (C6) are service requirement constraints corresponding to eMBB slice users and URLLC slice users. The formula (C7) is an interference temperature constraint, i.e., tolerance of the primary user to interference generated by the cognitive user.
(II) mapping the cognitive network slice resource allocation problem to reinforcement learning and establishing a model
The graph convolution reinforcement learning algorithm has two key technologies, namely a multi-head attention mechanism and time relation regularization. The convolution kernel adopts a multi-head attention mechanism and is used for capturing high-order effective information, so that the interaction between the intelligent agents is well learned, and the training is more stable. For agent i, its neighbors are denoted Xi. For the attention head m, it can be expressed as shown in formula (4):
Figure BDA0003360377960000183
wherein the content of the first and second substances,
Figure BDA0003360377960000184
attention head m, W of neighbor cognitive user j representing agent imRepresents a weight matrix, hiRepresenting the output of a neuron, X+iSet representing cognitive user i and its neighborsAnd τ represents the scaling factor, (W)mhi)TRepresents WmhiThe transposed matrix of (2). Performing weighted sum on the input characteristic values, then connecting M attention heads of the agent i, and then passing through an MLP (multilayer perceptron) layer or a nonlinear ReLU (linear rectification function/activation function) layer to obtain the output of the final convolution layer, wherein the formula (5) is shown in the specification;
Figure BDA0003360377960000185
wherein, h'iRepresents the output of the convolutional layer, sigma represents the nonlinear activation function, concatenate ·]Indicating a splicing operation, X+iRepresenting a set of cognitive users i and their neighbours, WmRepresenting a weight matrix, M ∈ M representing an attention header M of the attention headers of all neighbor cognitive users.
In addition, because information interaction is required among different agents, in order to reduce the transmission of unnecessary information and increase the effective information interaction ratio, the state of the agents is changed
Figure BDA0003360377960000186
Two categories (state information requiring interaction)
Figure BDA0003360377960000187
And status information that does not require interaction
Figure BDA0003360377960000188
). The neural network structure of the graph convolution reinforcement learning algorithm GQN is shown in fig. 3, and some agents 1, agents 2, … and AgentN need interaction or not interaction.
The second main technology is to provide time relation regularization to promote stable cooperation of the intelligent agents within a certain time, and in an actual application scene, the stable cooperation between the intelligent agents can often obtain the maximum benefit for a long time. Therefore, the KL divergence is adopted to measure the difference between the current attention weight distribution and the target attention weight distribution, and the stable cooperation of the intelligent agent is enhanced. Adding the KL divergence as a regularization term into a loss function, as shown in formula (6):
Figure BDA0003360377960000191
wherein L (theta) represents the value of the loss function, theta represents the parameter of the neural network, BS represents the number of mini-batch in the reinforcement learning algorithm, and rbA mini-batch (mini-batch of the b-th time) winning excitation function value is shown, Q (s, a; theta) represents a Q value when an action a, a state s and a neural network parameter are theta, s represents a state, s' represents a next state, a represents an action, lambda represents a coefficient of regularization loss, M represents the number of attention heads,
Figure BDA0003360377960000192
KL dispersion values representing the weight distribution of the current state and the weight distribution of the next state,
Figure BDA0003360377960000193
indicating the attention weight distribution of agent i in convolutional layer k attention head m.
The smaller the KL Divergence (Kullback-Leibler Divergence), the more the agents can achieve consistent cooperation for a long time, thereby being greatly helpful for capturing the cooperative relationship characteristics among the agents.
In a hybrid access cognitive network slicing scene, the resource allocation problem of all cognitive users is a complex non-convex optimization problem in theory. To solve this problem, the present embodiment proposes a multi-agent reinforcement learning method (CRNGQN algorithm) for hybrid access cognitive radio network slicing scenario. The multi-agent reinforcement learning algorithm is based on GQN algorithm, and adopts a graph structure to represent the cooperation relationship between the agents. In the CRNGQN algorithm, reinforcement learning basic elements are set as follows.
It is assumed that all agent states do not change at time t. First, the state of agent i is designed as
Figure BDA0003360377960000194
Status information required to be interacted with by agents, including
Figure BDA0003360377960000195
Binary coefficient for cognizing whether user meets communication requirement
Figure BDA0003360377960000196
The cognitive user meets the communication requirement, otherwise
Figure BDA0003360377960000197
Status information for agents that do not require interaction, including
Figure BDA0003360377960000198
Wherein the content of the first and second substances,
Figure BDA0003360377960000199
represents the transmission power of agent (cognitive user) i,
Figure BDA00033603779600001910
represents the correlation coefficient of an agent (cognitive user) i and a cognitive base station,
Figure BDA00033603779600001911
representing the correlation coefficient of the agent (primary user) i with the channel. { Gammam}1*mAn SINR matrix (signal to interference plus noise ratio matrix) representing the primary user,
Figure BDA00033603779600001912
a channel occupancy matrix representing primary users.
Second, the behavior of agent i is designed as an optimization variable for the scenario
Figure BDA00033603779600001913
Representing channel association, is a discrete variable, however power
Figure BDA0003360377960000201
The variables are continuous variables. The same as the traditional DQN algorithm processing method, the power in the CRNGQN algorithm needs to be discretized, and the value range of the power is as shown in formula (7):
Figure BDA0003360377960000202
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003360377960000203
representing the maximum power occupied on channel a.
Finally, a reward function is designed. The goal in view of the system is to maximize ηiAnd needs to meet interference temperature constraints. Therefore, in designing the reward function, it is necessary to penalize an action that does not meet the constraint condition and reward an action that increases the target value. Combining the above factors, the reward function is designed as shown in formula (8):
Figure BDA0003360377960000204
wherein r isiA prize value representing the number of agents i,
Figure BDA0003360377960000205
representing a disturbance temperature threshold constraint function,
Figure BDA0003360377960000206
represents the transmission power of agent (cognitive user) i,
Figure BDA0003360377960000207
representing a correlation coefficient, η, representing the agent (primary user) i with the channeliThe representation represents the energy efficiency of the agent (primary user) i.
Wherein
Figure BDA0003360377960000208
For the disturbance temperature threshold constraint function, the expression is shown in equation (9):
Figure BDA0003360377960000209
wherein
Figure BDA00033603779600002010
For Sigmoid function, k belongs to C to represent channel k in channel set C, S represents Sigmoid function, ITmaxRepresenting an interference temperature threshold, n belongs to CU and represents a cognitive user n in a cognitive user set CU, a belongs to CBS and represents a cognitive base station in a cognitive base station set CBS,
Figure BDA00033603779600002011
representing the association coefficient between the cognitive user of agent i and cognitive base station a,
Figure BDA00033603779600002012
representing the correlation coefficient of the cognitive user of agent i with channel k,
Figure BDA00033603779600002013
representing the gain, k, of cognitive users of agent i with channel kconsIs Boltzmann constant, B denotes the channel bandwidth, RiIndicating the transmission rate.
The slice resource allocation technology for the hybrid access cognitive wireless network of the embodiment is used for efficiently selecting the optimal strategy in a complex cognitive network by combining a deep reinforcement learning technology. The main innovation points are as follows: (1) under the 5G network slice scene, a multi-slice cognitive radio network model based on a hybrid underlay-overlay spectrum access mode is provided. The model introduces a network slicing technology into a multi-service cognitive radio network scene, cognitive users can access to a channel without a main user by adopting an overlay spectrum access mode, and access to a channel with a main user by adopting an underlay spectrum access mode. (2) Aiming at the problem of resource allocation in a cognitive radio network, a multi-target self-adaptive deep reinforcement learning framework is provided. The framework is applicable to a variety of objective functions such as network throughput, spectral efficiency, and energy efficiency. (3) A deep reinforcement learning algorithm combined with a graph convolution neural network is provided. The algorithm uses GAT to help the DQN agent extract environment information, and introduces a time relation regularization mechanism to improve the stability of the proxy relation. The algorithm classifies the states of the agents according to whether the agents (intelligent agents) need to communicate or not, reduces unnecessary state information interaction, and improves the efficiency of information interaction and the convergence speed of the algorithm.
In the embodiment, the multi-agent reinforcement learning method for the slice resource allocation scene of the hybrid access mode cognitive network is realized. On the basis of the cognitive radio technology, a hybrid spectrum access mechanism is introduced, so that the utilization rate of a spectrum can be further improved. Aiming at the scene, a CRNGQN (Cognitive Relay network, CRN) algorithm based on graph convolution reinforcement learning is provided, a graph structure is established by aiming at a design state, an action and a reward function, and the model is set to be three module coding layers, a graph convolution layer and a DQN (deep reinforcement learning) layer, so that cooperative communication among multiple intelligent agents is enhanced, and the convergence speed of the model is improved. The influence of the learning rate, the number of graph convolution layers and the number of neighbors of the agent on the experimental result is explored through experimental simulation, and the feasibility of the algorithm is proved. Compared with a DGN algorithm and a DQN algorithm which do not classify states according to the need of interaction, the proposed algorithm can obtain higher reward, convergence speed and stability. Comparing the hybrid spectrum access mode with the overlay and underlay access modes, the energy efficiency in the proposed spectrum hybrid access mode is better than that of the single overlay and underlay spectrum access modes.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A slice resource allocation method for a hybrid access cognitive radio network is characterized by comprising the following steps:
constructing an intelligent agent for each cognitive user in a hybrid access cognitive wireless network, wherein the state of the intelligent agent corresponds to information which needs to be interacted and information which does not need to be interacted between the current cognitive user and other cognitive users in the hybrid access cognitive wireless network, and the action of the intelligent agent corresponds to the transmission power of the current cognitive user and a channel occupied by the current cognitive user;
determining the state of an agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent through a convolution layer in the neural network model after all the attention heads are connected;
obtaining a transmission power prediction result and an occupied channel prediction result of a cognitive user of the intelligent agent according to the action related information of the intelligent agent, and judging whether the current action of the intelligent agent meets the communication delay requirement when the current action of the intelligent agent is an ultra-high reliable low-delay communication slice user and the transmission rate requirement when the current action of the intelligent agent is an enhanced mobile broadband slice user or not based on the transmission power prediction result and the occupied channel prediction result of the cognitive user, and whether the current action of the intelligent agent meets the interference temperature threshold value requirement or not according to the value of an interference temperature threshold value constraint function;
punishing the reward function if the communication delay requirement, the transmission rate requirement and the interference temperature threshold value requirement are not met, and performing reward and punishment on the current action of the intelligent body by utilizing the value of the reward function so as to train the neural network model and the intelligent body; the reward function is determined based on an interference temperature threshold constraint function and a cognitive user energy efficiency function, and the interference temperature threshold constraint function is determined based on the ratio of interference power to channel bandwidth and is a function related to transmission power and a channel;
and distributing network slice resources for cognitive users in the hybrid access cognitive wireless network by using the trained neural network model and the intelligent agent.
2. The hybrid access cognitive radio network slice resource allocation method of claim 1,
the information that the current cognitive user needs to interact with other cognitive users in the hybrid access cognitive wireless network comprises the following information: the method comprises the following steps that binary correlation coefficients of a cognitive user and a cognitive base station, binary correlation coefficients of the cognitive user and a channel, transmission power of the cognitive user and a binary coefficient whether the cognitive user meets communication requirements or not are obtained;
the information that the current cognitive user does not need to interact with other cognitive users in the hybrid access cognitive wireless network comprises the following information: and recognizing the signal-to-interference-and-noise ratio of the user and the binary correlation coefficient of the main user and the channel.
3. The slice resource allocation method for a hybrid access cognitive wireless network according to claim 1, wherein the determining of the state of the agent and the inputting of the determined state of the agent into the neural network model based on graph convolution under the scene of the hybrid access cognitive wireless network are performed to calculate an attention head for each neighbor cognitive user in the state of the agent and output the action related information of the agent via convolution layers in the neural network model after all the attention heads are connected, and the method comprises:
determining the state of the agent, inputting the determined state of the agent into a neural network model based on graph convolution under a hybrid access cognitive wireless network scene, calculating an attention head for each neighbor cognitive user in the state of the agent, and outputting action related information of the agent after connecting all the attention heads and sequentially passing through a nonlinear activation function and a convolution layer in the neural network model.
4. The hybrid access cognitive wireless network slice resource allocation method of claim 3, wherein the attention header computed for each neighbor cognitive user in the state of the agent is represented as:
Figure FDA0003360377950000021
wherein the content of the first and second substances,
Figure FDA0003360377950000022
attention head m, W representing neighbor cognitive user j of agent imWeight matrix, h, representing attention head miRepresenting output of neurons corresponding to cognitive users of agent i, hjRepresents the output, X, of the neuron corresponding to the neighbor cognitive user j+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, k belongs to X+iA set of representations X+iCognitive users k, h in (1)kRepresents the output of the neuron corresponding to the neighbor cognitive user k, tau represents the scaling factor, and (W)mhk)TRepresents WmhkThe transposed matrix of (W)mhj)TRepresents WmhjThe transposed matrix of (2);
the output of the convolutional layer in the neural network model is represented as:
Figure FDA0003360377950000023
wherein h isi' represents the output of the convolutional layer, and σ represents the nonlinearityActivation function, concatenate [. cndot]Indicating a splicing operation, X+iRepresenting the set formed by the cognitive users of the agent i and the neighbor cognitive users, j belongs to X+iSet of representations X+iCognitive users j, h in (1)jA set of representations X+iOutput of neuron corresponding to cognitive user in (1), WmA weight matrix representing attention headers M, M ∈ M representing attention headers M in the set M of attention headers of all neighbor aware users.
5. The method for allocating slice resources of a hybrid access cognitive wireless network according to claim 1, wherein the current action of the agent is rewarded and punished by using a value of a reward function to train a neural network model and the agent, comprising:
calculating the value of the loss function by using the value of the reward function, and returning the value of the loss function to the neural network model so as to reward and punish the current action of the agent and train the neural network model and the agent; the loss function comprises a KL gradient regularization term of attention weight distribution, wherein the KL gradient regularization term is used for measuring the difference between the attention weight distribution corresponding to the connection results of all current attention heads and the target attention weight distribution.
6. The hybrid access cognitive radio network slice resource allocation method of claim 5,
the loss function is expressed as:
Figure FDA0003360377950000031
wherein L (theta) represents the value of the loss function, theta represents the parameter of the neural network, BS represents the mini-batch number, rbDenotes the value of the mini-batch winning excitation function at the b-th time, gamma denotes the signal to interference plus noise ratio, Q(s)b', a'; θ) represents the next action a', the next state sb' and Q value when the neural network parameter is theta, Q (s, a; theta) represents the Q value when the action a, the state s and the neural network parameter are theta,λ represents the coefficient of the regularization loss, M represents the number of attention heads,
Figure FDA0003360377950000032
weight distribution representing current state
Figure FDA0003360377950000033
And weight distribution of the next state
Figure FDA0003360377950000034
The value of the KL dispersion of (A),
Figure FDA0003360377950000035
and
Figure FDA0003360377950000036
the attention weight distribution of the agent at the attention head m of convolutional layer k is shown.
7. The hybrid access cognitive radio network slice resource allocation method of claim 1,
the reward function is represented as:
Figure FDA0003360377950000037
Figure FDA0003360377950000038
Figure FDA0003360377950000039
wherein r isiA value representing a value of a reward function for the current action of agent i,
Figure FDA00033603779500000310
representing a disturbance temperature threshold constraint function,
Figure FDA00033603779500000311
represents the transmission power when the cognitive user of agent i is connected to cognitive base station a and selects channel k,
Figure FDA00033603779500000312
representing the correlation coefficient, eta, of the cognitive users of agent i with cognitive base station aiRepresenting the energy efficiency of a cognitive user of an agent i, k ∈ C representing a channel k in a channel set C, S representing a Sigmoid function, ITmaxRepresenting an interference temperature threshold value, n belongs to CU and represents cognitive users n in a cognitive user set CU, a belongs to CBS and represents cognitive base stations in a cognitive base station set CBS,
Figure FDA0003360377950000041
representing the association coefficient between the cognitive user of agent i and cognitive base station a,
Figure FDA0003360377950000042
representing the correlation coefficient of the cognitive user of agent i with channel k,
Figure FDA0003360377950000043
representing the gain, k, of cognitive users of agent i with channel kconsIs Boltzmann constant, B denotes the channel bandwidth, RiIndicating the transmission rate.
8. The hybrid access cognitive radio network slice resource allocation method of claim 1,
the signal to interference plus noise ratio is expressed as:
Figure FDA0003360377950000044
wherein, γnRepresenting the SINR of a cognitive user n, a ∈ CBS tableThe cognitive base station a in the cognitive base station set CBS is shown, k epsilon C represents a channel k in the channel set C,
Figure FDA0003360377950000045
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure FDA0003360377950000046
representing the correlation coefficient of the channel k and the cognitive user n,
Figure FDA0003360377950000047
representing the channel gain of the cognitive user n and the cognitive base station a,
Figure FDA0003360377950000048
the transmission power when the cognitive user n and the cognitive base station a are represented and a channel k is selected, n 'epsilon SU represents the cognitive user n' in the cognitive user set SU,
Figure FDA0003360377950000049
representing the correlation coefficient of the cognitive base station a and the cognitive user n',
Figure FDA00033603779500000410
representing the correlation coefficient of the channel k and the cognitive user n',
Figure FDA00033603779500000411
the transmission power of a cognitive user n' and a cognitive base station a when a channel k is selected is represented, the m belongs to the PU and represents a master user m in a master user set PU,
Figure FDA00033603779500000412
representing the correlation coefficient, g, of channel k and primary user mnShows the channel gain of the cognitive user n and the main base station,
Figure FDA00033603779500000413
indicating the transmission power when the primary user m selects channel k,σ2Representing white gaussian noise;
the disturbance temperature threshold constraint function is expressed as:
Figure FDA00033603779500000414
wherein n belongs to CU and represents cognitive user n in the cognitive user set CU, a belongs to CBS and represents a cognitive base station a in the cognitive base station set CBS,
Figure FDA00033603779500000415
representing the correlation coefficient of the cognitive base station a and the cognitive user n,
Figure FDA00033603779500000416
representing the correlation coefficient of the channel k and the cognitive user n,
Figure FDA00033603779500000417
representing the channel gain of the cognitive user n and the cognitive base station a,
Figure FDA00033603779500000418
representing the transmission power when the cognitive user n and the cognitive base station a select a channel k;
the transmission rate requirement is expressed as:
Figure FDA00033603779500000419
wherein R isnRepresenting the transmission rate, R, of cognitive user nminDenotes a minimum transmission rate threshold, n ∈ CUeMBBCognitive user set CU denoted as enhanced mobile broadband slice usereMBBCognitive user n in (1);
the communication delay requirement is expressed as:
Figure FDA0003360377950000051
where D represents the parameter of the Poisson distribution followed by the arrival rate, ξ is a set number, D ismaxFor maximum transmission delay, n ∈ CUURLLCCognitive user set CU represented as ultra-high-reliability low-delay communication slice userURLLCCognitive user n in (1).
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202111365101.5A 2021-11-17 2021-11-17 Slice resource allocation method and equipment for hybrid access cognitive wireless network Pending CN114095940A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111365101.5A CN114095940A (en) 2021-11-17 2021-11-17 Slice resource allocation method and equipment for hybrid access cognitive wireless network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111365101.5A CN114095940A (en) 2021-11-17 2021-11-17 Slice resource allocation method and equipment for hybrid access cognitive wireless network

Publications (1)

Publication Number Publication Date
CN114095940A true CN114095940A (en) 2022-02-25

Family

ID=80301814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111365101.5A Pending CN114095940A (en) 2021-11-17 2021-11-17 Slice resource allocation method and equipment for hybrid access cognitive wireless network

Country Status (1)

Country Link
CN (1) CN114095940A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356535A (en) * 2022-03-16 2022-04-15 北京锦诚世纪咨询服务有限公司 Resource management method and device for wireless sensor network
CN115412401A (en) * 2022-08-26 2022-11-29 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356535A (en) * 2022-03-16 2022-04-15 北京锦诚世纪咨询服务有限公司 Resource management method and device for wireless sensor network
CN115412401A (en) * 2022-08-26 2022-11-29 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding
CN115412401B (en) * 2022-08-26 2024-04-19 京东科技信息技术有限公司 Method and device for training virtual network embedding model and virtual network embedding

Similar Documents

Publication Publication Date Title
Peng et al. Deep reinforcement learning based resource management for multi-access edge computing in vehicular networks
Zhu et al. Deep reinforcement learning for mobile edge caching: Review, new features, and open issues
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
Asheralieva et al. Learning-based mobile edge computing resource management to support public blockchain networks
CN111726811B (en) Slice resource allocation method and system for cognitive wireless network
CN109862610A (en) A kind of D2D subscriber resource distribution method based on deeply study DDPG algorithm
CN114095940A (en) Slice resource allocation method and equipment for hybrid access cognitive wireless network
CN112512070B (en) Multi-base-station cooperative wireless network resource allocation method based on graph attention mechanism reinforcement learning
Wang et al. Joint resource allocation and power control for D2D communication with deep reinforcement learning in MCC
CN113784410B (en) Heterogeneous wireless network vertical switching method based on reinforcement learning TD3 algorithm
CN116541106B (en) Computing task unloading method, computing device and storage medium
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
Han et al. Joint resource allocation in underwater acoustic communication networks: A game-based hierarchical adversarial multiplayer multiarmed bandit algorithm
Zhang et al. Topology aware deep learning for wireless network optimization
CN116600316A (en) Air-ground integrated Internet of things joint resource allocation method based on deep double Q networks and federal learning
US20230093673A1 (en) Reinforcement learning (rl) and graph neural network (gnn)-based resource management for wireless access networks
Du et al. Virtual relay selection in LTE-V: A deep reinforcement learning approach to heterogeneous data
Hlophe et al. AI meets CRNs: A prospective review on the application of deep architectures in spectrum management
CN113811009A (en) Multi-base-station cooperative wireless network resource allocation method based on space-time feature extraction reinforcement learning
Elhachmi Distributed reinforcement learning for dynamic spectrum allocation in cognitive radio‐based internet of things
CN115811788A (en) D2D network distributed resource allocation method combining deep reinforcement learning and unsupervised learning
Rohoden et al. Evolutionary game theoretical model for stable femtocells’ clusters formation in hetnets
Erman et al. Modeling 5G wireless network service reliability prediction with bayesian network
Malandrino et al. Efficient distributed DNNs in the mobile-edge-cloud continuum
CN114584951A (en) Combined computing unloading and resource allocation method based on multi-agent DDQN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination