CN114116209A

CN114116209A - Spectrum map construction and distribution method and system based on deep reinforcement learning

Info

Publication number: CN114116209A
Application number: CN202111341780.2A
Authority: CN
Inventors: 周力; 刘兴光; 谭翔; 魏急波; 赵海涛; 熊俊; 高文颖; 黄圣春; 张姣; 曹阔
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-03-01

Abstract

The invention discloses a method and a system for building and distributing a frequency spectrum map based on deep reinforcement learning. The invention provides a method for constructing and distributing a frequency spectrum map with an expandable network scale in a mobile edge network scene, which provides a real-time high-precision frequency spectrum map service for network cognitive users by using the minimum frequency spectrum data load and the minimum calculation overhead, and effectively improves the frequency spectrum efficiency and the robustness of a mobile edge network.

Description

Spectrum map construction and distribution method and system based on deep reinforcement learning

Technical Field

The invention relates to the technical field of wireless communication networks, in particular to a method and a system for building and distributing a frequency spectrum map based on deep reinforcement learning.

Background

In the mobile edge network, cognitive users determine own unloading strategies according to perceived available spectrum information and own computing resources, single agents or uncooperative multiple agents do not cooperate or exchange respective operations, frequency conflicts can be generated among the cognitive users in the process of building and distributing a mobile edge network spectrum map, or an edge server is overloaded, so that network energy consumption is increased and user service quality is reduced.

Disclosure of Invention

The invention provides a spectrum map construction and distribution method and system based on deep reinforcement learning, which are used for overcoming the defects that single agents or uncooperative multiple agents do not cooperate or exchange respective operations and the like in the prior art.

In order to achieve the above object, the present invention provides a spectrum map construction and distribution method based on deep reinforcement learning, which includes the following steps:

modeling the construction and distribution problems of the spectrum map in the mobile edge network into a calculation and communication compromise model, and constructing a centralized training and distributed execution reinforcement learning framework; the reinforcement learning framework comprises an offline training module and an online execution module;

acquiring available bandwidth information according to the self frequency spectrum sensing capability of the cognitive user;

according to the available bandwidth information and the computing power of the cognitive user terminal, resource allocation is carried out on the bandwidth of the mobile edge network and the computing power of the edge server by using an offline training module, and an unloading strategy is selected; the unloading strategies comprise a full unloading strategy, a partial unloading strategy and a local computing strategy;

through a trained online execution module, performing data distribution, unloading and calculation of the cognitive user by using a selected unloading strategy, and constructing a frequency spectrum map step by step;

and monitoring available bandwidth information perceived by a cognitive user in real time, and when the change of the available bandwidth information is monitored, retraining the online execution module by using the offline training module, and determining an unloading strategy of a new environment so as to adapt to a complex and changeable communication environment.

In order to achieve the above object, the present invention further provides a spectrum map construction and distribution system based on deep reinforcement learning, including:

the model building module is used for modeling the building and distribution problems of the spectrum map in the mobile edge network into a calculation and communication compromise model and building a centralized training and distributed execution reinforcement learning framework; the reinforcement learning framework comprises an offline training module and an online execution module;

the frequency spectrum sensing module is used for acquiring available bandwidth information according to the self frequency spectrum sensing capability of the cognitive user;

the resource allocation and strategy selection module is used for allocating resources to the bandwidth of the mobile edge network and the computing capacity of the edge server by utilizing the offline training module according to the available bandwidth information and the computing capacity of the cognitive user terminal and selecting an unloading strategy; the unloading strategies comprise a full unloading strategy, a partial unloading strategy and a local computing strategy;

the spectrum map building module is used for distributing, unloading and calculating data of the cognitive user by using the selected unloading strategy through the trained online execution module, and building a spectrum map step by step;

and the real-time monitoring module is used for monitoring the available bandwidth information sensed by the cognitive user in real time, and when the change of the available bandwidth information is monitored, the offline training module is used for retraining the online execution module and determining an unloading strategy of a new environment so as to be self-adaptive to the complex and changeable communication environment.

To achieve the above object, the present invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

To achieve the above object, the present invention further proposes a computer-readable storage medium having a computer program stored thereon, which, when being executed by a processor, implements the steps of the above method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a spectrum map construction and distribution method based on deep reinforcement learning, which models the problems of joint unloading and resource management in a mobile edge network into a calculation and communication compromise model in the mobile edge network, and constructs a reinforcement learning framework with centralized training and distributed execution, wherein the reinforcement learning framework comprises an offline training module and an online execution module, the online execution module carries out step-by-step construction of the spectrum map by utilizing a learned unloading strategy, and the offline training module dynamically updates the online execution module according to the unloading calculation result of a cognitive user. The invention provides a method for constructing and distributing a frequency spectrum map with an expandable network scale in a mobile edge network scene, which provides a real-time high-precision frequency spectrum map service for network cognitive users by using the minimum frequency spectrum data load and the minimum calculation overhead, and effectively improves the frequency spectrum efficiency and the robustness of a mobile edge network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a schematic diagram of a centralized training, distributed execution reinforcement learning framework according to the present invention;

fig. 2 is a schematic view of an unloading strategy in the spectrum map construction and distribution method based on deep reinforcement learning according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

The invention provides a spectrum map construction and distribution method based on deep reinforcement learning, which comprises the following steps:

101: modeling the construction and distribution problem of the spectrum map in the mobile edge network into a calculation and communication compromise model, and constructing a centralized training and distributed execution reinforcement learning framework (as shown in figure 1); the reinforcement learning framework comprises an offline training module and an online execution module;

102: acquiring available bandwidth information according to the self frequency spectrum sensing capability of the cognitive user;

103: according to the available bandwidth information and the computing power of the cognitive user terminal, resource allocation is carried out on the bandwidth of the mobile edge network and the computing power of the edge server by using an offline training module, and an unloading strategy is selected; the unloading strategies comprise a full unloading strategy, a partial unloading strategy and a local computing strategy;

104: through a trained online execution module, performing data distribution, unloading and calculation of the cognitive user by using a selected unloading strategy, and constructing a frequency spectrum map step by step;

105: and monitoring available bandwidth information perceived by a cognitive user in real time, and when the change of the available bandwidth information is monitored, retraining the online execution module by using the offline training module, and determining an unloading strategy of a new environment so as to adapt to a complex and changeable communication environment.

The invention models the combined unloading and resource management problem in the mobile edge network into a calculation and communication compromise model in the mobile edge network, and constructs a centralized training and distributed execution reinforcement learning framework, wherein the reinforcement learning framework comprises an offline training module and an online execution module, the online execution module utilizes the learned unloading strategy to construct a spectrum map step by step, and the offline training module dynamically updates the online execution module according to the unloading calculation result of a cognitive user. The invention provides a method for constructing and distributing a frequency spectrum map with an expandable network scale in a mobile edge network scene, which provides a real-time high-precision frequency spectrum map service for network cognitive users by using the minimum frequency spectrum data load and the minimum calculation overhead, and effectively improves the frequency spectrum efficiency and the robustness of a mobile edge network.

In one embodiment, for step 101, the offline training module includes a centralized trainer that is built by an edge server (e.g., a small cell, a wireless access point, or a drone assisted edge computing server, etc.).

The reinforcement learning framework is a reinforcement learning framework with centralized training and distributed execution.

The online execution module comprises a policy network, and the policy network is loaded on the cognitive user side.

In the next embodiment, for step 101, the offline training module collects available bandwidth information perceived by the cognitive users through a common channel, trains a mutually-coordinated policy network for each cognitive user by using the collected available bandwidth information, and sends the trained policy network parameters to the corresponding cognitive user through the common channel to update the parameters of the policy network of the corresponding cognitive user side; the policy network comprises an unloading policy of a cognitive user, computing resource allocation of an edge server and bandwidth allocation of a mobile edge network.

In one embodiment, training a cooperative policy network for each cognitive user using the collected available bandwidth information includes:

001: taking the sum of bandwidth, energy and time consumed in the process of constructing and distributing the frequency spectrum map as an objective function of the mobile edge network system;

wherein N represents the total number of cognitive users, W_nThe bandwidth consumed by the edge server to distribute the spectrum map to the nth cognitive user, E_nThe edge server constructs and distributes a frequency spectrum map to the nth cognitive user to consume the total calculation and communication energy, T_nThe total time consumed by the edge server in the process of constructing and distributing the spectrum map to the nth cognitive user is included.

002: establishing a reward function in a reinforcement learning framework according to the target function;

r_t＝(w'-w)/w

in the formula, r_tRepresenting rewards of all cognitive users after the cognitive users execute the strategy network parameters at the time t; w' represents a target function obtained after all cognitive users execute the strategy network parameters at the time t; and w represents an objective function obtained after all cognitive users execute the strategy network parameters at the time of t-1.

003: and outputting the reward value of the current strategy network by using the reward function according to the target function obtained by the cognitive user executing the uninstalling strategy.

004: and inputting the reward value into a neural network for training, and training a mutually-cooperative strategy network for each cognitive user.

During offline centralized training, a strategy network is obtained through neural network training according to the reward value output by the reward function, namely, the spectrum bandwidth distributed by the cognitive user, the server computing resource and the unloading strategy are obtained.

In the embodiment, an edge server corresponds to N cognitive users in a mobile edge network, the cognitive users sense available bandwidth through the spectrum sensing capability of the cognitive users, the edge server collects information of all the cognitive users through a common channel, an unloading strategy is established for the cognitive users through neural network training, spectrum bandwidth is distributed to the cognitive users, server computing resources are calculated, the cognitive users execute a strategy network in a distributed mode, an objective function is obtained through calculation, the objective functions of all the cognitive users are added to serve as an objective function of the mobile edge network, the objective function establishes a reward function of the strategy network in a reinforcement learning frame, and the strategy network is updated through neural network training.

In another embodiment, for step 103, the full offload policy, as shown in fig. 2, includes:

constructing original spectrum data acquired by a cognitive user into a low-resolution spectrum map by using an edge server and adopting a kriging interpolation algorithm;

and constructing the low-resolution frequency spectrum map into a high-resolution frequency spectrum map by utilizing a super-resolution algorithm, compressing the high-resolution frequency spectrum map and distributing the compressed high-resolution frequency spectrum map to mobile edge network terminal users.

In a next embodiment, for step 103, the partial offload policy, as shown in fig. 2, includes:

constructing the original spectrum data collected by the cognitive user into a low-resolution spectrum map by using an edge server and adopting a kriging interpolation algorithm

And distributing the low-resolution frequency spectrum map to a terminal user of a mobile edge network, and converting the low-resolution frequency spectrum map into a high-resolution frequency spectrum map by using a super-resolution algorithm by using the terminal user.

In one embodiment, for step 103, locally computing the policy, as shown in fig. 2, includes:

distributing original spectrum data acquired by the cognitive user to a mobile edge network terminal user by using an edge server;

and constructing the original spectrum data into a low-resolution spectrum map by using a kriging interpolation algorithm by using a terminal user, and constructing the low-resolution spectrum map into a high-resolution spectrum map by using a super-resolution algorithm.

The invention also provides a spectrum map construction and distribution system based on deep reinforcement learning, which comprises the following steps:

The invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method described above.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A spectrum map construction and distribution method based on deep reinforcement learning is characterized by comprising the following steps:

2. The deep reinforcement learning-based spectrum map building and distributing method according to claim 1, wherein the offline training module comprises a centralized trainer, and the centralized trainer is built through an edge server; the online execution module comprises a policy network, and the policy network is loaded on the cognitive user side.

3. The method for building and distributing spectrum map based on deep reinforcement learning according to claim 2, wherein the offline training module collects available bandwidth information perceived by the cognitive users through a common channel, trains a mutually cooperative strategy network for each cognitive user by using the collected available bandwidth information, and sends the trained strategy network parameters to the corresponding cognitive user through the common channel to update the parameters of the strategy network of the corresponding cognitive user side; the policy network comprises an unloading policy of a cognitive user, computing resource allocation of an edge server and bandwidth allocation of a mobile edge network.

4. The deep reinforcement learning-based spectrum map building and distributing method according to claim 3, wherein training a cooperative strategy network for each cognitive user by using the collected available bandwidth information comprises:

taking the sum of bandwidth, energy and time consumed in the process of constructing and distributing the frequency spectrum map as an objective function of the mobile edge network system;

establishing a reward function in a reinforcement learning framework according to the target function;

outputting the reward value of the current strategy network by using the reward function according to a target function obtained by the cognitive user executing the uninstalling strategy;

and inputting the reward value into a neural network for training, and training a mutually-cooperative strategy network for each cognitive user.

5. The deep reinforcement learning-based spectrum map building and distributing method according to claim 1, wherein the full offloading strategy comprises:

6. The deep reinforcement learning-based spectrum map building and distributing method according to claim 1, wherein the partial offloading policy includes:

7. The deep reinforcement learning-based spectrum map building and distributing method according to claim 1, wherein the local computing strategy comprises:

8. A spectrum map construction and distribution system based on deep reinforcement learning is characterized by comprising the following components:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method of any of claims 1-8.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.