CN111104732A

CN111104732A - Intelligent planning method for mobile communication network based on deep reinforcement learning

Info

Publication number: CN111104732A
Application number: CN201911219452.8A
Authority: CN
Inventors: 杨若鹏; 聂宗哲; 殷昌盛; 江尚; 朱巍; 邹小飞; 张其增
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-05-05
Anticipated expiration: 2039-12-03
Also published as: CN111104732B

Abstract

The invention discloses a method and device for intelligent planning of a mobile communication network based on deep reinforcement learning. The method includes the following steps: 1. Resource element preprocessing; Elements are preprocessed; 2. Planning rule preprocessing, which preprocesses the planning rules of the mobile communication network; 3. Training sample generation, which performs random Monte Carlo search calculus on the preprocessing results to generate training samples; 4. Model Training, based on the recurrent neural network, using training samples to train the network planning model; 5. Model generation, constructing a joint loss function, and repeating the search and training of the samples according to the instructions of the joint loss function to generate a mobile communication network network planning model. The method and device for intelligent planning of mobile communication network based on deep reinforcement learning effectively solve the problems that the current network planning of mobile communication network relies heavily on manual operations, planning time exceeds task requirements, poor adaptability to unexpected tasks and unfamiliar environments, and low resource utilization. , which improves the overall efficiency of the network planning of the mobile communication network.

Description

Intelligent planning method for mobile communication network based on deep reinforcement learning

Technical Field

The invention relates to the technical field of information, in particular to an intelligent planning method for a motor-driven communication network.

Background

The mobile communication network generally refers to a mobile communication network used in the special field for guaranteeing large-scale special tasks, and is a comprehensive mobile network generally composed of multiple sub-networks and multiple devices such as a fixed optical fiber network, a microwave network, a satellite network, an ascending relay network, a short-wave and ultrashort-wave radio station network and the like, wherein the minimum unit of the comprehensive mobile network is a single communication guarantee platform or device and is regarded as a guarantee node in the mobile communication network. The number of guaranteed people of the mobile communication network is hundreds and above, the random installation requirement is high, the time is short, and the planning time is within 24 hours or shorter.

Network planning means that a network planning person or a technical support person fully utilizes the existing system equipment to balance the actual requirements and contradiction barriers, and planning and organizing work aiming at the network erection of the mobile communication network is performed for ensuring the completion of the current task. The method mainly refers to the steps of selecting sites for each network system device of the mobile communication network and designing a network architecture according to various devices, connection relations and geographic environments of task personnel, groups and task supply supported and guaranteed by the mobile communication network so as to support erection and deployment of the devices.

Because the mobile communication network is generally used for guaranteeing various sudden tasks at times and places which are difficult to predict, the mobile communication network has the characteristics of large difference of network planning requirements, complex content, limited equipment conditions, urgent time requirements and the like. Currently, mobile communication network planning usually uses a large amount of manual work combined with a fixed algorithm system for planning, wherein the manual planning needs a professional planning staff to accumulate a large amount of experience in the working process to be possibly qualified for the work, and needs more staff, and has the defects of long planning time, frequent data interaction and the like; the fixed algorithm system for the network planning of the mobile communication network can assist planning personnel in planning to a certain extent, but the planning system using the fixed algorithm cannot be flexibly applied to all concrete scenes which can possibly expand the mobile communication network, cannot deal with different geographic environments, equipment limitations and other conditions without bottom layer modification, and because the system is mainly designed on the network design, when the network scale is increased, a more intuitive and accurate planning result cannot be obtained after constraint conditions are increased, only limited auxiliary support can be provided for the planning personnel, and the task guarantee effect is influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and aims to solve the practical problems of complex network planning conditions, time emergency, uncertain places, limited equipment conditions and the like of the mobile communication network, so that the intelligent planning method of the mobile communication network based on deep reinforcement learning is realized.

In order to achieve the purpose, the invention adopts the following technical scheme:

an intelligent planning method for a mobile communication network based on deep reinforcement learning comprises the following steps:

s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;

s1.1, preprocessing an erection region of a mobile communication network;

s1.2, preprocessing guarantee nodes of the mobile communication network;

s1.3, preprocessing the guaranteed user of the mobile communication network.

S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;

s2.1, preprocessing the connection relation of the mobile communication network;

s2.2, preprocessing the planning state of the mobile communication network.

S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;

s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;

s3.2, correspondingly generating the positions of the guaranteed users, and performing simulated deployment by using a search algorithm;

and S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.

S4, model training, based on a deep reinforcement learning algorithm such as a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step S3, optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT), and obtaining an optimized training sample;

s4.1, initializing and using three types of elements to describe a planning situation;

s4.2, constructing a filter (filter) by adopting a public full convolution network through the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;

s4.3, feeding back the result of the step S4.2 to the step 3.2, and refining the searching process;

s4.4, defining local strategy evaluation;

s4.5, combining the output of the recurrent neural network, and updating all the searching processes into the deployment actions for searching the maximum value;

and S4.6, according to the process of the step S4.5, combining the time and effective results for each situation, executing a search process and determining a new address selection strategy.

S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;

s5.1, constructing a joint loss function according to the training target;

s5.2, comparing the model after training with the model before training, and judging the result according to the simulation model rule;

and S5.3, training based on the steps S4.1 and S4.2 to obtain a mobile communication network planning model.

The invention adopts the intelligent planning method of the mobile communication network based on the deep reinforcement learning, and has the advantages that:

1. the Monte Carlo tree searching method based on the upper limit confidence interval algorithm (UCT) is adopted, and the recursive neural network which is simple in structure but practical and effective is combined, so that the computational power requirement and the processing time of hardware are greatly reduced, and the problem of network planning of the maneuvering network can be solved quickly;

2. by adopting the deep reinforcement learning algorithm to train the intelligent planning model, the planning model overcomes the defect of single applicable scene and can adapt to the scenes of different regions, different security equipment and different secured users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a detailed flow diagram of an embodiment of the intelligent planning method for a mobile communication network based on deep reinforcement learning according to the present invention;

fig. 2 is a block diagram of the composition structure of the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Referring to the attached fig. 1, a schematic flow chart of an embodiment of the intelligent planning method for the mobile communication network based on deep reinforcement learning of the present invention is shown, which specifically includes the following steps:

s1.1, preprocessing the erection region of the mobile communication network, and abstracting the erection region by analogy to a chessboard. Setting the region size N²km²The method comprises the following steps of taking a left lower corner coordinate of a topographic map of an erection region as a zero point coordinate, taking a certain divisor of N as a unit length, transversely and longitudinally dividing the erection region, taking each intersection point as a positioning point to obtain a node position matrix, presetting the erection region as a square region with the same length and width in the patent to obtain an NxN node position matrix, and continuously performing multiple expansion subdivision;

s1.2, preprocessing communication platforms/devices of the mobile communication network, namely guarantee nodes (such as mobile communication vehicles, mobile radio stations, mobile stations and the like), presetting P-type guarantee nodes, wherein the communication distance R and the link quantity L of the guarantee nodes are determined according to the specific models of the devices. In this patent, the security nodes are mainly divided into two categories, i.e. main nodes P₁And a secondary node P₂Sequentially modeling according to the guarantee priority B, and setting the priority of the main node as B₁The secondary node has a priority of B₂Wherein the communication guarantee range of the main node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle₁km is a circle of radius, and the number of links is set to L₁The communication guarantee range of the secondary node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle₁km, single-hop shortwave communication distance R₂km is a circle of radius, and the number of microwave links is set to L₂The number of short-wave links is set to L'₂；

S1.3, pre-processing guaranteed users (such as military unit group classes, grade classes, continuous classes, individual soldiers and the like of different levels) of the mobile communication network, presetting Q-type guarantee nodes, and determining the communication distance R and the link quantity L of the guaranteed user nodes according to the specific models of the equipment. The guaranteed users in the patent are mainly divided into three categories, namely main user Q₁Secondary user Q₂And subordinate user Q₃Sequentially modeling according to the guarantee priority A, and setting the priority of the main user as A₁The secondary user is A₂The subordinate user is A₃Wherein the single-hop microwave communication distance of the main user is R₁km, number of links U₁(ii) a The secondary user single-hop microwave communication distance is R₁km, number of links U₂The single-hop short-wave communication distance of the subordinate user is R₂Km, number of links is U'₃。

The resource elements of the mobile communication network are abstracted and mapped, and support is provided for subsequent completion of rules and integral modeling of the mobile communication network.

S2, planning rule preprocessing, and preprocessing the planning rule of the mobile communication network. Abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;

s2.1, preprocessing the connection relation of the mobile communication network, and associating a guarantee node with a guarantee node and a guaranteed user;

s2.1.1, associating the guarantee node with the guaranteed user according to the priority association A → B, and determining the guarantee relationship. In this patent, A₁And B₁Corresponding to (A)₂And B₂、B₃Corresponding, i.e. primary node P₁Securing primary user Q₁The secondary node P₂Securing secondary user Q₂And subordinate user Q₃Each user needs to have at least one corresponding security node connected with the user;

s2.1.2, determining the connection relationship between the guarantee nodes, in this patent, all the primary nodes need to form a connected graph, and the secondary nodes P₂Must communicate with at least one primary node P₁Connecting;

s2.1.3, all connections need to satisfy the communication type specified in step S1, i.e., links of the same communication type can be connected;

s2.1.4, all connections need to satisfy the number of links specified in step S1, that is, the number of connections cannot exceed the specified number L of node links to be connected;

s2.1.5, all connections need to satisfy the communication distance specified in step S1, i.e. the distance between any two nodes must be less than the maximum communication distance R of the communication device used to be connectable;

s2.1.6, the minimum spanning tree is formed by the minimum requirement of the topological structure of the whole mobile network communication network;

s2.2, preprocessing the planning state of the mobile communication network, establishing a network situation S according to the guaranteeing node, the guaranteed user, the erection region and the network planning rule, wherein the network situation S comprises all information of the mobile communication network, namely S ═ P, Q, A, B, R and L …, but the main plane is used for describing the planning position of each node, the planned position is occupied by characters, the unplanned position is marked as 0, and the shape is as follows

S2.2.1, marking the initial situation of the network situation s as s₀The planning positions of all the guaranteed user nodes are mainly described, namely the positions of the guaranteed personnel in the erection region model are directly determined according to the actual task requirements of the guaranteed personnel, and the shape is as follows

Wherein the positions of the guaranteed user nodes are represented in the matrix by symbols in the guaranteed user set P.

S2.2.2, the planning of the subsequent safeguard nodes is regarded as a typical Markov process, that is, the deployment situation of each safeguard node can be regarded as a situation for the current network situation s_i-1Action response of a_i(where i ∈ [1, K)]K is the total number of security nodes, in this patent the sum of the primary and secondary nodes), which is the determination of the addressing of a certain security node Q, e.g. the addressing of a certain security node Q

S2.2.3, all guaranteed nodes are located and planned to meet the requirements, or the guaranteed nodes are arranged and marked as the final office after the arrangement, and the network situation is obtained when the final office is finished

In the step, on the basis of the step S1, the planning rule of the mobile communication network is abstracted and mapped, and an overall mobile communication network simulation model is established, so as to provide support for the subsequent deep reinforcement learning planning strategy.

s3.2, correspondingly generating the guaranteed user position, and performing simulated deployment by using a Monte Carlo tree search algorithm based on an upper limit confidence interval algorithm (UCT);

s3.2.1 from an initial situation s₀And starting to initialize the simulation deployment, wherein the state is the root node of the search tree, and initializing each action (s, a) of the search tree based on a certain situation at the moment, wherein E (s, a) is the comprehensive action evaluation of each possible selected position of the guarantee node under the situation.

S3.2.2 in existenceWhen the neural network is introduced, the initial E (s, a) scores under all situations are equal and are set as r₀Continuously searching in a random traversal mode until all the guarantee nodes are under, namely, after the final bureau is reached, judging according to the steps S1 and S2, and calculating each corresponding current situation S according to whether the final bureau result meets the condition_i-1Deployment action a of_iThe action evaluation r of (2) is set as "r", and if satisfied, is counted as "r ═ r₀+ r', if not, r ═ r₀-r', normalized to obtain the shape:

the evaluation set of (1).

S4, model training, based on a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the step real-time planning satisfaction degree to the step S3, and optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT) to obtain an optimized training sample;

s4.1, initializing and using 6 planes of three categories to describe a planning situation, namely three planes of a guaranteed user Q, two planes of a guaranteed user P and one plane of an erection region;

s4.2, the recurrent neural network firstly adopts 4 layers of public full convolution networks, respectively uses Relu functions to construct 32, 64, 128 and 256 filters (filters) of 3 × 3, the tail part is divided into two branches of planning strategies and planning satisfaction degrees, the strategy branches use 4 dimensionality reduction filters of 1 × 1, one full connection layer outputs the selection probability P of each node in a planning space by using a softmax function, the satisfaction degree branches use 2 dimensionality reduction filters of 1 × 1, the full connection layer uses a tank function output range of [0,1] satisfaction degree score C, namely:

f_θ(s)＝(P，C)

and S4.3, returning the planning strategy probability P and the satisfaction degree score C obtained in the S4.2 to the S3.2, refining the expansion process of UCT tree search, and updating the action situation of each time into (S, a) ═ E (S, a), N (S, a) and E (S, a)_v(s，a)，P(s，a))；

S4.3.1, N (s, a) is the number of visits of the next node (child node) selected based on the current situation;

S4.3.2、E_v(s, a) is the average action evaluation,

combined with the output of the neural network to update

S4.4, defining local strategy evaluation E_l(s,a)，E_l(s, a) equals the parallel UCT search horizon constant U_puctThe quotient of (initialization is 3) the product of the recursive neural network output strategy probability P (s, a) and the evolution of the parent node access times N (s, b) and 1+ the access times N (s, a) of a certain child node is obtained, and the specific algorithm is as follows:

s4.5, combining the output of the recurrent neural network, and updating all the UCT search tree processes to search for a situation S_i-1Under the action of_v(s,a)+E_l(s, a) deployment action a to obtain maximum value_iAfter a certain number of times of cyclic training of the search tree and the neural network, the search process of a UCT search tree is as follows:

s4.5.1 initial situation s for the currently secured user₀Selecting the current E_v(s₀,a₁)+E_l(s₀,a₁) Deploying action a with the maximum value and deploying;

s4.5.2, repeat 4.5.1 until a certain situation s_iHas not been evaluated_v+E_lValue, unable to select, at which time the current situation s_iImporting a neural network f_θ(s) intoEvaluating to obtain f_θ(s_i)＝(P_i,C_i)；

S4.5.3, updating the access times N(s) of the current node_i,a_i+1)＝N(s_i,a_i+1)+1；

S4.5.4, use of P_iProceed to the next deployment action a_i+1And 4.5.2, 4.5.3 are repeated until the final end is reached;

s4.5.5, returning the search result of the whole tree, updating the access times of each passed node according to 4.5.3, returning and updating the satisfaction degree scores of all child nodes according to leaf nodes, wherein the satisfaction degree score is 0 and the satisfaction degree score is 1;

s4.5.6, calculate average action rating per node as S4.3.2:

s4.6, according to the whole flow of S4.5, for each situation S_iCombining the time of use and the consideration of effective results, the search tree search process is carried out for 800 times, and finally the actual action set { a) according to the search tree is collected_nDetermining a new addressing strategy M as follows:

wherein tau is a search constant and is responsible for controlling the randomness of the address selection, the larger tau is, the stronger the randomness is, and tau is set to be continuously reduced according to the address selection process because the address selection activity has certain relevance, and finally, tau is stabilized at 0.4.

s5.1, constructing a combined Loss function Loss according to a training target, predicting an error of the satisfaction C and the upper limit confidence interval algorithm search planning satisfaction C' for the minimum neural network, enabling the strategy probability P output by the neural network to be similar to the branch probability pi obtained by the UCT tree search algorithm search as much as possible, and adding a control parameter g | theta | for preventing overfitting to obtain the combined Loss function Loss:

Loss＝(C'-C)²-π^TlogP+g||θ||

s5.2, setting the obtained model to be compared with the previous model after each 50 training batches, and judging the result according to the simulation model rule: wining in accordance with the guarantee rule; all do not accord with the bureau of confluence, keep the former model parameter; when the data are consistent, judging according to the number of used guarantee nodes, and reserving a model with a small number;

and S5.3, continuously training based on the steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network.

Referring to fig. 2, a block diagram of the structure of the present invention is shown, which specifically includes:

resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, which specifically comprises the following steps:

erection region preprocessing unit 101: preprocessing the erection region of the mobile communication network;

the safeguard node preprocessing unit 102: preprocessing guarantee nodes of a mobile communication network;

secured user preprocessing unit 103: preprocessing guaranteed users of the mobile communication network;

planning rule preprocessing module 200: abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing a resource element simulation model of the resource element preprocessing module 100, and establishing an overall simulation model of the mobile communication network planning, which specifically comprises the following steps:

the connection relationship preprocessing unit 201: preprocessing the connection relation of the mobile communication network;

the plan state preprocessing unit 202: preprocessing the planning state of the mobile communication network;

the training sample generation module 300: establishing network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and adopting a search method to run simulation, generating training samples and forming a training sample set for deep reinforcement learning, wherein the method specifically comprises the following steps:

network planning simulation setup unit 301: establishing network planning simulation according to an overall simulation model of the planning rule preprocessing module 200, and randomly generating the position of a guaranteed user during initial training;

the simulation deployment unit 302: performing simulated deployment by using a search algorithm according to the generated guaranteed user position;

the sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;

model training module 400: based on the recurrent neural network, the training sample of the training sample generation module 300 is used to train the whole simulation model of the planning rule preprocessing module 200, the training results of each time are compared and screened, the obtained planning space strategy and step real-time planning satisfaction are fed back to the training sample generation module 300, the search result of the search algorithm is optimized, and the optimized training sample is obtained, which specifically comprises:

planning situation initialization unit 401: initializing and using three major element description planning situations;

filter configuration unit 402: the recurrent neural network adopts a public full convolution network to construct a filter (filter), and the tail part of the filter is divided into two branches of a planning strategy and a planning satisfaction degree;

search process refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;

local policy evaluation definition unit 404: defining local strategy evaluation;

search procedure update unit 405: combining the output of the recurrent neural network, and updating all the search processes into the deployment action for searching the maximum value;

new addressing policy determination unit 406: according to the flow of the search process updating unit 405, the search flow is executed for each situation in combination with the time and effective results, and a new address selection strategy is determined;

the model generation module 500: inputting the obtained optimized training sample into a training network of the model training module 400, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model, which specifically comprises the following steps:

joint loss function construction unit 501: constructing a joint loss function according to the training target;

the result evaluation unit 502: comparing the model after training with the model before training, and judging the result according to the simulation model rule;

the model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;

the network planning module 600: inputting the parameters of an erection region, a guarantee node and a guaranteed user by applying a trained network planning model to obtain the planning parameters of the mobile communication network, and specifically comprising the following steps:

network planning element input section 601: inputting parameters of an erection region, a guarantee node and a guaranteed user;

model operation section 602: calling the trained network planning model for operation;

network planning parameter generation unit 603: the model generates network planning parameters.

Claims

1. a kind of intelligent planning method of mobile communication network based on deep reinforcement learning, is characterized in that, described method comprises the following steps:

S1. Preprocessing of resource elements, abstracting and mapping the erection area, support nodes, and guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;

S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and planning state of the mobile communication network, and integrating the resource element simulation model of step S1 to establish an overall simulation model for the planning of the mobile communication network;

S3, training samples are generated, a network planning simulation is established according to the overall simulation model in step S2, and a search method is used to run the simulation to generate training samples and form a training sample set for deep reinforcement learning;

S4, model training, based on the deep reinforcement learning algorithm, use the training samples of step S3 to train the overall simulation model of step S2, compare and filter each training result, and feed back the obtained planning space strategy and step real-time planning satisfaction degree Go to step S3, optimize the search result of described search algorithm, obtain optimized training sample;

S5, model generation, input the obtained optimized training samples into the training network in step S4, construct a joint loss function according to the training target, and search and train the samples according to the instructions of the joint loss function to generate a mobile communication network network planning model.

2. The method for intelligent planning of a mobile communication network based on deep reinforcement learning according to claim 1, wherein the resource element preprocessing comprises the following steps:

S1.1. Preprocess the area where the mobile communication network network is set up;

S1.2. Preprocess the security nodes of the mobile communication network network;

S1.3. Preprocess the guaranteed users of the mobile communication network network.

3. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the planning rule preprocessing comprises the following steps:

S2.1. Preprocess the connection relationship of the mobile communication network;

S2.2, preprocessing the planning state of the mobile communication network network.

4. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the training sample generation comprises the following steps:

S3.1. According to the overall simulation model of step S2, a network planning simulation is established. During the initial training, the guaranteed user position is randomly generated first;

S3.2. Use the search algorithm to simulate deployment corresponding to the generated guaranteed user location;

S3.3. Use the search method to simulate deployment repeatedly to obtain samples and evaluation sets that meet the conditions.

5. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the model training comprises the following steps:

S4.1. Initialization uses three categories of elements to describe the planning situation;

S4.2. The recurrent neural network adopts a public fully convolutional network to construct a filter, and the tail is divided into two branches of planning strategy and planning satisfaction;

S4.3, the result of step S4.2 is fed back to step 3.2 to refine the search process;

S4.4. Define local strategy evaluation;

S4.5. Combined with the output of the recurrent neural network, the search process is all updated to the deployment action of finding the maximum value;

S4.6. According to the process of step S4.5, the search process is performed for each situation in combination with the time-consuming and effective results, and a new location selection strategy is determined.

6. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the model generation comprises the following steps:

S5.1. Construct a joint loss function according to the training target;

S5.2. Compare the model after training and before training, and judge the results according to the rules of the simulation model;

S5.3. Perform training based on steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network.

7. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1 or 4, wherein the search method is a Monte Carlo tree search method based on an upper confidence interval algorithm (UCT).

8. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the deep reinforcement learning algorithm is a recurrent neural network.

9. An intelligent planning device for a mobile communication network based on deep reinforcement learning, wherein the device comprises:

The resource element preprocessing module 100: abstracts and maps the erection area, support nodes, and guaranteed users of the mobile communication network, and establishes a simulation model of the resource elements of the mobile communication network, which specifically includes:

The erection area preprocessing unit 101: preprocesses the erection area of the mobile communication network;

Guarantee node preprocessing unit 102: preprocessing the guarantee nodes of the mobile communication network;

Guaranteed user preprocessing unit 103: preprocessing the guaranteed users of the mobile communication network network;

The planning rule preprocessing module 200: abstracts and maps the guarantee relationship and planning state of the mobile communication network, integrates the resource element simulation model of the resource element preprocessing module 100, and establishes an overall simulation model for the planning of the mobile communication network, specifically including:

Connection relationship preprocessing unit 201: preprocessing the connection relationship of the mobile communication network;

Planning state preprocessing unit 202: preprocessing the planning state of the mobile communication network;

Training sample generation module 300: establishes a network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and uses a search method to run the simulation to generate training samples and form a training sample set for deep reinforcement learning, specifically including:

Network planning simulation establishing unit 301 : preprocessing the overall simulation model of the module 200 according to the planning rules, establishes a network planning simulation, and randomly generates the guaranteed user position during initial training;

Simulated deployment unit 302: corresponding to the generated guaranteed user location, use a search algorithm to perform simulated deployment;

Sample and evaluation set generation unit 303: use the search method to repeatedly simulate deployment to obtain samples and evaluation sets that meet the conditions;

Model training module 400: Based on the recurrent neural network, use the training samples of the training sample generation module 300 to train the overall simulation model of the planning rule preprocessing module 200, compare and screen each training result, and compare the obtained planning space strategy and Step: Feedback the real-time planning satisfaction to the training sample generation module 300, optimize the search results of the search algorithm, and obtain optimized training samples, specifically including:

Planning situation initialization unit 401: Initialize the planning situation using three categories of elements;

Filter construction unit 402: the recurrent neural network adopts a public fully convolutional network to construct a filter, and the tail is divided into two branches of planning strategy and planning satisfaction;

Search process refinement unit 403: Feed back the result of the filter construction unit 402 to the simulation deployment unit 302 to refine the search process;

Local strategy evaluation definition unit 404: define a local strategy evaluation;

Search process update unit 405: Combined with the output of the recurrent neural network, the search process is all updated to the deployment action of finding the maximum value;

New site selection strategy determination unit 406: According to the process flow of the search process update unit 405, for each situation in combination with the time and effective results, the search process is performed to determine a new site selection strategy;

The model generation module 500: input the obtained optimized training samples into the training network of the model training module 400, construct a joint loss function according to the training target, and search and train the samples according to the instructions of the joint loss function to generate a network planning model of the mobile communication network, Specifically include:

Joint loss function construction unit 501: Construct a joint loss function according to the training target;

Result judging unit 502: compares the model after training and before training, and judges the results according to the simulation model rules;

Model generation unit 503: perform training based on the planning situation initialization unit 401 and the filter construction unit 402 to obtain a network planning model of the mobile communication network;

Network planning module 600: Apply the trained network planning model, input the parameters of the erection area, guaranteed nodes, and guaranteed users, and obtain the planning parameters of the mobile communication network, specifically including:

Network planning element input unit 601: input the parameters of the erection area, guaranteed nodes, and guaranteed users;

Model operation unit 602: call the trained network planning model to perform operation;

Network planning parameter generating unit 603: The model generates network planning parameters.