CN111104732B

CN111104732B - Intelligent planning method for mobile communication network based on deep reinforcement learning

Info

Publication number: CN111104732B
Application number: CN201911219452.8A
Authority: CN
Inventors: 杨若鹏; 聂宗哲; 殷昌盛; 江尚; 朱巍; 邹小飞; 张其增
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2022-09-13
Anticipated expiration: 2039-12-03
Also published as: CN111104732A

Abstract

The invention discloses a method and a device for intelligent planning of a mobile communication network based on deep reinforcement learning, wherein the method comprises the following steps: 1. the resource element pretreatment, namely pretreating resource elements such as a guarantee node, a guaranteed user, an erection region and the like of the mobile communication network; 2. the planning rule preprocessing, namely preprocessing the planning rule of the mobile communication network; 3. generating a training sample, namely performing random Monte Carlo type search and calculation on the preprocessing result to generate the training sample; 4. model training, based on recurrent neural network, training the network planning model by using training samples; 5. and generating a model, constructing a joint loss function, repeatedly searching and training the sample according to the joint loss function indication, and generating a mobile communication network planning model. The method and the device for the intelligent planning of the mobile communication network based on the deep reinforcement learning effectively solve the problems that the current mobile communication network planning depends on manual operation in a large quantity, the planning time exceeds the task requirement, the adaptability to sudden tasks and strange environments is poor, the resource utilization rate is low and the like, and improve the overall efficiency of the mobile communication network planning.

Description

Intelligent planning method for mobile communication network based on deep reinforcement learning

Technical Field

The invention relates to the technical field of information, in particular to an intelligent planning method for a motor-driven communication network.

Background

The mobile communication network generally refers to a mobile communication network used in the special field for guaranteeing large-scale special tasks, and is a comprehensive mobile network generally composed of multiple sub-networks and multiple devices such as a fixed optical fiber network, a microwave network, a satellite network, an ascending relay network, a short-wave and ultrashort-wave radio station network and the like, wherein the minimum unit of the comprehensive mobile network is a single communication guarantee platform or device and is regarded as a guarantee node in the mobile communication network. The number of guaranteed people of the mobile communication network is hundreds and above, the random installation requirement is strong, the time is short, and the planning time is within 24 hours or shorter.

Network planning means that a network planning person or a technical support person fully utilizes the existing system equipment to balance the actual requirements and contradiction barriers, and planning and organizing work aiming at the network erection of the mobile communication network is performed for ensuring the completion of the current task. The patent mainly refers to the field of location selection of each network system device of a mobile communication network and the design of a network architecture for supporting the erection and deployment of devices according to task personnel, groups and various devices for task supply, connection relation and geographical environment of the mobile communication network support guarantee.

Because the mobile communication network is generally used for guaranteeing various sudden tasks at times and places which are difficult to predict, the mobile communication network has the characteristics of large difference of network planning requirements, complex content, limited equipment conditions, urgent time requirements and the like. At present, mobile communication network planning usually uses a large amount of manual work combined with a fixed algorithm system for planning, wherein the manual planning needs a professional planning staff to accumulate a large amount of experience in the working process to be able to be competent for the work, and needs more staff, and has the disadvantages of long planning time, frequent data interaction and the like; the fixed algorithm system for the network planning of the mobile communication network can assist planning personnel in planning to a certain extent, but the planning system using the fixed algorithm cannot be flexibly applied to all concrete scenes which can possibly expand the mobile communication network, cannot deal with different geographic environments, equipment limitations and other conditions without bottom layer modification, and because the system is mainly designed on the network design, when the network scale is increased, a more intuitive and accurate planning result cannot be obtained after constraint conditions are increased, only limited auxiliary support can be provided for the planning personnel, and the task guarantee effect is influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and aims to solve the practical problems of complex network planning conditions, time urgency, uncertain places, limited equipment conditions and the like of the mobile communication network, so that the intelligent mobile communication network planning method based on deep reinforcement learning is realized.

In order to achieve the purpose, the invention adopts the following technical scheme:

an intelligent planning method for a mobile communication network based on deep reinforcement learning comprises the following steps:

s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;

s1.1, preprocessing an erection region of a mobile communication network;

s1.2, preprocessing a guarantee node of a mobile communication network;

s1.3, preprocessing the guaranteed user of the mobile communication network.

S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;

s2.1, preprocessing the connection relation of the mobile communication network;

s2.2, preprocessing the planning state of the mobile communication network.

S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;

s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;

s3.2, correspondingly generating the positions of the guaranteed users, and performing simulated deployment by using a search algorithm;

and S3.3, repeatedly simulating deployment by using a searching method to obtain a sample and an evaluation set meeting the conditions.

S4, model training, based on a deep reinforcement learning algorithm such as a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step S3, optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT), and obtaining an optimized training sample;

s4.1, initializing and using three types of elements to describe a planning situation;

s4.2, constructing a filter (filter) by adopting a public full convolution network through the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;

s4.3, feeding back the result of the step S4.2 to the step 3.2, and refining the searching process;

s4.4, defining local strategy evaluation;

s4.5, combining the output of the recurrent neural network, and updating all the searching processes into the deployment actions for searching the maximum value;

and S4.6, according to the process of the step S4.5, combining the time and effective results for each situation, executing a search process and determining a new address selection strategy.

S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;

s5.1, constructing a joint loss function according to the training target;

s5.2, comparing the model after training with the model before training, and judging the result according to the simulation model rule;

and S5.3, training based on the steps S4.1 and S4.2 to obtain a mobile communication network planning model.

The invention adopts the intelligent planning method of the mobile communication network based on the deep reinforcement learning, and has the advantages that:

1. the Monte Carlo tree searching method based on the upper limit confidence interval algorithm (UCT) is adopted, and the recursive neural network which is simple in structure but practical and effective is combined, so that the computational power requirement and processing time of hardware are greatly reduced, and the problem of network planning of the maneuvering network can be solved quickly;

2. by adopting the deep reinforcement learning algorithm to train the intelligent planning model, the planning model overcomes the defect of single applicable scene and can adapt to the scenes of different regions, different security equipment and different secured users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a detailed flow diagram of an embodiment of the intelligent planning method for a mobile communication network based on deep reinforcement learning according to the present invention;

fig. 2 is a block diagram of the composition structure of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, and is not intended to limit the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Referring to the attached fig. 1, a schematic flow chart of an embodiment of the intelligent planning method for the mobile communication network based on deep reinforcement learning of the present invention is shown, which specifically includes the following steps:

s1.1, preprocessing the erection region of the mobile communication network, and abstracting the erection region by analogy to a chessboard. Setting the region size N ² km ² To do so byThe left lower corner coordinate of the erection region topographic map is a zero coordinate, a certain divisor of N is a unit length, the erection region is divided transversely and longitudinally, each intersection point is used as a positioning point, and a node position matrix is obtained;

s1.2, preprocessing communication platforms/devices of the mobile communication network, namely guarantee nodes (such as mobile communication vehicles, mobile radio stations, mobile stations and the like), presetting P-type guarantee nodes, wherein the communication distance R and the link quantity L of the guarantee nodes are determined according to the specific models of the devices. In this patent, the security nodes are mainly divided into two categories, i.e. main nodes P ₁ And a secondary node P ₂ And modeling is sequentially carried out according to the guarantee priority B, and the priority of the main node is set as B ₁ The secondary node has a priority of B ₂ Wherein the communication guarantee range of the main node is based on a single-hop microwave communication distance R by taking the deployment position of the node as the center of a circle ₁ km is a circle of radius, and the number of links is set to L ₁ The communication guarantee range of the secondary node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle ₁ km, single-hop shortwave communication distance R ₂ km is a circle of radius, and the number of microwave links is set to L ₂ The number of short-wave links is set to L' ₂ ；

S1.3, pre-processing guaranteed users (such as military unit group classes, grade classes, continuous classes, individual soldiers and the like of different levels) of the mobile communication network, presetting Q-type guarantee nodes, and determining the communication distance R and the link quantity L of the guaranteed user nodes according to the specific models of the equipment. The guaranteed users in the patent are mainly divided into three categories, namely main user Q ₁ Secondary user Q ₂ And subordinate user Q ₃ Sequentially modeling according to the guarantee priority A, and setting the priority of the main user as A ₁ The secondary user is A ₂ The subordinate user is A ₃ Wherein the single-hop microwave communication distance of the main user is R ₁ Km, number of links U ₁ (ii) a The single-hop microwave communication distance of the secondary user is R ₁ km, number of links U ₂ The single-hop short-wave communication distance of the subordinate user is R ₂ Km, number of links is U' ₃ 。

The resource elements of the mobile communication network are abstracted and mapped, and support is provided for subsequent completion of rules and integral modeling of the mobile communication network.

S2, planning rule preprocessing, and preprocessing the planning rule of the mobile communication network. Abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;

s2.1, preprocessing the connection relation of the mobile communication network, and associating a guarantee node with a guarantee node and a guaranteed user;

s2.1.1, associating the safeguard node with the ensured user according to the priority association A → B, and determining the safeguard relationship. In this patent, A ₁ And B ₁ Corresponding to (A) ₂ And B ₂ 、B ₃ Corresponding, i.e. primary node P ₁ Securing primary user Q ₁ Secondary node P ₂ Securing secondary user Q ₂ And subordinate user Q ₃ Each user needs to have at least one corresponding security node connected with the user;

s2.1.2, determining the connection relation between the guarantee nodes, in this patent, all the primary nodes need to form a connected graph, and the secondary nodes P ₂ Must communicate with at least one primary node P ₁ Connecting;

s2.1.3, all connections need to satisfy the communication type specified in step S1, i.e. links of the same communication type can be connected;

s2.1.4, all connections need to satisfy the number of links specified in step S1, that is, the number of connections cannot exceed the specified number L of node links to be connected;

s2.1.5, all connections need to satisfy the communication distance specified in step S1, i.e. the distance between any two nodes must be less than the maximum communication distance R of the communication device used to be connectable;

s2.1.6, the minimum spanning tree is formed by the minimum requirement of the topological structure of the whole mobile network communication network;

s2.2, preprocessing the planning state of the mobile communication network, establishing a network situation S according to the guaranteeing node, the guaranteed user, the erection region and the network planning rule, wherein the network situation S comprises all information of the mobile communication network, namely S ═ P, Q, A, B, R and L …, but the main plane is used for describing the planning position of each node, the planned position is occupied by characters, the unplanned position is marked as 0, and the shape is as follows

S2.2.1, marking the initial situation of the network situation s as s ₀ The planning positions of all the guaranteed user nodes are mainly described, namely the positions of the guaranteed personnel in the erection regional model are directly determined according to the actual task requirements of the guaranteed personnel, and the shape is as follows

Wherein the positions of the guaranteed user nodes are represented in the matrix by symbols in the guaranteed user set P.

S2.2.2, the planning of the subsequent safeguard nodes is regarded as a typical Markov process, that is, the deployment situation of each safeguard node can be regarded as a situation for the current network situation s _i-1 Action response of (a) _i (where i ∈ [1, K)]K is the total number of security nodes, in this patent the sum of the primary and secondary nodes), which is the determination of the addressing of a certain security node Q, e.g. the addressing of a certain security node Q

S2.2.3, all guaranteed nodes are located and planned to meet the requirements, or the guaranteed nodes are arranged and marked as the final office after the arrangement, and the network situation is obtained when the final office is finished

In the step, on the basis of the step S1, the planning rule of the mobile communication network is abstracted and mapped, and an overall mobile communication network simulation model is established, so as to provide support for the subsequent deep reinforcement learning planning strategy.

S3, generating training samples, establishing network planning simulation according to the overall simulation model of the step S2, and adopting a Monte Carlo tree searching method based on an upper limit confidence interval algorithm (UCT) to run simulation, generating the training samples and forming a training sample set for deep reinforcement learning;

s3.1, establishing network planning simulation according to the overall simulation model of the step S2, and during initial training, randomly generating the positions of the guaranteed users;

s3.2, correspondingly generating the guaranteed user position, and performing simulated deployment by using a Monte Carlo tree search algorithm based on an upper limit confidence interval algorithm (UCT);

s3.2.1 from an initial situation s ₀ And starting initialization simulation deployment, wherein the state is the root node of the search tree, and initializing each action (s, a) of the search tree based on a certain situation, wherein E (s, a) is comprehensive action evaluation of each possible selection position of the guarantee node in the situation.

S3.2.2 when no neural network is introduced, the initial E (s, a) scores are equal in all situations and are set as r ₀ Continuously searching in a random traversal mode until all the guarantee nodes are under, namely, after the final bureau is reached, judging according to the steps S1 and S2, and calculating each corresponding current situation S according to whether the final bureau result meets the condition _i-1 Deployment action a of _i The action evaluation r of (2) is set as "r", and if satisfied, is counted as "r ═ r ₀ + r', if not, r ═ r ₀ -r', normalized to obtain the shape:

the evaluation set of (1).

And S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.

S4, model training, based on a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the step real-time planning satisfaction degree to the step S3, optimizing the search result of a Monte Carlo tree search algorithm based on an upper limit confidence interval algorithm (UCT), and obtaining an optimized training sample;

s4.1, initializing and using 6 planes of three categories to describe a planning situation, namely three planes of a guaranteed user Q, two planes of a guaranteed user P and one plane of an erection region;

s4.2, the recurrent neural network firstly adopts 4 layers of public full convolution networks, respectively uses Relu functions to construct 32, 64, 128 and 256 filters (filters) of 3 × 3, the tail part is divided into two branches of planning strategies and planning satisfaction degrees, the strategy branches use 4 dimensionality reduction filters of 1 × 1, one full connection layer outputs the selection probability P of each node in a planning space by using a softmax function, the satisfaction degree branches use 2 dimensionality reduction filters of 1 × 1, the full connection layer uses a tank function output range of [0,1] satisfaction degree score C, namely:

f _θ (s)＝(P，C)

and S4.3, returning the planning strategy probability P and the satisfaction degree score C obtained in the S4.2 to the S3.2, refining the expansion process of UCT tree search, and updating the action situation of each time into (S, a) ═ E (S, a), N (S, a) and E (S, a) _v (s，a)，P(s，a))；

S4.3.1, N (s, a) is the number of visits of the next node (child node) selected based on the current situation;

S4.3.2、E _v (s, a) is the average action evaluation,

is updated after being combined with the output of the neural network into

S4.4, defining local strategy evaluation E _l (s,a)，E _l (s, a) equals the parallel UCT search horizon constant U _puct The quotient of (initialization is 3) the product of the recursive neural network output strategy probability P (s, a) and the evolution of the parent node access times N (s, b) and 1+ the access times N (s, a) of a certain child node is obtained, and the specific algorithm is as follows:

s4.5, after combining the output of the recurrent neural network, the UCT search tree flow is completely updated to search a situation S _i-1 Under the action of _v (s,a)+E _l (s, a) deployment action a to obtain maximum value _i After a certain number of times of cyclic training of the search tree and the neural network, the search process of a UCT search tree is as follows:

s4.5.1 initial situation s for the currently secured user ₀ Selecting the current E _v (s ₀ ,a ₁ )+E _l (s ₀ ,a ₁ ) Deploying action a with the maximum value and deploying;

s4.5.2, repeating 4.5.1 till a certain situation s _i Has not been evaluated _v +E _l Value, unable to select, at which time the current situation s _i Importing a neural network f _θ (s) evaluation to give f _θ (s _i )＝(P _i ,C _i )；

S4.5.3, updating the access times N(s) of the current node _i ,a _i+1 )＝N(s _i ,a _i+1 )+1；

S4.5.4, use of P _i Proceed to the next deployment action a _i+1 And 4.5.2, 4.5.3 are repeated until the final end is reached;

s4.5.5, returning the search result of the whole tree, updating the access times of each passed node according to 4.5.3, returning and updating the satisfaction degree scores of all child nodes according to leaf nodes, wherein the satisfaction degree score is 0 and the satisfaction degree score is 1;

s4.5.6, calculate average action rating per node according to S4.3.2:

s4.6, according to the whole flow of S4.5, for each situation S _i Combining the time of use and the consideration of effective results, the search tree search process is carried out for 800 times, and finally the actual action set { a) according to the search tree is collected _n The new addressing strategy M is determined as follows:

wherein tau is a search constant and is responsible for controlling the randomness of the address selection, the larger tau is, the stronger the randomness is, and tau is set to be continuously reduced according to the address selection process because the address selection activity has certain relevance, and finally, tau is stabilized at 0.4.

s5.1, constructing a combined Loss function Loss according to a training target, predicting an error of the satisfaction C and the upper limit confidence interval algorithm search planning satisfaction C' for the minimum neural network, enabling the strategy probability P output by the neural network to be similar to the branch probability pi obtained by the UCT tree search algorithm search as much as possible, and adding a control parameter g | theta | for preventing overfitting to obtain the combined Loss function Loss:

Loss＝(C'-C) ² -π ^T logP+g||θ||

s5.2, setting the obtained model to be compared with the previous model after each 50 training batches, and judging the result according to the simulation model rule: wining in accordance with the guarantee rule; all do not accord with the bureau of confluence, keep the former model parameter; when the data are consistent, judging according to the number of used guarantee nodes, and reserving a model with a small number;

and S5.3, continuously training based on the steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network.

Referring to fig. 2, a block diagram of the structure of the present invention is shown, which specifically includes:

the resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, wherein the simulation model specifically comprises the following steps:

erection region preprocessing unit 101: preprocessing the erection region of the mobile communication network;

the safeguard node preprocessing unit 102: preprocessing guarantee nodes of a mobile communication network;

secured user preprocessing unit 103: preprocessing guaranteed users of the mobile communication network;

planning rule preprocessing module 200: abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing a resource element simulation model of the resource element preprocessing module 100, and establishing an overall simulation model of the mobile communication network planning, which specifically comprises the following steps:

the connection relationship preprocessing unit 201: preprocessing the connection relation of the mobile communication network;

the plan state preprocessing unit 202: preprocessing the planning state of the mobile communication network;

the training sample generation module 300: establishing network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and adopting a search method to run simulation, generating training samples and forming a training sample set for deep reinforcement learning, wherein the method specifically comprises the following steps:

network planning simulation setup unit 301: establishing network planning simulation according to an overall simulation model of the planning rule preprocessing module 200, and randomly generating the position of a guaranteed user during initial training;

the simulation deployment unit 302: performing simulated deployment by using a search algorithm according to the generated guaranteed user position;

the sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;

model training module 400: based on the recurrent neural network, the training sample of the training sample generation module 300 is used to train the whole simulation model of the planning rule preprocessing module 200, the training results of each time are compared and screened, the obtained planning space strategy and step real-time planning satisfaction are fed back to the training sample generation module 300, the search result of the search algorithm is optimized, and the optimized training sample is obtained, which specifically comprises:

planning situation initialization unit 401: initializing and using three major element description planning situations;

filter configuration unit 402: the recurrent neural network adopts a public full convolution network to construct a filter (filter), and the tail part of the filter is divided into two branches of a planning strategy and a planning satisfaction degree;

search process refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;

local policy evaluation definition unit 404: defining local strategy evaluation;

search procedure update unit 405: combining the output of the recurrent neural network, and updating all the search processes into the deployment action for searching the maximum value;

new addressing policy determination unit 406: according to the flow of the search process updating unit 405, the search flow is executed for each situation in combination with the time and effective results, and a new address selection strategy is determined;

the model generation module 500: inputting the obtained optimized training sample into a training network of the model training module 400, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model, which specifically comprises the following steps:

joint loss function construction unit 501: constructing a joint loss function according to the training target;

result evaluation unit 502: comparing the model after training with the model before training, and judging the result according to the simulation model rule;

model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;

the network planning module 600: inputting the parameters of an erection region, a guarantee node and a guaranteed user by applying a trained network planning model to obtain the planning parameters of the mobile communication network, and specifically comprising the following steps:

network planning element input section 601: inputting erection region, guarantee node and guaranteed user parameters;

model operation section 602: calling the trained network planning model for operation;

network planning parameter generation unit 603: the model generates network planning parameters.

Claims

1. An intelligent planning method for a mobile communication network based on deep reinforcement learning is characterized by comprising the following steps:

s1, preprocessing resource elements, abstracting and mapping the construction region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;

s2, preprocessing planning rules, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;

s3, generating training samples, establishing network planning simulation according to the integral simulation model in the step S2, and operating simulation by adopting a search method to generate the training samples and form a training sample set for deep reinforcement learning;

s4, model training, based on a deep reinforcement learning algorithm, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step to the step S3, and optimizing the search result of the search algorithm to obtain an optimized training sample;

and S5, generating a model, inputting the obtained optimized training sample into the training network of the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating the mobile communication network planning model.

2. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the resource element preprocessing comprises the following steps:

s1.1, preprocessing an erection region of a mobile communication network;

s1.2, preprocessing guarantee nodes of the mobile communication network;

s1.3, preprocessing the guaranteed user of the mobile communication network.

3. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the planning rule preprocessing comprises the following steps:

s2.2, preprocessing the planning state of the mobile communication network.

4. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the training sample generation comprises the following steps:

5. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the model training comprises the following steps:

s4.2, constructing a filter (filter) by adopting a public full convolution network in the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;

s4.4, defining local strategy evaluation;

6. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the model generation comprises the following steps:

s5.1, constructing a joint loss function according to the training target;

7. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1 or 4, wherein the searching method is a Monte Carlo tree searching method based on upper confidence interval algorithm (UCT).

8. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the deep reinforcement learning algorithm is a recurrent neural network.

9. An intelligent planning device for a mobile communication network based on deep reinforcement learning, which is characterized by comprising:

resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, which specifically comprises the following steps:

sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;

search procedure refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;

the model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;

network planning element input section 601: inputting parameters of an erection region, a guarantee node and a guaranteed user;