CN111104732A - Intelligent planning method for mobile communication network based on deep reinforcement learning - Google Patents

Intelligent planning method for mobile communication network based on deep reinforcement learning Download PDF

Info

Publication number
CN111104732A
CN111104732A CN201911219452.8A CN201911219452A CN111104732A CN 111104732 A CN111104732 A CN 111104732A CN 201911219452 A CN201911219452 A CN 201911219452A CN 111104732 A CN111104732 A CN 111104732A
Authority
CN
China
Prior art keywords
planning
mobile communication
communication network
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911219452.8A
Other languages
Chinese (zh)
Other versions
CN111104732B (en
Inventor
杨若鹏
聂宗哲
殷昌盛
江尚
朱巍
邹小飞
张其增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201911219452.8A priority Critical patent/CN111104732B/en
Publication of CN111104732A publication Critical patent/CN111104732A/en
Application granted granted Critical
Publication of CN111104732B publication Critical patent/CN111104732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a method and a device for intelligent planning of a mobile communication network based on deep reinforcement learning, wherein the method comprises the following steps: 1. the resource element preprocessing, namely preprocessing resource elements such as a guarantee node, a guaranteed user, an erection region and the like of the mobile communication network; 2. the planning rule preprocessing, namely preprocessing the planning rule of the mobile communication network; 3. generating a training sample, namely performing random Monte Carlo search calculation on the preprocessing result to generate the training sample; 4. model training, based on the recurrent neural network, training a network planning model by using a training sample; 5. and generating a model, constructing a joint loss function, repeatedly searching and training the sample according to the joint loss function indication, and generating a mobile communication network planning model. The intelligent planning method and device for the mobile communication network based on deep reinforcement learning effectively solve the problems that the current mobile communication network planning depends on manual operation in a large quantity, the planning time exceeds the task requirement, the adaptability to sudden tasks and strange environments is poor, the resource utilization rate is low and the like, and improve the overall efficiency of the mobile communication network planning.

Description

Intelligent planning method for mobile communication network based on deep reinforcement learning
Technical Field
The invention relates to the technical field of information, in particular to an intelligent planning method for a motor-driven communication network.
Background
The mobile communication network generally refers to a mobile communication network used in the special field for guaranteeing large-scale special tasks, and is a comprehensive mobile network generally composed of multiple sub-networks and multiple devices such as a fixed optical fiber network, a microwave network, a satellite network, an ascending relay network, a short-wave and ultrashort-wave radio station network and the like, wherein the minimum unit of the comprehensive mobile network is a single communication guarantee platform or device and is regarded as a guarantee node in the mobile communication network. The number of guaranteed people of the mobile communication network is hundreds and above, the random installation requirement is high, the time is short, and the planning time is within 24 hours or shorter.
Network planning means that a network planning person or a technical support person fully utilizes the existing system equipment to balance the actual requirements and contradiction barriers, and planning and organizing work aiming at the network erection of the mobile communication network is performed for ensuring the completion of the current task. The method mainly refers to the steps of selecting sites for each network system device of the mobile communication network and designing a network architecture according to various devices, connection relations and geographic environments of task personnel, groups and task supply supported and guaranteed by the mobile communication network so as to support erection and deployment of the devices.
Because the mobile communication network is generally used for guaranteeing various sudden tasks at times and places which are difficult to predict, the mobile communication network has the characteristics of large difference of network planning requirements, complex content, limited equipment conditions, urgent time requirements and the like. Currently, mobile communication network planning usually uses a large amount of manual work combined with a fixed algorithm system for planning, wherein the manual planning needs a professional planning staff to accumulate a large amount of experience in the working process to be possibly qualified for the work, and needs more staff, and has the defects of long planning time, frequent data interaction and the like; the fixed algorithm system for the network planning of the mobile communication network can assist planning personnel in planning to a certain extent, but the planning system using the fixed algorithm cannot be flexibly applied to all concrete scenes which can possibly expand the mobile communication network, cannot deal with different geographic environments, equipment limitations and other conditions without bottom layer modification, and because the system is mainly designed on the network design, when the network scale is increased, a more intuitive and accurate planning result cannot be obtained after constraint conditions are increased, only limited auxiliary support can be provided for the planning personnel, and the task guarantee effect is influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and aims to solve the practical problems of complex network planning conditions, time emergency, uncertain places, limited equipment conditions and the like of the mobile communication network, so that the intelligent planning method of the mobile communication network based on deep reinforcement learning is realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intelligent planning method for a mobile communication network based on deep reinforcement learning comprises the following steps:
s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;
s1.1, preprocessing an erection region of a mobile communication network;
s1.2, preprocessing guarantee nodes of the mobile communication network;
s1.3, preprocessing the guaranteed user of the mobile communication network.
S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;
s2.1, preprocessing the connection relation of the mobile communication network;
s2.2, preprocessing the planning state of the mobile communication network.
S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;
s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;
s3.2, correspondingly generating the positions of the guaranteed users, and performing simulated deployment by using a search algorithm;
and S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.
S4, model training, based on a deep reinforcement learning algorithm such as a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step S3, optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT), and obtaining an optimized training sample;
s4.1, initializing and using three types of elements to describe a planning situation;
s4.2, constructing a filter (filter) by adopting a public full convolution network through the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;
s4.3, feeding back the result of the step S4.2 to the step 3.2, and refining the searching process;
s4.4, defining local strategy evaluation;
s4.5, combining the output of the recurrent neural network, and updating all the searching processes into the deployment actions for searching the maximum value;
and S4.6, according to the process of the step S4.5, combining the time and effective results for each situation, executing a search process and determining a new address selection strategy.
S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;
s5.1, constructing a joint loss function according to the training target;
s5.2, comparing the model after training with the model before training, and judging the result according to the simulation model rule;
and S5.3, training based on the steps S4.1 and S4.2 to obtain a mobile communication network planning model.
The invention adopts the intelligent planning method of the mobile communication network based on the deep reinforcement learning, and has the advantages that:
1. the Monte Carlo tree searching method based on the upper limit confidence interval algorithm (UCT) is adopted, and the recursive neural network which is simple in structure but practical and effective is combined, so that the computational power requirement and the processing time of hardware are greatly reduced, and the problem of network planning of the maneuvering network can be solved quickly;
2. by adopting the deep reinforcement learning algorithm to train the intelligent planning model, the planning model overcomes the defect of single applicable scene and can adapt to the scenes of different regions, different security equipment and different secured users.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a detailed flow diagram of an embodiment of the intelligent planning method for a mobile communication network based on deep reinforcement learning according to the present invention;
fig. 2 is a block diagram of the composition structure of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the attached fig. 1, a schematic flow chart of an embodiment of the intelligent planning method for the mobile communication network based on deep reinforcement learning of the present invention is shown, which specifically includes the following steps:
s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;
s1.1, preprocessing the erection region of the mobile communication network, and abstracting the erection region by analogy to a chessboard. Setting the region size N2km2The method comprises the following steps of taking a left lower corner coordinate of a topographic map of an erection region as a zero point coordinate, taking a certain divisor of N as a unit length, transversely and longitudinally dividing the erection region, taking each intersection point as a positioning point to obtain a node position matrix, presetting the erection region as a square region with the same length and width in the patent to obtain an NxN node position matrix, and continuously performing multiple expansion subdivision;
s1.2, preprocessing communication platforms/devices of the mobile communication network, namely guarantee nodes (such as mobile communication vehicles, mobile radio stations, mobile stations and the like), presetting P-type guarantee nodes, wherein the communication distance R and the link quantity L of the guarantee nodes are determined according to the specific models of the devices. In this patent, the security nodes are mainly divided into two categories, i.e. main nodes P1And a secondary node P2Sequentially modeling according to the guarantee priority B, and setting the priority of the main node as B1The secondary node has a priority of B2Wherein the communication guarantee range of the main node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle1km is a circle of radius, and the number of links is set to L1The communication guarantee range of the secondary node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle1km, single-hop shortwave communication distance R2km is a circle of radius, and the number of microwave links is set to L2The number of short-wave links is set to L'2
S1.3, pre-processing guaranteed users (such as military unit group classes, grade classes, continuous classes, individual soldiers and the like of different levels) of the mobile communication network, presetting Q-type guarantee nodes, and determining the communication distance R and the link quantity L of the guaranteed user nodes according to the specific models of the equipment. The guaranteed users in the patent are mainly divided into three categories, namely main user Q1Secondary user Q2And subordinate user Q3Sequentially modeling according to the guarantee priority A, and setting the priority of the main user as A1The secondary user is A2The subordinate user is A3Wherein the single-hop microwave communication distance of the main user is R1km, number of links U1(ii) a The secondary user single-hop microwave communication distance is R1km, number of links U2The single-hop short-wave communication distance of the subordinate user is R2Km, number of links is U'3
The resource elements of the mobile communication network are abstracted and mapped, and support is provided for subsequent completion of rules and integral modeling of the mobile communication network.
S2, planning rule preprocessing, and preprocessing the planning rule of the mobile communication network. Abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;
s2.1, preprocessing the connection relation of the mobile communication network, and associating a guarantee node with a guarantee node and a guaranteed user;
s2.1.1, associating the guarantee node with the guaranteed user according to the priority association A → B, and determining the guarantee relationship. In this patent, A1And B1Corresponding to (A)2And B2、B3Corresponding, i.e. primary node P1Securing primary user Q1The secondary node P2Securing secondary user Q2And subordinate user Q3Each user needs to have at least one corresponding security node connected with the user;
s2.1.2, determining the connection relationship between the guarantee nodes, in this patent, all the primary nodes need to form a connected graph, and the secondary nodes P2Must communicate with at least one primary node P1Connecting;
s2.1.3, all connections need to satisfy the communication type specified in step S1, i.e., links of the same communication type can be connected;
s2.1.4, all connections need to satisfy the number of links specified in step S1, that is, the number of connections cannot exceed the specified number L of node links to be connected;
s2.1.5, all connections need to satisfy the communication distance specified in step S1, i.e. the distance between any two nodes must be less than the maximum communication distance R of the communication device used to be connectable;
s2.1.6, the minimum spanning tree is formed by the minimum requirement of the topological structure of the whole mobile network communication network;
s2.2, preprocessing the planning state of the mobile communication network, establishing a network situation S according to the guaranteeing node, the guaranteed user, the erection region and the network planning rule, wherein the network situation S comprises all information of the mobile communication network, namely S ═ P, Q, A, B, R and L …, but the main plane is used for describing the planning position of each node, the planned position is occupied by characters, the unplanned position is marked as 0, and the shape is as follows
Figure BDA0002300386110000051
S2.2.1, marking the initial situation of the network situation s as s0The planning positions of all the guaranteed user nodes are mainly described, namely the positions of the guaranteed personnel in the erection region model are directly determined according to the actual task requirements of the guaranteed personnel, and the shape is as follows
Figure BDA0002300386110000052
Wherein the positions of the guaranteed user nodes are represented in the matrix by symbols in the guaranteed user set P.
S2.2.2, the planning of the subsequent safeguard nodes is regarded as a typical Markov process, that is, the deployment situation of each safeguard node can be regarded as a situation for the current network situation si-1Action response of ai(where i ∈ [1, K)]K is the total number of security nodes, in this patent the sum of the primary and secondary nodes), which is the determination of the addressing of a certain security node Q, e.g. the addressing of a certain security node Q
Figure BDA0002300386110000061
S2.2.3, all guaranteed nodes are located and planned to meet the requirements, or the guaranteed nodes are arranged and marked as the final office after the arrangement, and the network situation is obtained when the final office is finished
Figure BDA0002300386110000062
In the step, on the basis of the step S1, the planning rule of the mobile communication network is abstracted and mapped, and an overall mobile communication network simulation model is established, so as to provide support for the subsequent deep reinforcement learning planning strategy.
S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;
s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;
s3.2, correspondingly generating the guaranteed user position, and performing simulated deployment by using a Monte Carlo tree search algorithm based on an upper limit confidence interval algorithm (UCT);
s3.2.1 from an initial situation s0And starting to initialize the simulation deployment, wherein the state is the root node of the search tree, and initializing each action (s, a) of the search tree based on a certain situation at the moment, wherein E (s, a) is the comprehensive action evaluation of each possible selected position of the guarantee node under the situation.
S3.2.2 in existenceWhen the neural network is introduced, the initial E (s, a) scores under all situations are equal and are set as r0Continuously searching in a random traversal mode until all the guarantee nodes are under, namely, after the final bureau is reached, judging according to the steps S1 and S2, and calculating each corresponding current situation S according to whether the final bureau result meets the conditioni-1Deployment action a ofiThe action evaluation r of (2) is set as "r", and if satisfied, is counted as "r ═ r0+ r', if not, r ═ r0-r', normalized to obtain the shape:
Figure BDA0002300386110000063
the evaluation set of (1).
And S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.
S4, model training, based on a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the step real-time planning satisfaction degree to the step S3, and optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT) to obtain an optimized training sample;
s4.1, initializing and using 6 planes of three categories to describe a planning situation, namely three planes of a guaranteed user Q, two planes of a guaranteed user P and one plane of an erection region;
s4.2, the recurrent neural network firstly adopts 4 layers of public full convolution networks, respectively uses Relu functions to construct 32, 64, 128 and 256 filters (filters) of 3 × 3, the tail part is divided into two branches of planning strategies and planning satisfaction degrees, the strategy branches use 4 dimensionality reduction filters of 1 × 1, one full connection layer outputs the selection probability P of each node in a planning space by using a softmax function, the satisfaction degree branches use 2 dimensionality reduction filters of 1 × 1, the full connection layer uses a tank function output range of [0,1] satisfaction degree score C, namely:
fθ(s)=(P,C)
and S4.3, returning the planning strategy probability P and the satisfaction degree score C obtained in the S4.2 to the S3.2, refining the expansion process of UCT tree search, and updating the action situation of each time into (S, a) ═ E (S, a), N (S, a) and E (S, a)v(s,a),P(s,a));
S4.3.1, N (s, a) is the number of visits of the next node (child node) selected based on the current situation;
S4.3.2、Ev(s, a) is the average action evaluation,
Figure BDA0002300386110000071
combined with the output of the neural network to update
Figure BDA0002300386110000072
S4.4, defining local strategy evaluation El(s,a),El(s, a) equals the parallel UCT search horizon constant UpuctThe quotient of (initialization is 3) the product of the recursive neural network output strategy probability P (s, a) and the evolution of the parent node access times N (s, b) and 1+ the access times N (s, a) of a certain child node is obtained, and the specific algorithm is as follows:
Figure BDA0002300386110000073
s4.5, combining the output of the recurrent neural network, and updating all the UCT search tree processes to search for a situation Si-1Under the action ofv(s,a)+El(s, a) deployment action a to obtain maximum valueiAfter a certain number of times of cyclic training of the search tree and the neural network, the search process of a UCT search tree is as follows:
s4.5.1 initial situation s for the currently secured user0Selecting the current Ev(s0,a1)+El(s0,a1) Deploying action a with the maximum value and deploying;
s4.5.2, repeat 4.5.1 until a certain situation siHas not been evaluatedv+ElValue, unable to select, at which time the current situation siImporting a neural network fθ(s) intoEvaluating to obtain fθ(si)=(Pi,Ci);
S4.5.3, updating the access times N(s) of the current nodei,ai+1)=N(si,ai+1)+1;
S4.5.4, use of PiProceed to the next deployment action ai+1And 4.5.2, 4.5.3 are repeated until the final end is reached;
s4.5.5, returning the search result of the whole tree, updating the access times of each passed node according to 4.5.3, returning and updating the satisfaction degree scores of all child nodes according to leaf nodes, wherein the satisfaction degree score is 0 and the satisfaction degree score is 1;
s4.5.6, calculate average action rating per node as S4.3.2:
Figure BDA0002300386110000081
s4.6, according to the whole flow of S4.5, for each situation SiCombining the time of use and the consideration of effective results, the search tree search process is carried out for 800 times, and finally the actual action set { a) according to the search tree is collectednDetermining a new addressing strategy M as follows:
Figure BDA0002300386110000082
wherein tau is a search constant and is responsible for controlling the randomness of the address selection, the larger tau is, the stronger the randomness is, and tau is set to be continuously reduced according to the address selection process because the address selection activity has certain relevance, and finally, tau is stabilized at 0.4.
S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;
s5.1, constructing a combined Loss function Loss according to a training target, predicting an error of the satisfaction C and the upper limit confidence interval algorithm search planning satisfaction C' for the minimum neural network, enabling the strategy probability P output by the neural network to be similar to the branch probability pi obtained by the UCT tree search algorithm search as much as possible, and adding a control parameter g | theta | for preventing overfitting to obtain the combined Loss function Loss:
Loss=(C'-C)2TlogP+g||θ||
wherein g | | θ | | | is the L2 norm of the neural network variable;
s5.2, setting the obtained model to be compared with the previous model after each 50 training batches, and judging the result according to the simulation model rule: wining in accordance with the guarantee rule; all do not accord with the bureau of confluence, keep the former model parameter; when the data are consistent, judging according to the number of used guarantee nodes, and reserving a model with a small number;
and S5.3, continuously training based on the steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network.
Referring to fig. 2, a block diagram of the structure of the present invention is shown, which specifically includes:
resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, which specifically comprises the following steps:
erection region preprocessing unit 101: preprocessing the erection region of the mobile communication network;
the safeguard node preprocessing unit 102: preprocessing guarantee nodes of a mobile communication network;
secured user preprocessing unit 103: preprocessing guaranteed users of the mobile communication network;
planning rule preprocessing module 200: abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing a resource element simulation model of the resource element preprocessing module 100, and establishing an overall simulation model of the mobile communication network planning, which specifically comprises the following steps:
the connection relationship preprocessing unit 201: preprocessing the connection relation of the mobile communication network;
the plan state preprocessing unit 202: preprocessing the planning state of the mobile communication network;
the training sample generation module 300: establishing network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and adopting a search method to run simulation, generating training samples and forming a training sample set for deep reinforcement learning, wherein the method specifically comprises the following steps:
network planning simulation setup unit 301: establishing network planning simulation according to an overall simulation model of the planning rule preprocessing module 200, and randomly generating the position of a guaranteed user during initial training;
the simulation deployment unit 302: performing simulated deployment by using a search algorithm according to the generated guaranteed user position;
the sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;
model training module 400: based on the recurrent neural network, the training sample of the training sample generation module 300 is used to train the whole simulation model of the planning rule preprocessing module 200, the training results of each time are compared and screened, the obtained planning space strategy and step real-time planning satisfaction are fed back to the training sample generation module 300, the search result of the search algorithm is optimized, and the optimized training sample is obtained, which specifically comprises:
planning situation initialization unit 401: initializing and using three major element description planning situations;
filter configuration unit 402: the recurrent neural network adopts a public full convolution network to construct a filter (filter), and the tail part of the filter is divided into two branches of a planning strategy and a planning satisfaction degree;
search process refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;
local policy evaluation definition unit 404: defining local strategy evaluation;
search procedure update unit 405: combining the output of the recurrent neural network, and updating all the search processes into the deployment action for searching the maximum value;
new addressing policy determination unit 406: according to the flow of the search process updating unit 405, the search flow is executed for each situation in combination with the time and effective results, and a new address selection strategy is determined;
the model generation module 500: inputting the obtained optimized training sample into a training network of the model training module 400, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model, which specifically comprises the following steps:
joint loss function construction unit 501: constructing a joint loss function according to the training target;
the result evaluation unit 502: comparing the model after training with the model before training, and judging the result according to the simulation model rule;
the model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;
the network planning module 600: inputting the parameters of an erection region, a guarantee node and a guaranteed user by applying a trained network planning model to obtain the planning parameters of the mobile communication network, and specifically comprising the following steps:
network planning element input section 601: inputting parameters of an erection region, a guarantee node and a guaranteed user;
model operation section 602: calling the trained network planning model for operation;
network planning parameter generation unit 603: the model generates network planning parameters.

Claims (9)

1. An intelligent planning method for a mobile communication network based on deep reinforcement learning is characterized by comprising the following steps:
s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;
s2, planning rule preprocessing, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;
s3, generating training samples, establishing network planning simulation according to the integral simulation model in the step S2, and operating simulation by adopting a search method to generate the training samples and form a training sample set for deep reinforcement learning;
s4, model training, based on a deep reinforcement learning algorithm, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step to the step S3, and optimizing the search result of the search algorithm to obtain an optimized training sample;
and S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model.
2. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the resource element preprocessing comprises the following steps:
s1.1, preprocessing an erection region of a mobile communication network;
s1.2, preprocessing guarantee nodes of the mobile communication network;
s1.3, preprocessing the guaranteed user of the mobile communication network.
3. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the planning rule preprocessing comprises the following steps:
s2.1, preprocessing the connection relation of the mobile communication network;
s2.2, preprocessing the planning state of the mobile communication network.
4. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the training sample generation comprises the following steps:
s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;
s3.2, correspondingly generating the positions of the guaranteed users, and performing simulated deployment by using a search algorithm;
and S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.
5. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the model training comprises the following steps:
s4.1, initializing and using three types of elements to describe a planning situation;
s4.2, constructing a filter (filter) by adopting a public full convolution network through the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;
s4.3, feeding back the result of the step S4.2 to the step 3.2, and refining the searching process;
s4.4, defining local strategy evaluation;
s4.5, combining the output of the recurrent neural network, and updating all the searching processes into the deployment actions for searching the maximum value;
and S4.6, according to the process of the step S4.5, combining the time and effective results for each situation, executing a search process and determining a new address selection strategy.
6. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the model generation comprises the following steps:
s5.1, constructing a joint loss function according to the training target;
s5.2, comparing the model after training with the model before training, and judging the result according to the simulation model rule;
and S5.3, training based on the steps S4.1 and S4.2 to obtain a mobile communication network planning model.
7. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1 or 4, wherein the searching method is a Monte Carlo tree searching method based on upper confidence interval algorithm (UCT).
8. The intelligent planning method for mobile communication network based on deep reinforcement learning of claim 1, wherein the deep reinforcement learning algorithm is a recurrent neural network.
9. An intelligent planning device for a mobile communication network based on deep reinforcement learning is characterized by comprising the following components:
resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, which specifically comprises the following steps:
erection region preprocessing unit 101: preprocessing the erection region of the mobile communication network;
the safeguard node preprocessing unit 102: preprocessing guarantee nodes of a mobile communication network;
secured user preprocessing unit 103: preprocessing guaranteed users of the mobile communication network;
planning rule preprocessing module 200: abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing a resource element simulation model of the resource element preprocessing module 100, and establishing an overall simulation model of the mobile communication network planning, which specifically comprises the following steps:
the connection relationship preprocessing unit 201: preprocessing the connection relation of the mobile communication network;
the plan state preprocessing unit 202: preprocessing the planning state of the mobile communication network;
the training sample generation module 300: establishing network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and adopting a search method to run simulation, generating training samples and forming a training sample set for deep reinforcement learning, wherein the method specifically comprises the following steps:
network planning simulation setup unit 301: establishing network planning simulation according to an overall simulation model of the planning rule preprocessing module 200, and randomly generating the position of a guaranteed user during initial training;
the simulation deployment unit 302: performing simulated deployment by using a search algorithm according to the generated guaranteed user position;
the sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;
model training module 400: based on the recurrent neural network, the training sample of the training sample generation module 300 is used to train the whole simulation model of the planning rule preprocessing module 200, the training results of each time are compared and screened, the obtained planning space strategy and step real-time planning satisfaction are fed back to the training sample generation module 300, the search result of the search algorithm is optimized, and the optimized training sample is obtained, which specifically comprises:
planning situation initialization unit 401: initializing and using three major element description planning situations;
filter configuration unit 402: the recurrent neural network adopts a public full convolution network to construct a filter (filter), and the tail part of the filter is divided into two branches of a planning strategy and a planning satisfaction degree;
search process refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;
local policy evaluation definition unit 404: defining local strategy evaluation;
search procedure update unit 405: combining the output of the recurrent neural network, and updating all the search processes into the deployment action for searching the maximum value;
new addressing policy determination unit 406: according to the flow of the search process updating unit 405, the search flow is executed for each situation in combination with the time and effective results, and a new address selection strategy is determined;
the model generation module 500: inputting the obtained optimized training sample into a training network of the model training module 400, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model, which specifically comprises the following steps:
joint loss function construction unit 501: constructing a joint loss function according to the training target;
the result evaluation unit 502: comparing the model after training with the model before training, and judging the result according to the simulation model rule;
the model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;
the network planning module 600: inputting the parameters of an erection region, a guarantee node and a guaranteed user by applying a trained network planning model to obtain the planning parameters of the mobile communication network, and specifically comprising the following steps:
network planning element input section 601: inputting parameters of an erection region, a guarantee node and a guaranteed user;
model operation section 602: calling the trained network planning model for operation;
network planning parameter generation unit 603: the model generates network planning parameters.
CN201911219452.8A 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning Active CN111104732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911219452.8A CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911219452.8A CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111104732A true CN111104732A (en) 2020-05-05
CN111104732B CN111104732B (en) 2022-09-13

Family

ID=70420933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911219452.8A Active CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111104732B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797292A (en) * 2020-06-02 2020-10-20 成都方未科技有限公司 UCT behavior-based trajectory data mining method and system
CN112348175A (en) * 2020-11-30 2021-02-09 福州大学 Method for performing feature engineering based on reinforcement learning
CN112532442A (en) * 2020-11-25 2021-03-19 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN113765691A (en) * 2021-01-14 2021-12-07 北京京东振世信息技术有限公司 Network planning method and device
CN114964269A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN115174416A (en) * 2022-07-12 2022-10-11 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
CN116668306A (en) * 2023-06-08 2023-08-29 中国人民解放军国防科技大学 Three-view-angle-based network engineering planning method and system for mobile communication network
CN116684273A (en) * 2023-06-08 2023-09-01 中国人民解放军国防科技大学 Automatic planning method and system for mobile communication network structure based on particle swarm
CN116962196A (en) * 2023-06-08 2023-10-27 中国人民解放军国防科技大学 Intelligent planning method and system for mobile communication network based on relation reasoning
CN117669993A (en) * 2024-01-30 2024-03-08 南方科技大学 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560968A (en) * 2018-12-20 2019-04-02 中国电子科技集团公司第三十研究所 A kind of the Internet resources intelligent planning and configuration method of dynamic strategy driving
CN110297490A (en) * 2019-06-17 2019-10-01 西北工业大学 Heterogeneous module robot via Self-reconfiguration planing method based on nitrification enhancement

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560968A (en) * 2018-12-20 2019-04-02 中国电子科技集团公司第三十研究所 A kind of the Internet resources intelligent planning and configuration method of dynamic strategy driving
CN110297490A (en) * 2019-06-17 2019-10-01 西北工业大学 Heterogeneous module robot via Self-reconfiguration planing method based on nitrification enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄颖等: "一种基于稠密卷积网络和竞争架构的改进路径规划算法", 《计算机与数字工程》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797292B (en) * 2020-06-02 2023-10-20 成都方未科技有限公司 UCT behavior trace data mining method and system
CN111797292A (en) * 2020-06-02 2020-10-20 成都方未科技有限公司 UCT behavior-based trajectory data mining method and system
CN112532442A (en) * 2020-11-25 2021-03-19 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN112532442B (en) * 2020-11-25 2023-02-03 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN112348175A (en) * 2020-11-30 2021-02-09 福州大学 Method for performing feature engineering based on reinforcement learning
CN113765691A (en) * 2021-01-14 2021-12-07 北京京东振世信息技术有限公司 Network planning method and device
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
CN115238599B (en) * 2022-06-20 2024-02-27 中国电信股份有限公司 Energy-saving method and model reinforcement learning training method and device for refrigerating system
CN115174416A (en) * 2022-07-12 2022-10-11 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN115174416B (en) * 2022-07-12 2024-04-12 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN114964269A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN114964269B (en) * 2022-08-01 2022-11-08 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN116684273A (en) * 2023-06-08 2023-09-01 中国人民解放军国防科技大学 Automatic planning method and system for mobile communication network structure based on particle swarm
CN116962196A (en) * 2023-06-08 2023-10-27 中国人民解放军国防科技大学 Intelligent planning method and system for mobile communication network based on relation reasoning
CN116684273B (en) * 2023-06-08 2024-01-30 中国人民解放军国防科技大学 Automatic planning method and system for mobile communication network structure based on particle swarm
CN116668306B (en) * 2023-06-08 2024-02-23 中国人民解放军国防科技大学 Three-view-angle-based network engineering planning method and system for mobile communication network
CN116668306A (en) * 2023-06-08 2023-08-29 中国人民解放军国防科技大学 Three-view-angle-based network engineering planning method and system for mobile communication network
CN117669993A (en) * 2024-01-30 2024-03-08 南方科技大学 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Also Published As

Publication number Publication date
CN111104732B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN111104732B (en) Intelligent planning method for mobile communication network based on deep reinforcement learning
US20230334981A1 (en) Traffic flow forecasting method based on multi-mode dynamic residual graph convolution network
US20210133536A1 (en) Load prediction method and apparatus based on neural network
Melo et al. A novel surrogate model to support building energy labelling system: A new approach to assess cooling energy demand in commercial buildings
Lam et al. Decision support system for contractor pre‐qualification—artificial neural network model
CN108563863B (en) Energy consumption calculation and scheduling method for urban rail transit system
CN107194504A (en) Forecasting Methodology, the device and system of land use state
CN110414718A (en) A kind of distribution network reliability index optimization method under deep learning
CN108984830A (en) A kind of building efficiency evaluation method and device based on FUZZY NETWORK analysis
CN106067077A (en) A kind of load forecasting method based on neutral net and device
CN116523187A (en) Engineering progress monitoring method and system based on BIM
Buijs et al. Adaptive planning for flood resilient areas: dealing with complexity in decision-making about multilayered flood risk management
CN114609910A (en) Linear multi-intelligence system with multiplicative noise and consistency control method thereof
CN108347048B (en) Planning method adapting to transregional and transnational scheduling modes
Dong et al. Data-Driven Distributed $ H_\infty $ Current Sharing Consensus Optimal Control of DC Microgrids via Reinforcement Learning
Pan et al. Deep reinforcement learning for multi-objective optimization in BIM-based green building design
CN106161618A (en) A kind of car networking dedicated short range communication system trackside communication unit layout optimization method
CN114897398B (en) Quick formulation method of vehicle rescue scheme under disaster
CN104703059A (en) Planning method and device of broadband access network
CN116167254A (en) Multidimensional city simulation deduction method and system based on city big data
CN115527365A (en) Urban trip activity prediction method and device based on artificial neural network
Yang et al. Integrating case‐based reasoning and expert system techniques for solving experience‐oriented problems
CN114462810A (en) Semi-automatic network planning auxiliary optimization method for mobile communication network and application
CN110096506A (en) A kind of description of tree-like Cellular structure and storage method of multilayer demand
Li et al. Construction Technology Safety Management under the Background of BIM and Information System Modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant