CN111104732A - Intelligent planning method for mobile communication network based on deep reinforcement learning - Google Patents

Intelligent planning method for mobile communication network based on deep reinforcement learning Download PDF

Info

Publication number
CN111104732A
CN111104732A CN201911219452.8A CN201911219452A CN111104732A CN 111104732 A CN111104732 A CN 111104732A CN 201911219452 A CN201911219452 A CN 201911219452A CN 111104732 A CN111104732 A CN 111104732A
Authority
CN
China
Prior art keywords
planning
mobile communication
communication network
network
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911219452.8A
Other languages
Chinese (zh)
Other versions
CN111104732B (en
Inventor
杨若鹏
聂宗哲
殷昌盛
江尚
朱巍
邹小飞
张其增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201911219452.8A priority Critical patent/CN111104732B/en
Publication of CN111104732A publication Critical patent/CN111104732A/en
Application granted granted Critical
Publication of CN111104732B publication Critical patent/CN111104732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

本发明公开一种基于深度强化学习的机动通信网智能规划方法及装置,所述方法包括以下步骤:1、资源要素预处理,对机动通信网网络的保障节点、被保障用户、架设地域等资源要素进行预处理;2、规划规则预处理,对机动通信网网络的规划规则进行预处理;3、训练样本生成,对预处理结果进行随机蒙特卡洛式搜索演算,生成训练样本;4、模型训练,基于递归神经网络,使用训练样本对网络规划模型进行训练;5、模型生成,构造联合损失函数,依照联合损失函数指示,重复对样本进行搜索和训练,生成机动通讯网网络规划模型。基于深度强化学习的机动通信网智能规划方法及装置,有效解决了当前机动通信网网络规划大量依赖人工操作、规划时间超出任务要求、对突发任务和陌生环境适应性差、资源利用率不高等问题,提高了机动通信网网络规划总体效率。

Figure 201911219452

The invention discloses a method and device for intelligent planning of a mobile communication network based on deep reinforcement learning. The method includes the following steps: 1. Resource element preprocessing; Elements are preprocessed; 2. Planning rule preprocessing, which preprocesses the planning rules of the mobile communication network; 3. Training sample generation, which performs random Monte Carlo search calculus on the preprocessing results to generate training samples; 4. Model Training, based on the recurrent neural network, using training samples to train the network planning model; 5. Model generation, constructing a joint loss function, and repeating the search and training of the samples according to the instructions of the joint loss function to generate a mobile communication network network planning model. The method and device for intelligent planning of mobile communication network based on deep reinforcement learning effectively solve the problems that the current network planning of mobile communication network relies heavily on manual operations, planning time exceeds task requirements, poor adaptability to unexpected tasks and unfamiliar environments, and low resource utilization. , which improves the overall efficiency of the network planning of the mobile communication network.

Figure 201911219452

Description

Intelligent planning method for mobile communication network based on deep reinforcement learning
Technical Field
The invention relates to the technical field of information, in particular to an intelligent planning method for a motor-driven communication network.
Background
The mobile communication network generally refers to a mobile communication network used in the special field for guaranteeing large-scale special tasks, and is a comprehensive mobile network generally composed of multiple sub-networks and multiple devices such as a fixed optical fiber network, a microwave network, a satellite network, an ascending relay network, a short-wave and ultrashort-wave radio station network and the like, wherein the minimum unit of the comprehensive mobile network is a single communication guarantee platform or device and is regarded as a guarantee node in the mobile communication network. The number of guaranteed people of the mobile communication network is hundreds and above, the random installation requirement is high, the time is short, and the planning time is within 24 hours or shorter.
Network planning means that a network planning person or a technical support person fully utilizes the existing system equipment to balance the actual requirements and contradiction barriers, and planning and organizing work aiming at the network erection of the mobile communication network is performed for ensuring the completion of the current task. The method mainly refers to the steps of selecting sites for each network system device of the mobile communication network and designing a network architecture according to various devices, connection relations and geographic environments of task personnel, groups and task supply supported and guaranteed by the mobile communication network so as to support erection and deployment of the devices.
Because the mobile communication network is generally used for guaranteeing various sudden tasks at times and places which are difficult to predict, the mobile communication network has the characteristics of large difference of network planning requirements, complex content, limited equipment conditions, urgent time requirements and the like. Currently, mobile communication network planning usually uses a large amount of manual work combined with a fixed algorithm system for planning, wherein the manual planning needs a professional planning staff to accumulate a large amount of experience in the working process to be possibly qualified for the work, and needs more staff, and has the defects of long planning time, frequent data interaction and the like; the fixed algorithm system for the network planning of the mobile communication network can assist planning personnel in planning to a certain extent, but the planning system using the fixed algorithm cannot be flexibly applied to all concrete scenes which can possibly expand the mobile communication network, cannot deal with different geographic environments, equipment limitations and other conditions without bottom layer modification, and because the system is mainly designed on the network design, when the network scale is increased, a more intuitive and accurate planning result cannot be obtained after constraint conditions are increased, only limited auxiliary support can be provided for the planning personnel, and the task guarantee effect is influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and aims to solve the practical problems of complex network planning conditions, time emergency, uncertain places, limited equipment conditions and the like of the mobile communication network, so that the intelligent planning method of the mobile communication network based on deep reinforcement learning is realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intelligent planning method for a mobile communication network based on deep reinforcement learning comprises the following steps:
s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;
s1.1, preprocessing an erection region of a mobile communication network;
s1.2, preprocessing guarantee nodes of the mobile communication network;
s1.3, preprocessing the guaranteed user of the mobile communication network.
S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;
s2.1, preprocessing the connection relation of the mobile communication network;
s2.2, preprocessing the planning state of the mobile communication network.
S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;
s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;
s3.2, correspondingly generating the positions of the guaranteed users, and performing simulated deployment by using a search algorithm;
and S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.
S4, model training, based on a deep reinforcement learning algorithm such as a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the real-time planning satisfaction degree of the step S3, optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT), and obtaining an optimized training sample;
s4.1, initializing and using three types of elements to describe a planning situation;
s4.2, constructing a filter (filter) by adopting a public full convolution network through the recurrent neural network, and dividing the tail part into two branches of a planning strategy and a planning satisfaction degree;
s4.3, feeding back the result of the step S4.2 to the step 3.2, and refining the searching process;
s4.4, defining local strategy evaluation;
s4.5, combining the output of the recurrent neural network, and updating all the searching processes into the deployment actions for searching the maximum value;
and S4.6, according to the process of the step S4.5, combining the time and effective results for each situation, executing a search process and determining a new address selection strategy.
S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;
s5.1, constructing a joint loss function according to the training target;
s5.2, comparing the model after training with the model before training, and judging the result according to the simulation model rule;
and S5.3, training based on the steps S4.1 and S4.2 to obtain a mobile communication network planning model.
The invention adopts the intelligent planning method of the mobile communication network based on the deep reinforcement learning, and has the advantages that:
1. the Monte Carlo tree searching method based on the upper limit confidence interval algorithm (UCT) is adopted, and the recursive neural network which is simple in structure but practical and effective is combined, so that the computational power requirement and the processing time of hardware are greatly reduced, and the problem of network planning of the maneuvering network can be solved quickly;
2. by adopting the deep reinforcement learning algorithm to train the intelligent planning model, the planning model overcomes the defect of single applicable scene and can adapt to the scenes of different regions, different security equipment and different secured users.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a detailed flow diagram of an embodiment of the intelligent planning method for a mobile communication network based on deep reinforcement learning according to the present invention;
fig. 2 is a block diagram of the composition structure of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the attached fig. 1, a schematic flow chart of an embodiment of the intelligent planning method for the mobile communication network based on deep reinforcement learning of the present invention is shown, which specifically includes the following steps:
s1, preprocessing resource elements, abstracting and mapping the erection region, the guarantee nodes and the guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network;
s1.1, preprocessing the erection region of the mobile communication network, and abstracting the erection region by analogy to a chessboard. Setting the region size N2km2The method comprises the following steps of taking a left lower corner coordinate of a topographic map of an erection region as a zero point coordinate, taking a certain divisor of N as a unit length, transversely and longitudinally dividing the erection region, taking each intersection point as a positioning point to obtain a node position matrix, presetting the erection region as a square region with the same length and width in the patent to obtain an NxN node position matrix, and continuously performing multiple expansion subdivision;
s1.2, preprocessing communication platforms/devices of the mobile communication network, namely guarantee nodes (such as mobile communication vehicles, mobile radio stations, mobile stations and the like), presetting P-type guarantee nodes, wherein the communication distance R and the link quantity L of the guarantee nodes are determined according to the specific models of the devices. In this patent, the security nodes are mainly divided into two categories, i.e. main nodes P1And a secondary node P2Sequentially modeling according to the guarantee priority B, and setting the priority of the main node as B1The secondary node has a priority of B2Wherein the communication guarantee range of the main node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle1km is a circle of radius, and the number of links is set to L1The communication guarantee range of the secondary node is based on a single-hop microwave communication distance R with the node deployment position as the center of a circle1km, single-hop shortwave communication distance R2km is a circle of radius, and the number of microwave links is set to L2The number of short-wave links is set to L'2
S1.3, pre-processing guaranteed users (such as military unit group classes, grade classes, continuous classes, individual soldiers and the like of different levels) of the mobile communication network, presetting Q-type guarantee nodes, and determining the communication distance R and the link quantity L of the guaranteed user nodes according to the specific models of the equipment. The guaranteed users in the patent are mainly divided into three categories, namely main user Q1Secondary user Q2And subordinate user Q3Sequentially modeling according to the guarantee priority A, and setting the priority of the main user as A1The secondary user is A2The subordinate user is A3Wherein the single-hop microwave communication distance of the main user is R1km, number of links U1(ii) a The secondary user single-hop microwave communication distance is R1km, number of links U2The single-hop short-wave communication distance of the subordinate user is R2Km, number of links is U'3
The resource elements of the mobile communication network are abstracted and mapped, and support is provided for subsequent completion of rules and integral modeling of the mobile communication network.
S2, planning rule preprocessing, and preprocessing the planning rule of the mobile communication network. Abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing the resource element simulation model of the step S1, and establishing an integral simulation model of the mobile communication network planning;
s2.1, preprocessing the connection relation of the mobile communication network, and associating a guarantee node with a guarantee node and a guaranteed user;
s2.1.1, associating the guarantee node with the guaranteed user according to the priority association A → B, and determining the guarantee relationship. In this patent, A1And B1Corresponding to (A)2And B2、B3Corresponding, i.e. primary node P1Securing primary user Q1The secondary node P2Securing secondary user Q2And subordinate user Q3Each user needs to have at least one corresponding security node connected with the user;
s2.1.2, determining the connection relationship between the guarantee nodes, in this patent, all the primary nodes need to form a connected graph, and the secondary nodes P2Must communicate with at least one primary node P1Connecting;
s2.1.3, all connections need to satisfy the communication type specified in step S1, i.e., links of the same communication type can be connected;
s2.1.4, all connections need to satisfy the number of links specified in step S1, that is, the number of connections cannot exceed the specified number L of node links to be connected;
s2.1.5, all connections need to satisfy the communication distance specified in step S1, i.e. the distance between any two nodes must be less than the maximum communication distance R of the communication device used to be connectable;
s2.1.6, the minimum spanning tree is formed by the minimum requirement of the topological structure of the whole mobile network communication network;
s2.2, preprocessing the planning state of the mobile communication network, establishing a network situation S according to the guaranteeing node, the guaranteed user, the erection region and the network planning rule, wherein the network situation S comprises all information of the mobile communication network, namely S ═ P, Q, A, B, R and L …, but the main plane is used for describing the planning position of each node, the planned position is occupied by characters, the unplanned position is marked as 0, and the shape is as follows
Figure BDA0002300386110000051
S2.2.1, marking the initial situation of the network situation s as s0The planning positions of all the guaranteed user nodes are mainly described, namely the positions of the guaranteed personnel in the erection region model are directly determined according to the actual task requirements of the guaranteed personnel, and the shape is as follows
Figure BDA0002300386110000052
Wherein the positions of the guaranteed user nodes are represented in the matrix by symbols in the guaranteed user set P.
S2.2.2, the planning of the subsequent safeguard nodes is regarded as a typical Markov process, that is, the deployment situation of each safeguard node can be regarded as a situation for the current network situation si-1Action response of ai(where i ∈ [1, K)]K is the total number of security nodes, in this patent the sum of the primary and secondary nodes), which is the determination of the addressing of a certain security node Q, e.g. the addressing of a certain security node Q
Figure BDA0002300386110000061
S2.2.3, all guaranteed nodes are located and planned to meet the requirements, or the guaranteed nodes are arranged and marked as the final office after the arrangement, and the network situation is obtained when the final office is finished
Figure BDA0002300386110000062
In the step, on the basis of the step S1, the planning rule of the mobile communication network is abstracted and mapped, and an overall mobile communication network simulation model is established, so as to provide support for the subsequent deep reinforcement learning planning strategy.
S3, generating training samples, establishing network planning simulation according to the overall simulation model in the step S2, and generating the training samples and forming a training sample set for deep reinforcement learning by adopting a Monte Carlo tree search method based on an upper limit confidence interval algorithm (UCT) to run simulation;
s3.1, establishing network planning simulation according to the integral simulation model of the step S2, and during initial training, randomly generating the position of a guaranteed user;
s3.2, correspondingly generating the guaranteed user position, and performing simulated deployment by using a Monte Carlo tree search algorithm based on an upper limit confidence interval algorithm (UCT);
s3.2.1 from an initial situation s0And starting to initialize the simulation deployment, wherein the state is the root node of the search tree, and initializing each action (s, a) of the search tree based on a certain situation at the moment, wherein E (s, a) is the comprehensive action evaluation of each possible selected position of the guarantee node under the situation.
S3.2.2 in existenceWhen the neural network is introduced, the initial E (s, a) scores under all situations are equal and are set as r0Continuously searching in a random traversal mode until all the guarantee nodes are under, namely, after the final bureau is reached, judging according to the steps S1 and S2, and calculating each corresponding current situation S according to whether the final bureau result meets the conditioni-1Deployment action a ofiThe action evaluation r of (2) is set as "r", and if satisfied, is counted as "r ═ r0+ r', if not, r ═ r0-r', normalized to obtain the shape:
Figure BDA0002300386110000063
the evaluation set of (1).
And S3.3, repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet the conditions.
S4, model training, based on a recurrent neural network, training the overall simulation model of the step S2 by using the training sample of the step S3, comparing and screening the training results of each time, feeding back the obtained planning space strategy and the step real-time planning satisfaction degree to the step S3, and optimizing the search result of a Monte Carlo tree search algorithm based on an upper confidence interval algorithm (UCT) to obtain an optimized training sample;
s4.1, initializing and using 6 planes of three categories to describe a planning situation, namely three planes of a guaranteed user Q, two planes of a guaranteed user P and one plane of an erection region;
s4.2, the recurrent neural network firstly adopts 4 layers of public full convolution networks, respectively uses Relu functions to construct 32, 64, 128 and 256 filters (filters) of 3 × 3, the tail part is divided into two branches of planning strategies and planning satisfaction degrees, the strategy branches use 4 dimensionality reduction filters of 1 × 1, one full connection layer outputs the selection probability P of each node in a planning space by using a softmax function, the satisfaction degree branches use 2 dimensionality reduction filters of 1 × 1, the full connection layer uses a tank function output range of [0,1] satisfaction degree score C, namely:
fθ(s)=(P,C)
and S4.3, returning the planning strategy probability P and the satisfaction degree score C obtained in the S4.2 to the S3.2, refining the expansion process of UCT tree search, and updating the action situation of each time into (S, a) ═ E (S, a), N (S, a) and E (S, a)v(s,a),P(s,a));
S4.3.1, N (s, a) is the number of visits of the next node (child node) selected based on the current situation;
S4.3.2、Ev(s, a) is the average action evaluation,
Figure BDA0002300386110000071
combined with the output of the neural network to update
Figure BDA0002300386110000072
S4.4, defining local strategy evaluation El(s,a),El(s, a) equals the parallel UCT search horizon constant UpuctThe quotient of (initialization is 3) the product of the recursive neural network output strategy probability P (s, a) and the evolution of the parent node access times N (s, b) and 1+ the access times N (s, a) of a certain child node is obtained, and the specific algorithm is as follows:
Figure BDA0002300386110000073
s4.5, combining the output of the recurrent neural network, and updating all the UCT search tree processes to search for a situation Si-1Under the action ofv(s,a)+El(s, a) deployment action a to obtain maximum valueiAfter a certain number of times of cyclic training of the search tree and the neural network, the search process of a UCT search tree is as follows:
s4.5.1 initial situation s for the currently secured user0Selecting the current Ev(s0,a1)+El(s0,a1) Deploying action a with the maximum value and deploying;
s4.5.2, repeat 4.5.1 until a certain situation siHas not been evaluatedv+ElValue, unable to select, at which time the current situation siImporting a neural network fθ(s) intoEvaluating to obtain fθ(si)=(Pi,Ci);
S4.5.3, updating the access times N(s) of the current nodei,ai+1)=N(si,ai+1)+1;
S4.5.4, use of PiProceed to the next deployment action ai+1And 4.5.2, 4.5.3 are repeated until the final end is reached;
s4.5.5, returning the search result of the whole tree, updating the access times of each passed node according to 4.5.3, returning and updating the satisfaction degree scores of all child nodes according to leaf nodes, wherein the satisfaction degree score is 0 and the satisfaction degree score is 1;
s4.5.6, calculate average action rating per node as S4.3.2:
Figure BDA0002300386110000081
s4.6, according to the whole flow of S4.5, for each situation SiCombining the time of use and the consideration of effective results, the search tree search process is carried out for 800 times, and finally the actual action set { a) according to the search tree is collectednDetermining a new addressing strategy M as follows:
Figure BDA0002300386110000082
wherein tau is a search constant and is responsible for controlling the randomness of the address selection, the larger tau is, the stronger the randomness is, and tau is set to be continuously reduced according to the address selection process because the address selection activity has certain relevance, and finally, tau is stabilized at 0.4.
S5, model generation, inputting the obtained optimized training sample into the training network in the step S4, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model;
s5.1, constructing a combined Loss function Loss according to a training target, predicting an error of the satisfaction C and the upper limit confidence interval algorithm search planning satisfaction C' for the minimum neural network, enabling the strategy probability P output by the neural network to be similar to the branch probability pi obtained by the UCT tree search algorithm search as much as possible, and adding a control parameter g | theta | for preventing overfitting to obtain the combined Loss function Loss:
Loss=(C'-C)2TlogP+g||θ||
wherein g | | θ | | | is the L2 norm of the neural network variable;
s5.2, setting the obtained model to be compared with the previous model after each 50 training batches, and judging the result according to the simulation model rule: wining in accordance with the guarantee rule; all do not accord with the bureau of confluence, keep the former model parameter; when the data are consistent, judging according to the number of used guarantee nodes, and reserving a model with a small number;
and S5.3, continuously training based on the steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network.
Referring to fig. 2, a block diagram of the structure of the present invention is shown, which specifically includes:
resource element preprocessing module 100: abstracting and mapping an erection region, a guarantee node and a guaranteed user of the mobile communication network, and establishing a simulation model of resource elements of the mobile communication network, which specifically comprises the following steps:
erection region preprocessing unit 101: preprocessing the erection region of the mobile communication network;
the safeguard node preprocessing unit 102: preprocessing guarantee nodes of a mobile communication network;
secured user preprocessing unit 103: preprocessing guaranteed users of the mobile communication network;
planning rule preprocessing module 200: abstracting and mapping the guarantee relationship and the planning state of the mobile communication network, fusing a resource element simulation model of the resource element preprocessing module 100, and establishing an overall simulation model of the mobile communication network planning, which specifically comprises the following steps:
the connection relationship preprocessing unit 201: preprocessing the connection relation of the mobile communication network;
the plan state preprocessing unit 202: preprocessing the planning state of the mobile communication network;
the training sample generation module 300: establishing network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and adopting a search method to run simulation, generating training samples and forming a training sample set for deep reinforcement learning, wherein the method specifically comprises the following steps:
network planning simulation setup unit 301: establishing network planning simulation according to an overall simulation model of the planning rule preprocessing module 200, and randomly generating the position of a guaranteed user during initial training;
the simulation deployment unit 302: performing simulated deployment by using a search algorithm according to the generated guaranteed user position;
the sample and evaluation set generation unit 303: repeatedly simulating deployment by using a search method to obtain a sample and an evaluation set which meet conditions;
model training module 400: based on the recurrent neural network, the training sample of the training sample generation module 300 is used to train the whole simulation model of the planning rule preprocessing module 200, the training results of each time are compared and screened, the obtained planning space strategy and step real-time planning satisfaction are fed back to the training sample generation module 300, the search result of the search algorithm is optimized, and the optimized training sample is obtained, which specifically comprises:
planning situation initialization unit 401: initializing and using three major element description planning situations;
filter configuration unit 402: the recurrent neural network adopts a public full convolution network to construct a filter (filter), and the tail part of the filter is divided into two branches of a planning strategy and a planning satisfaction degree;
search process refinement unit 403: feeding back the results of the filter construction unit 402 to the simulation deployment unit 302, and refining the search process;
local policy evaluation definition unit 404: defining local strategy evaluation;
search procedure update unit 405: combining the output of the recurrent neural network, and updating all the search processes into the deployment action for searching the maximum value;
new addressing policy determination unit 406: according to the flow of the search process updating unit 405, the search flow is executed for each situation in combination with the time and effective results, and a new address selection strategy is determined;
the model generation module 500: inputting the obtained optimized training sample into a training network of the model training module 400, constructing a joint loss function according to a training target, searching and training the sample according to joint loss function instructions, and generating a mobile communication network planning model, which specifically comprises the following steps:
joint loss function construction unit 501: constructing a joint loss function according to the training target;
the result evaluation unit 502: comparing the model after training with the model before training, and judging the result according to the simulation model rule;
the model generation unit 503: training based on a planning situation initialization unit 401 and a filter construction unit 402 to obtain a mobile communication network planning model;
the network planning module 600: inputting the parameters of an erection region, a guarantee node and a guaranteed user by applying a trained network planning model to obtain the planning parameters of the mobile communication network, and specifically comprising the following steps:
network planning element input section 601: inputting parameters of an erection region, a guarantee node and a guaranteed user;
model operation section 602: calling the trained network planning model for operation;
network planning parameter generation unit 603: the model generates network planning parameters.

Claims (9)

1.一种基于深度强化学习的机动通信网智能规划方法,其特征在于,所述方法包括以下步骤:1. a kind of intelligent planning method of mobile communication network based on deep reinforcement learning, is characterized in that, described method comprises the following steps: S1、资源要素预处理,对机动通信网的架设地域、保障节点、被保障用户进行抽象和映射,建立机动通信网资源要素的仿真模型;S1. Preprocessing of resource elements, abstracting and mapping the erection area, support nodes, and guaranteed users of the mobile communication network, and establishing a simulation model of the resource elements of the mobile communication network; S2、规划规则预处理,对机动通信网的保障关系、规划状态进行抽象和映射,融合步骤S1的资源要素仿真模型,建立机动通信网规划的整体仿真模型;S2, planning rule preprocessing, abstracting and mapping the guarantee relationship and planning state of the mobile communication network, and integrating the resource element simulation model of step S1 to establish an overall simulation model for the planning of the mobile communication network; S3、训练样本生成,按照步骤S2的整体仿真模型建立网络规划仿真,并采用搜索方法运行模拟,产生训练样本并形成可供深度强化学习使用的训练样本集;S3, training samples are generated, a network planning simulation is established according to the overall simulation model in step S2, and a search method is used to run the simulation to generate training samples and form a training sample set for deep reinforcement learning; S4、模型训练,基于深度强化学习算法,使用步骤S3的训练样本对步骤S2的整体仿真模型进行训练,对每次的训练结果进行比较筛选,将得到的规划空间策略和步骤实时规划满足度反馈到步骤S3,优化所述搜索算法的搜索结果,得到优化训练样本;S4, model training, based on the deep reinforcement learning algorithm, use the training samples of step S3 to train the overall simulation model of step S2, compare and filter each training result, and feed back the obtained planning space strategy and step real-time planning satisfaction degree Go to step S3, optimize the search result of described search algorithm, obtain optimized training sample; S5、模型生成,将得到的优化训练样本输入步骤S4的训练网络中,根据训练目标构造联合损失函数,并依照联合损失函数指示,对样本进行搜索和训练,生成机动通讯网网络规划模型。S5, model generation, input the obtained optimized training samples into the training network in step S4, construct a joint loss function according to the training target, and search and train the samples according to the instructions of the joint loss function to generate a mobile communication network network planning model. 2.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述资源要素预处理包括以下步骤:2. The method for intelligent planning of a mobile communication network based on deep reinforcement learning according to claim 1, wherein the resource element preprocessing comprises the following steps: S1.1、对机动通信网网络的架设地域进行预处理;S1.1. Preprocess the area where the mobile communication network network is set up; S1.2、对机动通信网网络的保障节点进行预处理;S1.2. Preprocess the security nodes of the mobile communication network network; S1.3、对机动通信网网络的被保障用户进行预处理。S1.3. Preprocess the guaranteed users of the mobile communication network network. 3.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述规划规则预处理包括以下步骤:3. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the planning rule preprocessing comprises the following steps: S2.1、对机动通信网网络的连接关系进行预处理;S2.1. Preprocess the connection relationship of the mobile communication network; S2.2、对机动通信网网络的规划状态进行预处理。S2.2, preprocessing the planning state of the mobile communication network network. 4.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述训练样本生成包括以下步骤:4. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the training sample generation comprises the following steps: S3.1、按照步骤S2的整体仿真模型,建立网络规划仿真,初始训练时,首先随机生成被保障用户位置;S3.1. According to the overall simulation model of step S2, a network planning simulation is established. During the initial training, the guaranteed user position is randomly generated first; S3.2、对应生成的被保障用户位置,使用搜索算法进行模拟部署;S3.2. Use the search algorithm to simulate deployment corresponding to the generated guaranteed user location; S3.3、使用搜索方法重复模拟部署,得到满足条件的样本和评价集。S3.3. Use the search method to simulate deployment repeatedly to obtain samples and evaluation sets that meet the conditions. 5.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述模型训练包括以下步骤:5. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the model training comprises the following steps: S4.1、初始化使用三大类要素描述规划局面;S4.1. Initialization uses three categories of elements to describe the planning situation; S4.2、递归神经网络采用公共的全卷积网络,构造过滤器(filter),尾部分成规划策略和规划满足度两路分支;S4.2. The recurrent neural network adopts a public fully convolutional network to construct a filter, and the tail is divided into two branches of planning strategy and planning satisfaction; S4.3、将步骤S4.2的结果反馈给步骤3.2,细化搜索过程;S4.3, the result of step S4.2 is fed back to step 3.2 to refine the search process; S4.4、定义局部策略评价;S4.4. Define local strategy evaluation; S4.5、结合递归神经网络输出,搜索流程全部更新为寻找最大值的部署动作;S4.5. Combined with the output of the recurrent neural network, the search process is all updated to the deployment action of finding the maximum value; S4.6、按照步骤S4.5的流程,对每一个态势结合用时和有效结果,执行搜索流程,确定新的选址策略。S4.6. According to the process of step S4.5, the search process is performed for each situation in combination with the time-consuming and effective results, and a new location selection strategy is determined. 6.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述模型生成包括以下步骤:6. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the model generation comprises the following steps: S5.1、根据训练目标构造联合损失函数;S5.1. Construct a joint loss function according to the training target; S5.2、对模型训练后与训练前对比,按照仿真模型规则进行结果评判;S5.2. Compare the model after training and before training, and judge the results according to the rules of the simulation model; S5.3、基于步骤S4.1和S4.2进行训练,获取机动通信网网络规划模型。S5.3. Perform training based on steps S4.1 and S4.2 to obtain a network planning model of the mobile communication network. 7.根据权利要求1或4所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述搜索方法为基于上限置信区间算法(UCT)的蒙特卡洛树搜索方法。7. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1 or 4, wherein the search method is a Monte Carlo tree search method based on an upper confidence interval algorithm (UCT). 8.根据权利要求1所述的基于深度强化学习的机动通信网智能规划方法,其特征在于,所述深度强化学习算法为递归神经网络。8. The intelligent planning method for a mobile communication network based on deep reinforcement learning according to claim 1, wherein the deep reinforcement learning algorithm is a recurrent neural network. 9.一种基于深度强化学习的机动通信网智能规划装置,其特征在于,所述装置包括:9. An intelligent planning device for a mobile communication network based on deep reinforcement learning, wherein the device comprises: 资源要素预处理模块100:对机动通信网的架设地域、保障节点、被保障用户进行抽象和映射,建立机动通信网资源要素的仿真模型,具体包括:The resource element preprocessing module 100: abstracts and maps the erection area, support nodes, and guaranteed users of the mobile communication network, and establishes a simulation model of the resource elements of the mobile communication network, which specifically includes: 架设地域预处理单元101:对机动通信网网络的架设地域进行预处理;The erection area preprocessing unit 101: preprocesses the erection area of the mobile communication network; 保障节点预处理单元102:对机动通信网网络的保障节点进行预处理;Guarantee node preprocessing unit 102: preprocessing the guarantee nodes of the mobile communication network; 被保障用户预处理单元103:对机动通信网网络的被保障用户进行预处理;Guaranteed user preprocessing unit 103: preprocessing the guaranteed users of the mobile communication network network; 规划规则预处理模块200:对机动通信网的保障关系、规划状态进行抽象和映射,融合资源要素预处理模块100的资源要素仿真模型,建立机动通信网规划的整体仿真模型,具体包括:The planning rule preprocessing module 200: abstracts and maps the guarantee relationship and planning state of the mobile communication network, integrates the resource element simulation model of the resource element preprocessing module 100, and establishes an overall simulation model for the planning of the mobile communication network, specifically including: 连接关系预处理单元201:对机动通信网网络的连接关系进行预处理;Connection relationship preprocessing unit 201: preprocessing the connection relationship of the mobile communication network; 规划状态预处理单元202:对机动通信网网络的规划状态进行预处理;Planning state preprocessing unit 202: preprocessing the planning state of the mobile communication network; 训练样本生成模块300:按照规划规则预处理模块200的整体仿真模型建立网络规划仿真,并采用搜索方法运行模拟,产生训练样本并形成可供深度强化学习使用的训练样本集,具体包括:Training sample generation module 300: establishes a network planning simulation according to the overall simulation model of the planning rule preprocessing module 200, and uses a search method to run the simulation to generate training samples and form a training sample set for deep reinforcement learning, specifically including: 网络规划仿真建立单元301:按照规划规则预处理模块200的整体仿真模型,建立网络规划仿真,初始训练时,首先随机生成被保障用户位置;Network planning simulation establishing unit 301 : preprocessing the overall simulation model of the module 200 according to the planning rules, establishes a network planning simulation, and randomly generates the guaranteed user position during initial training; 模拟部署单元302:对应生成的被保障用户位置,使用搜索算法进行模拟部署;Simulated deployment unit 302: corresponding to the generated guaranteed user location, use a search algorithm to perform simulated deployment; 样本和评价集生成单元303:使用搜索方法重复模拟部署,得到满足条件的样本和评价集;Sample and evaluation set generation unit 303: use the search method to repeatedly simulate deployment to obtain samples and evaluation sets that meet the conditions; 模型训练模块400:基于递归神经网络,使用训练样本生成模块300的训练样本对规划规则预处理模块200的整体仿真模型进行训练,对每次的训练结果进行比较筛选,将得到的规划空间策略和步骤实时规划满足度反馈到训练样本生成模块300,优化所述搜索算法的搜索结果,得到优化训练样本,具体包括:Model training module 400: Based on the recurrent neural network, use the training samples of the training sample generation module 300 to train the overall simulation model of the planning rule preprocessing module 200, compare and screen each training result, and compare the obtained planning space strategy and Step: Feedback the real-time planning satisfaction to the training sample generation module 300, optimize the search results of the search algorithm, and obtain optimized training samples, specifically including: 规划局面初始化单元401:初始化使用三大类要素描述规划局面;Planning situation initialization unit 401: Initialize the planning situation using three categories of elements; 过滤器构造单元402:递归神经网络采用公共的全卷积网络,构造过滤器(filter),尾部分成规划策略和规划满足度两路分支;Filter construction unit 402: the recurrent neural network adopts a public fully convolutional network to construct a filter, and the tail is divided into two branches of planning strategy and planning satisfaction; 搜索过程细化单元403:将过滤器构造单元402的结果反馈给模拟部署单元302,细化搜索过程;Search process refinement unit 403: Feed back the result of the filter construction unit 402 to the simulation deployment unit 302 to refine the search process; 局部策略评价定义单元404:定义局部策略评价;Local strategy evaluation definition unit 404: define a local strategy evaluation; 搜索过程更新单元405:结合递归神经网络输出,搜索流程全部更新为寻找最大值的部署动作;Search process update unit 405: Combined with the output of the recurrent neural network, the search process is all updated to the deployment action of finding the maximum value; 新选址策略确定单元406:按照搜索过程更新单元405的流程,对每一个态势结合用时和有效结果,执行搜索流程,确定新的选址策略;New site selection strategy determination unit 406: According to the process flow of the search process update unit 405, for each situation in combination with the time and effective results, the search process is performed to determine a new site selection strategy; 模型生成模块500:将得到的优化训练样本输入模型训练模块400的训练网络中,根据训练目标构造联合损失函数,并依照联合损失函数指示,对样本进行搜索和训练,生成机动通讯网网络规划模型,具体包括:The model generation module 500: input the obtained optimized training samples into the training network of the model training module 400, construct a joint loss function according to the training target, and search and train the samples according to the instructions of the joint loss function to generate a network planning model of the mobile communication network, Specifically include: 联合损失函数构造单元501:根据训练目标构造联合损失函数;Joint loss function construction unit 501: Construct a joint loss function according to the training target; 结果评判单元502:对模型训练后与训练前对比,按照仿真模型规则进行结果评判;Result judging unit 502: compares the model after training and before training, and judges the results according to the simulation model rules; 模型生成单元503:基于规划局面初始化单元401和过滤器构造单元402进行训练,获取机动通信网网络规划模型;Model generation unit 503: perform training based on the planning situation initialization unit 401 and the filter construction unit 402 to obtain a network planning model of the mobile communication network; 网络规划模块600:应用训练好的网络规划模型,输入架设地域、保障节点、被保障用户参数,获取机动通信网络规划参数,具体包括:Network planning module 600: Apply the trained network planning model, input the parameters of the erection area, guaranteed nodes, and guaranteed users, and obtain the planning parameters of the mobile communication network, specifically including: 网络规划要素输入单元601:输入架设地域、保障节点、被保障用户参数;Network planning element input unit 601: input the parameters of the erection area, guaranteed nodes, and guaranteed users; 模型运算单元602:调用训练好的网络规划模型进行运算;Model operation unit 602: call the trained network planning model to perform operation; 网络规划参数生成单元603:模型生成网络规划参数。Network planning parameter generating unit 603: The model generates network planning parameters.
CN201911219452.8A 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning Active CN111104732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911219452.8A CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911219452.8A CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111104732A true CN111104732A (en) 2020-05-05
CN111104732B CN111104732B (en) 2022-09-13

Family

ID=70420933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911219452.8A Active CN111104732B (en) 2019-12-03 2019-12-03 Intelligent planning method for mobile communication network based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111104732B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797292A (en) * 2020-06-02 2020-10-20 成都方未科技有限公司 UCT behavior-based trajectory data mining method and system
CN112348175A (en) * 2020-11-30 2021-02-09 福州大学 Method for performing feature engineering based on reinforcement learning
CN112532442A (en) * 2020-11-25 2021-03-19 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN113765691A (en) * 2021-01-14 2021-12-07 北京京东振世信息技术有限公司 Network planning method and device
CN114964269A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN115174416A (en) * 2022-07-12 2022-10-11 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
CN115622897A (en) * 2022-05-12 2023-01-17 重庆金美通信有限责任公司 Network scheme automatic verification method based on simulated training equipment
CN115801595A (en) * 2022-10-26 2023-03-14 南京信息工程大学 Topology description method oriented to large-scale network virtualization simulation
CN116668306A (en) * 2023-06-08 2023-08-29 中国人民解放军国防科技大学 A method and system for network engineering planning of mobile communication network based on three perspectives
CN116684273A (en) * 2023-06-08 2023-09-01 中国人民解放军国防科技大学 A method and system for automatic planning of mobile communication network structure based on particle swarm
CN116962196A (en) * 2023-06-08 2023-10-27 中国人民解放军国防科技大学 An intelligent planning method and system for mobile communication network based on relational reasoning
CN117669993A (en) * 2024-01-30 2024-03-08 南方科技大学 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560968A (en) * 2018-12-20 2019-04-02 中国电子科技集团公司第三十研究所 A kind of the Internet resources intelligent planning and configuration method of dynamic strategy driving
CN110297490A (en) * 2019-06-17 2019-10-01 西北工业大学 Heterogeneous module robot via Self-reconfiguration planing method based on nitrification enhancement

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109560968A (en) * 2018-12-20 2019-04-02 中国电子科技集团公司第三十研究所 A kind of the Internet resources intelligent planning and configuration method of dynamic strategy driving
CN110297490A (en) * 2019-06-17 2019-10-01 西北工业大学 Heterogeneous module robot via Self-reconfiguration planing method based on nitrification enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄颖等: "一种基于稠密卷积网络和竞争架构的改进路径规划算法", 《计算机与数字工程》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797292B (en) * 2020-06-02 2023-10-20 成都方未科技有限公司 UCT behavior trace data mining method and system
CN111797292A (en) * 2020-06-02 2020-10-20 成都方未科技有限公司 UCT behavior-based trajectory data mining method and system
CN112532442B (en) * 2020-11-25 2023-02-03 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN112532442A (en) * 2020-11-25 2021-03-19 中国人民解放军军事科学院评估论证研究中心 Task coordination capability evaluation method for global command control network
CN112348175A (en) * 2020-11-30 2021-02-09 福州大学 Method for performing feature engineering based on reinforcement learning
CN113765691A (en) * 2021-01-14 2021-12-07 北京京东振世信息技术有限公司 Network planning method and device
CN115622897A (en) * 2022-05-12 2023-01-17 重庆金美通信有限责任公司 Network scheme automatic verification method based on simulated training equipment
CN115238599A (en) * 2022-06-20 2022-10-25 中国电信股份有限公司 Energy-saving method for refrigerating system and model reinforcement learning training method and device
CN115238599B (en) * 2022-06-20 2024-02-27 中国电信股份有限公司 Energy-saving method and model reinforcement learning training method and device for refrigerating system
CN115174416A (en) * 2022-07-12 2022-10-11 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN115174416B (en) * 2022-07-12 2024-04-12 中国电信股份有限公司 Network planning system, method and device and electronic equipment
CN114964269B (en) * 2022-08-01 2022-11-08 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN114964269A (en) * 2022-08-01 2022-08-30 成都航空职业技术学院 Unmanned aerial vehicle path planning method
CN115801595A (en) * 2022-10-26 2023-03-14 南京信息工程大学 Topology description method oriented to large-scale network virtualization simulation
CN116962196A (en) * 2023-06-08 2023-10-27 中国人民解放军国防科技大学 An intelligent planning method and system for mobile communication network based on relational reasoning
CN116684273B (en) * 2023-06-08 2024-01-30 中国人民解放军国防科技大学 An automatic planning method and system for mobile communication network structure based on particle swarm
CN116668306B (en) * 2023-06-08 2024-02-23 中国人民解放军国防科技大学 Three-view-angle-based network engineering planning method and system for mobile communication network
CN116684273A (en) * 2023-06-08 2023-09-01 中国人民解放军国防科技大学 A method and system for automatic planning of mobile communication network structure based on particle swarm
CN116668306A (en) * 2023-06-08 2023-08-29 中国人民解放军国防科技大学 A method and system for network engineering planning of mobile communication network based on three perspectives
CN117669993A (en) * 2024-01-30 2024-03-08 南方科技大学 Progressive charging facility planning method, progressive charging facility planning device, terminal and storage medium

Also Published As

Publication number Publication date
CN111104732B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN111104732A (en) Intelligent planning method for mobile communication network based on deep reinforcement learning
CN112651059B (en) An artificial intelligence-based multi-scheme generation method for urban design of controlled plots
US20230334981A1 (en) Traffic flow forecasting method based on multi-mode dynamic residual graph convolution network
Lam et al. Decision support system for contractor pre‐qualification—artificial neural network model
CN101702655B (en) Layout method and system of network topological diagram
Chen et al. Mean-variance model for the build-operate-transfer scheme under demand uncertainty
Pan et al. Deep reinforcement learning for multi-objective optimization in BIM-based green building design
CN112488392B (en) A method for predicting daily water consumption of smart water affairs based on machine learning
CN111752302B (en) Path planning method and device, electronic equipment and computer readable storage medium
Liu et al. An oriented spanning tree based genetic algorithm for multi-criteria shortest path problems
Li et al. Classical planning model‐based approach to automating construction planning on earthwork projects
CN110414718A (en) A distribution network reliability index optimization method based on deep learning
CN106067077A (en) A kind of load forecasting method based on neutral net and device
CN107977711A (en) A Multi-Agent Genetic Algorithm Oriented to "Three Lines" Cooperative Optimization
CN115186936B (en) Optimal well pattern construction method for oil field based on GNN model
Dong et al. Data-Driven Distributed $ H_\infty $ Current Sharing Consensus Optimal Control of DC Microgrids via Reinforcement Learning
CN106161618A (en) A kind of car networking dedicated short range communication system trackside communication unit layout optimization method
CN107679305B (en) Road network model creating method and device
Sorokin et al. Mathematical model to describe the inter-structural relationship between different systems
CN116167254A (en) Multidimensional city simulation deduction method and system based on city big data
Preiss Data frame model for the engineering design process
Saha The space allocation problem.
Gao et al. A λ-cut approximate algorithm for goal-based bilevel risk management systems
CN117272787B (en) Residence land growth simulation method and system based on multiple intelligent agents
CN114219258B (en) Method and device for collaborative planning of tasks of multiple unmanned platforms based on domain knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant