CN112308345A

CN112308345A - Communication network load prediction method, device and server

Info

Publication number: CN112308345A
Application number: CN202011375603.1A
Authority: CN
Inventors: 陈锋; 李张铮; 陈海; 卢春生; 严燕燕; 王哲坤; 吴帆; 刘文山
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-02-02

Abstract

The embodiment of the invention provides a communication network load prediction method, a device and a server, wherein the method comprises the following steps: acquiring network characteristic data sets of a plurality of cells in a communication network, and aggregating the network characteristic data sets of each cell to obtain mean characteristic data of each cell; the method comprises the steps of carrying out cluster analysis on mean characteristic data of all cells to obtain a plurality of groups of training sample sets of different types, respectively carrying out model training by utilizing the plurality of groups of training sample sets of different types to obtain a plurality of cell load prediction models of different types, obtaining network prediction loads of the cells according to the cell load prediction model corresponding to each cell, determining the network prediction loads of the communication network according to the network prediction loads of all the cells in the communication network, and improving the accuracy of a predicted communication network load result.

Description

Communication network load prediction method, device and server

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a communication network load prediction method, a communication network load prediction device and a server.

Background

The load of the communication network is an important index for measuring the network operation capacity, and when the load of the communication network reaches a certain threshold, a capacity bottleneck occurs, resulting in network delay or packet loss. In order to ensure the network use perception of a user, the network load is predicted in advance, when the situation that the network is about to have high load is predicted, an alarm prompt is sent out in time, the network is optimized, and therefore the problem of high load of the network is relieved.

At present, the existing method for predicting the load of the communication network mainly comprises the following steps: the method comprises the steps of adopting a time series model, such as a differential Integrated Moving Average Autoregressive model (ARIMA for Short) and a Long Short-Term Memory network (LSTM for Short), and the like, inputting sample data into the model by using historical network load data of a plurality of cells as the sample data, training the model to obtain a prediction model capable of predicting network load, and predicting the network load data of a communication network according to the prediction model.

However, the inventors found that the prior art has the following technical problems: in the process of training a machine learning model by using historical network load data of a plurality of cells, as the historical network load data of the cells with different service types have great difference, namely the correlation between sample data is too low, the training of a prediction model fails, the communication network load data cannot be accurately obtained through the trained prediction model, and the accuracy of the predicted communication network load is influenced.

Disclosure of Invention

The embodiment of the invention provides a communication network load prediction method, a communication network load prediction device and a server, which are used for improving the accuracy of a predicted communication network load result.

In a first aspect, an embodiment of the present invention provides a method for predicting a load of a communication network, including:

acquiring network characteristic data sets of a plurality of cells in a communication network, and aggregating the network characteristic data sets of each cell to obtain mean characteristic data of each cell;

performing cluster analysis on the mean characteristic data of all the cells to obtain a plurality of groups of training sample sets of different types, wherein each training sample set comprises the mean characteristic data of a plurality of cells of the same type;

training the model according to each group of training sample set respectively to obtain a plurality of cell load prediction models of different types;

and obtaining the network prediction load of each cell according to the cell load prediction model corresponding to each cell, and determining the network prediction load of the communication network according to the network prediction loads of all the cells in the communication network.

In one possible design, the mean characteristic data of all cells is subjected to cluster analysis to obtain a plurality of groups of training sample sets of different types, including;

taking the mean characteristic data of each cell as a node, and determining the similarity between two nodes according to the respective corresponding value characteristic data of the two nodes;

obtaining a node relation graph according to the similarity between any two nodes, and analyzing the node relation graph according to a label propagation algorithm to obtain a plurality of node subsets;

and taking all nodes belonging to the same node subset as a group of training nodes, and obtaining a plurality of groups of training sample sets of different types according to the mean characteristic data of the cells corresponding to each group of training nodes.

In one possible design, the determining the similarity between two nodes according to the value feature data corresponding to the two nodes includes:

if the characteristic of cell i is x_i＝[x_i1,x_i2,…,x_in]Cell j is characterized by x_j＝[x_j1,x_j2,…,x_jn]Then calculate the similarity w between two nodes corresponding to cell i and cell j_i,jThe formula of (1) is:

wherein n is the characteristic dimension of the cell mean characteristic data, σ is a constant, and k is a positive integer greater than 0 and less than or equal to n.

In one possible design, the obtaining the network characteristic data set of the plurality of cells in the communication network includes:

acquiring network characteristic data of a plurality of cells in a communication network according to a preset time interval;

and grouping the network characteristic data of the plurality of cells according to the cell identification codes to obtain a network characteristic data set of each cell.

In one possible design, after the obtaining the network feature data sets of the plurality of cells in the communication network, the method further includes:

and performing missing value filling and abnormal value elimination processing on the network characteristic data set of each cell, wherein the missing value filling comprises at least one of mean filling, most filling and quantile filling.

In one possible design, the network characteristics include load metrics and characteristic data;

the load index comprises at least one of cell flow, cell Physical Resource Block (PRB) utilization rate or user number;

the characteristic data comprises at least one of cell longitude and latitude, cell frequency points, cell bandwidth, cell coverage type, cell region type, cell busy and idle time, cell key performance KPI (key performance indicator), cell key quality KQI (key quality indicator) and measurement report MR (magnetic resonance) data.

In a second aspect, an embodiment of the present invention provides a communication network load prediction apparatus, including:

the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring network characteristic data sets of a plurality of cells in a communication network, and aggregating the network characteristic data sets of each cell to obtain mean characteristic data of each cell;

the clustering module is used for carrying out clustering analysis on the mean characteristic data of all the cells to obtain a plurality of groups of training sample sets of different types, wherein each training sample set comprises the mean characteristic data of a plurality of cells of the same type;

the training module is used for training the model according to each group of training sample set respectively to obtain a plurality of different types of cell load prediction models;

and the prediction module is used for obtaining the network prediction load of each cell according to the cell load prediction model corresponding to each cell and determining the network prediction load of the communication network according to the network prediction loads of all the cells in the communication network.

In one possible design, the clustering module is specifically configured to:

taking the mean characteristic data of each cell as a node, determining the similarity between two nodes according to the respective corresponding value characteristic data of the two nodes, and obtaining a node relation graph according to the similarity between any two nodes;

obtaining a node relation graph according to the similarity between any two nodes;

analyzing the node relation graph according to a label propagation algorithm to obtain a plurality of node subsets;

In a third aspect, an embodiment of the present invention provides a server, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the communication network load prediction method as described in any one of the first aspect and the first aspect above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer executing instruction is stored, and when a processor executes the computer executing instruction, the method for predicting a communication network load according to any one of the first aspect and the first aspect is implemented.

According to the communication network load prediction method, the communication network load prediction device and the communication network load prediction server, the network characteristic data set of each cell is aggregated to obtain the mean characteristic data of each cell, the mean characteristic data of all the cells are subjected to cluster analysis to obtain a plurality of groups of training sample sets of different types, the training sample sets of different types are respectively subjected to model training to obtain a plurality of different types of cell load prediction models, the network prediction load of each cell is obtained according to the cell load prediction model corresponding to each cell, the network prediction load of the communication network is determined according to the network prediction loads of all the cells in the communication network, and the accuracy of the predicted communication network load result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a communication network according to an embodiment of the present invention;

fig. 2 is a first flowchart illustrating a method for predicting a load of a communication network according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a second method for predicting a load of a communication network according to an embodiment of the present invention;

FIG. 4 is an exemplary diagram of a node subset provided by an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a communication network load prediction apparatus according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic structural diagram of a communication network according to an embodiment of the present invention. As shown in fig. 1, the communication network includes a plurality of base stations 11, a plurality of user terminals 12, and a server 13. The cell is an area for providing network services for users, and is a basic component unit of a network, the base station 11 is configured to manage or support one or more cells, generally, one base station corresponds to multiple cells, that is, the cell is a coverage area with the base station as a center, and the server 13 is configured to determine a network prediction load of the communication network according to network characteristics sent by all cells within a range of the communication network. When the terminal 12 is within the network range provided by a certain cell, it establishes a connection with the base station managing the cell, and then establishes communication with the core network, thereby implementing network communication service.

As the size of communication networks has increased, the number of user terminals 12 in the network has increased continuously, and a large number of user terminals 12 use the network for data communication, resulting in a sudden increase in the load of the communication network. When the load of the communication network reaches a certain threshold, a capacity bottleneck occurs, resulting in network delay or packet loss, which seriously affects the user experience. In order to ensure the network use perception of a user, the network load is predicted in advance, when the situation that the network is about to have high load is predicted, an alarm prompt is sent out in time, the network is optimized, and therefore the problem of high load of the network is relieved.

Currently, a model based on a time sequence is generally adopted for cell capacity prediction, such as ARIMA and LSTM, and generally, historical network load data of all cells in a current communication network range is used as sample data, the sample data is input into a machine learning model and trained to obtain a prediction model capable of predicting network load, and the network load data of a communication network is predicted according to the prediction model. However, in the process of training the machine learning model by using the historical network load data of multiple cells, because there is a great difference between the historical network load data of the cells of different service types, that is, the correlation between sample data is too low, the training of the prediction model fails, and a stable and accurate prediction model cannot be obtained by training the machine learning model, so that the communication network load data cannot be accurately obtained by the trained prediction model in the prior art, and the accuracy of the predicted communication network load is affected.

In order to avoid the technical problem, the invention improves the method for predicting the communication network load of the scene. The method comprises the steps of aggregating the network characteristic data sets of each cell to obtain the mean characteristic data of each cell, carrying out cluster analysis on the mean characteristic data of all the cells, using the mean characteristic data of the cells of the same type as a group of training sample sets, training the models according to each group of training sample sets to obtain a plurality of cell load prediction models of different types, finally obtaining the network prediction load of each cell according to the cell load prediction model corresponding to each cell, determining the network prediction load of the communication network according to the network prediction loads of all the cells in the communication network, and improving the accuracy of the predicted communication network load result.

Fig. 2 is a flowchart illustrating a first method for predicting a communication network load according to an embodiment of the present invention, where an execution subject of the embodiment may be a server in the embodiment shown in fig. 1. As shown in fig. 2, the method includes:

s201: the method comprises the steps of obtaining network characteristic data sets of a plurality of cells in a communication network, and aggregating the network characteristic data sets of each cell to obtain mean characteristic data of each cell.

In the embodiment of the present invention, a server receives network characteristic data uploaded by all base stations in a communication network, where the network characteristic data uploaded by each base station includes network characteristic data of all cells accessing the base station. Specifically, the base station sends the collected network characteristic data sets of the plurality of cells to the server according to a preset time interval, that is, the server obtains the network characteristic data of the plurality of cells in the communication network according to the preset time interval. After receiving the network characteristic data of all the cells, grouping the network characteristic data of the cells according to the cell identification codes to obtain a network characteristic data set of each cell.

The server receives that the network characteristic data of the plurality of cells uploaded by the base station are initial data, and the initial data are grouped according to the cell identification codes, so that a network characteristic data set of each cell in the communication network can be obtained, and the network characteristic data sets of each cell are aggregated to obtain mean characteristic data of each cell. Specifically, all the characteristic fields included in the network characteristic data of each cell are determined, aggregation operation is performed on all the same characteristic fields included in the network characteristic data set of the cell, specifically, the mean value of each characteristic field in the network characteristic data set of the cell can be calculated according to a mean value algorithm, and finally, a piece of cell mean value characteristic data including a plurality of mean value characteristics is obtained. The aggregation operation includes, but is not limited to, averaging, and median.

Illustratively, the load index includes at least one of cell traffic, PRB utilization, or number of users; the characteristic data comprises at least one of longitude and latitude of a cell, a cell frequency point, a cell bandwidth, a cell coverage type, a cell region type, a cell busy and idle time, a cell KPI index, a cell KQI index and MR data.

S202: and performing cluster analysis on the mean characteristic data of all the cells to obtain a plurality of groups of training sample sets of different types, wherein each training sample set comprises the mean characteristic data of a plurality of cells of the same type.

In the embodiment of the present invention, specifically, a K-means algorithm may be adopted to perform cluster analysis on the mean characteristic data of all the cells. Illustratively, mean characteristic data of K cells are selected from the mean characteristic data of all the cells as an initial clustering center, numerical distances between the mean characteristic data of all the cells and the mean characteristic data of the K cells are respectively calculated, and the clustering center is determined again according to the numerical distances obtained in the previous step. And calculating a standard measure function in the process of determining the clustering center, and stopping clustering operation until the maximum iteration times are reached. Therefore, a plurality of clusters of relevant mean characteristic data sets of the cells of the same type can be obtained, the mean characteristic data sets of the cells of the same type are used as a group of training sample sets, and a plurality of groups of training sample sets of different types can be obtained, wherein each training sample set comprises the mean characteristic data of the cells of the same type.

S203: and training the model according to each group of training sample set respectively to obtain a plurality of different types of cell load prediction models.

In the embodiment of the invention, a plurality of different types of cell load prediction models are obtained by training the models according to each group of training sample sets. Specifically, mean characteristic data sets of multiple cells of the same type included in each training sample set form training samples, and a matched machine learning model is selected for training. Machine learning models include, but are not limited to, linear regression models, support vector machines, random forests, gradient boosting decision trees, deep learning models, and the like. After the model training is finished, a plurality of different types of cell load prediction models can be obtained.

S204: and obtaining the network prediction load of each cell according to the cell load prediction model corresponding to each cell, and determining the network prediction load of the communication network according to the network prediction loads of all the cells in the communication network.

In the embodiment of the invention, the historical network load data of each cell is used as the sample data of the cell, and the sample data of each cell is input into the corresponding cell load prediction model, so that the future network load value of the cell can be predicted. Thereby, the network load values of all cells within the communication network can be predicted. By summing the network load values of all cells within the communication network, the network predicted load of the communication network can be determined.

As can be seen from the foregoing embodiments, in the communication network load prediction method provided in the embodiments of the present invention, the network characteristic data sets of each cell are aggregated to obtain the mean characteristic data of each cell, the mean characteristic data of all cells are subjected to cluster analysis, the mean characteristic data of the cells of the same type are used as a set of training sample sets, the models are trained according to each set of training sample sets to obtain a plurality of different types of cell load prediction models, finally, the network prediction load of each cell is obtained according to the cell load prediction model corresponding to each cell, and the network prediction load of the communication network is determined according to the network prediction loads of all cells in the communication network. According to the communication network load prediction method provided by the embodiment of the invention, the machine learning model is not trained directly according to the network characteristic data of all the cells in the communication network, and the single load prediction model is established, but the similarity of the network characteristic data among the cells of the same type and the network load difference of the cells of different types are considered, and the corresponding cell load prediction model is trained according to the network characteristic data of the cells of the same type, so that the cells of different types use different load prediction models, and the accuracy of the predicted communication network load result is improved.

Fig. 3 is a schematic flowchart of a second method for predicting a load of a communication network according to an embodiment of the present invention, and this embodiment describes in detail a process of performing cluster analysis on mean characteristic data of all cells in S202 to obtain a plurality of groups of training sample sets of different categories based on the embodiment of fig. 2. As shown in fig. 3, the method includes:

s301: and taking the mean characteristic data of each cell as a node, determining the similarity between two nodes according to the respective corresponding value characteristic data of the two nodes, and obtaining a node relation graph according to the similarity between any two nodes.

In the embodiment of the invention, the mean characteristic data of each cell is taken as a node, and the similarity between two nodes can be determined according to the respective corresponding value characteristic data of the two nodes. Specifically, for example, taking the radial basis kernel function as the calculation criterion, if the characteristic of the cell i is x_i＝[x_i1,x_i2,…,x_in]Cell j is characterized by x_j＝[x_j1,x_j2,…,x_jn]Then calculate the similarity w between two nodes corresponding to cell i and cell j_i,jIs shown in (1):

For example, the cell mean characteristic field includes an evolved radio bearer E-RAB establishment success rate, a video initial buffering delay, and a Reference Signal Receiving Power (RSRP) ratio greater than-110 dBm. Correspondingly, if the characteristic dimension n is 3, and the mean characteristic data of the cell 1 is [0.99, 0.1, 0.98], the mean characteristic data of the cell 2 is [0.97, 0.2, 0.99], and σ is 0.12, the similarity between the nodes corresponding to the cells 1 and 2 is 0.92 according to the formula (1), which indicates that the similarity between the cells 1 and 2 is high.

S302: and obtaining a node relation graph according to the similarity between any two nodes, and analyzing the node relation graph according to a label propagation algorithm to obtain a plurality of node subsets.

In the embodiment of the invention, the similarity between any two nodes can be obtained through the formula (1), and the node relation graph is obtained according to the similarity between any two nodes. Specifically, in the node relationship graph, the similarity between two nodes is used as the edge weight of two nodes in the node relationship graph, and the node relationship graph is formed according to the edge weight between any two nodes.

In the embodiment of the invention, the node relation graph is analyzed according to the label propagation algorithm, and the nodes are clustered according to the similarity between any two nodes, so that a plurality of node subsets can be obtained. The method is mainly used for social network relationship mining based on a Label Propagation Algorithm (LPA for short). The LPA community discovery algorithm uses only the graph network structure as guidance, without optimizing a predefined objective function or a priori information about the community. Where each node is initialized with a unique label and each node takes the label currently used by most of its neighbors at each step. In this iterative process, densely connected groups of nodes agree on a unique label to form a community. Finally, the connection between nodes in the same community is very tight, and the connection between communities is sparse.

In the embodiment of the present invention, specifically, the step of analyzing the node relationship graph according to the label propagation algorithm to obtain a plurality of node subsets is as follows:

(1) each node in the node relation graph is assigned with a unique label, namely, a label 1 corresponds to a cell node 1, and a label i corresponds to a node i.

For example, assume that there are 4 cells in the node relationship graph, the label of the initialization node cell 1 is 1, the label of the cell 2 is 2, the label of the cell 3 is 3, the label of the cell 4 is 4, and the edge weights of two nodes are shown in table 1 below:

TABLE 1

This node	This node label	Neighbor node	Neighbor node label	Edge weight
					Cell 1	1	Cell 2	2	0.8
Cell 1	1	Cell 3	3	0.5
					Cell 1	1	Cell 4	4	0.6
Cell 2	2	Cell 3	3	0.2
					Cell 2	2	Cell 4	4	0.6
Cell 3	3	Cell 4	4	0.6

(2) Traversing all nodes, finding out neighbors of corresponding nodes, acquiring neighbor labels of the nodes, finding out labels with the most occurrence times in the neighbors of the nodes, and if only one label is present in the neighbors of the nodes, taking the label as a label of the node; if the number of the occurrence times of the labels is more than one, respectively summing the similarity of the node neighbor of each label and the node to obtain the total similarity of the node neighbor of each label and the node, and taking the label with the highest total similarity as the label of the node; otherwise, a label is randomly selected and replaced by the node label.

For example, when traversing 4 node cells for the first time, the cell 1 label is fixed to 1;

the cell 2 has the adjacent nodes cell 1, cell 3 and cell 4, and the number of the labels of the most neighbors of each node is 1, so the label with the highest total similarity of each label is taken as the label of the cell, the highest similarity of the adjacent nodes of the cell 2 can be found to be 0.8 from the above table, and the label of the node of the cell 2 is updated to be 1, namely the label of the node of the cell 1. The modified label distribution is shown in table 2 below:

TABLE 2

This node	This node label	Neighbor node	Neighbor node label	Edge weight
					Cell 1	1	Cell 2	1	0.8
Cell 1	1	Cell 3	3	0.5
					Cell 1	1	Cell 4	4	0.6
Cell 2	1	Cell 3	3	0.2
					Cell 2	1	Cell 4	4	0.6
Cell 3	3	Cell 4	4	0.6

Next, the label of the cell 3 is updated, the cell 3 has the adjacent nodes cell 1, cell 2 and cell 4, and the similarity sum of the cell 3 and the cell 4 can be found to be at most 0.6 through the above table, so the node label of the cell 3 is changed to the node label 4 of the cell 4. The modified label distribution is shown in table 3 below:

TABLE 3

This node	This node label	Neighbor node	Neighbor node label	Edge weight
					Cell 1	1	Cell 2	4	0.8
Cell 1	1	Cell 3	4	0.5
					Cell 1	1	Cell 4	4	0.6
Cell 2	4	Cell 3	4	0.2
					Cell 2	4	Cell 4	4	0.6
Cell 3	4	Cell 4	4	0.6

As can be seen from table 3, the labels of the nodes of the three other cells except cell 1 are updated to 4, and the labels of the nodes of the neighboring cells are only 1, so that the node label of cell 1 is updated to 4. The updated label distribution is shown in table 4 below:

TABLE 4

This node	This node label	Neighbor node	Neighbor node label	Edge weight
					Cell 1	4	Cell 2	4	0.8
Cell 1	4	Cell 3	4	0.5
					Cell 1	4	Cell 4	4	0.6
Cell 2	4	Cell 3	4	0.2
					Cell 2	4	Cell 4	4	0.6
Cell 3	4	Cell 4	4	0.6

(3) And (4) if the node labels are not changed any more or the set maximum iteration times are reached after the labels are re-marked in the current round, stopping iteration, and otherwise, repeating the step (2). Specifically, for example, after the 4 cells in step 2 are re-labeled, the labels are all 4 and no change occurs, and the iteration stops.

(4) After the iteration is stopped, each node cell is divided into node subsets. Fig. 4 is an exemplary diagram of a node subset provided in an embodiment of the present invention, and as shown in fig. 4, connections between any two nodes in the same node subset are tight, and connections between nodes in different node subsets are sparse.

S303: and taking all nodes belonging to the same node subset as a group of training nodes, and obtaining a plurality of groups of training sample sets of different types according to the mean characteristic data of the cells corresponding to each group of training nodes.

In the embodiment of the invention, the connection between any two nodes in the same node subset is tight, the edge weight value between the nodes is low, the similarity exists between all the nodes in the same node subset, and all the nodes belonging to the same node subset are used as a group of training nodes. According to all nodes contained in the same node subset, mean characteristic data of cells corresponding to each group of training nodes is determined, a group of training sample sets of the same type is obtained, and therefore a plurality of groups of training sample sets of different types can be obtained.

It can be known from the above embodiments that the accuracy of the clustering result is improved by using the mean characteristic data of each cell as a node and classifying the communication network by using a label-based propagation algorithm. In the embodiment of the invention, the label propagation algorithm only uses the node relation graph network structure as a guide, and does not need to optimize the predefined objective function or the prior information of related communities, thereby avoiding the performance prediction defect caused by mismatching of the specified cluster number, the initial central point and the objective function with the data distribution in the traditional clustering method such as the K-means clustering algorithm, and improving the accuracy of the obtained node subset. The invention provides an improved community discovery algorithm in combination with the similarity between nodes, when the node neighbors have more than one label, the label attribution of the node is judged by utilizing the similarity sum of the node neighbors of each label, and the accuracy of a cell load prediction model obtained according to different types of training sample sets is improved.

In one possible implementation, after acquiring network feature data sets of a plurality of cells in a communication network, missing value filling and outlier culling processing are performed on the network feature data set of each cell, where the missing value filling includes at least one of mean filling, most value filling, and quantile filling.

In the embodiment of the present invention, after the network feature data sets of a plurality of cells in the communication network are obtained, missing value padding and abnormal value elimination processing are first performed on the network feature data set of each cell. Specifically, the missing value padding includes at least one of mean padding, maximum padding, and quantile padding. By filling missing values and removing abnormal values from the network characteristic data sets of the cells, after conflicting data, invalid values and missing values are deleted and the missing values are filled in the deleted data, the integrity and accuracy of the original data are improved, and error parameters in the original data are corrected.

From the above embodiment, missing value filling and abnormal value elimination are performed on the network feature data set of each cell, so that the integrity and accuracy of the original data are improved, and the average feature data of each cell obtained according to the network feature data set of each cell after data processing is more accurate, thereby improving the accuracy of the communication network load prediction provided by the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a communication network load prediction apparatus according to an embodiment of the present invention. As shown in fig. 5, the communication network load prediction apparatus includes: an acquisition module 501, a clustering module 502, a training module 503, and a prediction module 504. The acquiring module 501 is configured to acquire network feature data sets of multiple cells in a communication network, and aggregate the network feature data sets of each cell to obtain mean feature data of each cell; a clustering module 502, configured to perform clustering analysis on the mean characteristic data of all cells to obtain multiple groups of training sample sets of different categories, where each training sample set includes mean characteristic data of multiple cells of the same type; a training module 503, configured to train the model according to each set of training samples, respectively, to obtain multiple different types of cell load prediction models; the prediction module 504 is configured to obtain a network predicted load of each cell according to the cell load prediction model corresponding to each cell, and determine the network predicted load of the communication network according to the network predicted loads of all the cells in the communication network.

The device provided in this embodiment may be used to implement the technical solution of the embodiment in fig. 2, and the implementation principle and technical effect are similar, which are not described herein again.

In a possible implementation manner, the clustering module 502 is specifically configured to: taking the mean characteristic data of each cell as a node, determining the similarity between two nodes according to the respective corresponding value characteristic data of the two nodes, and obtaining a node relation graph according to the similarity between any two nodes; obtaining a node relation graph according to the similarity between any two nodes, and analyzing the node relation graph according to a label propagation algorithm to obtain a plurality of node subsets; and taking all nodes belonging to the same node subset as a group of training nodes, and obtaining a plurality of groups of training sample sets of different types according to the mean characteristic data of the cells corresponding to each group of training nodes.

In a possible implementation manner, the clustering module 502 is specifically configured to: if the characteristic of cell i is x_i＝[x_i1,x_i2,…,x_in]Cell j is characterized by x_j＝[x_j1,x_j2,…,x_jn]Then calculate the similarity w between two nodes corresponding to cell i and cell j_i,jFormula (2)Comprises the following steps:

In a possible implementation manner, the obtaining module 501 is specifically configured to: the method comprises the steps of obtaining network characteristic data of a plurality of cells in a communication network according to a preset time interval, grouping the network characteristic data of the plurality of cells according to cell identification codes, and obtaining a network characteristic data set of each cell.

In a possible implementation manner, the apparatus further includes a data preprocessing module, configured to perform missing value filling and outlier culling processing on the network feature data set of each cell, where the missing value filling includes at least one of mean filling, maximum filling, and quantile filling.

Fig. 6 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention. As shown in fig. 6, the server of the present embodiment includes: a processor 601, a memory 602, and a computer program stored in the memory 602 and operable on the processor 601, the processor 601 implementing the following steps when executing the computer program: acquiring network characteristic data sets of a plurality of cells in a communication network, and aggregating the network characteristic data sets of each cell to obtain mean characteristic data of each cell; performing cluster analysis on the mean characteristic data of all the cells to obtain a plurality of groups of training sample sets of different types, wherein each training sample set comprises the mean characteristic data of a plurality of cells of the same type; training the model according to each group of training sample set respectively to obtain a plurality of cell load prediction models of different types; and obtaining the network prediction load of each cell according to the cell load prediction model corresponding to each cell, and determining the network prediction load of the communication network according to the network prediction loads of all the cells in the communication network.

In one possible design, the processor 601, when executing the computer program, further performs the following steps: taking the mean characteristic data of each cell as a node, and determining the similarity between two nodes according to the respective corresponding value characteristic data of the two nodes; obtaining a node relation graph according to the similarity between any two nodes, and analyzing the node relation graph according to a label propagation algorithm to obtain a plurality of node subsets; and taking all nodes belonging to the same node subset as a group of training nodes, and obtaining a plurality of groups of training sample sets of different types according to the mean characteristic data of the cells corresponding to each group of training nodes.

In one possible design, the processor 601, when executing the computer program, further performs the following steps: if the characteristic of cell i is x_i＝[x_i1,x_i2,…,x_in]Cell j is characterized by x_j＝[x_j1,x_j2,…,x_jn]Then calculate the similarity w between two nodes corresponding to cell i and cell j_i,jThe formula of (1) is:

In one possible design, the processor 601, when executing the computer program, further performs the following steps: the method comprises the steps of obtaining network characteristic data of a plurality of cells in a communication network according to a preset time interval, grouping the network characteristic data of the plurality of cells according to cell identification codes, and obtaining a network characteristic data set of each cell.

In one possible design, the processor 601, when executing the computer program, further performs the following steps: and performing missing value filling and abnormal value elimination processing on the network characteristic data set of each cell, wherein the missing value filling comprises at least one of mean filling, most filling and quantile filling.

Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 602 may be separate or integrated with the processor 601.

When the memory 602 is provided separately, the server further comprises a bus 603 for connecting the memory 602 and the processor 601.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the method for predicting a communication network load as described above is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to implement the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods described in the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for predicting a load on a communication network, comprising:

2. The method according to claim 1, wherein the mean characteristic data of all cells are subjected to cluster analysis to obtain a plurality of sets of training samples of different types, including;

3. The method of claim 2, wherein determining the similarity between two nodes according to the value characteristic data corresponding to the two nodes comprises:

4. The method of claim 1, wherein obtaining the network characteristic data sets for the plurality of cells in the communication network comprises:

5. The method of claim 1, further comprising, after said obtaining the network characteristic data sets for the plurality of cells within the communication network:

6. The method according to any of claims 1 to 5, wherein the network characteristic data comprises load metrics and characteristic data;

7. A communication network load prediction apparatus, comprising:

8. The apparatus of claim 7, wherein the clustering module is specifically configured to:

9. A server, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the communication network load prediction method of any of claims 1 to 6.

10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the communication network load prediction method of any one of claims 1 to 6.