CN111385128B

CN111385128B - Method and device for predicting burst load, storage medium, and electronic device

Info

Publication number: CN111385128B
Application number: CN201811643589.1A
Authority: CN
Inventors: 孟晟
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2023-04-07
Anticipated expiration: 2038-12-29
Also published as: WO2020135510A1; CN111385128A

Abstract

The invention provides a method and a device for predicting an abrupt load, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring load index data of a designated area, wherein the load index data is used for representing the load condition of the designated area; and analyzing the load index data by using a burst load model, and predicting the occurrence condition of the burst load, wherein the burst load model is trained by using multiple groups of data through machine learning.

Description

Method and device for predicting burst load, storage medium, and electronic device

Technical Field

The present invention relates to the field of communications, and in particular, to a method and an apparatus for predicting an emergency load, a storage medium, and an electronic apparatus.

Background

The service network is usually limited in resources, and there are several situations when the load of the access node or the whole network approaches or even reaches the upper limit. One is that the medium and long term gradually approaches the upper limit of the capacity, one is that the medium and short term periodical load is high, and the other is that the sudden increase is in real time. The short-term to long-term load prediction of the granularity of more than hours is mainly based on a time series model. The sudden load surge is usually mainly determined by random behaviors of network users and is only related to a time period which is very close to the occurrence moment, and the congestion caused by the sudden load surge is a public problem.

Taking cellular communication network as an example, when the cell load exceeds a certain level, the service performance (e.g. delay and jamming) of users in the cell will be reduced, and when the cell load is severe, congestion will occur, which causes the system basic index to be rapidly deteriorated, so that the user service cannot be performed normally. Especially, in the scene of high user density, such as sports games, concerts, and large gatherings, how to alleviate or even avoid real-time congestion is a pain point for operators and users.

The traditional scheme mainly has three implementation modes:

(1) And planning and arranging the areas according to the user capacity corresponding to the congestion time. Therefore, the frequency spectrum utilization rate can be seriously reduced and the network cost is greatly improved in a common time period; (2) And fixing the cell parameters in the area to the gear of the highest access number, and only ensuring the basic connection of the user. This can sacrifice user rate and traffic class; (3) In order to take account of both the spectrum utilization rate and the user experience, when a scene with high user density (or called as high telephone traffic) occurs, various network indexes are manually monitored, and network parameters are adjusted continuously and manually according to whether the indexes exceed a preset threshold, however, according to the technical scheme, many operation and maintenance personnel need to be arranged on the site, and the labor cost is greatly increased.

Aiming at the problems of high cost and the like of the traditional technical scheme for real-time congestion caused by sudden load surge in the related technology, an effective technical scheme is not provided yet.

Disclosure of Invention

The embodiment of the invention provides a method and a device for predicting an abrupt load, a storage medium and an electronic device, which are used for at least solving the problems of real-time congestion caused by abrupt load surge in the related art, high cost and the like in the traditional technical scheme.

According to an embodiment of the present invention, there is also provided a method for predicting a burst load, including:

acquiring load index data of a designated area, wherein the load index data is used for representing the load condition of the designated area; and analyzing the load index data by using a sudden load model, and predicting the occurrence condition of the sudden load, wherein the sudden load model is trained by using multiple groups of data through machine learning.

According to another embodiment of the present invention, there is also provided an apparatus for predicting a burst load, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring load index data of a specified area, and the load index data is used for representing the load condition of the specified area; and the prediction module is used for analyzing the load index data by using an emergency load model and predicting the occurrence condition of the emergency load, wherein the emergency load model is trained by using multiple groups of data through machine learning.

According to another embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to execute the method for predicting a burst load as described in any one of the above when the computer program runs.

According to another embodiment of the present invention, there is also provided an electronic apparatus including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the method for predicting a burst load as described in any one of the above.

According to the invention, load index data of a designated area is collected, wherein the load index data is used for representing the load condition of the designated area; the load index data are analyzed by using a burst load model, and the occurrence condition of the burst load is predicted, wherein the burst load model is trained by using multiple groups of data through machine learning.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a flowchart of a method for predicting a sudden load according to an embodiment of the present invention;

fig. 2 is a block diagram of a burst load prediction apparatus according to an embodiment of the present invention;

fig. 3 is another block diagram of a burst load prediction apparatus according to an embodiment of the present invention;

FIG. 4 is a block diagram of an electronic device according to an embodiment of the invention;

FIG. 5 is a block diagram of a system according to an embodiment of the invention;

FIG. 6 is an overall flowchart in accordance with a preferred embodiment of the present invention;

FIG. 7 is a schematic diagram of a sequence alignment search according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of boundary parameters according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a service of screening equivalent evaluation indexes according to an embodiment of the present invention;

fig. 10 is a decision tree method for automatic labeling of real-time congestion according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

An embodiment of the present invention provides a method for predicting an emergency load, and fig. 1 is a flowchart of the method for predicting an emergency load according to the embodiment of the present invention, as shown in fig. 1, including:

step S102, collecting load index data of a designated area, wherein the load index data is used for representing the load condition of the designated area;

and step S104, analyzing the load index data by using an emergency load model, and predicting the occurrence condition of the emergency load, wherein the emergency load model is trained by using multiple groups of data through machine learning.

Through the steps, load index data of a designated area are collected, wherein the load index data are used for representing the load condition of the designated area; the load index data are analyzed by using the burst load model, and the occurrence situation of the burst load is predicted, wherein the burst load model is trained by using multiple groups of data through machine learning.

It should be noted that the occurrence of the burst load may refer to the occurrence time of the burst load, the occurrence duration of the burst load, and other technical solutions related to the burst load.

In this embodiment of the present invention, before analyzing the load index data by using a sudden load model, the method further includes: selecting a burst load model suitable for the designated area from a plurality of burst load models.

In an embodiment of the present invention, before selecting a burst load model suitable for the designated area from a plurality of burst load models, the method further includes:

acquiring historical data collected at a base station; preprocessing the historical data to obtain a regional whole cell data set; selecting a congestion data set from the data sets of the regional global cells according to a specified rule; and labeling the congestion data set to obtain a plurality of burst load models.

In an embodiment of the present invention, the historical data includes at least one of: load index data, network key performance index data, key quality of service index data, user behavior indication data.

In an embodiment of the present invention, the regional global cell dataset includes: independent variable data and dependent variable data, and acquiring the regional global cell data set at least by the following modes: acquiring independent variable data and dependent variable data meeting preset conditions, wherein the independent variable data comprise: and load type data, wherein the dependent variable data and the independent variable data have a specified functional relationship.

In this embodiment of the present invention, selecting a congestion data set from the regional global cell data set according to a specified rule includes: analyzing the dependent variable data and the independent variable data according to a comparison detection method; dependent variable data with the comparison detection times meeting the preset times are used as congestion indication index data; and taking n independent variable index data of which the association degree with the congestion indication index data meets a preset value as the congestion data set, wherein n is a positive integer.

More specifically, the congestion data set may be obtained by: determining independent variables (load class indexes) and dependent variables (perception accumulative indexes); data with the load index mean value and standard deviation being 20% in the whole field is marked by' existence of customer complaints or explicit marking of operation and maintenance personnel for the field with the congestion period. Learning, namely determining the high load of an independent variable, finding out a dependent variable index with higher correlation by a comparison detection method, and then determining a threshold of the dependent variable (threshold of a perception index); and excluding the cells with low load and short perception difference time as a whole to obtain a congestion data set.

In an embodiment of the present invention, after taking n pieces of independent variable index data, of which the association degree with the congestion indication index data satisfies a preset value, as the congestion data set, the method further includes: labeling the second load index data by using a gradient lifting decision tree (GBDT) method to obtain a labeling result, wherein the labeling result comprises: a first burst level, a second burst level.

It should be noted that the burst load model at least includes information of one of the following: input data length (or observation time window, namely, matching the burst load model based on the data of the time, and obtaining a specific model after matching), (2) advance, (3) burst load width (or burst load duration), (4) burst load grade. Wherein the burst load level comprises at least: a first burst level, a second burst level.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, a device for predicting an emergency load is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram of a prediction apparatus of a sudden load according to an embodiment of the present invention, as shown in fig. 2, the apparatus including:

the system comprises an acquisition module 20, a storage module and a processing module, wherein the acquisition module is used for acquiring load index data of a specified area, and the load index data is used for representing the load condition of the specified area;

and the prediction module 22 is configured to analyze the load index data by using an emergency load model, and predict an occurrence of the emergency load, where the emergency load model is trained by machine learning by using multiple groups of data.

As shown in fig. 3, in the embodiment of the present invention, the apparatus further includes: and a selecting module 24, configured to select a burst load model suitable for the designated area from multiple burst load models.

In the embodiment of the present invention, the acquisition module 20 is configured to acquire historical data acquired at a base station; preprocessing the historical data to obtain a regional whole cell data set; selecting a congestion data set from the data sets of the regional global cells according to a specified rule; and labeling the congestion data set to obtain a plurality of burst load models.

In an embodiment of the present invention, the regional global cell data set includes: the system comprises an independent variable data acquisition module 20, a dependent variable data acquisition module 20 and a data processing module, wherein the independent variable data acquisition module is used for acquiring independent variable data and dependent variable data which meet preset conditions, and the independent variable data comprises the following components: and the dependent variable data and the independent variable data have a specified functional relationship.

In the embodiment of the present invention, the selecting module 24 is further configured to analyze the dependent variable data and the independent variable data according to a comparison detection method; dependent variable data with the comparison detection times meeting the preset times are used as congestion indication index data; and taking n independent variable index data with the relevance degree of the congestion indication index data meeting a preset value as the congestion data set.

More specifically, the above-mentioned congestion data set may be obtained by: determining independent variables (load class indexes) and dependent variables (perception accumulative indexes); data with the load index mean value and standard deviation being 20% in the whole field is marked by' existence of customer complaints or explicit marking of operation and maintenance personnel for the field with the congestion period. Learning, namely determining the high load of an independent variable, finding out a dependent variable index with higher correlation by a comparison detection method, and then determining a threshold of the dependent variable (threshold of a perception index); and excluding the cells with low load and short perception difference time as a whole to obtain a congestion data set.

In this embodiment of the present invention, after taking n independent variable index data whose association with the congestion indication index data meets a preset value as the congestion data set, the selecting module 24 is further configured to label the second load index data by using a gradient boost decision tree GBDT method, so as to obtain a labeling result, where the labeling result includes: a first burst level, a second burst level.

By adopting the technical scheme, the following technical problems are overcome: the congestion high-speed area, the congestion identification characteristics and the main reasons of congestion cannot be automatically identified; the congestion prediction depends on predicting a real-time flow value, and the real-time flow prediction is strongly related to the random behavior of a user and is difficult to accurately predict or has huge cost; there is no method to label data accurately.

Example 3

An embodiment of the present invention further provides an electronic apparatus, as shown in fig. 4, including a memory 40 and a processor 42, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for predicting the burst load described in any one of the above items.

It should be noted that the technical solutions of the foregoing embodiments 1 to 3 may be used in combination or may be used alone, and the embodiments of the present invention do not limit this.

The above technical solutions are described below with reference to preferred embodiments, but are not intended to limit the technical solutions of the embodiments of the present invention.

The technical solution of the preferred embodiment 1-2 is mainly applied to the structural scenario shown in fig. 5, where fig. 5 takes a Long Term evolution (Long Term evolution, LTE for short) system as an example, OMC represents a network manager, evolved Node B (Evolved Node B, eNB for short) represents a base station, and UE represents user equipment (a mobile phone, a tablet, etc.); further comprising:

edge Calculation center (ECU) for short. All eNBs are connected, and the single eNB can exist independently or can be borne by the existing eNB single board; and the system is responsible for real-time computing processing in the venue, such as online reasoning, real-time counter collection and audio and video auxiliary information identification.

An External data Source Unit (ESU for short). Such as cameras, drones, temperature and humidity sensors, etc. These external data, if available, may improve venue support algorithm performance. The ESU unit is an option for the present invention and does not affect the core functionality.

Preferred embodiment 1

Fig. 6 is an overall flowchart according to a preferred embodiment of the present invention, as shown in fig. 6, including the following steps:

step S402, historical data collection and integration, wherein the granularity of real-time data (from eNB counters) is 5-15 seconds, and in the preferred embodiment of the invention, 10 seconds.

Data or data sets containing load, capacity metrics, and associated system key performance metrics are collected for use in generating evaluation criteria. Sources of data include: network management performance statistical data, alarm data, measurement and counter data reported by a base station in real time, and unstructured data (log files, on-site monitoring picture videos and the like). In the present invention, load/capacity and network Key Performance Indicator (KPI), key Quality of service Indicator (KQI), and User Behavior Indicator (UBI) which are finally used for quantization processing belong to structured numerical data, and it is necessary to extract load/capacity-related fields from unstructured data and convert them into a structured form. The individual substeps involved (without order requirement between them) are:

1) And collecting the historical data of the wireless side foreground.

2) Collection of non-communication system data, such as a grandstand surveillance video during a game (for real-time identification of audience cell phone operating densities).

And step S404, checking the health degree of the data and stipulating the data.

And (3) carrying out operations such as inspection, exception removal, completion, conversion, combination, splitting, derivation and the like on the historical original data field to generate a 'Feature' field suitable for a quantitative mining evaluation system. The step comprises two aspects of system internal data and ESU data, and the two can be in parallel without front-back order relation. After the step is finished, outputting the preprocessed stadium whole cell data set { CellDataOrigin }.

The embodiment of the invention mainly uses the following data in the system: the method comprises the steps of user connection number, physical resource block utilization rate, flow byte number, adjacent cell pair switching number, wireless side transmission delay, MCS level indication and core client attention KPI.

External data (optional). For improving the accuracy of burst load or congestion determination in conjunction with internal data. The audio and video are mainly used for extracting the information of the exciting time of audiences and the operation time of the high-density mobile phone.

Step S406, a congestion scene self-adaptive identification and congestion indication index determination method.

The input data set is the regional global cell data set { CellDataOrigin } output in step S404. The core idea is described as follows: when a large amount of connection/data services suddenly appear for users, due to the limitation of the admission/switching capacity of system equipment, certain corresponding network KPI deterioration can appear after a certain time delay. Thus, a causal relationship is formed, the independent variable is a load class index (such as the number of connections, the number of bytes of traffic, the hardware load rate, etc.), and the dependent variable is the KPI of some systems or networks. In the step, by means of sequence comparison detection, the corresponding dependent variable can be determined on the premise that the independent variable is known, and meanwhile, the confidence degree range of the delay time is determined according to the information contained in the historical data.

Congestion embodies for example: the user cannot access/transceive data or the transceiving rate is slow.

And identifying congestion characteristics. Load spikes that cause a reduction in customer experience can account for the sudden load. Using wireless side data: and customer complaints or operation and maintenance personnel clearly mark the occasions with congestion periods, or the data with the average value and standard deviation of the load indexes in the whole occasions being 20 percent at the top. A method of alignment detection is used.

If the venue (corresponding to the designated area of the above embodiment) mainly covers the admission control and load balancing strategy algorithm of the corresponding version of eNB and there is a significant difference with the threshold, it needs to be handled in different categories.

And (3) taking the load indexes as independent variable groups, and performing normalization discretization treatment: records above 80 quantiles are discretized to 1, with the remainder being 0.

And (3) taking the time delay and signal-to-interference-and-noise ratio indexes as a dependent variable group, counting the average value and the standard deviation, discretizing records higher than (average value +3 standard deviation) into 1, and taking the rest as 0.

And aligning the time axis, and comparing the independent variable group with the dependent variable group. Counting the times that the dependent variable is delayed by 1-2 time granularities relative to the independent variable and lasts for more than 2 granularities.

And selecting two dependent variable indexes with the highest comparison detection times as wireless side congestion indication indexes. According to 4), 2 independent variable load indexes with highest difference correlation with the congestion indication indexes are selected as the congestion load indexes.

And (4) counting the CONGESTION indexes of all CONGESTION fields, and defining the CONGESTION indexes as a CONGESTION identification threshold TH _ CONGESTION (mean +2 standard deviation).

And judging the congestion scene of the whole historical data. If the load on a playing cell in a certain venue is low (e.g., the audience is few), the data needs to be excluded from the training set.

Optionally, a Radio Resource Control (RRC) connection number, a Physical Resource Block (PRB) utilization rate, and a byte traffic average are all lower than a preset threshold. Different operators may have different requirements for this.

And scanning the whole CONGESTION index threshold according to the judged CONGESTION index threshold TH _ CONGESTION, judging as a CONGESTION-free field if the total CONGESTION time is less than 60 seconds, and then removing from the training data.

All congested session history data is obtained and saved CellDataCongestion.

And step S408, adaptively labeling the historical data. The usage data set is CellDataCongestion. Wherein, the base station software and hardware version basically consistent stadiums separately carry out this step.

And scanning and marking the CONGESTION indication indexes of all cells in the data set according to the TH _ CONGESTION.

And recording the congestion time period marked by 'acquisition of historical data of a wireless side foreground', and carrying out interval scanning within search interval granularity on the congestion load index first-order difference sequence. And the load index can be judged as the burst load only when the load index is in an ascending trend. The meaning of the search interval refers to fig. 8 and the description of the delay time in step S406. The search range of the search space is determined based on statistical information of the history data, and for example, 95% of the probability falls in 5 to 35 seconds.

And determining the time TA when the congestion load index starts to rise based on the acquisition of the non-communication system data.

And (4) segmenting and sorting the congestion time period, and determining according to scenes, service significance and historical data statistical characteristics. For example, in fig. 7, each box represents a time granularity, with "C" dark periods representing congestion periods. When the interval between two congestion sections is 1 time granularity, marking the granularity as congestion, namely, two congestion sections which are separated by only one time granularity are regarded as the same section; two segments of congestion are considered to be independent congestion when the interval between two segments of congestion is 2 or more time granularities. The sub-step can make the subsequent data marking stage clearer and make the congestion identification rate higher.

In step S410, the optimal model parameter is adaptively searched in a combined manner.

The optimal boundary conditions for model training need to be determined by adaptive search, and the optimal boundary combination is used for approaching the upper limit of the congestion prediction recognition rate implied by the data set. The basic principle is that the upper limit of the congestion identification rate is determined by the information contained in the data set: congestion recognition rate = model function (input feature matrix) under the constraint of boundary conditions.

In the step, a simple approximation method is used for searching out the approximate upper limit of the congestion identification rate and the corresponding optimal parameter set { boundary condition 1, boundary condition 2, \8230;, boundary condition n };

as shown in fig. 8, the boundary conditions of the sudden load prediction model are 4: the number of input data records (time dimension boundary), the amount of advance of the burst load, the burst load width, and the burst load level.

By performing a poor search or optimization on the 4 boundary parameters, a boundary parameter combination most suitable for the current service (the service has the best burst load or congestion identification effect, which is equivalent to the highest congestion prediction accuracy) is found.

Step S412, training a real-time congestion recognition model.

The input data is the < completion label > data obtained in step S408, and the boundary parameter is the optimal parameter combination obtained in step S410. Without steps S406-S410, the situation faced by this step can be understood as a complex network with more parameters, which is huge in computation and difficult to find the optimal/near-optimal congestion identification rate. The steps S406 to S410 are equivalent to separating the relatively independent hyper-parameter subspace from the data space (steps S406 to S408), and individually optimizing the solution (step S410). Therefore, the step only needs to face a relatively simple data space, and solution and model training are convenient to carry out; for example, in embodiment 1, the problem faced by this step has been decomposed into a simple model that can be solved by a Support Vector Machine (SVM).

Optionally, the training set and the validation set are segmented in a cross validation manner, and model evaluation and optimal model decision are performed. Criteria (preferred) of the sudden load prediction model: the structured risk is minimal; correspondingly, the model is superior to the main consideration: and converging conditions and penalty parameters to improve generalization capability.

And selecting a sudden load prediction model which is most suitable for the current region and the current service from the candidate models through model evaluation to serve as an application model for real-time calculation.

And step S414, issuing a real-time congestion identification model.

The model training and the model on-line application can be carried out on the same computing node or different nodes. The model training is thus logically separated from the model application, and the trained model is saved in a generic or proprietary format and published/transferred to the nodes (ECUs) of the online application, step S412.

Step S416, the real-time congestion identification model is applied online.

Putting the model obtained in the step 7 into online practical application and operating the model in an ECU.

And step S418, identifying the online performance monitoring and evaluating of the real-time congestion.

And detecting the misjudgment rate, the missed judgment rate and the like of the online model to the sudden load, and simultaneously, integrating other indexes on site to judge whether the model causes, the user behavior pattern mutation or other causes. Usually, the burst load prediction will also run simultaneously with other safeguard measures, as a reference basis for network dynamic parameter configuration.

Step S420, maintain/close/recalculate.

According to the evaluation result of step S418, the maintenance (in good effect) or shutdown (failure) of the current prediction and pre-optimization strategy is decided. The sudden load prediction belongs to a real-time strategy, and the model needs to be retrained and evaluated by incorporating the latest data regularly.

During e.g. sports games, concerts, large gatherings, there are situations where the cell load suddenly spikes and causes a drop in the user rate or even a break in the connection. In order to guarantee the traffic quality, one solution is to perform dynamic adjustment of parameters of the execution node based on the prediction of the burst load, and take measures to avoid or slow down the arrival of the load.

Preferred embodiment 2

This preferred embodiment 2 can be understood as a further detailed technical solution of the preferred embodiment 1.

Step 1: and collecting regional historical real-time data.

And the network unified operation and maintenance management center issues an acquisition task to the strategy centralized control node, and the strategy centralized control node collects data reported by the eNB at a specified time and sends the data to the network unified operation and maintenance management center in a convention mode. The data includes: load index, cell service quality evaluation index and base station hardware resource occupancy rate, and the granularity is 5-20 seconds. The time granularity for this example is 10 seconds.

Step 2: and (4) checking the health degree of the data and stipulating the data. This step is performed automatically according to rules (possibly in conjunction with an expert interface).

The operation and maintenance are carried out in a network unified operation and maintenance management center. Due to the fact that events such as temporary failure of related modules, congestion of data transmission links, communication failures, decoding errors and the like may occur, a plurality of health degree problems may occur in the acquired cell-level data, and the health degree problems need to be checked and preprocessed according to the rule base. Features for subsequent calculations are then generated from the raw data fields according to the feature generation rules.

And step 3: the principle of defining the burst load is explored.

In this embodiment, the operation and maintenance is performed in a network unified operation and maintenance management center. The business objective is defined as: the severe jam of the user data service is solved or relieved. There are a plurality of data fields representing node loads, and the katon phenomenon is also related to a plurality of network KPIs/KQIs. Therefore, it is necessary to find the load index and the network KPI/KQI that have the closest relationship with the service objective.

And generating equivalent evaluation indexes which most possibly reflect the low calorie ton/download by combining the experience of field maintenance optimization personnel, aligning the time sequence of each load index, and detecting the consistency of sudden increase of the load index and the change of the equivalent evaluation indexes by using a correlation algorithm (optimization). For example, the variation trend of the number of connections, the PRB utilization rate, and the number of bytes of traffic is aligned with the trend start-stop time, and may all be classified as an argument.

After removing the load indexes with insignificant service expression, the remaining load indexes are subjected to collinearity analysis, such as hierarchical clustering (optimization) and correlation coefficients. If the traffic embodiment is general and co-linear with the strong traffic embodiment load indicators, these load indicators are removed. In this embodiment, the load class indexes include: the total flow number, the number of data bytes, the maximum RRC connection number, the average RRC connection number, the single-board CPU occupancy rate, the single-board memory occupancy rate, the PRB occupancy rate \8230andthe PRB occupancy rate \8230, wherein after final analysis and comparison, only the maximum RRC connection number and the downlink PRB utilization rate are used as independent variables, namely only 2 main components are needed.

And (4) checking the change consistency of the equivalent evaluation index in the period of sudden increase or continuous peak of the main load index. As shown in fig. 9, the deterioration index with a low equivalent katon/download rate is used as a dependent variable, and the rising peak needs to be delayed from the rising peak of the independent variable of the load class.

In this embodiment, in the operator demand area, the load index finally used for defining the burst load through data analysis is: the utilization rate of the downlink PRB is primary, and the number of RRC connections is secondary (only when the number of RRC connections is high); the corresponding cell equivalent evaluation indexes are as follows: average normalization time delay of downlink PDCP packets and downlink QPSK coding proportion.

And 4, step 4: and automatically marking the burst load of the historical data.

The operation and maintenance are carried out in a network unified operation and maintenance management center. In step 3, some data that can be confirmed as "burst load" causing the cell user to feel stuck are labeled and divided into burst level 1 and burst level 2. In this embodiment, the thresholds in fig. 10 are determined step by step in a GBDT (preferred) manner by combining the existing thresholds, the existing safeguard policy flow, and manual experience. In fig. 10, DDR represents relative average delay ratio of downlink PDCP, DPR represents utilization rate of downlink Prb, and RUR represents user ratio of RRC connection. DDR corresponds to a dependent variable indicating congestion, and DPR and RUR correspond to a load-class independent variable.

The final part of the obtained nondiscritable data is as follows: a cell with too low number of users, a cell with unreasonable CA strategy, and a cell with overload eNB single board.

And 5: the optimal model parameters are combined with the adaptive search.

In this embodiment, the operation and maintenance is performed in a network unified operation and maintenance management center. The effectiveness of the burst load prediction is determined by a combination of the following time window lengths:

the number of data records (time length) is input;

how long before the marked burst load a prediction (advance) is made;

how long the labeled burst load lasts;

the burst load level has been specified in step 4 in connection with the traffic specification and is not used here as a parameter.

For historical data, the classification precision of an SVM multi-classification algorithm (optimization) is used for evaluating the parameter combination effect, and the selection method is decision tree and neural network. The data acquisition granularity input into the classifier is 10 second intervals, and the time span search range is as follows: 10-100 seconds, lead search range: 10-60 seconds, burst load duration search range: 10 seconds to 200 seconds, with the acquisition granularity of the input data as a step size (10 seconds in this embodiment), the burst load relative grade range: 30 to 60 percent.

For the current system device and the current area, the obtained optimal time window combination is as follows:

{ input time span 40 seconds, advance 10 seconds, burst duration 60 seconds zuo }.

Step 6: and (4) performing offline training on the sudden load prediction model.

And (5) selecting data in the historical competition period of a certain gym, and performing off-line model training according to the parameter combination and the structural risk minimization principle obtained in the step 5. The algorithm selects a 3-class Support Vector Classifier (SVC for short) (preferably), mainly considers the limitation of the single-board computing capability of the current network base station, and the online model has to consume less resources. Using the data of the last month, the input data are: load and equivalent evaluation index data of the cell and the main adjacent cell (from step 3).

The resulting model, precision about 0.8, recall about 0.7, F1-Measure about 0.75.

And 7: and (5) issuing the model.

And (4) before the next competition of the gym starts, the network unified operation and maintenance management center issues the model trained in the step (6) to a strategy centralized control node.

And step 8: burst load prediction and pre-optimization are performed online.

In this embodiment, the real-time operation of the online model and the corresponding adjustment policy decision when the sudden load is predicted to come are performed in the policy centralized control node, so as to avoid consuming the computing power of the eNB. And the real-time adjustment strategy decided by the strategy centralized control node is immediately issued to the eNB for execution, and meanwhile, the detailed information of the sudden load prediction error judgment and the sudden load prediction missing judgment is monitored and recorded.

In this embodiment, the following treatment measures are (preferably): centralized load balancing and automatic adjustment of high telephone traffic parameters.

And step 9: and (4) evaluation and subsequent processing.

In this embodiment, the evaluation criteria are: regional spectrum efficiency, average time delay, user complaint rate (operator index). And comprehensively judging the effectiveness of the sudden load prediction and subsequent supporting measures according to an evaluation criterion, and adjusting and optimizing the hyper-parameters of the model training according to the effectiveness.

In summary, the technical scheme of the invention has the following beneficial effects: the system architecture is flexible to deploy, the resource consumption is less, and the hardware capability of the existing system and the next generation network is considered; intermediate results obtained in the process of data and business rule exploration can be used for supporting other businesses under the same system architecture; the method can be used as an integral framework of intelligent operation and maintenance, and the scheme is used as an implementation subset. The scheme of the invention can coexist with other intelligent methods, share the architecture and jointly optimize.

Example 3

An embodiment of the present invention further provides a storage medium including a stored program, where the program executes the method of any one of the above.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:

s1, collecting load index data of a designated area, wherein the load index data is used for representing the load condition of the designated area;

and S2, analyzing the load index data by using an emergency load model, and predicting the occurrence condition of the emergency load, wherein the emergency load model is trained by using multiple groups of data through machine learning.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized in a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a memory device and executed by a computing device, and in some cases, the steps shown or described may be executed out of order, or separately as individual integrated circuit modules, or multiple modules or steps thereof may be implemented as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for predicting a sudden load, comprising:

acquiring load index data of a designated area, wherein the load index data is used for representing the load condition of the designated area;

acquiring historical data collected at a base station; preprocessing the historical data to obtain a regional whole cell data set, wherein the regional whole cell data set comprises independent variable data and dependent variable data; selecting a congestion data set from the data sets of the regional whole cells according to a specified rule; labeling the congestion data set through a gradient lifting decision tree, and determining a boundary condition combination of model training through self-adaptive search to obtain a plurality of burst load models;

selecting a burst load model suitable for the designated area from a plurality of the burst load models, wherein the burst load model at least comprises one of the following information: inputting data length, burst load advance, burst load width and burst load grade;

and analyzing the load index data by using the sudden load model, and predicting the occurrence condition of the sudden load, wherein the sudden load model is trained by using multiple groups of data through machine learning.

2. The method of claim 1, wherein the historical data comprises at least one of: load index data, network key performance index data, key quality of service index data, user behavior indication data.

3. The method of claim 2, wherein the regional global cell dataset is obtained by at least:

acquiring independent variable data and dependent variable data meeting preset conditions, wherein the independent variable data comprise: and load type data, wherein the dependent variable data and the independent variable data have a specified functional relationship.

4. The method of claim 3, wherein selecting a congested data set from the regional global cell data sets according to a specified rule comprises:

analyzing the dependent variable data and the independent variable data according to a comparison detection method;

dependent variable data with the comparison detection times meeting the preset times are used as congestion indication index data;

and taking n independent variable index data with the relevance degree of the congestion indication index data meeting a preset value as the congestion data set, wherein n is a positive integer.

5. An apparatus for predicting a sudden load, comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring load index data of a specified area, and the load index data is used for representing the load condition of the specified area;

the acquisition module is also used for acquiring historical data acquired at the base station; preprocessing the historical data to obtain a regional whole cell data set, wherein the regional whole cell data set comprises independent variable data and dependent variable data; selecting a congestion data set from the data sets of the regional global cells according to a specified rule; labeling the congestion data set through a gradient lifting decision tree, and determining a boundary condition combination of model training through self-adaptive search to obtain a plurality of burst load models;

a selection module, configured to select a burst load model suitable for the designated area from the multiple burst load models, where the burst load model includes at least one of the following information: inputting data length, burst load advance, burst load width and burst load grade;

and the prediction module is used for analyzing the load index data by using the sudden load model and predicting the occurrence condition of sudden load, wherein the sudden load model is trained by using multiple groups of data through machine learning.

6. A storage medium, characterized in that a computer program is stored in the storage medium, wherein the computer program, when being executed by a processor, performs the method of any one of claims 1 to 4.

7. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.