CN116887300A

CN116887300A - Method for predicting complaint quantity and service capacity of mobile phone internet surfing user

Info

Publication number: CN116887300A
Application number: CN202310835241.7A
Authority: CN
Inventors: 艾小惠; 王炳亮; 刘康; 张京辉; 冉明昊
Original assignee: Inspur Communication Information System Co Ltd
Current assignee: Inspur Communication Information System Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-13

Abstract

The invention relates to the technical field of service wireless resources, in particular to a method for predicting complaint quantity and service capacity of a mobile phone internet surfing user, which comprises the following steps: collecting historical related index data of a mobile phone internet surfing user and equipment; carrying out data preprocessing; establishing an ARIMA+LSTM prediction model; predicting future data through an ARIMA+LSTM prediction model; based on the prediction result, configuring a threshold value for early warning; the beneficial effects are as follows: the method for predicting the complaint volume and the service capacity of the mobile phone Internet surfing users, which is provided by the invention, is based on historical data, adopts ARIMA+LSTM algorithm to establish a prediction model, predicts the future change trend of the mobile phone Internet surfing performance index and the complaint volume index data of the mobile phone Internet surfing users, adopts corresponding measures for the change trend of the business volume and the complaint volume in advance, and avoids poor user Internet surfing experience caused by excessive business volume and complaint volume, thereby being capable of greatly improving the Internet surfing satisfaction of the mobile phone of the users.

Description

Method for predicting complaint quantity and service capacity of mobile phone internet surfing user

Technical Field

The invention relates to the technical field of service wireless resources, in particular to a method for predicting complaint quantity and service capacity of a mobile phone internet surfing user.

Background

The traffic of 4/5G internet service resources, service wireless resources and the like and complaint volume indexes of mobile phone internet users are important indexes for influencing the internet experience of the users, and certain measures are needed to be taken in time when too many complaint users exist at the same time, so that customer service personnel for solving the problem can be prevented from experiencing poor customer service experience.

In the prior art, excessive mobile phone internet traffic also causes mobile phone internet congestion, the network speed is poor, and the load capacity of equipment traffic and network capacity needs to be increased to avoid poor internet quality of users.

However, according to the existing data, only the historical index trend can be observed, and whether the future traffic and the network capacity of the equipment are overrun cannot be accurately predicted.

Disclosure of Invention

The invention aims to provide a method for predicting complaint and service capacity of a mobile phone internet surfing user so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method for predicting complaint quantity and service capacity of a mobile phone internet surfing user comprises the following steps:

collecting historical related index data of a mobile phone internet surfing user and equipment;

carrying out data preprocessing;

establishing an ARIMA+LSTM prediction model;

predicting future data through an ARIMA+LSTM prediction model;

and (5) based on the prediction result, configuring a threshold value for early warning.

Preferably, when the historical related index data of the mobile phone Internet surfing user and equipment is collected, the mobile phone Internet surfing data index comprises all 4G and 5G-ToC network elements which are maintained by the unit and comprise converged network elements, and network, service, user and market data related to the 4/5G wireless network.

Preferably, the data preprocessing comprises two stages of data preprocessing and data prediction, a data set meeting the model requirement is obtained through the data preprocessing, and the historical data is decomposed into trend sequences through an ARIMA+LSTM prediction model so as to conduct prediction.

Preferably, the data preprocessing further comprises supplemental data missing items and outlier processing.

Preferably, the outlier processing adopts one of four processing methods, namely an absolute value difference median method, a 3sigma standard difference method, a percentile method and a scatter diagram method.

Preferably, the ARIMA+LSTM predictive model is constructed by using an ARIMA+LSTM algorithm, and is divided into a stationary time sequence and a non-stationary time sequence according to the stability of the index.

Preferably, the stationary data sequence is predicted using an autoregressive moving average model.

Preferably, a large number of data sequences, non-stationary and multi-dimensional, then complex nonlinear feature interactions are modeled using LSTM, outputting the final result.

Compared with the prior art, the invention has the beneficial effects that:

the method for predicting the complaint volume and the service capacity of the mobile phone Internet surfing users, which is provided by the invention, is based on historical data, adopts ARIMA+LSTM algorithm to establish a prediction model, predicts the future change trend of the mobile phone Internet surfing performance index and the complaint volume index data of the mobile phone Internet surfing users, adopts corresponding measures for the change trend of the business volume and the complaint volume in advance, and avoids poor user Internet surfing experience caused by excessive business volume and complaint volume, thereby being capable of greatly improving the Internet surfing satisfaction of the mobile phone of the users.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of time series classification according to the present invention;

FIG. 3 is a schematic diagram of the overall architecture of the predictive network according to the present invention.

Detailed Description

In order to make the objects, technical solutions, and advantages of the present invention more apparent, the embodiments of the present invention will be further described in detail with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are some, but not all, embodiments of the present invention, are intended to be illustrative only and not limiting of the embodiments of the present invention, and that all other embodiments obtained by persons of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: a method for predicting complaint quantity and service capacity of a mobile phone internet surfing user comprises the following steps:

and modeling historical data of the mobile phone internet traffic and complaint user quantity by using an AI algorithm, and predicting mobile phone internet traffic and complaint quantity indexes in a future period based on the historical data. And taking the index of the number of the mobile phone internet surfing users acquired based on the signaling data as a prediction object. And predicting future indexes according to the index change trend in the past time period. The prediction model can be used for predicting trend according to single numerical value of the time sequence, and can also be used for predicting index trend of future holiday influence through the AI model according to holiday numerical value characteristics of the past time period.

And respectively inputting index fields to be predicted, and predicting the time period of the future date based on historical data modeling. By comparing the current actual value y with the predicted value y/of the model, if the absolute value of the error, abs (y-y /), is within a certain threshold range, the value is normal, otherwise, the value is abnormal. The threshold size is adjustable.

Based on the historical data for future prediction, the algorithm adopts ARIMA+LSTM algorithm. The algorithm inputs data of 3 months of history and outputs data of 1 month in the future. The length of the input data is typically 3 times the length of the output data.

Input: and 3-month history data of complaint indexes of the mobile phone Internet surfing business volume and the mobile phone Internet surfing user under different user information.

And (3) outputting: and predicting data of the complaint quantity index of the mobile phone Internet surfing business volume and the mobile phone Internet surfing user in a certain time period in the future.

b) Data index information

Region granularity: the method can be used for predicting the mobile phone internet surfing traffic volume and the complaint volume of the mobile phone internet surfing users under a certain city independently

Time granularity: 15 minutes, hours, days, months

Prediction model: ARIMA+LSTM with configurable threshold

Prediction index: the indexes such as the number of 5G message users, the number of 5G message terminals, the number of UP2.4 terminal 5G message users, the number of UP2.4 terminal 5G message terminals, the user complaint amount and the like obtained based on signaling data are used as prediction objects

Training data period: default 3 months

c) Data preprocessing

The data preprocessing in the scheme comprises missing value processing and outlier processing

Null value handling

Missing values are a common problem in data analysis because in many cases there will be missing data in our dataset. These missing values may cause deviation or error in analysis results, and thus processing of the missing values is required. Forward padding, moving average, exponential smoothing, linear interpolation are four common missing value processing methods.

Forward padding (Forward padding): the missing values are filled with the last known value before. When there are multiple consecutive missing values, the method will fill all missing values with the last known value until the next known value is encountered.

Moving Average (Moving Average): the size of the moving window is selected to be suitable according to the actual situation, and in general. The larger the window size, the smoother the result of the estimation, but also the more lag. For example, if a missing value is located at a certain position in the time series, the missing value may be filled with an average of some data before and after the position.

Exponential smoothing(Exponential Smoothing): the missing values are filled in using a weighted average, wherein the nearest data point is given a higher weight. Exponential smoothing can better reflect trends and seasonal nature of the dataAnd is also very effective in filling in successive missing values.

Linear interpolation (Linear Interpolation): the missing values are filled in using a linear relationship between two adjacent data points known in the data. This method is typically used for continuous time series data, and may better estimate the possible range of missing values.

If the deletion rate of the variable is higher (more than 80%), the coverage rate is lower, and the importance is lower, the variable can be directly deleted.

Outlier processing

Outliers refer to data values that differ significantly from other data values. Outliers may be due to variability in the measurements and may also represent experimental errors; the latter is sometimes excluded from the dataset. Outliers can cause serious problems in statistical analysis. Therefore, an outlier refers to one data that is too high or too low in value compared to the other data. Outliers are caused by a number of reasons, such as changing the sensitivity of the sensor, experimental errors or data processing errors.

The outlier is processed by four processing methods, namely an absolute value difference median method, a 3sigma standard difference method, a percentile method and a scatter diagram method.

Absolute value difference median method:

the distance sum between all data and the average value needs to be calculated first, so that outliers can be detected. Firstly, finding out the median Xmedia of all data; step two, obtaining an absolute deviation value Xi-Xmedia of each numerical value and a median; thirdly, obtaining a median MAD of the absolute deviation value; finally, the parameter n is determined, so that a reasonable range is determined as [ Xmedian-nMAD, xmedian+nmad ], and an adjustment is made for the factor values out of the reasonable range.

3sigma standard deviation method: 3sigma is also known as standard deviation. The standard deviation itself may represent the degree of dispersion of the factors, based on the mean of the factors Xmean. In outlier processing, the distance of the factor from the mean can be measured by X_ (mean) \pmn\sigma. The logic of the standard deviation method processing is similar to that of the MAD method, firstly, the average value and standard deviation of the factors are calculated, and secondly, the parameter n is confirmed, so that the reasonable range of the factor values is confirmed to be [ X_ (mean) -n\sigma, X_ (mean) +n\sigma ], and the factor values are adjusted. The 3sigma principle can be described simply as: if the data obeys a normal distribution, an outlier is defined as a value in the set of resulting values that deviates from the mean by more than three times the standard deviation. That is, under the assumption of normal distribution, the probability of occurrence of a value other than three times \sigma (standard deviation) from the average value is small (the following expression), and thus can be regarded as an abnormal value.

P(|x-μ|＞3σ)≤0.003

Percentile method: all values are ranked by ascending order, with values >97.5% or <2.5% ranked being outliers.

Scatter plot method: the discrete data far from the other concentrated data is the outlier by using the algorithm of the scatter diagram clustering.

d) Prediction algorithm

The scheme is based on an ARIMA+LSTM composite model. And taking the index of the number of the mobile phone internet surfing users acquired based on the signaling data as a prediction object. And predicting future indexes according to the index change trend in the past time period. The time granularity, the sample period number and the prediction period number are configurable, and data of a period of one week is predicted by default. The prediction model can be used for predicting trend according to single numerical value of the time sequence, and can also be used for predicting index trend of future holiday influence through the AI model according to holiday numerical value characteristics of the past time period.

ARIMA

Time series analysis is an important branch of the statistical discipline. The method is mainly used for predicting future development conditions of things by researching rules in the development and change processes of the things over time.

According to the stability of the index, the method is divided into a stable time sequence and a non-stable time sequence;

classifying according to the properties of the indexes, and dividing the indexes into a total index time sequence, a relative index time sequence and an average index time sequence;

classifying according to the time attribute of the index, dividing the time attribute into a time index time sequence and a time index time sequence;

the ARMA model is collectively referred to as an autoregressive moving average model, and can be said to be the model most commonly used at present to fit stationary sequences.

The ARMA model consists of two parts:

when (when)The autoregressive model is also referred to as the centralized AR ≡model. The non-centric AR ≡sequence may also be translated (by translation) into a centric AR ≡model.

The AR model represents the value of a certain time t with the values of several times t-1 to t-p in the past by linear combinations and noise.

q-order moving average model MA (q)

y _t ＝μ+ε _t -θ ₁ ε _t-1 -θ ₂ ε _t-2 -…-θ _q ε _t-q

When μ=0, the model MA (q) is referred to as a centered MA (q) model, and the non-centered MA (q) model can be converted into a centered MA (q) model by simply shifting.

The MA model is a value representing the current time by a linear combination of noise of history points.

The ARMA model is in essence a combination of AR ≡3 and MA (q):

likewise, whenThis model is called the centralized ARMA (p, q) model. He combines the features of both models, the AR model handles the relationship between the current data and the later data, and the MA handles the influence of random variation.

For the stable time sequence, an ARMA model can be adopted to directly fit, but in a practical scene, the time sequence is trended, namely the general time sequence is non-stable, so that stable processing is required, wherein differential processing is most commonly used, and ARMA analysis is performed after the time sequence is stable.

This process is in fact ARIMA, which applies ARMA models to the smoothed time series after first or second order differential processing based on the original non-smoothed time series. The ARIMA (p, d, q) model is a triplet order model with the addition of a difference d based on ARMA (p, q) two-tuple orders.

LSTM

In recent years, the long-term and short-term memory network technology is a popular time series modeling framework due to the fact that the end-to-end modeling is easy to reflect exogenous variables and automatic feature extraction. The LSTM approach uses a large amount of data in multiple dimensions to model complex nonlinear feature interactions, which is critical to predicting extreme events.

The function is based on lstm network + full connectivity. The complete structure of the neural network mainly comprises two parts:

1. the encoder-decoder framework is used to derive the own relationships in the time series and learns well during pre-training.

2. The predictive network, his input comes from the layer of empedaging learned by the encoder-decoder framework and the potential external characteristic alert cause.

Before fitting the predictive model, we first have to pre-train to fit an encoder that can extract useful and representative emmbeddings from the time series. The targets have two directions:

1. ensuring that learned emmbedding provides useful features for predictions;

2. the proof may capture the outlier input in the ebadd and thus propagate further into the predictive network.

e) Prediction result

The data prediction result is generally presented in a trend graph form, a threshold value can be configured, the prediction result is judged, and early warning is carried out on equipment or complaint volume which is about to exceed the limit in the future, so that corresponding measures can be conveniently taken in advance.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for predicting complaint and service capacity of a mobile phone internet surfing user is characterized by comprising the following steps of: the method comprises the following steps:

carrying out data preprocessing;

establishing an ARIMA+LSTM prediction model;

predicting future data through an ARIMA+LSTM prediction model;

2. The method for predicting complaint volume and service capacity of a mobile phone internet surfing user according to claim 1 is characterized in that: when the historical related index data of the mobile phone Internet surfing user and equipment are collected, the mobile phone Internet surfing data index comprises all 4G and 5G-ToC network elements which are maintained by the unit and comprise converged network elements, networks related to 4/5G wireless networks, services, users and market data.

3. The method for predicting complaint volume and service capacity of a mobile phone internet surfing user according to claim 1 is characterized in that: the data preprocessing comprises two stages of data preprocessing and data prediction, a data set meeting the model requirement is obtained through the data preprocessing, and historical data is decomposed into trend sequences through an ARIMA+LSTM prediction model, so that prediction is carried out.

4. The method for predicting complaint volume and service capacity of a mobile phone internet surfing user according to claim 1 is characterized in that: the data preprocessing also includes supplemental data missing items and outlier processing.

5. The method for predicting complaint volume and service capacity of the mobile phone internet surfing user according to claim 4 is characterized in that: the outlier processing adopts one of four processing methods, namely an absolute value difference median method, a 3sigma standard difference method, a percentile method and a scatter diagram method.

6. The method for predicting complaint volume and service capacity of a mobile phone internet surfing user according to claim 1 is characterized in that: the ARIMA+LSTM prediction model is constructed by using an ARIMA+LSTM algorithm and is divided into a stable time sequence and a non-stable time sequence according to the stability of the index.

7. The method for predicting complaint volume and service capacity of the mobile phone internet surfing user according to claim 6 is characterized in that: the stationary data sequence is predicted using an autoregressive moving average model.

8. The method for predicting complaint volume and service capacity of the mobile phone internet surfing user according to claim 6 is characterized in that: non-stationary and multi-dimensional massive data sequences, complex nonlinear feature interactions are modeled using LSTM, outputting final results.