CN112348571A

CN112348571A - Combined model sales prediction method based on sales prediction system

Info

Publication number: CN112348571A
Application number: CN202011131501.5A
Authority: CN
Inventors: 孙永强; 唐军; 唐潮
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2021-02-09

Abstract

The invention discloses a combined model sales prediction method based on a sales prediction system, which comprises the following steps: building a combined model; selecting sales data to carry out SARIMAX model prediction to obtain a first prediction sequence T of a sample of a sales value sequence; merging the first prediction sequence T into the characteristic sequence, and obtaining a second prediction sequence T1 through a CATBOOST model; merging the prediction results of the first prediction sequence T and the second prediction sequence T1 into a characteristic sequence, inputting the characteristic sequence into an LSTM model, and training to obtain a third prediction sequence T2; and (3) weighted combination: recursion layer by layer, continuously recombining and training the prediction result, the characteristic sequence and the original data of each model, adding a verification data set to determine the weight of each model in the combined model, and finally dynamically adjusting the weight according to the prediction result of each time. The invention solves the defect that a single model can not process partial problems, and improves the superiority and generalization of the model.

Description

Combined model sales prediction method based on sales prediction system

Technical Field

The invention relates to the technical field of big data, in particular to a combined model sales prediction method based on a sales prediction system.

Background

The sales forecasting system uses a model to forecast data so as to provide technical support for business, the model is divided into machine learning, deep learning, time series and the like, the general model construction process comprises the steps of data acquisition, data preprocessing, feature engineering, model training, diagnosis, tuning, model verification, error analysis, model fusion, deployment online and the like, and one of the most core is the model which comprises the models of time series analysis, machine learning, deep learning and the like.

The time series analysis model was studied for the main purpose: and (3) forecasting, namely forecasting future changes according to the existing time sequence data, wherein the future changes mainly comprise models such as arima, sarima, sarimax, holt-winners, prophet and the like, time sequence analysis can be only carried out according to data conditions, seasonal characteristics and other characteristics can be roughly judged, and other dimensional characteristic data cannot be processed. Machine learning mainly uses data dimensions to predict, such as temperatures of competitive product data, task data, air temperature data and the like, and mainly comprises catboost, xgboost, lightbgm, gbdt and the like, but needs to process current prediction data, if human filling is performed, a large prediction error occurs, and meanwhile, the machine learning cannot be time-sensitive like time series analysis. In recent years, deep learning is gradually applied to sales prediction, and mainly includes models such as cnn, dnn, rnn and lstm, but the models have the problems of gradient disappearance, gradient explosion, insufficient long-term memory and the like, and the models have natural disadvantages in combination of defects of time series, machine learning and deep learning.

Disclosure of Invention

In order to solve the problems in the prior art, the invention aims to provide a combined model sales prediction method based on a sales prediction system.

In order to achieve the purpose, the invention adopts the technical scheme that: a combined model sales prediction method based on a sales prediction system comprises the following steps:

step S10, building a combined model based on a SARIMAX model, a CATBOOST model and an LSTM model;

s20, selecting sales data to carry out SARIMAX model prediction to obtain a first prediction sequence T of a sample of a sales value sequence;

step S30, obtaining air temperature, competitive product data, task amount and macroscopic economic data as a characteristic sequence by using a crawler, merging the first prediction sequence T into the characteristic sequence, and obtaining a second prediction sequence T1 through a CATBOOST model;

s40, blending the prediction results of the first prediction sequence T and the second prediction sequence T1 into a characteristic sequence, inputting the characteristic sequence into an LSTM model, and training to obtain a third prediction sequence T2;

step S50, weighted combination: and recursion layer by layer, continuously recombining and training the prediction result, the characteristic sequence and the original data of each model, adding a verification data set to determine the weight of each model in the combined model, and finally dynamically adjusting the weight according to the prediction result of each time.

As a further improvement of the present invention, in step S20, implementing the SARIMAX model specifically includes the following steps:

1) loading data;

2) pretreatment: defining a preprocessing step according to a data set, wherein the preprocessing step comprises creating a time stamp, converting a dType of a date/time column and making a series univariate;

3) sequence stabilization: including checking the smoothness of the sequence and performing the required transformations;

4) determining a trend difference order d value: in order to smooth the sequence, the number of times the difference operation is performed will be determined as the value of d;

5) creating ACF and PACF graphs: determining input parameters of the SARIMAX model by using the ACF and the PACF graph;

6) determining a trend autoregressive order p value and a trend moving average order q value: reading the values of p and q from the ACF and PACF graphs of the previous step;

7) fitting the SARIMAX model: fitting the SARIMAX model by using the calculated data and parameter values;

8) prediction is performed on the validation set: predicting a future value;

9) computing MSE or RMSE: the MSE or RMSE values are checked using the predicted and actual values on the validation set.

As a further improvement of the present invention, in step S20, the sairmax model prediction uses rolling prediction, and after predicting sales data of one month, the actual sales data of the month is added to predict sales of the next month, and finally a predicted value sequence is output, so as to obtain a first predicted sequence T of samples of the sales value sequence.

As a further improvement of the present invention, in step S40, the LSTM model includes 2 hidden layers, the first hidden layer has 128 neurons, the second hidden layer has 256 neurons, the output layer is a 1-dimensional column vector, i.e., a sales prediction value, the input variables are characteristics of a time step (t-1), the loss function employs Mean Absolute Error (MAE), the optimization algorithm employs Adam, the activation function employs Sigmoid, the model employs 500 epochs, and each batch has a size of 15.

The invention has the beneficial effects that:

the sales forecasting system carries out sales forecasting based on sales data so as to provide technical support for business, carry out related sales guidance for business and the like.

Drawings

FIG. 1 is a graph comparing a combined model and a SARIMAX model to predict top5 in accordance with an embodiment of the present invention;

FIG. 2 is a diagram illustrating how the CATBOOST model predicts top5 in an embodiment of the present invention;

FIG. 3 is a diagram illustrating the result of SARIMAX model prediction data in an embodiment of the present invention;

FIG. 4 is a graph comparing original data and predicted data of a SARIMAX model in an embodiment of the present invention;

FIG. 5 is a diagram of the result of the CATBOOST model prediction data in an embodiment of the present invention;

FIG. 6 is a graph comparing the original data and the predicted data of the CATBOOST model in the embodiment of the present invention;

FIG. 7 is a graph of the results of the prediction data of the LSTM model in an embodiment of the present invention;

FIG. 8 is a graph comparing the raw data and predicted data of the LSTM model in an embodiment of the present invention;

FIG. 9 is a graph of combined model prediction data results in accordance with an embodiment of the present invention;

FIG. 10 is a graph comparing original data and predicted data of a combined model according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

A combined model sales prediction method based on a sales prediction system comprises the following steps:

step S10, building a combined model based on a SARIMAX model, a CATBOOST model and an LSTM model; the principles of the models are further explained below:

SARIMAX model principle:

differential integrated moving average autoregressive model (ARIMA) is one of the most widely used prediction methods for univariate time series data prediction. While this approach can handle data with trends, it does not support time series with seasonal components. An ARIMA extension that supports direct modeling of the series of seasonal components is called SARIMA. It adds three new parameters to specify the Autoregressive (AR), difference (I) and Moving Average (MA) of the series of seasonal components, as well as additional parameters for the seasonal period.

ARIMA → SARIMA → SARIMAX: s is Seasonal, namely X is eXogenous in the meaning of seasonality and periodicity, and external information means that the ratio of SARIMAX to ARIMA is increased in periodicity, and the external information is used for enhancing the prediction capability of the model.

Principle of the CATBOOST model:

the Catboost introduces two key algorithm improvements, one for achieving orderly promotion, and the permutation driving replaces the classical algorithm and the innovative algorithm for processing the classification characteristics. These methods aim to solve prediction shift (which is common in gradient lifting algorithms).

One effective way to process the class features is to: using a calculated value (target static (TS)) instead of the ith class feature of the kth training sample

The expected target y is usually estimated based on the class condition, expressed as

In the decision tree, the label mean will be the criterion for node splitting. This method is called Greedy Target Statistics, Greedy TS for short, and is expressed by the formula:

where I is Iverson brachackets (indicating function), i.e. two equal values in parentheses take 1, otherwise take 0:

if a certain nominal value

When there is only one record, then this nominal value is converted to a number and is equal to the tag value of that record. Such a process obviously results in an overfitting.

It is noisy for such low frequency classes, and it is usually smoothed with some a priori value p:

where a > 0 is a parameter, P is typically taken as the average of the target variables in all data.

But such greedy approach can bring about the problem of target leakage: by X_KTarget value y of_kTo calculate

This causes a shift in the conditions, namely: training set test note xⁱThe distribution of | y is different.

There are many ways to avoid the problem of cheap conditions, and it is generally applicable to not include x_kOther sets of samples to calculate

The value of (c):

during the gradient lifting of each step, the problem of prediction shift also exists, which is caused by a special target leakage. The solution of Catboost is called Ordered boosting (Ordered boosting), similar to the Ordered TS approach.

In the gradient lifting algorithm, the lifting step is as follows:

h_tfor newly generated if classifiers, -g^t(x_k,y_k) For the negative gradient of the loss function at the current model, i.e. the value to which the weak classifier is to be fitted:

h_t＝arg min_{h∈H}E(-g^t(x,y)-h(x))²

the following two theorems are obtained:

1. two independent samples D if the size is the same₁,D₂Are respectively used for estimating h₁And h₂Then there is

For any x e {0,1}²

2. If the same data set D ═ D₁＝D₂Are all used for h₁And h₂Then there is

The above two theorems imply that: when we use independent data sets in each gradient step, the trained model is for y ═ f^*(x) Unbiased estimation of true dependencies. Otherwise, using the same dataset, a biased estimated model is obtained, and the larger the dataset, the smaller the bias.

Another important detail of the castboost is the use of a combination of classification features as additional classification features, such as: and joint information of user ID and advertisement subject, information of high-order dependency relationship and the like are captured in the advertisement click prediction task. However, the number of combinations grows exponentially with the number of classification features in the dataset, so it is not possible to process all of these combinations.

The Catboost constructs the combinations in a greedy manner. That is to say: for the first segmentation of the tree, regardless of any combination, for the next segmentation of the tree, Catboost combines all the classification properties (and combinations thereof) in the current tree that were used for the previous segmentation with all the classification properties in the dataset. The combination is dynamically converted to TS.

Principle of the LSTM model:

LSTM is a type of recurrent neural network that, with continued improvement, performs better than general deep learning models in processing and predicting time series related data.

For the original m × n dimensional data, note:

M_j＝(T_1j,T_2j,…T_nj),j＝(1,2…,m)

for multivariate data, the data needs to be converted into a sequence of supervised learning, i.e. the new data is expressed as follows:

the LSTM model adopts a gate structure to solve the long-term dependence problem, each LSTM controls the state of each unit through 3 types of gates, and a forgetting gate determines the state c of the unit at the last moment_t-1How much to keep current time c_t(ii) a The input gate determines the input x of the network at the current moment_tHow many cells to save to cell state c_t. Output door control unit status c_tHow much current output value h is output to LSTM_t. The state updating of each step satisfies the following steps:

i_t＝σ(W_xix_t+W_hih_t-1+b_i)

f_t＝σ(W_xfx_t+W_hfh_t-1+b_f)

o_t＝σ(W_xox_t+W_hfh_t-1+b_o)

h_t＝tanh(o_t*c_t)。

step S20, SARIMAX model prediction:

loading data: the first step in the model building is of course the loading of the data set. Pretreatment: depending on the data set, the steps of pre-processing will be defined. This would include creating a timestamp, converting the dType of the date/time column, making a series argument, etc. To smooth the series: to satisfy the assumptions, it is necessary to smooth the series. This would include checking the smoothness of the sequence and performing the required transformations. Determining the value: in order to smooth the sequence, the number of times the difference operation is performed is taken as the value of d. Creating ACF and PACF graphs: this is the most important step in ARIMA implementation. ACF PACF maps were used to determine the input parameters for our ARIMA model. Determining p and q values: the values of p and q are read from the graph of the previous step. Seasonal characteristics are added as well as unique air temperature data (obtained by a crawler). Fitting the SARIMAX model: fit the SARIMAX model using the processed data and the parameter values calculated in our previous step. Calculate MSE or RMSE: to check the performance of the model, the MSE or RMSE values are checked using the predicted and actual values on the validation set. In order to enable the result to be closer to the actual situation, rolling prediction is adopted, after sales data of one month are predicted, actual sales data of the month are added to predict sales volume of the next month, and finally, a predicted value sequence is output to obtain a first predicted sequence T of samples of the sales value sequence.

Step S30, CATBOOST model prediction:

selecting air temperature (acquired by a crawler), competitive product data, task amount and macroscopic economic data (acquired by the crawler) as a prediction characteristic sequence, combining a first prediction sequence T into the characteristic sequence, dividing the characteristic sequence into a training set and a test set, and processing the class type characteristics, wherein the Catboost adopts an effective strategy, so that overfitting is reduced, and all data sets are ensured to be available for learning. That is, when the data sets are randomly arranged and the average label value of the samples of the same category value is calculated, the label values of the samples before the sample are included in the calculation. And (4) feature combination, wherein when a new segmentation point is constructed for the current tree, the Catboost adopts a greedy strategy to consider combination. For the first segmentation of the tree, no combination is considered. For the next segmentation, CatBoost combines all combinations, categorical features of the current tree with all categorical features in the dataset. The combination is dynamically converted to a number. The Catboost also generates a combination of numerical and categorical features by: all the segmentation points of the tree selection are considered to be a class-type feature having two values and are combined in the same way as the class-type feature. The gradient bias is overcome.

Catboost, like all standard gradient boosting algorithms, fits the gradient of the current model by building a new tree. However, all classical lifting algorithms suffer from overfitting problems caused by biased point-state gradient estimation. Many algorithms that utilize GBDT technology (e.g., XGBoost, LightGBM) construct a tree in two stages: selecting a tree structure and calculating the values of the leaf nodes after the tree structure is fixed. To select the best tree structure, the algorithm constructs the tree by enumerating the different segmentations, computing values in the resulting leaf nodes, then computing scores for the resulting tree, and finally selecting the best segmentation. The values of the leaf nodes of both stages are calculated as approximations of the gradient [8] or Newton step size. The first phase of the Catboost employs unbiased estimation of the gradient step size, and the second phase is performed using a conventional GBDT scheme. Fast scoring, Catboost uses the oblivious tree as a basic predictor, which is balanced and less prone to overfitting. In the oblivious tree, the index of each leaf node may be encoded as a binary vector of length equal to the tree depth. The Catboost first binarizes all floating point features, statistical information, and one-hot coded features, and then uses the binary features to compute model prediction values. The predicted value is noted as a second predicted sequence T1.

Step S40, LSTM model prediction:

building an LSTM neural network, building 2 hidden layers in an LSTM model, wherein the first hidden layer is provided with 128 neurons, the second hidden layer is provided with 256 neurons, an output layer is a 1-dimensional column vector, namely a sales predicted value, an input variable is a characteristic of a time step (T-1), a loss function adopts Mean Absolute Error (MAE), an optimization algorithm adopts Adam, an activation function adopts Sigmoid, the model adopts 500 epochs, the size of each batch is 15, a predicted result of a first predicted sequence T and a second predicted sequence T1 is fused into a characteristic sequence, the characteristic sequence is input into the LSTM model and trained, and the predicted value is recorded as a third predicted sequence T2

Step S50, weighted combination:

the traditional weighted combination mostly adopts an average weighting mode, or obtains respective weight ratio through model training, and the weights of the model combination are generally considered to be approximately consistent, however, the effect of the combination is not large, the single models run independently and are finally fused together, the real multi-model fusion effect is not achieved, the patent adopts a unique weighted combination mode, recursion layer by layer is carried out, the prediction result of each model is obtained, and characteristic sequence, original data are continuously recombined and trained, a verification data set is added to determine the weight of each model, finally, the weight is dynamically adjusted according to each prediction result, the result changes along with the training, the weight is determined according to different attributes of the models, so that the advantages of each model can be exerted, and the weight can be dynamically updated in real time, so that the best predicted value is obtained.

The technical effects of the present embodiment are shown in the following table:

data	SARIMAX	CATBOOST	LSTM	combined model	RMSE	MA	ACC(％)
								94081	84081	84081	84081	91311	0.92\|0.87\|0.71\|0.34	642\|643\|641\|256	64\|71\|82\|98
51954	41231	41312	44643	50123	0.92\|0.90\|0.76\|0.33	643\|649\|642\|253	63\|70\|83\|97
								176394	162382	160293	160982	171341	0.92\|0.86\|0.75\|0.34	642\|645\|643\|254	64\|74\|81\|99
153697	143202	142309	143908	150012	0.92\|0.85\|0.76\|0.37	648\|649\|645\|257	67\|70\|85\|97
								177447	160293	169802	168212	171214	0.92\|0.88\|0.78\|0.35	642\|642\|642\|258	64\|76\|81\|99
106786	93102	98318	91273	101241	0.92\|0.89\|0.76\|0.34	647\|649\|647\|256	68\|70\|86\|98
								64316	53271	56671	52318	61209	0.92\|0.86\|0.73\|0.36	642\|643\|648\|254	64\|72\|81\|96
47744	31677	35567	35587	40341	0.92\|0.85\|0.76\|0.34	646\|649\|642\|256	69\|70\|87\|97
								28153	12456	16742	16457	24642	0.92\|0.85\|0.74\|0.31	642\|648\|647\|250	64\|75\|81\|98
16483	9856	9452	9231	13191	0.92\|0.87\|0.76\|0.30	640\|649\|640\|252	63\|70\|88\|99

The prediction graphs and the prediction contrast graphs of the models are shown in fig. 1-10, and the embodiment is further verified as follows:

selecting data: according to the sales forecasting items, selecting the refrigerator 2019 and 2020 monthly delivery data, unit: and (4) a table.

The raw data are as follows: the data has been processed accordingly for data security.

datet	2019/11/1	2019/12/1	2020/1/1	2020/2/1	2020/3/1	2020/4/1	2O20/5/1	2020/6/1	2020/7/1	2020/8/1
											data	94081	51954	176394	153697	177447	106786	64316	47744	28153	16483

SARIMAX model:

the prediction accuracy of the SARIMAX model is 65%, and under the comprehensive conditions of fully considering seasonal factors, namely, cycle factors, regional air temperatures and the like, the prediction accuracy is low, mainly because prediction can be performed only according to time dimensions, or only the closest value in a period range can be predicted according to the principle of proximity. The SARIMAX model prediction data table is as follows:

catcoost model:

the prediction accuracy of the CATBOOST model is 72%, when dimensions such as competitive product data, sales task data and air temperature data are fused, the dimensions of the data are obtained by a crawler, and the data are provided by a service part, the prediction accuracy is slightly low, mainly because the characteristic dimensions are too much, an overfitting phenomenon is generated, and the characteristic dimension data cannot be well utilized. The prediction data table of the CATBOOST model is as follows:

LSTM model:

the prediction accuracy of the LSTM optimization model is 82%, dimensions such as competitive product data, sales task data and air temperature data are fused, the dimensions of the data are obtained by crawlers, and the data are provided by a business part, so that the prediction accuracy is high, mainly because of a strong long-term and short-term memory method of the LSTM optimization model, factors such as time sequence short-term memory and mechanical period long-term sequence characteristics are fused. The LSTM model prediction data table is as follows:

and (3) combining the models:

the prediction accuracy of the combined model is 97%, the prediction effect of the combined model is almost the same as that of actual data, mainly because the short-term memory and the time sequence of the time sequence are integrated into machine learning and deep learning, the advantages of each model are utilized, the long-term memory and the short-term memory are perfectly matched, the hidden data rule is mined out, and prediction can be well carried out. The combined model prediction data table is as follows:

the above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. A combined model sales prediction method based on a sales prediction system is characterized by comprising the following steps:

2. The combined model sales prediction method based on sales prediction system of claim 1, wherein in step S20, implementing the SARIMAX model specifically includes the following steps:

1) loading data;

8) prediction is performed on the validation set: predicting a future value;

3. The combined model sales prediction method based on sales prediction system according to claim 2, wherein in step S20, the SARIMAX model prediction uses rolling prediction, and after predicting sales data for one month, actual sales data for the month are added to predict sales for the next month, and finally a predicted value sequence is output, and a first predicted sequence T of samples of the sales value sequence is obtained.

4. The sales prediction system-based combined model sales prediction method of claim 1, wherein in step S40, the LSTM model includes 2 hidden layers, the first hidden layer has 128 neurons, the second hidden layer has 256 neurons, the output layer is a 1-dimensional column vector, i.e., a sales prediction value, the input variables are characteristics of a time step (t-1), the loss function employs Mean Absolute Error (MAE), the optimization algorithm employs Adam, the activation function employs Sigmoid, the model employs 500 epochs, and each batch has a size of 15.