CN113379125A

CN113379125A - Logistics storage sales prediction method based on TCN and LightGBM combined model

Info

Publication number: CN113379125A
Application number: CN202110653522.1A
Authority: CN
Inventors: 李石君; 陶雯雯; 余伟; 余放; 杨济海; 杨俊成; 李宇轩
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-09-10
Anticipated expiration: 2041-06-11
Also published as: CN113379125B

Abstract

The invention provides a logistics storage sales forecasting method based on a TCN and LightGBM combined model, belongs to the research category of time sequence analysis and classification regression, relates to the technical fields of TCN, LightGBM and the like, mainly aims at historical sales distribution record information, respectively constructs the TCN and LightGBM models, finally adopts a weighted combination mode to find out an optimal combination mode as a final forecasting model, and utilizes the trained models to perform classification tasks. The invention has the advantages that: model training can be automatically carried out on historical sales distribution data of the past twelve months and other factors influencing sales volume from the outside, the sales volume of stores in the next three days is predicted, and the utilization rate of various resources in warehouse logistics is improved. And meanwhile, the combined prediction of the two models is adopted, so that the prediction accuracy is improved.

Description

Logistics storage sales prediction method based on TCN and LightGBM combined model

Technical Field

The invention relates to the technical field of time sequence analysis and classification regression, in particular to a logistics storage sales prediction method based on a TCN and LightGBM combined model.

Background

Logistics storage management system: warehousing plays a crucial role in the whole supply chain of an enterprise, and if correct feeding and inventory control and delivery cannot be guaranteed, the management cost is increased, the service quality is difficult to guarantee, and the competitiveness of the enterprise is affected. Traditional simple and static warehousing management cannot guarantee efficient utilization of various resources of enterprises. Aiming at the sales prediction accuracy of a specific store, the storage efficiency can be greatly improved, and the time, the labor and the storage scale are saved.

Historical sales distribution records: in the logistics and warehousing management system, warehousing distribution records of warehouses corresponding to stores in a past period of time are recorded, wherein the warehousing distribution records comprise various kinds of structured data, commodity names, selling prices, commodity quantity and the like, and the structured data form the basis for developing big data analysis.

Store sales prediction accuracy: the sales volume of the stores is possibly influenced by external factors such as weather conditions, holiday conditions, sales promotion commodity types, store scales and the like, if the accuracy of sales prediction of the stores cannot be guaranteed, the sales volume of the stores is greatly influenced, and meanwhile, a large amount of resources can be wasted. Therefore, it is necessary to predict readiness for store sales and to predict the sales.

The existing store sales prediction methods at present mostly analyze future sales through historical sales data according to a single time series neural network model. The sales volume analysis method does not consider the influence of external factors on sales volume, and meanwhile, a single training model is limited in prediction accuracy.

Disclosure of Invention

The invention provides a logistics storage sales prediction method based on a TCN and LightGBM combined model, which is used for solving or at least partially solving the technical problem that the sales prediction accuracy of logistics storage in the prior art is not high.

In order to solve the above technical problem, the present invention provides a logistics warehouse sales prediction method based on a TCN and LightGBM combined model, comprising:

s1: acquiring historical sales distribution data, extracting records of past twelve months with time to be predicted as a node, and preprocessing the extracted historical sales distribution record data;

s2: constructing a TCN model, wherein the TCN model is fused with a one-dimensional full convolution network, a causal convolution and an expansion convolution, and the TCN model is trained by utilizing preprocessed historical sales distribution record data;

s3: constructing a LightGBM model, splicing the preprocessed historical sales distribution record data with external influence factors, and then training the LightGBM model;

s4: and finding out an optimal combination mode as a final prediction model by adopting a weighted combination mode according to the prediction result of the trained TCN model on the logistics storage sales and the prediction result of the trained LightGBM model on the logistics storage sales, and predicting the logistics storage sales by utilizing the final prediction model.

In one embodiment, the step S1 of preprocessing the extracted historical sales distribution record data includes:

processing the sales time and sales volume of the extracted data according to the form of time sequence data;

for missing values and sales data surge due to promotions, the mean value of sales for this month was used instead;

and converting the label characteristics into the characteristics of 0 or 1 by utilizing one-hot codes, and finally counting to obtain a time sequence data form of historical sales distribution record data.

In one embodiment, step S2 includes:

s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as that of nodes of the output layer, the number of nodes of the output layer is determined by the characteristic number of the data obtained by the data preprocessing of the step S1,

input data vector D_iComprises the following steps:

D_i＝(x₁,x₂,...,x_M)^T,i∈1,2,3...N

where M is the characteristic number of the data, x₁Refers to the first feature of the input data, and so on x_MRepresenting the Mth feature of the input data, D_iRepresenting the data of the ith record, wherein N is the number of data strips;

output data vector R_iComprises the following steps:

R_i＝(r₁,r₂,...,r_M)^T,i∈1,2,3...N

where M is the characteristic number of the data, r₁Refers to the first feature of the output data, and so on r_MRepresenting the Mth characteristic of the output data, wherein N is the number of data strips;

s2.2: constructing an expansion convolution kernel, wherein the expression of the output value of the s-th neuron after expansion convolution calculation is as follows:

wherein d represents a dilation coefficient, k represents a convolution kernel size, and "·" represents a convolution operation; f (i) denotes the ith element in the convolution kernel; x_s-d·iRepresenting the sequence elements multiplied by the elements in the convolution kernel correspondingly, and representing sales data of the past twelve months in the warehouse sales volume prediction, namely input data;

s2.3: constructing residual modules, wherein one residual module comprises two parts: inputting the output after a series of operations, wherein the output expression of the residual module is as follows:

o＝Activation(x+F(x))

in the above formula, the residual error module comprises a branch, which leads out a series of transformations f (x), the output of which is added to the input x of the residual error module and finally generates the output o of the residual error module by means of an Activation function;

s2.4: training the TCN model, wherein a loss function in the training process is as follows:

where n is the training data length, y_iA value is returned for the network model,

and setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate mu and a training step size steps, and continuously updating the network weight by applying a gradient descent optimization algorithm to finally obtain the well-trained TCN model.

In one embodiment, step S3 includes:

s3.1: splicing external influence factors on the time series data obtained by preprocessing in the step S1 to serve as characteristic attributes of the sample, wherein the external influence factors comprise whether the time series data are holiday periods or festival periods, temperature and unit price;

s3.2: screening the characteristic attributes by calculating the correlation among different characteristics;

s3.3: the accuracy and the efficiency are balanced by adopting two methods of unilateral sampling and mutual exclusion characteristic combination based on gradient, and the LightGBM model is optimized;

s3.4: training the LightGBM model, wherein a loss function in the model training process adopts a loss function of a decision tree and is defined as:

wherein, | T | represents the number of leaf nodes, N_tIndicating the number of samples, H, of a particular leaf node_t(T) represents the leaf node empirical entropy, the first term represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the two is controlled by the parameter alpha.

In one embodiment, in step S4, a weighted combination method is adopted to find an optimal combination method as a final prediction model, specifically:

TCN+LightGBM＝a*TCN.result+b*LightGBM.result

where TCN. result represents the prediction probability of the trained TCN model, LightGBM. result represents the prediction probability of the trained LightGBM model, TCN + LightGBM represents the final prediction result, and a + b is 1.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

according to the method, a TCN model and a LightGBM model are respectively constructed according to historical sales distribution record data, an optimal combination mode is found out to serve as a final prediction model in a weighted combination mode, and a classification task is carried out by using a trained model. The method can automatically perform model training on historical sales distribution data of past twelve months and other factors influencing sales volume externally, and predict the sales volume of stores in the next three days, thereby improving the utilization rate of various resources in warehouse logistics. And meanwhile, the combined prediction of the two models is adopted, so that the prediction accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating an implementation of a prediction method according to an embodiment of the present invention.

Detailed Description

In order to improve the accuracy of sales prediction and reduce the waste of commodity resources and storage space, the invention provides a method for training and predicting the time sequence information recorded by historical sales conditions by using a TCN (traffic control network), adding other external factor information influencing the sales, performing secondary prediction by using a LightGBM (LightGBM) model, and finally linearly combining the model results to finally complete the sales evaluation and prediction.

For the existing historical sales distribution data and the external factor data acquired according to the sales records, the invention provides a combined model based on TCN and LightGBM, which is used for predicting the logistics storage sales volume. Unlike the RNN architecture, TCNs can be massively parallel processed, so the speed of the network is faster at both training and validation. The TCN can change the reception field by increasing the number of layers, changing the expansion coefficient and the size of the filter, the historical information is more flexible in length, and the problems of gradient dispersion and gradient explosion in the RNN are solved, so that the TCN model is trained according to historical sales distribution data in a past period of time. Because the LightGBM is different from the TCN in time-sequence sensitivity, the LightGBM supplements other statistical features on the basis of feature extraction in historical sales distribution data, and trains the LightGBM model according to the statistical features. And finally, the two models are combined in a weighted mode.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a method for predicting logistics warehouse sales based on a TCN and LightGBM combined model, including:

Specifically, in S1, historical sales distribution data in the database of the logistics warehouse sales system may be read based on python. The preprocessing includes processing in the form of time-series data, normalization processing, and the like.

In S2, TCN is an abbreviation of time domain Convolutional Network (Temporal Convolutional Network) which consists of expanded causal 1D Convolutional layers with the same input and output length. The time sequence convolution network is a sequence modeling structure, integrates a one-dimensional full convolution network, a causal convolution and an expansion convolution, effectively avoids the problem of gradient disappearance or gradient explosion faced by a cyclic neural network, and has the advantages of parallel computation, low memory consumption, control of sequence memory length by changing a receptive field and the like.

TCN is based on two principles: 1. the input and output lengths of the network are the same, and 2, no information leakage exists from the past. To accomplish the first point, the TCN uses a 1D Full Convolution Network (FCN), each hidden layer is the same length as the input layer, and zero padding (length kernel size-1) is used to keep the subsequent layer the same length as the previous layer. To achieve the second point, the TCN uses a causal convolution, i.e. the output at time t in the convolution is convolved only with time t or the elements in the previous layer (i.e. the output of time step t for each layer is computed only for the region no later than the time step t of the previous layer). So TCN-1D FCN + causal convolution.

In S3, LightGBM, a new boosting framework from microsoft, uses a decision tree based on a learning algorithm as the basic principle of XGBoost, and only performs an optimization on the framework (with an emphasis on the training speed of the model). Most importantly, the LightGBM uses a decision tree algorithm based on a histogram, and the basic idea is to firstly discretize continuous floating point characteristic values into k integers and simultaneously construct a histogram with the width of k. When data is traversed, statistics are accumulated in the histogram according to the discretized value serving as an index, after the data is traversed for one time, the histogram accumulates needed statistics, and then the optimal segmentation point is searched in a traversing mode according to the discretized value of the histogram.

Step S4, weighting the combined model. In steps S2 and S3, two detection models based on TCN and LightGBM are respectively constructed, and in the process of merging prediction results, a linear model is selected for model combination based on different advantages of the two models in the process of processing data through comprehensive consideration and analysis verification. Through repeated training and prediction, the best parameter combining the results of the TCN model and the LightGBM model is selected as the final parameter.

In one embodiment, step S2 includes:

input data vector D_iComprises the following steps:

D_i＝(x₁,x₂,...,x_M)^T,i∈1,2,3...N

output data vector R_iComprises the following steps:

R_i＝(r₁,r₂,...,r_M)^T,i∈1,2,3...N

o＝Activation(x+F(x))

In the implementation process of the application, it is found that simple causal convolution needs a very large number of layer levels or a large convolution kernel to widen the receptive field, and the large receptive field is necessary for formally constructing long-term memory. Therefore, the problems of gradient disappearance, complex training and poor fitting effect caused by increasing the receptive field are solved by constructing the expansion convolution kernel.

Since the receptive field of a TCN depends on the depth of the network, the size of the filter, and the dilation factor d, the stability of deeper and larger TCN networks is important. Therefore, in the design of the TCN model in the present application, a residual module is used instead of a general convolutional layer.

In the training process, as the abnormal data are processed in detail by the trained data set, in the MSE and MAE, the MSE is more helpful for the training model of the application, and the MSE squares the error (e is the true value-the predicted value), so if e is greater than 1, the MSE further increases the error. The loss function described above is therefore employed.

In one embodiment, step S3 includes:

Specifically, S3.1 is to perform data analysis and preprocessing, the commodity market is a market full of uncertain factors, the commodity sales volume is not only influenced by the price, whether the commodity sales volume is in a holiday period or a holiday period, and whether the commodity sales volume is in a festival period or not, and the factors in multiple aspects such as unit price, highest temperature, lowest temperature, weather condition, wind direction, wind power, air pollution index, air pollution level, sales volume and the like have immeasurable correlation with the commodity sales volume. Therefore, the feature extraction of the external factors has a great influence on the accuracy of the prediction. The time-series data obtained by the preprocessing in step S1 is pieced together as the characteristic attributes of the sample, whether it is a holiday period, and external influence factors such as temperature.

Step S3.2 is feature engineering. Features determine the upper bound and therefore feature engineering is critical. The overall concept of feature engineering is to respectively construct sales domain features and polynomial features. In fact, feature engineering is a continuously repeated process, and more accurate output results can be screened and predicted only by continuously screening features, establishing features, selecting features and verifying features. In step S3.1, a large number of external influence factors except the historical sales records are selected as features, and in order to confirm that the extracted feature information is meaningful and reduce the calculation cost, less meaningful or highly relevant features need to be removed. By calculating the correlation between different features, if the correlation coefficient between two features is greater than or equal to 0.8, the two features are considered to have high correlation, and at the moment, only one of the two features which is more reasonable in explanation in the sales process needs to be reserved.

Step S3.3 is to perform model optimization. The LightGBM is optimized aiming at the problem, two methods of Gradient-based One-side Sampling (GOSS) and mutual exclusion Feature merging (EFB) are adopted to balance accuracy and efficiency, the splitting behavior is accelerated, the number of features is effectively reduced under the condition that the accuracy of a splitting point is guaranteed, the traditional GBDT training process is accelerated by more than 20 times, and the algorithm real-time performance is improved. The GOSS algorithm traverses each splitting point of each feature, finds and calculates the maximum information gain, and divides data into left and right nodes according to the splitting points of the features. The EFB can change a plurality of mutually exclusive characteristics into low-dimensional dense characteristics, effectively avoids the calculation of unnecessary value characteristics, and can reduce the time complexity of the algorithm from O (data) to O (non _ zero _ data).

Step S3.4 is to perform model training. LightGBM adopts Histopram and Leaf-wise decision tree optimization algorithms for model training. Compared with the traditional pre-ordering idea, the Histogram only needs to store the value after the characteristic discretization, so that the memory loss is obviously reduced, and the training speed is accelerated. The Leaf-wise decision tree increases the limitation of the maximum depth of the decision tree on the basis of a Leaf splitting strategy, and avoids overfitting to the maximum extent. The leaf node includes all features, such as a holiday period, a unit price, a maximum temperature, a minimum temperature, a weather condition, a wind direction, wind power, an air pollution index, an air pollution level, and the like.

In a specific implementation process, the data preprocessing of step S3.1 includes the following processes: and splicing the characteristics of whether the current time is a holiday period, unit price, highest temperature, lowest temperature, weather condition, wind direction, wind power, air pollution index, air pollution grade and the like on the time sequence data obtained by preprocessing in the step S1. For missing values and sales data spikes due to promotions, the mean value of sales for this month was used instead. And finally, normalizing the input data:

in the above formula, X' represents the normalized data, X_iRepresenting data to be processed, X_minRepresenting the minimum value, X, in the input data_maxRepresenting the maximum value in the input data.

And (3) characteristic engineering in S3.2. The result of the characteristic engineering plays an important role in improving the model effect. Feature engineering is often used to reduce redundant features and improve the speed and accuracy of model training. The commonly used methods for feature selection are Pearson correlation coefficient, mutual information and maximum information coefficient, Pearson chi-square test, distance correlation coefficient, etc.

The Pearson correlation coefficient is selected in the present embodiment, which is a relatively simple feature selection method. The method can reflect the linear correlation between the characteristic and the predicted value, the range of the result value is [ -1, 1], -1 represents that the characteristic and the predicted value are completely negative correlation, 0 represents that the linear correlation is not existed, and 1 represents that the characteristic value and the predicted value are positively correlated. The calculation formula is as follows:

where cov (X, Y) represents the covariance, σ, of two variables_X、σ_YThe standard deviation of the variables is indicated. The calculation formula is as follows:

wherein X_i，Y_iRespectively represent the ith element in the two input variables,

the mean value of X is shown as,

represents the mean value of Y.

The optimization of the model in step S3.3 specifically includes the following steps:

s3.3.1: gradient-based single-sided sampling GOSS. The algorithm traverses each splitting point of each feature, finds and calculates the maximum information gain, and divides data into left and right nodes according to the splitting points of the features, wherein the overall process of the GOSS is as follows: (1) the number of the training samples is N, and the values of a% of the previous larger gradients are selected as the training samples with large gradient values; (2) randomly selecting b% of the rest 1-a% of smaller gradient values as training samples with small gradient values; (3) for training samples with smaller gradient values, i.e., b%. N, the information gain is scaled by (1-a)/b times when it is calculated.

In general, a% × N + b% × N samples were used as training samples. This is done to keep as consistent as possible with the overall data distribution and to ensure that samples of small gradient values are trained.

S3.3.2, the mutual exclusion feature incorporates EFB. The algorithm flow is as follows: (1) establishing a graph, wherein each point represents a feature, each edge has a weight, and the weight is related to the overall conflict between the features; (2) ranking features by degree in descending order ranking plots; (3) each feature is traversed and attempts are made to merge the features to minimize the collision ratio.

In S3.4, the leaf node empirical entropy is calculated as follows:

wherein N is_tNumber of samples representing a particular leaf node, N_tkRepresenting a leaf node N_tThe number of samples of medium class k.

The first term in the loss function represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the first term and the second term is controlled by the parameter alpha. When α is determined, the model with the smallest loss function is selected.

TCN+LightGBM＝a*TCN.result+b*LightGBM.result

Specifically, the values of a and b are determined by the final evaluation index, i.e., the best a and b values expressed on the verification set are selected.

In a specific implementation process, as shown in fig. 1, after the data preprocessing of step S1, a training set, a prediction set, and a verification set are divided, where the training set is used to train a TCN model to obtain a trained TCN model, the training set is used to train a LightGBM model to obtain a trained LightGBM model after the data preprocessing again (i.e., the preprocessing of step S3.1), and a combined model (a final prediction model) is obtained through weighted combination to obtain a model result, which can be used to predict the sales volume of the logistics warehouse.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the invention relates may modify, supplement or substitute the specific embodiments described, without however departing from the spirit of the invention or exceeding the scope defined by the appended claims.

Claims

1. A logistics storage sales prediction method based on a TCN and LightGBM combined model is characterized by comprising the following steps:

2. The logistics warehouse sales prediction method of claim 1, wherein the preprocessing of the extracted historical sales distribution record data in step S1 comprises:

3. The logistics warehouse sales prediction method of claim 1, wherein the step S2 comprises:

input data vector D_iComprises the following steps:

D_i＝(x₁,x₂,...,x_M)^T,i∈1,2,3...N

output data vector R_iComprises the following steps:

R_i＝(r₁,r₂,...,r_M)^T,i∈1,2,3...N

o＝Activation(x+F(x))

4. The logistics warehouse sales prediction method of claim 1, wherein the step S3 comprises:

5. The logistics warehouse sales prediction method of claim 1, wherein in step S4, an optimal combination mode is found as a final prediction model by a weighted combination mode, specifically:

TCN+LightGBM＝a*TCN.result+b*LightGBM.result