CN113379125A - Logistics storage sales prediction method based on TCN and LightGBM combined model - Google Patents

Logistics storage sales prediction method based on TCN and LightGBM combined model Download PDF

Info

Publication number
CN113379125A
CN113379125A CN202110653522.1A CN202110653522A CN113379125A CN 113379125 A CN113379125 A CN 113379125A CN 202110653522 A CN202110653522 A CN 202110653522A CN 113379125 A CN113379125 A CN 113379125A
Authority
CN
China
Prior art keywords
data
model
sales
tcn
lightgbm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110653522.1A
Other languages
Chinese (zh)
Other versions
CN113379125B (en
Inventor
李石君
陶雯雯
余伟
余放
杨济海
杨俊成
李宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110653522.1A priority Critical patent/CN113379125B/en
Publication of CN113379125A publication Critical patent/CN113379125A/en
Application granted granted Critical
Publication of CN113379125B publication Critical patent/CN113379125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a logistics storage sales forecasting method based on a TCN and LightGBM combined model, belongs to the research category of time sequence analysis and classification regression, relates to the technical fields of TCN, LightGBM and the like, mainly aims at historical sales distribution record information, respectively constructs the TCN and LightGBM models, finally adopts a weighted combination mode to find out an optimal combination mode as a final forecasting model, and utilizes the trained models to perform classification tasks. The invention has the advantages that: model training can be automatically carried out on historical sales distribution data of the past twelve months and other factors influencing sales volume from the outside, the sales volume of stores in the next three days is predicted, and the utilization rate of various resources in warehouse logistics is improved. And meanwhile, the combined prediction of the two models is adopted, so that the prediction accuracy is improved.

Description

Logistics storage sales prediction method based on TCN and LightGBM combined model
Technical Field
The invention relates to the technical field of time sequence analysis and classification regression, in particular to a logistics storage sales prediction method based on a TCN and LightGBM combined model.
Background
Logistics storage management system: warehousing plays a crucial role in the whole supply chain of an enterprise, and if correct feeding and inventory control and delivery cannot be guaranteed, the management cost is increased, the service quality is difficult to guarantee, and the competitiveness of the enterprise is affected. Traditional simple and static warehousing management cannot guarantee efficient utilization of various resources of enterprises. Aiming at the sales prediction accuracy of a specific store, the storage efficiency can be greatly improved, and the time, the labor and the storage scale are saved.
Historical sales distribution records: in the logistics and warehousing management system, warehousing distribution records of warehouses corresponding to stores in a past period of time are recorded, wherein the warehousing distribution records comprise various kinds of structured data, commodity names, selling prices, commodity quantity and the like, and the structured data form the basis for developing big data analysis.
Store sales prediction accuracy: the sales volume of the stores is possibly influenced by external factors such as weather conditions, holiday conditions, sales promotion commodity types, store scales and the like, if the accuracy of sales prediction of the stores cannot be guaranteed, the sales volume of the stores is greatly influenced, and meanwhile, a large amount of resources can be wasted. Therefore, it is necessary to predict readiness for store sales and to predict the sales.
The existing store sales prediction methods at present mostly analyze future sales through historical sales data according to a single time series neural network model. The sales volume analysis method does not consider the influence of external factors on sales volume, and meanwhile, a single training model is limited in prediction accuracy.
Disclosure of Invention
The invention provides a logistics storage sales prediction method based on a TCN and LightGBM combined model, which is used for solving or at least partially solving the technical problem that the sales prediction accuracy of logistics storage in the prior art is not high.
In order to solve the above technical problem, the present invention provides a logistics warehouse sales prediction method based on a TCN and LightGBM combined model, comprising:
s1: acquiring historical sales distribution data, extracting records of past twelve months with time to be predicted as a node, and preprocessing the extracted historical sales distribution record data;
s2: constructing a TCN model, wherein the TCN model is fused with a one-dimensional full convolution network, a causal convolution and an expansion convolution, and the TCN model is trained by utilizing preprocessed historical sales distribution record data;
s3: constructing a LightGBM model, splicing the preprocessed historical sales distribution record data with external influence factors, and then training the LightGBM model;
s4: and finding out an optimal combination mode as a final prediction model by adopting a weighted combination mode according to the prediction result of the trained TCN model on the logistics storage sales and the prediction result of the trained LightGBM model on the logistics storage sales, and predicting the logistics storage sales by utilizing the final prediction model.
In one embodiment, the step S1 of preprocessing the extracted historical sales distribution record data includes:
processing the sales time and sales volume of the extracted data according to the form of time sequence data;
for missing values and sales data surge due to promotions, the mean value of sales for this month was used instead;
and converting the label characteristics into the characteristics of 0 or 1 by utilizing one-hot codes, and finally counting to obtain a time sequence data form of historical sales distribution record data.
In one embodiment, step S2 includes:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as that of nodes of the output layer, the number of nodes of the output layer is determined by the characteristic number of the data obtained by the data preprocessing of the step S1,
input data vector DiComprises the following steps:
Di=(x1,x2,...,xM)T,i∈1,2,3...N
where M is the characteristic number of the data, x1Refers to the first feature of the input data, and so on xMRepresenting the Mth feature of the input data, DiRepresenting the data of the ith record, wherein N is the number of data strips;
output data vector RiComprises the following steps:
Ri=(r1,r2,...,rM)T,i∈1,2,3...N
where M is the characteristic number of the data, r1Refers to the first feature of the output data, and so on rMRepresenting the Mth characteristic of the output data, wherein N is the number of data strips;
s2.2: constructing an expansion convolution kernel, wherein the expression of the output value of the s-th neuron after expansion convolution calculation is as follows:
Figure BDA0003112800780000031
wherein d represents a dilation coefficient, k represents a convolution kernel size, and "·" represents a convolution operation; f (i) denotes the ith element in the convolution kernel; xs-d·iRepresenting the sequence elements multiplied by the elements in the convolution kernel correspondingly, and representing sales data of the past twelve months in the warehouse sales volume prediction, namely input data;
s2.3: constructing residual modules, wherein one residual module comprises two parts: inputting the output after a series of operations, wherein the output expression of the residual module is as follows:
o=Activation(x+F(x))
in the above formula, the residual error module comprises a branch, which leads out a series of transformations f (x), the output of which is added to the input x of the residual error module and finally generates the output o of the residual error module by means of an Activation function;
s2.4: training the TCN model, wherein a loss function in the training process is as follows:
Figure BDA0003112800780000032
where n is the training data length, yiA value is returned for the network model,
Figure BDA0003112800780000033
and setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate mu and a training step size steps, and continuously updating the network weight by applying a gradient descent optimization algorithm to finally obtain the well-trained TCN model.
In one embodiment, step S3 includes:
s3.1: splicing external influence factors on the time series data obtained by preprocessing in the step S1 to serve as characteristic attributes of the sample, wherein the external influence factors comprise whether the time series data are holiday periods or festival periods, temperature and unit price;
s3.2: screening the characteristic attributes by calculating the correlation among different characteristics;
s3.3: the accuracy and the efficiency are balanced by adopting two methods of unilateral sampling and mutual exclusion characteristic combination based on gradient, and the LightGBM model is optimized;
s3.4: training the LightGBM model, wherein a loss function in the model training process adopts a loss function of a decision tree and is defined as:
Figure BDA0003112800780000034
wherein, | T | represents the number of leaf nodes, NtIndicating the number of samples, H, of a particular leaf nodet(T) represents the leaf node empirical entropy, the first term represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the two is controlled by the parameter alpha.
In one embodiment, in step S4, a weighted combination method is adopted to find an optimal combination method as a final prediction model, specifically:
TCN+LightGBM=a*TCN.result+b*LightGBM.result
where TCN. result represents the prediction probability of the trained TCN model, LightGBM. result represents the prediction probability of the trained LightGBM model, TCN + LightGBM represents the final prediction result, and a + b is 1.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
according to the method, a TCN model and a LightGBM model are respectively constructed according to historical sales distribution record data, an optimal combination mode is found out to serve as a final prediction model in a weighted combination mode, and a classification task is carried out by using a trained model. The method can automatically perform model training on historical sales distribution data of past twelve months and other factors influencing sales volume externally, and predict the sales volume of stores in the next three days, thereby improving the utilization rate of various resources in warehouse logistics. And meanwhile, the combined prediction of the two models is adopted, so that the prediction accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a prediction method according to an embodiment of the present invention.
Detailed Description
In order to improve the accuracy of sales prediction and reduce the waste of commodity resources and storage space, the invention provides a method for training and predicting the time sequence information recorded by historical sales conditions by using a TCN (traffic control network), adding other external factor information influencing the sales, performing secondary prediction by using a LightGBM (LightGBM) model, and finally linearly combining the model results to finally complete the sales evaluation and prediction.
For the existing historical sales distribution data and the external factor data acquired according to the sales records, the invention provides a combined model based on TCN and LightGBM, which is used for predicting the logistics storage sales volume. Unlike the RNN architecture, TCNs can be massively parallel processed, so the speed of the network is faster at both training and validation. The TCN can change the reception field by increasing the number of layers, changing the expansion coefficient and the size of the filter, the historical information is more flexible in length, and the problems of gradient dispersion and gradient explosion in the RNN are solved, so that the TCN model is trained according to historical sales distribution data in a past period of time. Because the LightGBM is different from the TCN in time-sequence sensitivity, the LightGBM supplements other statistical features on the basis of feature extraction in historical sales distribution data, and trains the LightGBM model according to the statistical features. And finally, the two models are combined in a weighted mode.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for predicting logistics warehouse sales based on a TCN and LightGBM combined model, including:
s1: acquiring historical sales distribution data, extracting records of past twelve months with time to be predicted as a node, and preprocessing the extracted historical sales distribution record data;
s2: constructing a TCN model, wherein the TCN model is fused with a one-dimensional full convolution network, a causal convolution and an expansion convolution, and the TCN model is trained by utilizing preprocessed historical sales distribution record data;
s3: constructing a LightGBM model, splicing the preprocessed historical sales distribution record data with external influence factors, and then training the LightGBM model;
s4: and finding out an optimal combination mode as a final prediction model by adopting a weighted combination mode according to the prediction result of the trained TCN model on the logistics storage sales and the prediction result of the trained LightGBM model on the logistics storage sales, and predicting the logistics storage sales by utilizing the final prediction model.
Specifically, in S1, historical sales distribution data in the database of the logistics warehouse sales system may be read based on python. The preprocessing includes processing in the form of time-series data, normalization processing, and the like.
In S2, TCN is an abbreviation of time domain Convolutional Network (Temporal Convolutional Network) which consists of expanded causal 1D Convolutional layers with the same input and output length. The time sequence convolution network is a sequence modeling structure, integrates a one-dimensional full convolution network, a causal convolution and an expansion convolution, effectively avoids the problem of gradient disappearance or gradient explosion faced by a cyclic neural network, and has the advantages of parallel computation, low memory consumption, control of sequence memory length by changing a receptive field and the like.
TCN is based on two principles: 1. the input and output lengths of the network are the same, and 2, no information leakage exists from the past. To accomplish the first point, the TCN uses a 1D Full Convolution Network (FCN), each hidden layer is the same length as the input layer, and zero padding (length kernel size-1) is used to keep the subsequent layer the same length as the previous layer. To achieve the second point, the TCN uses a causal convolution, i.e. the output at time t in the convolution is convolved only with time t or the elements in the previous layer (i.e. the output of time step t for each layer is computed only for the region no later than the time step t of the previous layer). So TCN-1D FCN + causal convolution.
In S3, LightGBM, a new boosting framework from microsoft, uses a decision tree based on a learning algorithm as the basic principle of XGBoost, and only performs an optimization on the framework (with an emphasis on the training speed of the model). Most importantly, the LightGBM uses a decision tree algorithm based on a histogram, and the basic idea is to firstly discretize continuous floating point characteristic values into k integers and simultaneously construct a histogram with the width of k. When data is traversed, statistics are accumulated in the histogram according to the discretized value serving as an index, after the data is traversed for one time, the histogram accumulates needed statistics, and then the optimal segmentation point is searched in a traversing mode according to the discretized value of the histogram.
Step S4, weighting the combined model. In steps S2 and S3, two detection models based on TCN and LightGBM are respectively constructed, and in the process of merging prediction results, a linear model is selected for model combination based on different advantages of the two models in the process of processing data through comprehensive consideration and analysis verification. Through repeated training and prediction, the best parameter combining the results of the TCN model and the LightGBM model is selected as the final parameter.
In one embodiment, the step S1 of preprocessing the extracted historical sales distribution record data includes:
processing the sales time and sales volume of the extracted data according to the form of time sequence data;
for missing values and sales data surge due to promotions, the mean value of sales for this month was used instead;
and converting the label characteristics into the characteristics of 0 or 1 by utilizing one-hot codes, and finally counting to obtain a time sequence data form of historical sales distribution record data.
In one embodiment, step S2 includes:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as that of nodes of the output layer, the number of nodes of the output layer is determined by the characteristic number of the data obtained by the data preprocessing of the step S1,
input data vector DiComprises the following steps:
Di=(x1,x2,...,xM)T,i∈1,2,3...N
where M is the characteristic number of the data, x1Refers to the first feature of the input data, and so on xMRepresenting the Mth feature of the input data, DiRepresenting the data of the ith record, wherein N is the number of data strips;
output data vector RiComprises the following steps:
Ri=(r1,r2,...,rM)T,i∈1,2,3...N
where M is the characteristic number of the data, r1Refers to the first feature of the output data, and so on rMRepresenting the Mth characteristic of the output data, wherein N is the number of data strips;
s2.2: constructing an expansion convolution kernel, wherein the expression of the output value of the s-th neuron after expansion convolution calculation is as follows:
Figure BDA0003112800780000071
wherein d represents a dilation coefficient, k represents a convolution kernel size, and "·" represents a convolution operation; f (i) denotes the ith element in the convolution kernel; xs-d·iRepresenting the sequence elements multiplied by the elements in the convolution kernel correspondingly, and representing sales data of the past twelve months in the warehouse sales volume prediction, namely input data;
s2.3: constructing residual modules, wherein one residual module comprises two parts: inputting the output after a series of operations, wherein the output expression of the residual module is as follows:
o=Activation(x+F(x))
in the above formula, the residual error module comprises a branch, which leads out a series of transformations f (x), the output of which is added to the input x of the residual error module and finally generates the output o of the residual error module by means of an Activation function;
s2.4: training the TCN model, wherein a loss function in the training process is as follows:
Figure BDA0003112800780000072
where n is the training data length, yiA value is returned for the network model,
Figure BDA0003112800780000073
and setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate mu and a training step size steps, and continuously updating the network weight by applying a gradient descent optimization algorithm to finally obtain the well-trained TCN model.
In the implementation process of the application, it is found that simple causal convolution needs a very large number of layer levels or a large convolution kernel to widen the receptive field, and the large receptive field is necessary for formally constructing long-term memory. Therefore, the problems of gradient disappearance, complex training and poor fitting effect caused by increasing the receptive field are solved by constructing the expansion convolution kernel.
Since the receptive field of a TCN depends on the depth of the network, the size of the filter, and the dilation factor d, the stability of deeper and larger TCN networks is important. Therefore, in the design of the TCN model in the present application, a residual module is used instead of a general convolutional layer.
In the training process, as the abnormal data are processed in detail by the trained data set, in the MSE and MAE, the MSE is more helpful for the training model of the application, and the MSE squares the error (e is the true value-the predicted value), so if e is greater than 1, the MSE further increases the error. The loss function described above is therefore employed.
In one embodiment, step S3 includes:
s3.1: splicing external influence factors on the time series data obtained by preprocessing in the step S1 to serve as characteristic attributes of the sample, wherein the external influence factors comprise whether the time series data are holiday periods or festival periods, temperature and unit price;
s3.2: screening the characteristic attributes by calculating the correlation among different characteristics;
s3.3: the accuracy and the efficiency are balanced by adopting two methods of unilateral sampling and mutual exclusion characteristic combination based on gradient, and the LightGBM model is optimized;
s3.4: training the LightGBM model, wherein a loss function in the model training process adopts a loss function of a decision tree and is defined as:
Figure BDA0003112800780000081
wherein, | T | represents the number of leaf nodes, NtIndicating the number of samples, H, of a particular leaf nodet(T) represents the leaf node empirical entropy, the first term represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the two is controlled by the parameter alpha.
Specifically, S3.1 is to perform data analysis and preprocessing, the commodity market is a market full of uncertain factors, the commodity sales volume is not only influenced by the price, whether the commodity sales volume is in a holiday period or a holiday period, and whether the commodity sales volume is in a festival period or not, and the factors in multiple aspects such as unit price, highest temperature, lowest temperature, weather condition, wind direction, wind power, air pollution index, air pollution level, sales volume and the like have immeasurable correlation with the commodity sales volume. Therefore, the feature extraction of the external factors has a great influence on the accuracy of the prediction. The time-series data obtained by the preprocessing in step S1 is pieced together as the characteristic attributes of the sample, whether it is a holiday period, and external influence factors such as temperature.
Step S3.2 is feature engineering. Features determine the upper bound and therefore feature engineering is critical. The overall concept of feature engineering is to respectively construct sales domain features and polynomial features. In fact, feature engineering is a continuously repeated process, and more accurate output results can be screened and predicted only by continuously screening features, establishing features, selecting features and verifying features. In step S3.1, a large number of external influence factors except the historical sales records are selected as features, and in order to confirm that the extracted feature information is meaningful and reduce the calculation cost, less meaningful or highly relevant features need to be removed. By calculating the correlation between different features, if the correlation coefficient between two features is greater than or equal to 0.8, the two features are considered to have high correlation, and at the moment, only one of the two features which is more reasonable in explanation in the sales process needs to be reserved.
Step S3.3 is to perform model optimization. The LightGBM is optimized aiming at the problem, two methods of Gradient-based One-side Sampling (GOSS) and mutual exclusion Feature merging (EFB) are adopted to balance accuracy and efficiency, the splitting behavior is accelerated, the number of features is effectively reduced under the condition that the accuracy of a splitting point is guaranteed, the traditional GBDT training process is accelerated by more than 20 times, and the algorithm real-time performance is improved. The GOSS algorithm traverses each splitting point of each feature, finds and calculates the maximum information gain, and divides data into left and right nodes according to the splitting points of the features. The EFB can change a plurality of mutually exclusive characteristics into low-dimensional dense characteristics, effectively avoids the calculation of unnecessary value characteristics, and can reduce the time complexity of the algorithm from O (data) to O (non _ zero _ data).
Step S3.4 is to perform model training. LightGBM adopts Histopram and Leaf-wise decision tree optimization algorithms for model training. Compared with the traditional pre-ordering idea, the Histogram only needs to store the value after the characteristic discretization, so that the memory loss is obviously reduced, and the training speed is accelerated. The Leaf-wise decision tree increases the limitation of the maximum depth of the decision tree on the basis of a Leaf splitting strategy, and avoids overfitting to the maximum extent. The leaf node includes all features, such as a holiday period, a unit price, a maximum temperature, a minimum temperature, a weather condition, a wind direction, wind power, an air pollution index, an air pollution level, and the like.
In a specific implementation process, the data preprocessing of step S3.1 includes the following processes: and splicing the characteristics of whether the current time is a holiday period, unit price, highest temperature, lowest temperature, weather condition, wind direction, wind power, air pollution index, air pollution grade and the like on the time sequence data obtained by preprocessing in the step S1. For missing values and sales data spikes due to promotions, the mean value of sales for this month was used instead. And finally, normalizing the input data:
Figure BDA0003112800780000091
in the above formula, X' represents the normalized data, XiRepresenting data to be processed, XminRepresenting the minimum value, X, in the input datamaxRepresenting the maximum value in the input data.
And (3) characteristic engineering in S3.2. The result of the characteristic engineering plays an important role in improving the model effect. Feature engineering is often used to reduce redundant features and improve the speed and accuracy of model training. The commonly used methods for feature selection are Pearson correlation coefficient, mutual information and maximum information coefficient, Pearson chi-square test, distance correlation coefficient, etc.
The Pearson correlation coefficient is selected in the present embodiment, which is a relatively simple feature selection method. The method can reflect the linear correlation between the characteristic and the predicted value, the range of the result value is [ -1, 1], -1 represents that the characteristic and the predicted value are completely negative correlation, 0 represents that the linear correlation is not existed, and 1 represents that the characteristic value and the predicted value are positively correlated. The calculation formula is as follows:
Figure BDA0003112800780000101
where cov (X, Y) represents the covariance, σ, of two variablesX、σYThe standard deviation of the variables is indicated. The calculation formula is as follows:
Figure BDA0003112800780000102
Figure BDA0003112800780000103
Figure BDA0003112800780000104
wherein Xi,YiRespectively represent the ith element in the two input variables,
Figure BDA0003112800780000105
the mean value of X is shown as,
Figure BDA0003112800780000106
represents the mean value of Y.
The optimization of the model in step S3.3 specifically includes the following steps:
s3.3.1: gradient-based single-sided sampling GOSS. The algorithm traverses each splitting point of each feature, finds and calculates the maximum information gain, and divides data into left and right nodes according to the splitting points of the features, wherein the overall process of the GOSS is as follows: (1) the number of the training samples is N, and the values of a% of the previous larger gradients are selected as the training samples with large gradient values; (2) randomly selecting b% of the rest 1-a% of smaller gradient values as training samples with small gradient values; (3) for training samples with smaller gradient values, i.e., b%. N, the information gain is scaled by (1-a)/b times when it is calculated.
In general, a% × N + b% × N samples were used as training samples. This is done to keep as consistent as possible with the overall data distribution and to ensure that samples of small gradient values are trained.
S3.3.2, the mutual exclusion feature incorporates EFB. The algorithm flow is as follows: (1) establishing a graph, wherein each point represents a feature, each edge has a weight, and the weight is related to the overall conflict between the features; (2) ranking features by degree in descending order ranking plots; (3) each feature is traversed and attempts are made to merge the features to minimize the collision ratio.
In S3.4, the leaf node empirical entropy is calculated as follows:
Figure BDA0003112800780000111
wherein N istNumber of samples representing a particular leaf node, NtkRepresenting a leaf node NtThe number of samples of medium class k.
The first term in the loss function represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the first term and the second term is controlled by the parameter alpha. When α is determined, the model with the smallest loss function is selected.
In one embodiment, in step S4, a weighted combination method is adopted to find an optimal combination method as a final prediction model, specifically:
TCN+LightGBM=a*TCN.result+b*LightGBM.result
where TCN. result represents the prediction probability of the trained TCN model, LightGBM. result represents the prediction probability of the trained LightGBM model, TCN + LightGBM represents the final prediction result, and a + b is 1.
Specifically, the values of a and b are determined by the final evaluation index, i.e., the best a and b values expressed on the verification set are selected.
In a specific implementation process, as shown in fig. 1, after the data preprocessing of step S1, a training set, a prediction set, and a verification set are divided, where the training set is used to train a TCN model to obtain a trained TCN model, the training set is used to train a LightGBM model to obtain a trained LightGBM model after the data preprocessing again (i.e., the preprocessing of step S3.1), and a combined model (a final prediction model) is obtained through weighted combination to obtain a model result, which can be used to predict the sales volume of the logistics warehouse.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the invention relates may modify, supplement or substitute the specific embodiments described, without however departing from the spirit of the invention or exceeding the scope defined by the appended claims.

Claims (5)

1. A logistics storage sales prediction method based on a TCN and LightGBM combined model is characterized by comprising the following steps:
s1: acquiring historical sales distribution data, extracting records of past twelve months with time to be predicted as a node, and preprocessing the extracted historical sales distribution record data;
s2: constructing a TCN model, wherein the TCN model is fused with a one-dimensional full convolution network, a causal convolution and an expansion convolution, and the TCN model is trained by utilizing preprocessed historical sales distribution record data;
s3: constructing a LightGBM model, splicing the preprocessed historical sales distribution record data with external influence factors, and then training the LightGBM model;
s4: and finding out an optimal combination mode as a final prediction model by adopting a weighted combination mode according to the prediction result of the trained TCN model on the logistics storage sales and the prediction result of the trained LightGBM model on the logistics storage sales, and predicting the logistics storage sales by utilizing the final prediction model.
2. The logistics warehouse sales prediction method of claim 1, wherein the preprocessing of the extracted historical sales distribution record data in step S1 comprises:
processing the sales time and sales volume of the extracted data according to the form of time sequence data;
for missing values and sales data surge due to promotions, the mean value of sales for this month was used instead;
and converting the label characteristics into the characteristics of 0 or 1 by utilizing one-hot codes, and finally counting to obtain a time sequence data form of historical sales distribution record data.
3. The logistics warehouse sales prediction method of claim 1, wherein the step S2 comprises:
s2.1: constructing an input layer and an output layer, wherein the number of nodes of the input layer is the same as that of nodes of the output layer, the number of nodes of the output layer is determined by the characteristic number of the data obtained by the data preprocessing of the step S1,
input data vector DiComprises the following steps:
Di=(x1,x2,...,xM)T,i∈1,2,3...N
where M is the characteristic number of the data, x1Refers to the first feature of the input data, and so on xMRepresenting the Mth feature of the input data, DiRepresenting the data of the ith record, wherein N is the number of data strips;
output data vector RiComprises the following steps:
Ri=(r1,r2,...,rM)T,i∈1,2,3...N
where M is the characteristic number of the data, r1Refers to the first feature of the output data, and so on rMRepresenting the Mth characteristic of the output data, wherein N is the number of data strips;
s2.2: constructing an expansion convolution kernel, wherein the expression of the output value of the s-th neuron after expansion convolution calculation is as follows:
Figure FDA0003112800770000021
wherein d represents a dilation coefficient, k represents a convolution kernel size, and "·" represents a convolution operation; f (i) denotes the ith element in the convolution kernel; xs-d·iRepresenting the sequence elements multiplied by the elements in the convolution kernel correspondingly, and representing sales data of the past twelve months in the warehouse sales volume prediction, namely input data;
s2.3: constructing residual modules, wherein one residual module comprises two parts: inputting the output after a series of operations, wherein the output expression of the residual module is as follows:
o=Activation(x+F(x))
in the above formula, the residual error module comprises a branch, which leads out a series of transformations f (x), the output of which is added to the input x of the residual error module and finally generates the output o of the residual error module by means of an Activation function;
s2.4: training the TCN model, wherein a loss function in the training process is as follows:
Figure FDA0003112800770000022
where n is the training data length, yiA value is returned for the network model,
Figure FDA0003112800770000023
and setting the minimum loss function as an optimization target for the true value of the sample, giving a network initialization seed, a learning rate mu and a training step size steps, and continuously updating the network weight by applying a gradient descent optimization algorithm to finally obtain the well-trained TCN model.
4. The logistics warehouse sales prediction method of claim 1, wherein the step S3 comprises:
s3.1: splicing external influence factors on the time series data obtained by preprocessing in the step S1 to serve as characteristic attributes of the sample, wherein the external influence factors comprise whether the time series data are holiday periods or festival periods, temperature and unit price;
s3.2: screening the characteristic attributes by calculating the correlation among different characteristics;
s3.3: the accuracy and the efficiency are balanced by adopting two methods of unilateral sampling and mutual exclusion characteristic combination based on gradient, and the LightGBM model is optimized;
s3.4: training the LightGBM model, wherein a loss function in the model training process adopts a loss function of a decision tree and is defined as:
Figure FDA0003112800770000031
wherein, | T | represents the number of leaf nodes, NtIndicating the number of samples, H, of a particular leaf nodet(T) represents the leaf node empirical entropy, the first term represents the prediction error of the model to the training data, namely the fitting degree of the model, and the second term represents the complexity degree of the model, and the influence of the two is controlled by the parameter alpha.
5. The logistics warehouse sales prediction method of claim 1, wherein in step S4, an optimal combination mode is found as a final prediction model by a weighted combination mode, specifically:
TCN+LightGBM=a*TCN.result+b*LightGBM.result
where TCN. result represents the prediction probability of the trained TCN model, LightGBM. result represents the prediction probability of the trained LightGBM model, TCN + LightGBM represents the final prediction result, and a + b is 1.
CN202110653522.1A 2021-06-11 2021-06-11 Logistics storage sales prediction method based on TCN and LightGBM combined model Active CN113379125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110653522.1A CN113379125B (en) 2021-06-11 2021-06-11 Logistics storage sales prediction method based on TCN and LightGBM combined model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110653522.1A CN113379125B (en) 2021-06-11 2021-06-11 Logistics storage sales prediction method based on TCN and LightGBM combined model

Publications (2)

Publication Number Publication Date
CN113379125A true CN113379125A (en) 2021-09-10
CN113379125B CN113379125B (en) 2022-05-13

Family

ID=77573883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110653522.1A Active CN113379125B (en) 2021-06-11 2021-06-11 Logistics storage sales prediction method based on TCN and LightGBM combined model

Country Status (1)

Country Link
CN (1) CN113379125B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559163A (en) * 2018-11-16 2019-04-02 广州麦优网络科技有限公司 A kind of model building method and sales forecasting method based on machine learning
CN109784979A (en) * 2018-12-19 2019-05-21 重庆邮电大学 A kind of supply chain needing forecasting method of big data driving
WO2019109790A1 (en) * 2017-12-08 2019-06-13 北京京东尚科信息技术有限公司 Sales volume prediction method and device, and computer-readable storage medium
CN111652654A (en) * 2020-06-10 2020-09-11 创新奇智(南京)科技有限公司 Sales prediction and neural network construction method, device, equipment and storage medium
CN111882157A (en) * 2020-06-24 2020-11-03 东莞理工学院 Demand prediction method and system based on deep space-time neural network and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019109790A1 (en) * 2017-12-08 2019-06-13 北京京东尚科信息技术有限公司 Sales volume prediction method and device, and computer-readable storage medium
CN109559163A (en) * 2018-11-16 2019-04-02 广州麦优网络科技有限公司 A kind of model building method and sales forecasting method based on machine learning
CN109784979A (en) * 2018-12-19 2019-05-21 重庆邮电大学 A kind of supply chain needing forecasting method of big data driving
CN111652654A (en) * 2020-06-10 2020-09-11 创新奇智(南京)科技有限公司 Sales prediction and neural network construction method, device, equipment and storage medium
CN111882157A (en) * 2020-06-24 2020-11-03 东莞理工学院 Demand prediction method and system based on deep space-time neural network and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TINGYU WENG: ""Supply chain sales forecasting based on lightGBM and LSTM combination model"", 《INDUSTRIAL MANAGEMENT & DATA SYSTEMS》 *
YUANYUAN WANG: ""Short-Term Load Forecasting for Industrial Customers Based on TCN-LightGBM"", 《IEEE TRANSACTIONS ON POWER SYSTEMS》 *

Also Published As

Publication number Publication date
CN113379125B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
US11586880B2 (en) System and method for multi-horizon time series forecasting with dynamic temporal context learning
Liu et al. Bridge condition rating data modeling using deep learning algorithm
CN109034861B (en) User loss prediction method and device based on mobile terminal log behavior data
Chang et al. Trend discovery in financial time series data using a case based fuzzy decision tree
Zheng et al. Investigation of model ensemble for fine-grained air quality prediction
CN109086926B (en) Short-time rail transit passenger flow prediction method based on combined neural network structure
CN111582538A (en) Community value prediction method and system based on graph neural network
Biard et al. Automated detection of weather fronts using a deep learning neural network
Yarragunta et al. Prediction of air pollutants using supervised machine learning
US9324026B2 (en) Hierarchical latent variable model estimation device, hierarchical latent variable model estimation method, supply amount prediction device, supply amount prediction method, and recording medium
CN113706291A (en) Fraud risk prediction method, device, equipment and storage medium
Wambura et al. Robust anomaly detection in feature-evolving time series
CN116340726A (en) Energy economy big data cleaning method, system, equipment and storage medium
CN112052990B (en) CNN-BilSTM hybrid model-based next activity prediction method for multi-angle business process
CN116245259B (en) Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment
CN113569048A (en) Method and system for automatically dividing affiliated industries based on enterprise operation range
CN113379125B (en) Logistics storage sales prediction method based on TCN and LightGBM combined model
Li et al. An improved genetic-XGBoost classifier for customer consumption behavior prediction
CN115841345A (en) Cross-border big data intelligent analysis method, system and storage medium
Wu et al. Customer churn prediction for commercial banks using customer-value-weighted machine learning models
CN115187312A (en) Customer loss prediction method and system based on deep learning
CN115510948A (en) Block chain fishing detection method based on robust graph classification
Leverger et al. Toward a framework for seasonal time series forecasting using clustering
CN111160419A (en) Electronic transformer data classification prediction method and device based on deep learning
Pérez-Chacón et al. Pattern sequence-based algorithm for multivariate big data time series forecasting: Application to electricity consumption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant