CN115439045A

CN115439045A - Logistics storage demand prediction method based on MAGRU

Info

Publication number: CN115439045A
Application number: CN202210833998.8A
Authority: CN
Inventors: 田冉; 王灏篷; 马忠彧; 刘颜星; 王楚; 王晶霞; 李新梅
Original assignee: Northwest Normal University
Current assignee: Northwest Normal University
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-12-06

Abstract

The lean management of the logistics warehouse provides an important basis for warehouse operation, and unreasonable goods ordering amount can influence the warehouse management efficiency, waste the operation cost, cause the phenomena of idle resource waste or goods stockpiling and the like. The article proposes a method for predicting the demand quantity of goods based on MAGRU aiming at the low accuracy rate of goods prediction. First, the original sequence temporal features are encoded and embedded into the feature vector. Secondly, the input features are extracted by using an attention mechanism and a GRU at an encoder stage, and a key sequence is captured. And repeating the process of the encoder at the decoder stage, extracting the coded sequence characteristics by using an attention mechanism and a GRU (generalized regression analysis), and finally training and optimizing the model by using a storage demand data set, and evaluating the model by using indexes RMSE (RMSE) and MAE (maximum likelihood estimation). Taking the data of the demand quantity of a certain commodity as an example, a large number of experiments are carried out under different parameters, and the effectiveness of the model is verified. The experimental results show that the MAGRU is improved more remarkably compared with the prior method.

Description

Logistics storage demand prediction method based on MAGRU

Technical Field

The invention relates to a warehouse commodity demand prediction method, which has an extremely important application prospect in the field of logistics warehouse management.

Background

In the logistics industry, the lean management of the warehouse provides important basis for product manufacturing and warehouse operation, greatly saves the commercial cost, and is more and more emphasized by the industry. However, unreasonable commodity reserves can seriously affect the warehouse management efficiency, waste the cost of operators, reduce the user experience, and cause the phenomena of idle resource waste or goods stockpiling and the like. Some commodities with high requirements on storage time limit, such as foods, medicines and the like, need to be predicted accurately in inventory, and other commodities with high customization degree, such as special parts, chips and the like, need to be customized as required, and need to provide long-time sequence prediction.

In order to solve the above problems, research on the problem of predicting the demand of a commodity is particularly important. The demand forecast takes commodity information and historical demand information as basic information, potential demand rules are mined from the basic information, and the periodic seasonal characteristics of the demand rules are found through a specific algorithm, so that the utilization efficiency of a warehouse is improved, and the management cost of the warehouse is reduced.

The storage demand forecasting process comprises the following steps: (1) Basic information for definitely predicting commodities, such as commodity categories, commodity brands, access and collection records on various sales platforms and historical demand information of the commodities, such as monthly, weekly or daily sales records and other information; (2) An algorithm suitable for predicting the future demand of the commodity is found through various information of the commodity, including historical demand data and self characteristics; (3) The characteristics of the commodities and historical demand are considered comprehensively, and future values are predicted by a statistical method.

At present, most of researches on logistics demand prediction problems are focused on a traditional time series method, attention to historical demand data is the key point of demand prediction, but only attention is paid to the relation between historical demands, and the influence on the characteristics of commodities is less considered, so that external characteristics influencing the demands can be omitted, and the prediction effect is influenced. However, for some commodities with high customization degree, such as mechanical parts, chips and the like, long-order prediction needs to be provided for the commodities, the traditional time sequence method and the machine learning method cannot meet the actual demand, and the capability of practical application is limited. The method improves the model capture long-order dependence, can capture the key sequence in the sequence formed by the historical demand data, and simultaneously considers the influence of the characteristics of the commodity on the demand. An attention mechanism is introduced on the basis of a gated cycle unit, and the process is repeated for multiple times, so that the accuracy of prediction is greatly improved. A large number of experiments prove that the accuracy of long sequence prediction is greatly improved, and the method is simple in structure and suitable for real scenes.

Disclosure of Invention

The invention overcomes the defects that long-order prediction is inaccurate, the realization cost of a complex prediction method is overhigh, the prediction is lack of consideration for holidays and the characteristics of commodities and the like in the conventional warehouse commodity demand prediction problem, provides a logistics warehouse demand prediction method based on MA (Multi Layer Attention) -GRU (Gated recycle Unit), provides reliable prediction for the demand of commodities in a logistics warehouse, saves the operation cost of logistics merchants and improves the user experience.

The invention mainly comprises five parts: the input and the output of the method are determined (1), holiday coding is carried out on the time of commodity history information through input and output (2), a multilayer attention gating cyclic unit encoder is constructed (3), a multilayer attention gating cyclic unit decoder is constructed (4), and validity verification is carried out through the invention.

The contents of the above five parts are respectively described as follows:

1. input and output of the method are determined. The various types of information of the commodities comprise historical demand information of the commodities and information of the commodities, and the information is used as a model training data set. The historical demand information comprises the date and sales volume of the commodity sales, so that the relation between the commodity volume and the time can be better grasped, and seasonal demand factors, holiday and festival influence factors and the like of the commodity are mined. The information of the product itself includes its brand information, product type information, etc., and the change of these external conditions also affects the demand. The matrix formed by various information of the commodities is used as the input of the method, and comprises the attributes of date, commodity type, browsing times of commodities on a sales platform, collection times of the commodities, commodity brands, transaction amount, transaction number and the like. The output of the method is a predicted value of the commodity demand, and the future commodity demand is predicted.

2. And carrying out holiday and festival coding on the time of the commodity history information. Holidays and seasons are important links for commodity demand prediction, the demand of some commodities is not influenced by holidays, and the demand of some commodities is closely related to holidays, so that holiday coding and analyzing the commodities are the basis for predicting the demand of the commodities. The coded information is embedded into commodity demand historical data and is also used as commodity characteristics to be analyzed, and the method has an important effect on improving the accuracy of prediction.

3. A multi-layer attention-gated cyclic unit encoder was constructed. When the gated loop unit captures the linear dependence and the non-linear dependence of the commodity demand relationship, the key variation sequence can be found by attention mechanism. Based on this, the invention proposes an encoder combining a gated cyclic unit and self attention and stacking multiple layers, specifically, taking data in a time slice as the output of the encoder, firstly, using the attention system to find the attention weight of each node in the time slice, allocating the calculated weight to each component in the time slice, and taking each component as the input of the gated cyclic unit (hereinafter, abbreviated as GRU), wherein the output result of the encoder is the vector composed of GRU hidden layers.

4. A multi-layered attention-gated cyclic unit decoder is constructed. In order to further extract the features of the vector obtained by the encoder, and capture the linear and nonlinear relations of the sequence, the invention is implemented at the decoder stage using the attention layer and the GRU as well. Firstly, calculating attention weight of an encoder output vector, then multiplying the weight by vector components and accumulating, using the obtained result as input of a GRU, wherein hidden layer output of the GRU is a future demand prediction result.

5. And (5) verifying the validity of the method. Experiments on the disclosed warehouse cargo demand data set prove that compared with other leading-edge researches, the method can provide higher prediction accuracy for prediction tasks of different lengths of time and different commodities.

The detailed implementation steps adopted by the invention to achieve the aim are as follows:

step 1: firstly, input and output of the model are determined, and an appropriate training data set is selected. The model requires input of commodity demand time series data collected in warehouse operations. A single piece of data may be represented as a representation

Wherein D represents the date, A is the various attributes of the commodity, and Q is the demand of the commodity on the day. Collecting a data set of m sample sizes { x ] from a data set ⁽¹⁾ ,...,x ^(m) As training samples for the model.

And 2, step: and embedding the date according to the time information of the commodity demand. And (3) encoding the date data D of each commodity into 4 attributes of year, month, week and holiday, embedding the 4 attributes into the original commodity data, and replacing the original date data. Date embedded data can be defined as

Therefore, the time characteristics of the data can be extracted, and date factors influencing requirements can be effectively captured.

And step 3: and preprocessing the data set. The composition of the data set has an influence on the training process of the method, and therefore, the preprocessing of the data set is also one of the essential steps of the invention. Abnormal data and pole data in the original data set are deleted, the subsequent data are used for filling, in addition, the data are normalized, and the maximum and minimum normalization is adopted to control the data in a certain range.

And 4, step 4: dividing the sequence according to time steps and calculating attention weights of all components in the sequence based on a training data set; first the dataset is partitioned according to the set time steps according to step 4.1 and secondly the attention weight is calculated according to step 4.2. By extracting the weight of the input sequence, the key sequence influencing the requirement can be found, so that more attention is given to the key sequence, and the accuracy of prediction is improved.

The specific steps for constructing the encoder are as follows:

step 4.1: the input sequence is divided according to time steps. The training data set may be represented as { x } ₁ ,x ₂ ,...,x _t And inputting the sequence under a time step into an attention layer, and calculating the weight of each component.

Step 4.2: the attention weight of each component in the sequence is calculated according to a time step sequence of step 4.1. The formula for calculating attention is:

the weights are compressed to [0,1] and the sum of the weights of the components is 1, and the calculation formula is as follows:

weights are assigned to the sequences so that each component represents a different importance. We record the vector at this time slice as

Step 4.3: the features of this time slice sequence are extracted by the GRU, with the vector output from step 4.2 as input to the GRU. Firstly, initializing a GRU network hidden layer state h, and secondly, initializing x' _t As the output of each GRU unit. By the formula

R _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z ) (4)

Z _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z ) (5)

For hidden layer h _t After the updating, the output of each hidden layer is recorded, and the output of the hidden layers is the output of the encoder after passing through a layer of fully-connected neural network.

And 5: a decoder is constructed. The vector output by the encoder is further subjected to feature extraction, so that more accurate prediction is provided for future commodity demand.

Step 5.1: the calculation is based on the attention of the respective hidden layer state h output by the encoder to calculate the hidden layer state again in the decoder. The attention calculation formula is as follows:

the decoder state d, s will be initialized followed by computing the attention weights of the encoder output sequence and making the sum of the weights of the components in the sequence 1 by the softmax layer.

And multiplying and summing the weight and the components in the sequence to obtain a result which is a driving vector.

Step 5.2: the predicted demand is calculated from the GRU output. And finally obtaining a driving vector c in a time slice, splicing c and a subsequent real demand value of the time slice in the training process, and jointly using the c as the input of one unit of the GRU, wherein each component is the input of one GRU in a sequence formed from the first time slice to the last time slice divided by the training set, and finally outputting the future demand.

Step 5.3: and outputting the predicted demand. The result output by the decoder is a hidden layer sequence output by the gate control cycle unit, and the expression capability of the hidden layer sequence is limited when the hidden layer sequence is directly used as a prediction result. In the formula, V and W are parameters which need to be continuously and iteratively trained.

Representing the predictor, which may be defined as a vector of a size that is consistent with the length of the prediction task. And predicting the demand of tau days in the future, and then the vector size is tau.

The invention provides a logistics storage demand prediction method based on an MAGRU, which utilizes a deep learning technology, combines an attention mechanism, uses an encoder-decoder structure, repeats an attention process at the encoder and decoder stages, improves the capture capacity of a long-order dependency relationship, fully considers the influence of various external characteristics and time characteristics on commodity demand and improves the model accuracy. In addition, the model minimizes warehouse operating costs. A large number of experiments prove that the method has a great improvement compared with the previous research in the aspect of accuracy of long-time-sequence prediction tasks and generalization of commodity prediction with different distribution rules, and can be applied to small and medium-sized logistics enterprises.

Drawings

FIG. 1 is a diagram of a multi-layer attention cycle gating unit model according to the present invention

FIG. 2 is a stage attention calculation process of an encoder in the present invention

FIG. 3 is a stage prediction calculation process of the decoder in the present invention

FIG. 4 is a diagram of specific demand and variation trend of a product in accordance with the present invention

FIG. 5 is a line diagram of the predicted effect of the present invention on a certain product

FIG. 6 is a graph comparing the present invention with a recurrent neural network with respect to prediction accuracy

FIG. 7 is a graph comparing the prediction accuracy of the present invention and the extreme gradient enhancement algorithm

FIG. 8 is a comparison graph of the prediction accuracy of the recurrent neural network of the present invention with two-stage attention

FIG. 9 is a graph comparing the prediction accuracy of some of the components of the present invention with those of the present invention

Detailed Description

The invention is further illustrated by the following examples in conjunction with the drawings.

The method carries out prediction modeling aiming at the problem of insufficient prediction accuracy of the warehouse goods demand. The MA-GRU-based logistics storage commodity demand prediction method is suitable for any commodity time sequence data collected in warehouse management. The method is realized by a python language construction model in the Jupyter Notebook environment, and the realization of the method is explained in detail by combining with an example.

FIG. 1 is a diagram of the network model architecture of the present invention, in which there is a common 2-layer attention mechanism, a 2-layer GRU network, data

Where D is a numeric string representing the date, Q is the daily demand for the good, and A is a client that affects the demand for the goodThe number of times the commodity is accessed, the number of times the commodity is collected, the commodity page flow rate, the commodity price and the like. Firstly, extracting time characteristics, embedding the time characteristics into original data as characteristics and organizing the data into an organization

Y represents year, M represents month, W represents day of the week, and H represents holiday or not. Secondly, distributing a weight for an input sequence every day through an attention mechanism, forming a vector by x with different weights, taking each component of the vector as the input of each unit of the GRU network, then outputting hidden layer of the GRU network through a layer of attention again to obtain the weight of the hidden layer of the GRU network, accumulating the weight and the hidden layer product of the GRU to obtain a driving vector, finally combining the driving vector with the sequence real demand in the training data set, and obtaining the prediction demand through a GRU network layer.

The method focuses on a scene of predicting the future demand of the warehouse commodity on the constructed MA-GRU network, a proper data set is given, a network model is constructed through an attention mechanism and a neural network, network parameterization is carried out, and then parameters are trained and optimized, so that the loss value and the index value of the network are minimized on the basis of realizing the optimal network performance within the set proper training times.

The framework of the MA-GRU is shown in figure 1, the invention adopts the real daily demand data of a certain commodity, and the concrete implementation is as follows:

step 1: firstly, input and output of the model are determined, and a proper training data set is selected. The model needs to input commodity demand time sequence data collected in warehouse operation. A single piece of data may be represented as a representation

Wherein D represents date, A represents commodity attribute, and Q is data set { x) of m sample sizes collected from training set for commodity demand ⁽¹⁾ ,...,x ^(m) As training samples for the model.

And then the step 2 is carried out.

Step 2: according toThe time information of the commodity demand is embedded with the date. And (3) encoding the date data D of each commodity into 4 attributes of year, month, week and festival, embedding the 4 attributes into the original commodity data, and replacing the original date data. Date embedded data can be defined as

Therefore, the time characteristics of the data can be extracted, and date factors influencing requirements can be effectively captured. And then the step 3 is carried out.

And step 3: and preprocessing the data set. Firstly, input training data set is normalized, the normalization method uses maximum and minimum normalization to convert the training data to the range of [0,1] linearly, and the normalization formula is as follows:

wherein X _max Representing the maximum value, X, in the training dataset _min Representing the minimum of the training data set. And taking the normalized data set as a final training data set. And then step 4 is carried out.

And 4, step 4: dividing the sequence according to time steps and calculating attention weights of all components in the sequence based on the training data set obtained in the step 3; first the dataset is partitioned according to the set time steps according to step 4.1, secondly the attention weight is calculated according to step 4.2, and afterwards the hidden layer is calculated according to step 4.3. Then, turning to the step 5;

the specific steps for constructing the encoder are as follows:

step 4.1: the input sequence is divided according to time steps. The training data set may be represented as { x } ⁽¹⁾ ,...,x ^(m) The invention takes 7 days history demand as an example, then the data in one time slice is { x } ¹ ,x ² ,...,x ⁷ The sequence at one time step is entered into the attention layer and the component weights are calculated.

the weights are then compressed to [0,1] by softmax and the sum of the component weights is 1, the calculation formula is as follows:

weights are assigned to the sequences so that each component represents a different importance. We record the vector under this time slice as

Step 4.3: the features of this time slice sequence are extracted by the GRU, with the vector output from step 4.2 as input to the GRU. Firstly initializing a hidden layer state h of a GRU network, and secondly initializing x' _t As the output of each GRU unit. By the formula

R _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z ) (16)

Z _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z ) (17)

Then, turning to step 5;

first, the decoder states d, s are initialized, then the attention weights of the output sequence of the encoder are calculated, the sum of the weights of each component in the sequence is 1 through the softmax layer,

and multiplying and summing the weight and the components in the sequence to obtain a result which is a driving vector. Then go to step 5.2

Step 5.2: the predicted demand is calculated from the GRU output. And finally obtaining a driving vector c in a time slice, splicing c and a subsequent real demand value of the time slice in the training process, and jointly using the c as the input of one unit of the GRU, wherein each component is the input of one GRU in a sequence formed from the first time slice to the last time slice divided by the training set, and finally outputting the future demand. Then go to step 5.3

Step 5.3: and outputting the predicted demand. In order to enhance the model expression capability of the invention, two full-connection layers are added in the subsequent process based on the output of 5.2, and the prediction target is output after passing through the full-connection layers.

The output result is a vector, the length of the vector is different according to different prediction tasks, for example, the demand of the commodity is predicted in the subsequent 1 day, then y is the vector with the length of 1, and the demand predicted in the future t days can be expressed as

The invention can still provide more accurate prediction for the prediction task with larger t value.

According to the method, a prediction data set is obtained through training, the prediction data set is compared with a test data set, MAE and RMSE index values of the prediction data are calculated, and final generated data are obtained through a comparison experiment, wherein an index formula for calculating the generated data is as follows:

wherein, y _i I.e. the real test data set requirements,

i.e. the predicted future demand, n is the size of the test data set.

The performance of the invention is described by MAE and RMSE indexes under the same training data set, and the performance of the invention is evaluated by comparing with other existing methods. The results are presented using line graphs as shown.

Fig. 5 is a line diagram of the effect predicted for a certain product in the present invention. As can be seen from the graph, the method can better fit the change trend of the future demand of the commodity and can give more accurate prediction on the future demand.

FIG. 6 is a graph comparing the present invention with a recurrent neural network with respect to prediction accuracy. The cyclic neural network is used as a time series model and widely applied to various prediction problems, the horizontal axis of the graph 6 represents prediction time, the vertical axis represents demand of a certain commodity, and the graph shows that the method is closer to a real demand curve and can also give more accurate prediction for long-series prediction.

FIG. 7 is a comparison of the prediction accuracy of the present invention and extreme gradient enhancement (XGboost) algorithm. The XGboost is used as a statistical method, the accuracy of predicting the future demand is limited, only the general change rule of the demand can be given, and the performance of predicting the fluctuation trend is poor.

FIG. 8 is a graph comparing the present invention with a two-stage attention recurrent neural network (DA-RNN) with respect to prediction accuracy. DA-RN is expected to have limited performance for long-order predictions compared to the present invention, and the future daily demand differs from the actual demand by more than the present invention. The experimental results of the invention are proved to be compared, and the MAE and the RMSE are reduced by 13.47 percent and 14.90 percent compared with the DA-RNN model.

FIG. 9 is a comparison of prediction accuracy for some components of the invention compared to the deletion of the invention. Since the present invention is composed of each component, the necessity of each component is evaluated, and the present invention can provide higher accuracy by comparing the values of MAE by omitting the module for calculating attention weights in the encoder and decoder steps. All the steps of the present invention are not omitted.

Claims

1. A logistics storage demand forecasting method based on MAGRU comprises the steps of adding data embedding to original data, and combining a self-attention mechanism with a circulating gate control unit, and is characterized in that data embedding is carried out on dates of commodities, holidays and seasonal characteristics influencing demand are extracted, the attention mechanism is used for extracting a key part of an input sequence in an encoder stage, the circulating gate control unit is used for coding the sequence, the attention mechanism is used for capturing the key sequence output by an encoder again in a decoder stage, forecasting of future demand is achieved through the gate control circulating unit, and the aim of improving forecasting accuracy is achieved through the effect of the multi-layer attention mechanism, and the decomposition specific steps comprise the following steps:

step 1: firstly, determining the input and the output of a model, selecting a proper training data set, inputting the model into commodity demand time sequence data acquired in warehouse operation, and expressing single data by representation

Wherein D represents the date, A is the various attributes of the commodity, Q is the demand of the commodity on the day, and a data set { x with the size of m samples is collected from the data set ⁽¹⁾ ,...,x ^(m) } training samples as models;

and 2, step: embedding dates according to time information of commodity demand, coding date data D of each commodity into 4 attributes of year, month, week and holiday, embedding the 4 attributes into original commodity data to replace the original date data, and defining the data after date embedding as

Therefore, the time characteristics of the data can be extracted, and date factors influencing requirements are effectively captured;

and 3, step 3: preprocessing a data set, wherein the composition of the data set influences the training process of the method, so the preprocessing of the data set is also one of the necessary steps of the method, abnormal data and end data in an original data set are deleted and filled by subsequent data, in addition, the data are normalized, and the method adopts maximum and minimum normalization to control the data in a certain range;

and 4, step 4: dividing the sequence according to time steps and calculating attention weights of all components in the sequence based on a training data set, firstly dividing the data set according to the set time steps in the step 4.1, secondly calculating attention weights according to the step 4.2, and finding out a key sequence influencing requirements by extracting the weights of the input sequence, thereby giving more attention to the key sequence and improving the accuracy of prediction;

the specific steps for constructing the encoder are as follows:

step 4.1: the input sequence is divided according to time steps, and the training data set can be expressed as { x ₁ ,x ₂ ,...,x _t Inputting a sequence under a time step into an attention layer, and calculating the weight of each component;

step 4.2: according to a time step sequence of step 4.1, calculating attention weights of components in the sequence, wherein the formula for calculating attention is as follows:

weights are assigned to the sequences so that each component represents a different importance, and the vector under the time slice is recorded as

Step 4.3: taking the vector output in the step 4.2 as the input of the GRU, extracting the characteristics of the time slice sequence through the GRU, firstly initializing the state h of a hidden layer of the GRU network, and secondly, setting x' _t As output of each GRU unit, by formula

R _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z )； (4)

Z _t ＝σ(X _t W _xz +h _t-1 W _hz +b _z )； (5)

For hidden layer h _t After updating, recording the output of each hidden layer, wherein the output of the hidden layers is the output of the encoder after passing through a layer of fully-connected neural network;

and 5: constructing a decoder, and further extracting the characteristics of the vector output by the encoder, thereby providing more accurate prediction for the future commodity demand;

step 5.1: computing attention for each hidden layer state h based on the encoder output, computing the hidden layer state again in the decoder, the attention computation formula is as follows:

the decoder state d, s will be initialized, followed by the computation of the attention weight of the encoder output sequence, and the sum of the weights of the components in the sequence will be 1 by the softmax layer,

then multiplying and summing the weight and the components in the sequence to obtain a result which is a driving vector,

step 5.2: calculating a predicted demand through GRU output, finally obtaining a driving vector c in a time slice, splicing c and a subsequent real demand value of the time slice in a training process, and jointly using the c and the subsequent real demand value as the input of a unit of the GRU, wherein each component is the input of one GRU in a sequence formed by the first time slice and the last time slice divided by a training set, and finally outputting the future demand;

step 5.3: the method comprises the steps that prediction demand is output, the output result of a decoder is a hidden layer sequence output by a gating cycle unit, and the expression capability of the hidden layer sequence is limited as the prediction result directly;

and representing a prediction result, wherein the prediction result can be defined as a vector, the size of the vector is consistent with the length of the prediction task, and the vector size is tau when the demand of tau days in the future is predicted.