CN113159355A

CN113159355A - Data prediction method, data prediction device, logistics cargo quantity prediction method, medium and equipment

Info

Publication number: CN113159355A
Application number: CN202010014953.9A
Authority: CN
Inventors: 耿东阳
Original assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Zhenshi Information Technology Co Ltd
Current assignee: Beijing Jingbangda Trade Co Ltd; Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-07-23

Abstract

The embodiment of the invention relates to a data prediction method, a data prediction device, a logistics cargo quantity prediction method, a medium and equipment, relating to the technical field of big data processing, wherein the data prediction method based on time series comprises the following steps: acquiring historical time sequence data, and acquiring a time sequence feature matrix of each time sequence data according to the time sequence feature of each time sequence data in the historical time sequence data; classifying a plurality of time sequence prediction models by using a time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; and predicting the data of each time sequence data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result. The embodiment of the invention improves the selection efficiency of the target prediction model and simultaneously improves the efficiency of predicting the data of each time sequence data in the future time period.

Description

Data prediction method, data prediction device, logistics cargo quantity prediction method, medium and equipment

Technical Field

The embodiment of the invention relates to the technical field of big data processing, in particular to a data prediction method based on a time sequence, a data prediction device based on the time sequence, a logistics cargo volume prediction method based on the time sequence, a logistics cargo volume prediction device based on the time sequence, a computer-readable storage medium and electronic equipment.

Background

Time series prediction is to train a model based on historical observation data and then output a prediction result of future time, and similar to other machine learning methods, the time series prediction model also faces an "overfitting" problem, which shows that some models are well fitted on the historical training data, but the prediction error of the models at the future time is large. Since data at a future time cannot be observed at present, and a Model with the minimum error cannot be selected through the future data, Model Selection (Model Selection) needs to be performed based on historical training data.

Most of the existing model selection methods are model selection methods based on time series cross validation. Specifically, for most machine learning models such as tree models, model selection cannot be performed through an information quantity criterion, a time sequence cross validation method is generally adopted, a training time period and a validation time period are divided on historical data, all candidate models are trained by using training time period data, then prediction accuracy of the validation time period is compared, and a winning model is considered to be a model with better selected predictive performance.

However, the above model selection method has the following drawbacks: the model selection method based on time series cross validation needs to use each model for each time series to independently perform model selection, and the order of time complexity is M x N; and N is the number of time series, and M is the number of prediction models to be selected, so that the efficiency of model selection is low.

Therefore, it is desirable to provide a new method and apparatus for predicting data based on time series.

It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide a time series-based data prediction method, a time series-based data prediction device, a time series-based logistics cargo quantity prediction method, a time series-based logistics cargo quantity prediction device, a computer-readable storage medium and an electronic device, and further to overcome the problem of low model selection efficiency caused by the limitations and defects of the related art at least to a certain extent.

According to an aspect of the present disclosure, there is provided a time series-based data prediction method, including:

acquiring historical time sequence data, and acquiring a time sequence feature matrix of each time sequence data according to the time sequence feature of each time sequence data in the historical time sequence data;

classifying a plurality of time sequence prediction models by using a time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; the model classifier is obtained by training an initial network model by using the historical time series data;

and predicting the data of each time sequence data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result.

In an exemplary embodiment of the present disclosure, the time-series based data prediction method further includes:

obtaining training set data and verification set data according to the historical time sequence data, and respectively training each time sequence prediction model by using each time sequence data in the training set data;

predicting each time sequence data in the verification set data by using each trained time sequence prediction model to obtain a plurality of prediction results, and calculating a difference value between each prediction result and an actual result corresponding to each prediction result;

taking the time sequence prediction model with the minimum difference value as the current prediction model of the time sequence data corresponding to the prediction result;

and training an initial network model by using the current prediction model of each time sequence data in the verification set data and the time sequence characteristic matrix of each time sequence data to obtain the model classifier.

In an exemplary embodiment of the disclosure, deriving training set data and validation set data from the historical time series data comprises:

and sampling the historical time sequence data by using a self-service sampling method to obtain the training set data and the verification set data.

In an exemplary embodiment of the present disclosure, training an initial network model by using a current prediction model of each time series data in the validation set data and a time series feature matrix of each time series data, and obtaining the model classifier includes:

respectively inputting the time sequence feature matrix of each time sequence data in the verification set data into the initial network model to obtain a plurality of output results; wherein the initial network model comprises at least one of a decision tree model, a lifting tree model, a random forest model and a neural network model;

judging whether the output results are the same as the current prediction models of the time sequence data or not;

and taking the initial network model as the model classifier when determining that each output result is the same as each current prediction model.

In an exemplary embodiment of the disclosure, obtaining a time series feature matrix of each time series according to the time series feature of each time series data in the historical time series includes:

extracting time sequence characteristics of each time sequence data in the historical time sequence; wherein the timing characteristics include a plurality of timing lengths, trends, seasonality, linearity, steepness, spectral entropy, compartmentalization, volatility, autocorrelation, and partial autocorrelation;

and obtaining a time sequence characteristic matrix of each time sequence according to each time sequence characteristic.

In an exemplary embodiment of the present disclosure, the time series prediction model includes a difference integrated moving average autoregressive model, an exponential smoothing model, a time series decomposition model, a Theta model, and

a plurality of models.

According to one aspect of the present disclosure, there is provided a time-series-based logistics cargo volume prediction method, including:

acquiring historical cargo quantity time-series data, and obtaining a time-series characteristic matrix of the historical cargo quantity time-series data according to the time-series characteristics of the historical cargo quantity time-series data;

classifying a plurality of time sequence prediction models by using a time sequence characteristic matrix and a preset model classifier to obtain a target prediction model of the historical cargo quantity time sequence data; the model classifier is obtained by training a lifting tree algorithm model by utilizing the historical cargo quantity time series data;

and predicting the data of the historical goods quantity time-series data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result so that a user configures the required logistics goods quantity according to the prediction result.

In an exemplary embodiment of the present disclosure, the time-series-based logistics cargo quantity prediction method further includes:

carrying out normalization processing on the historical cargo quantity time-series data, and training set data and verification set data according to the normalized historical cargo quantity time-series data;

respectively training each time sequence prediction model by using the training set data, and predicting the verification set data by using each trained time sequence prediction model to obtain a plurality of prediction results;

calculating a difference value between each prediction result and an actual result corresponding to each prediction result, and using a time sequence prediction model with the minimum difference value as a target prediction model of the historical cargo quantity time sequence data;

and training an initial classifier according to the target prediction model and the time sequence feature matrix of the verification set data to obtain the model classifier of the historical cargo quantity time series data.

According to an aspect of the present disclosure, there is provided a time-series-based data prediction apparatus including:

the data acquisition module is used for acquiring historical time sequence data and acquiring a time sequence characteristic matrix of each time sequence data according to the time sequence characteristics of each time sequence data in the historical time sequence data;

the target prediction model determining module is used for classifying a plurality of time sequence prediction models by utilizing the time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; the model classifier is obtained by training an initial network model by using the historical time series data;

and the data prediction module is used for predicting the data of each time sequence data in the future time period by using the target prediction model to obtain a prediction result and displaying the prediction result.

According to an aspect of the present disclosure, there is provided a time-series-based logistics cargo amount prediction apparatus, including:

the time sequence characteristic matrix determining module is used for acquiring historical cargo quantity time sequence data and obtaining a time sequence characteristic matrix of the historical cargo quantity time sequence data according to the time sequence characteristics of the historical cargo quantity time sequence data;

the time sequence prediction model classification module is used for classifying a plurality of time sequence prediction models by utilizing a time sequence characteristic matrix and a preset model classifier to obtain a target prediction model of the historical cargo quantity time sequence data; the model classifier is obtained by training a lifting tree algorithm model by utilizing the historical cargo quantity time series data;

and the prediction result display module is used for predicting the data of the historical goods quantity time-series data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result so that the user configures the needed goods quantity according to the prediction result.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the time-series-based data prediction method according to any one of the above embodiments or the time-series-based logistics cargo amount prediction method according to any one of the above embodiments.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the executable instructions to perform the time-series-based data prediction method of any one of the above embodiments or the time-series-based logistics cargo volume prediction method of any one of the above embodiments.

According to the data prediction method based on the time sequence, on one hand, historical time sequence data are obtained, and a time sequence feature matrix of each time sequence data is obtained according to the time sequence features of each time sequence data in the historical time sequence data; then, classifying the plurality of time sequence prediction models by utilizing the time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; finally, a target prediction model is used for predicting data of each time sequence data in a future time period to obtain a prediction result, and the prediction result is displayed, so that the problem that each model is used for each time sequence to independently select the model in the model selection method based on time sequence cross validation in the prior art is solved, and the order of time complexity is M x N; therefore, the model selection efficiency is low, the selection efficiency of the target prediction model is improved, and the data prediction efficiency of each time series data in the future time period is improved; on the other hand, classifying the plurality of time sequence prediction models by utilizing the time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; finally, the target prediction model is used for predicting the data of each time sequence data in the future time period to obtain a prediction result, and the prediction result is displayed, so that the accuracy of the prediction result is improved; on the other hand, the target prediction model is used for predicting the data of each time sequence data in the future time period to obtain a prediction result, and the prediction result is displayed, so that relevant personnel can prepare correspondingly according to the prediction result, for example, corresponding goods are prepared or corresponding logistics personnel are configured according to the prediction result, and the problems of economic loss and the like caused to enterprises due to no preparation can be further avoided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 schematically shows a flowchart of a time series-based data prediction method according to an exemplary embodiment of the present invention.

Fig. 2 schematically shows a flow chart of another time series-based data prediction method according to an exemplary embodiment of the present invention.

Fig. 3 schematically shows a flow chart of another time series-based data prediction method according to an exemplary embodiment of the present invention.

Fig. 4 schematically shows a flowchart of a method for predicting a physical distribution quantity based on a time series according to an exemplary embodiment of the present invention.

Fig. 5 schematically shows a flowchart of another time-series-based logistics cargo volume prediction method according to an exemplary embodiment of the present invention.

Fig. 6, 7 and 8 are diagrams schematically illustrating application scenarios of a time-series-based logistics cargo quantity prediction method according to an exemplary embodiment of the invention.

Fig. 9 schematically shows a block diagram of a logistics system according to an exemplary embodiment of the present invention.

Fig. 10 schematically shows a block diagram of a time-series based data prediction apparatus according to an exemplary embodiment of the present invention.

Fig. 11 is a block diagram schematically illustrating a time-series-based logistics cargo amount prediction apparatus according to an exemplary embodiment of the present invention.

Fig. 12 schematically illustrates an electronic device for implementing the above-described time-series-based data prediction method or the above-described time-series-based physical distribution quantity prediction method according to an exemplary embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Time series, also called time series, historical complex or dynamic series. The numerical values of certain statistical indexes are arranged into a formed numerical sequence according to time sequence. The time series prediction method is to make and analyze time series, and analogize or extend the time series according to the development process, direction and trend reflected by the time series, so as to predict the level which can be reached at the next time point or the next period of time. The time series prediction method can be classified into short-term prediction, medium-term prediction, and long-term prediction according to the predicted time span. According to different data analysis methods, the method can be further divided into a simple sequence-time average method, a weighted sequence-time average method, a simple moving average method, a weighted moving average method, an exponential smoothing method and the like.

The time series prediction has wide application fields, such as demand prediction in the retail industry, financial market prediction, logistics capacity prediction and the like. In the process of realizing automation and intelligence of many business processes, time series prediction plays a very important role, for example, in some online shopping websites, the sales volume of each type of goods in a future period of time is a variable which needs to be considered in a series of business decisions such as stock, promotion and the like, so that the prediction technical capability can finally have a great influence on sales income, inventory cost and the like. Meanwhile, the quantity of commodities sold by a large online shopping website can reach millions, and a large-scale time sequence generates a new challenge to a modern time sequence prediction technology.

The current main time series prediction model selection method is as follows:

one is a model selection method based on the information content criterion. Specifically, for a time series prediction parameter Model based on statistical framework modeling, such as an ARIMA (differential Auto regression Moving Average Model), an ETS Model, etc., Model selection may be performed according to an information amount criterion, for example, criteria such as AIC, BIC, AICc may be used to select component parameters of an AR (Auto regression Model) and an MA (Moving Average Model) in the ARIMA Model, and it is generally considered that the Model generalization capability with smaller information amount in a fitting result is better.

The other is a model selection method based on time series cross validation. Specifically, for most machine learning models such as tree models, model selection cannot be performed through an information quantity criterion, a time sequence cross validation method is generally adopted, a training time period and a validation time period are divided on historical data, all candidate models are trained by using training time period data, then prediction accuracy of the validation time period is compared, and a winning model is considered to be a model with better selected predictive performance.

However, the above model selection method has the following drawbacks. Specifically, the model selection method based on the information content criterion has the following disadvantages:

firstly, the candidate models need to be capable of calculating the information quantity criterion, and most nonparametric models such as tree models cannot calculate the information quantity, so that the prediction performance comparison of different models cannot be performed based on the information quantity criterion, for example, the ARIMA prediction model and the tree-based prediction model cannot be compared by calculating AIC.

Secondly, it is necessary to ensure that the candidate models are all based on the same data, and taking the AIC criterion as an example, the AIC of an ARIMA model with a differential term and an ARIMA model without a differential term cannot be directly compared to consider a smaller model better, since the difference would result in a reduction in the observed values of the samples and thus in an AIC that is not comparable. Therefore, when using the information criterion for model selection, an expert usually needs to specify a suitable type of model, such as ARIMA model, and then perform model comparison and selection, which greatly limits the scope of prediction model selection.

Further, the model selection method based on time series cross validation has the following disadvantages:

the cross validation needs to divide a training set and a validation set for a data set, when the observation quantity of ordinal data is small, the division of the training/validation set is difficult, the model is difficult to train due to too little training data, and the model selection result is unreliable due to too little validation set data. The researchers proposed improved time-series cross validation, which can improve the robustness of model selection by using as much historical data as possible through rolling single-step prediction, but which increases the computational time complexity of model selection.

In addition, the model selection method based on the information quantity criterion and the model selection method based on the time series cross validation both need to use each model for each time series to independently perform model selection at present, the time complexity order is M × N, N is the number of time series, and M is the number of prediction models to be selected.

In summary, a time series prediction model selection method with high automation degree and strong expandability is absent at present.

In the present exemplary embodiment, a time series prediction method is first provided, and is a time series prediction model selection method based on time series characteristics, data driving, and strong expandability. Compared with the two model selection methods, the method can realize rapid and automatic model selection based on the sample time sequence characteristics. The use scene is included in large-scale time series prediction systems of demand prediction, financial market quotation prediction, logistics cargo volume prediction and the like in the retail industry, and is used for improving the selection efficiency of the prediction model and improving the prediction performance. Further, the method can be operated in a server, a server cluster or a cloud server, and the like, and can also be operated in a terminal device; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 1, the time series-based data prediction method may include the steps of:

s110, obtaining historical time sequence data, and obtaining a time sequence feature matrix of each time sequence data according to the time sequence features of each time sequence data in the historical time sequence data.

S120, classifying a plurality of time sequence prediction models by using a time sequence feature matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; and the model classifier is obtained by training an initial network model by using the historical time series data.

And S130, predicting the data of each time sequence data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result.

In the data prediction method based on the time series, on one hand, the time series characteristic matrix of each time series data is obtained by acquiring the historical time series data and according to the time series characteristics of each time series data in the historical time series data; then, classifying the plurality of time sequence prediction models by utilizing the time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; finally, a target prediction model is used for predicting data of each time sequence data in a future time period to obtain a prediction result, and the prediction result is displayed, so that the problem that in the prior art, a model selection method based on time sequence cross validation needs to independently select a model by using each model for each time sequence, and the order of time complexity is M x N, so that the efficiency of model selection is low is solved, the selection efficiency of the target prediction model is improved, and the efficiency of predicting the data of each time sequence data in the future time period is also improved; on the other hand, classifying the plurality of time sequence prediction models by utilizing the time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; finally, the target prediction model is used for predicting the data of each time sequence data in the future time period to obtain a prediction result, and the prediction result is displayed, so that the accuracy of the prediction result is improved; on the other hand, the target prediction model is used for predicting the data of each time sequence data in the future time period to obtain a prediction result, and the prediction result is displayed, so that relevant personnel can prepare correspondingly according to the prediction result, for example, corresponding goods are prepared or corresponding logistics personnel are configured according to the prediction result, and the problems of economic loss and the like caused to enterprises due to no preparation can be further avoided.

Hereinafter, each step involved in the data prediction method based on time series according to the exemplary embodiment of the present invention will be explained and explained in detail with reference to the drawings.

In step S110, historical time-series data is acquired, and a time-series feature matrix of each time-series data is obtained according to a time-series feature of each time-series data in the historical time-series data.

In the present exemplary embodiment, first, historical time-series data may be acquired from a certain data set; the data set may be, for example, an M4 original data set including a time series, or other data sets, which is not limited in this example; secondly, after the historical time series data is acquired, a time series characteristic matrix of each time series data can be obtained according to the time series characteristics of each time series data in the historical time series data. The method specifically comprises the following steps: firstly, extracting time sequence characteristics of each time sequence data in the historical time sequence; wherein the time series characteristics comprise time series length, trend, seasonality, linearity, steepness increasing, spectral entropy, interval, volatility, autocorrelation, partial autocorrelation and the like; and secondly, obtaining a time sequence characteristic matrix of each time sequence according to each time sequence characteristic. The above time sequence characteristics may be specifically shown in table 1 below:

TABLE 1

It can be seen from table 1 above that, by using the above features and the most suitable prediction model data, the classifier can "learn" the most suitable prediction method for the time sequences of different features, and use the information in the model selection of other similar feature time sequences.

In step S120, classifying a plurality of time series prediction models by using a time series feature matrix of each time series data and a preset model classifier to obtain a target prediction model of each time series data; and the model classifier is obtained by training an initial network model by using the historical time series data.

In the present exemplary embodiment, the time series prediction model may include, for example, a differential integrated moving average autoregressive model (ARIMA model), an exponential smoothing model (ETS model), a time series decomposition model (STL-AR model), a Theta model, and

models, and the like. Here, the trend prediction and the seasonal prediction of the time series are exemplified by a time series decomposition model (STL-AR model), and other models are similar and therefore are not described in detail.

Specifically, the time series decomposition model can be used for measuring the trend and the seasonality of the time series, and the decomposition formula is as follows: y is_t＝T_t+S_t+R_t(ii) a Wherein, T_tShows the trend term after smoothing, S_tRepresenting a seasonal item, R_tRepresenting the residual terms. For data with strong tendency, the data after season adjustment should have larger variation range than the residual term. Therefore, Var (R)_t)/Var(T_t+R_t) Will be relatively small. However, for time series with no or weak trends, the two variances should be approximately the same. Thus, the trend intensity can be defined as:

this may give a measure of the strength of the trend, which may be between 0 and 1. Since in some cases the variance of the residual terms is even larger than the seasonally transformed sequence, it is possible to let F be_TA minimum value of 0 is desirable. Similarly, the intensity of seasonality is defined as follows, and data used is trend-removed data rather than seasonally adjusted data.

Intensity of season F_sWhen the sequence is close to 0, the sequence has almost no seasonality, and the strength of the season is F_sVar (R) representing the sequence when it is close to 1_t) Much less than Var (S)_t+R_t)。

Therefore, after the time series feature matrix of each time series data is obtained, the time series feature matrices and a preset classification model can be used for classifying the plurality of time series prediction models to obtain target prediction models of different types of time series data. Specifically, the time series characteristic matrix may be used as an input of a classification model, and then the classification model may predict an optimal prediction model for the time series data according to the time series characteristic matrix. By the method, the accuracy of the prediction model can be improved, and the accuracy of the prediction result can be further improved.

It should be further added that the model classifier is obtained by training the initial network model by using the historical time series data, and a specific training process will be described in detail later, which is not described herein again.

In step S130, the target prediction model is used to predict data of each time series data in a future time period to obtain a prediction result, and the prediction result is displayed.

In the exemplary embodiment, the target prediction model is used for predicting the data of each time series data in the future time period to obtain the prediction result, and the prediction result is displayed, so that relevant personnel can prepare correspondingly according to the prediction result, for example, corresponding goods are prepared or corresponding logistics personnel are configured according to the prediction result, and further, the problems of economic loss and the like caused to enterprises due to no preparation can be avoided. Meanwhile, the financial market quotation can be predicted by the method, and loss can be stopped in time according to the prediction result, so that larger economic loss is avoided.

Fig. 2 schematically illustrates another time series-based data prediction method according to an exemplary embodiment of the present invention. Referring to fig. 2, the time-series based data prediction method may include steps S210 to S240, which will be described in detail below.

In step S210, training set data and verification set data are obtained according to the historical time series data, and each time series prediction model is trained by using each time series data in the training set data.

In this example embodiment, the historical time series data is sampled by using a self-service sampling method (Bootstrapping) to obtain the training set data and the verification set data, and then each time series prediction model is trained (parameters in each time series prediction model are adjusted) by using each time series data in the training set data. Specifically, the Bootstrapping process is formally described as: for a given processing task, a specific guided method of training the classification model is chosen. Two data sets are then required, typically a small number of labeled L and unlabeled U data sets. The labeled data set is then expanded step by step through the unlabeled data set U. Thereby training the final classifier to realize specific processing tasks. The process of enlarging the labeled data set through the unlabeled data set is as follows:

firstly, using a labeled data set L (possibly a very small number of data sets), and applying a selected classification method to train a classifier h, wherein the function of h is mainly used for labeling labeled classifications in unlabeled data sets, and generally some heuristic rules and the like can be adopted; then, labeling and classifying the U by using h, wherein the purpose is to acquire labeled data from the U; further, selecting data with higher confidence coefficient from the annotation data as annotation data to be added into the annotation data set; and finally, repeating the process until an iteration end condition is met. By the method, the problems that in the prior art, due to the fact that cross validation needs to divide a training set and a validation set into data sets, when the number of sequence data observation is small, division of the training/validation set is difficult, a model is difficult to train due to too little training data, and a model selection result is unreliable due to too little validation set data can be solved, the data quantity of the training set data and the validation set data is increased, and the accuracy of a model classifier can be improved.

In step S220, a plurality of prediction results are obtained by predicting each time series data in the verification set data by using each trained time series prediction model, and a difference between each prediction result and an actual result corresponding to each prediction result is calculated.

In this exemplary embodiment, first, a plurality of predicted results are obtained by predicting each time series data in the verification set data by using each trained time series prediction model, and then, a difference between each predicted result and an actual result is calculated, where the difference may include a mean square error value or a root mean square error value, and the like.

In step S230, the time-series prediction model with the smallest difference is used as the current prediction model of the time-series data corresponding to the prediction result.

In step S240, an initial network model is trained by using the current prediction model of each time series data in the verification set data and the time series feature matrix of each time series data, so as to obtain the model classifier.

In this exemplary embodiment, first, the time series feature matrix of each time series data in the verification set data is respectively input into the initial network model to obtain a plurality of output results; wherein the initial network model comprises at least one of a decision tree model, a lifting tree model (XGBOST model), a random forest model, and a neural network model; secondly, judging whether the output results are the same as the current prediction model of the time sequence data or not; and finally, taking the initial network model as the model classifier when each output result is determined to be the same as each current prediction model.

Hereinafter, the data prediction method based on time series according to the exemplary embodiment of the present invention will be further explained and explained with reference to fig. 3. Referring to fig. 3, the time series-based data prediction method may include the steps of:

stage one: training model classifier

Step S301, acquiring historical time series data from a data set;

step S302, Bootstrap sampling is carried out on the historical time sequence data to obtain training set data and verification set data;

step S303, training a plurality of time sequence prediction models by using training set data, and predicting verification set data by using each trained time sequence prediction model to obtain a plurality of prediction results;

step S304, calculating the difference between the prediction result and the real result, and taking the time sequence prediction model with the minimum difference as an optimal prediction model;

step S305, extracting a time sequence characteristic matrix of the training set data, and obtaining a model classifier according to the time sequence characteristic matrix and a final prediction model;

and a second stage: predictive model selection

Step S306, extracting time sequence feature matrixes of all time sequence data included in the historical time sequence data;

step S307, selecting a corresponding target prediction model for each time sequence data according to each time sequence characteristic matrix and the model classifier;

and step S308, predicting by using the corresponding target prediction model.

Fig. 4 schematically illustrates a time-series-based logistics cargo volume prediction method according to an exemplary embodiment of the present invention. Referring to fig. 4, the method for predicting the logistics cargo volume based on the time series may include steps S410 to S440, which will be described in detail below.

In step S410, historical cargo quantity time-series data is acquired, and a time-series feature matrix of the historical cargo quantity time-series data is obtained according to a time-series feature of the historical cargo quantity time-series data.

In step S420, classifying a plurality of time series prediction models by using a time series feature matrix and a preset model classifier to obtain a target prediction model of the historical cargo quantity time series data; and the model classifier is obtained by training a lifting tree algorithm model by utilizing the historical cargo quantity time series data.

In step S430, the target prediction model is used to predict the data of the historical cargo quantity time-series data in the future time period to obtain a prediction result, and the prediction result is displayed, so that the user configures the required physical distribution cargo quantity according to the prediction result.

In the exemplary embodiment schematically illustrated in fig. 4, on one hand, it is solved that the model selection method based on time series cross validation in the prior art needs to use each model for each time series for model selection separately, and the order of time complexity is higher; therefore, the model selection efficiency is low, the selection efficiency of the target prediction model is improved, and the data prediction efficiency of each time series data in the future time period is improved; on the other hand, the accuracy of the prediction result is improved; on the other hand, related personnel can prepare corresponding goods or configure corresponding logistics personnel and the like according to the prediction result, and further the problems of economic loss and the like caused to enterprises due to no preparation can be avoided. For example, enough goods may be allocated, or enough courier delivery personnel may be allocated, etc. by prediction before the dual 11 event begins.

It should be further added that, the parts related to defining and explaining the methods in steps S110 to S130 in the foregoing data prediction method based on time series also apply to steps S410 to S430, and are not described herein again to avoid redundant contents.

Fig. 5 schematically illustrates another time-series-based logistics cargo volume prediction method according to an exemplary embodiment of the present invention. Referring to fig. 5, the method for predicting the logistics cargo volume based on the time series may further include steps S510 to S540, which will be described in detail below.

In step S510, the historical cargo quantity time-series data is normalized, and training set data and verification set data are obtained according to the normalized historical cargo quantity time-series data.

In the exemplary embodiment, firstly, the historical cargo time series data (referred to as time series data) is normalized, so that the universality of the training set data and the verification data can be better. Then, 1000 pieces of sample data are obtained by using a bootstrap (bootstrap) method on the whole historical cargo quantity time series data, and then training set data and verification set data are divided for each time series in the part of data. Specifically, fig. 6 lists 3 pieces of time series data, where part 601 of the time series data graph is training set data and part 602 is validation set data.

In step S520, the training set data is used to train each time sequence prediction model, and the trained time sequence prediction models are used to predict the verification set data to obtain a plurality of prediction results.

In step S530, a difference between each of the predicted results and an actual result corresponding to each of the predicted results is calculated, and a time-series prediction model having the smallest difference is used as a target prediction model of the historical cargo amount time-series data.

In step S540, an initial classifier is trained according to the target prediction model and the time sequence feature matrix of the verification set data, so as to obtain a model classifier of the historical cargo amount time series data.

Hereinafter, steps S520 to S540 will be explained and explained. Specifically, first, 5 time sequence prediction models to be selected are respectively fitted to training set data, and a time sequence feature matrix (statistic) is extracted based on training period data, and partial calculation results are listed in the time sequence feature matrix in fig. 6. Then, 5 prediction models to be selected are used for predicting the data in the verification period to obtain prediction results, and the root mean square error index can be calculated by comparing the predicted values with the true values. And recording the model with the minimum root mean square error as the optimal model corresponding to the time sequence. As shown in the model result table of fig. 7, for example, for the time series data with id 50725 in the historical inventory time series data, the error of using the ETS model is the smallest, so the ETS model should be selected as its optimal prediction model. Finally, for each time sequence data, a new sample is formed according to each time sequence characteristic matrix and the optimal model mark, and the model with the optimal prediction performance corresponding to the characteristic time sequence is described. And summarizing the time sequence characteristics of each time sequence and the model labels thereof, and training a model classifier by using an xgboost tree algorithm based on the data.

Further, the time sequence feature matrix of the model classifier is used as the input of the xgboost tree algorithm, and the classification model predicts the 'optimal' prediction model of each time sequence prediction time sequence model. As shown in fig. 8, the time series data with id 56712 in the historical cargo quantity time series data is taken as an example, and after the value of each feature is calculated, the model obtained in stage 1 is used to select a classifier for prediction, and the result is the ETS model. As a verification, the respective prediction effects of the 5 types of models actually used for the time series can be calculated, and the root mean square error of the time series predicted by using the ETS model is found to be the minimum and is consistent with the classifier prediction result selected by the model.

Furthermore, the method provided by the invention can avoid directly processing all time sequence data through data sampling, then establishes a model selection classifier based on the sampled data, extracts information of different models with different time sequence characteristics, and uses the information for model selection of all time sequence data. Compared with the model selection of performing time sequence cross validation on all time sequence data one by one, the method can save a large amount of model selection calculation time by using the knowledge of the classifier.

The embodiment of the invention also provides a logistics system. Referring to fig. 9, the logistics system may include a distribution station 910, a delivery device 920 and a receiving device 930, and the distribution station, the delivery device and the receiving station may be connected in communication, and the delivery device may be, for example, an unmanned aerial vehicle, an unmanned vehicle, a robot, or the like, which will be described in detail below.

Specifically, taking once-a-year large double-11 activities as an example, the logistics information designed in the next double-11 activities can be predicted by using the time-series-based logistics goods quantity prediction method, so that goods required to be used and corresponding delivery devices, receiving devices and the like can be better prepared, further the logistics speed can be further improved, and the user experience can be improved.

Specifically, firstly, acquiring time series data of each category in the previous double 11 activities, and obtaining a time series feature matrix of the time series data of each category according to the time series features of the time series data of each category; wherein, the time sequence data can include sales volume, order sending time, receiving time and logistics information of each type in each region;

secondly, classifying the plurality of time sequence prediction models by using a time sequence characteristic matrix and a preset model classifier to obtain a target prediction model of the time sequence data of each category; each model classifier is obtained by training a lifting tree algorithm model by utilizing time series data of each category in double 11 activities in the last year;

finally, the time sequence data of each category is predicted by using each target prediction model in the next year and the next year, so that the distribution station can configure required stock, delivery devices, receiving devices and the like according to the prediction result, and a user can timely receive the purchased goods, thereby improving the user experience; in addition, more users can be added, so that more competitiveness can be promoted, and enterprise benefits can be increased.

The embodiment of the invention also provides a data prediction device based on the time series. Referring to fig. 10, the time-series based data prediction apparatus may include a data acquisition module 1010, a target prediction model determination module 1020, and a data prediction module 1030. Wherein:

the data obtaining module 1010 may be configured to obtain historical time series data, and obtain a time series feature matrix of each time series data according to a time series feature of each time series data in the historical time series data.

The target prediction model determining module 1020 may be configured to classify a plurality of time series prediction models by using a time series feature matrix of each time series data and a preset model classifier, so as to obtain a target prediction model of each time series data; and the model classifier is obtained by training an initial network model by using the historical time series data.

The data prediction module 1030 may be configured to predict, by using the target prediction model, data of each time series data in a future time period to obtain a prediction result, and display the prediction result.

In an exemplary embodiment of the present disclosure, the time-series based data prediction apparatus further includes:

the first training module may be configured to obtain training set data and verification set data according to the historical time series data, and train each time series prediction model by using each time series data in the training set data.

The first difference calculation module may be configured to predict each time series data in the verification set data by using each trained time series prediction model to obtain a plurality of prediction results, and calculate a difference between each prediction result and an actual result corresponding to each prediction result.

The first current prediction model determining module may be configured to use a time-series prediction model with a smallest difference as a current prediction model of the time-series data corresponding to the prediction result.

The first model classifier determining module may be configured to train an initial network model by using a current prediction model of each time series data in the verification set data and a time series feature matrix of each time series data, so as to obtain the model classifier.

respectively inputting the time sequence feature matrix of each time sequence data in the verification set data into the initial network model to obtain a plurality of output results; wherein the initial network model comprises at least one of a decision tree model, a lifting tree model, a random forest model and a neural network model; judging whether the output results are the same as the current prediction models of the time sequence data or not; and taking the initial network model as the model classifier when determining that each output result is the same as each current prediction model.

extracting time sequence characteristics of each time sequence data in the historical time sequence; wherein the timing characteristics include a plurality of timing lengths, trends, seasonality, linearity, steepness, spectral entropy, compartmentalization, volatility, autocorrelation, and partial autocorrelation; and obtaining a time sequence characteristic matrix of each time sequence according to each time sequence characteristic.

In an exemplary embodiment of the present disclosure, the time series prediction model includes a plurality of a difference integrated moving average autoregressive model, an exponential smoothing model, a time series decomposition model, a probability map model, and a naive bayes model.

The specific details of each module in the time-series-based data prediction apparatus have been described in detail in the corresponding time-series-based data prediction method, and therefore are not described herein again.

The embodiment of the invention also provides a logistics cargo quantity prediction device based on the time series. Referring to fig. 11, the time-series-based logistics cargo quantity prediction apparatus may include a time-series feature matrix determination module 1110, a time-series prediction model classification module 1120, and a prediction result presentation module 1130. Wherein:

the time sequence feature matrix determination module 1110 may be configured to obtain historical cargo quantity time series data, and obtain a time sequence feature matrix of the historical cargo quantity time series data according to a time sequence feature of the historical cargo quantity time series data.

The time sequence prediction model classification module 1120 can be used for classifying a plurality of time sequence prediction models by using a time sequence feature matrix and a preset model classifier to obtain a target prediction model of the historical cargo quantity time sequence data; and the model classifier is obtained by training a lifting tree algorithm model by utilizing the historical cargo quantity time series data.

The prediction result display module 1130 may be configured to predict data of the historical cargo time-series data in a future time period by using the target prediction model to obtain a prediction result, and display the prediction result, so that a user configures a required cargo amount according to the prediction result.

In an exemplary embodiment of the present disclosure, the time-series-based logistics cargo amount prediction apparatus further includes:

and the normalization processing module can be used for performing normalization processing on the historical cargo quantity time-series data and training set data and verification set data according to the normalized historical cargo quantity time-series data.

The data prediction module may be configured to train each time sequence prediction model by using the training set data, and predict the verification set data by using each trained time sequence prediction model to obtain a plurality of prediction results;

a second difference calculation module, configured to calculate a difference between each of the predicted results and an actual result corresponding to each of the predicted results, and use a time series prediction model with a smallest difference as a target prediction model of the historical cargo amount time series data;

and the second model classifier determining module is used for training an initial classifier according to the target prediction model and the time sequence feature matrix of the verification set data to obtain the model classifier of the historical cargo quantity time sequence data.

The specific details of each module in the above mentioned device for predicting the logistics cargo volume based on time series have been described in detail in the corresponding method for predicting the logistics cargo volume based on time series, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1200 according to this embodiment of the invention is described below with reference to fig. 12. The electronic device 1200 shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 12, the electronic device 1200 is embodied in the form of a general purpose computing device. The components of the electronic device 1200 may include, but are not limited to: the at least one processing unit 1210, the at least one memory unit 1220, and a bus 1230 connecting the various system components including the memory unit 1220 and the processing unit 1210.

Wherein the memory unit stores program code that is executable by the processing unit 1210 such that the processing unit 1210 performs steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 1210 may perform step S110 as shown in fig. 1: acquiring historical time sequence data, and acquiring a time sequence feature matrix of each time sequence data according to the time sequence feature of each time sequence data in the historical time sequence data; step S120: classifying a plurality of time sequence prediction models by using a time sequence characteristic matrix of each time sequence data and a preset model classifier to obtain a target prediction model of each time sequence data; the model classifier is obtained by training an initial network model by using the historical time series data; step S130: and predicting the data of each time sequence data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result.

The processing unit 1210 may perform step S410 as shown in fig. 4: acquiring historical cargo quantity time-series data, and obtaining a time-series characteristic matrix of the historical cargo quantity time-series data according to the time-series characteristics of the historical cargo quantity time-series data; step S420: classifying a plurality of time sequence prediction models by using a time sequence characteristic matrix and a preset model classifier to obtain a target prediction model of the historical cargo quantity time sequence data; the model classifier is obtained by training a lifting tree algorithm model by utilizing the historical cargo quantity time series data; step S430: and predicting the data of the historical goods quantity time-series data in the future time period by using the target prediction model to obtain a prediction result, and displaying the prediction result so that a user configures the required logistics goods quantity according to the prediction result.

The storage unit 1220 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)12201 and/or a cache memory unit 12202, and may further include a read only memory unit (ROM) 12203.

Storage unit 1220 may also include a program/utility 12204 having a set (at least one) of program modules 12205, such program modules 12205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1230 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1200 may also communicate with one or more external devices 1300 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1200, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1200 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 1250. Also, the electronic device 1200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 1260. As shown, the network adapter 1260 communicates with the other modules of the electronic device 1200 via the bus 1230. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A data prediction method based on time series is characterized by comprising the following steps:

2. The time-series based data prediction method of claim 1, further comprising:

3. The method of claim 2, wherein deriving training set data and validation set data from the historical time series data comprises:

4. The method of claim 2, wherein training an initial network model using the current prediction model of each time series data in the validation set data and the time series feature matrix of each time series data to obtain the model classifier comprises:

5. The method of claim 1, wherein obtaining the time series feature matrix of each time series according to the time series features of each time series data in the historical time series comprises:

6. The time-series based data prediction method of any one of claims 1-5, wherein the time-series prediction model comprises a differential integrated moving average autoregressive model, an exponential smoothing model, a time-series decomposition model, a Theta model, and

a plurality of models.

7. A logistics cargo quantity prediction method based on time series is characterized by comprising the following steps:

8. The method of predicting the logistics cargo volume based on time series according to claim 7, wherein the method of predicting the logistics cargo volume based on time series further comprises:

9. A time-series-based data prediction apparatus, comprising:

10. A time-series-based logistics cargo quantity prediction apparatus is characterized by comprising:

11. A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the time-series based data prediction method of any one of claims 1 to 6 or the time-series based logistics cargo amount prediction method of any one of claims 7 to 8.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the time series based data prediction method of any one of claims 1-6 or the time series based logistics cargo volume prediction method of any one of claims 7-8 via execution of the executable instructions.