WO2022106297A1

WO2022106297A1 - Univariate forecasting

Info

Publication number: WO2022106297A1
Application number: PCT/EP2021/081413
Authority: WO
Inventors: Uday SHARMA
Original assignee: Unilever Ip Holdings B.V.; Unilever Global Ip Limited; Conopco, Inc., D/B/A Unilever
Priority date: 2020-11-20
Filing date: 2021-11-11
Publication date: 2022-05-27

Abstract

A method and apparatus for generating a demand model for a resource, the apparatus comprising: one or more processors; an output device; and computer memory, the computer memory comprising computer program code configured to: receive historical demand data for the resource; determine, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separate the historical demand data into a training dataset and a testing dataset; apply a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and select one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding parameters.

Description

UNIVARIATE FORECASTING

Field of the invention

The invention relates to a system and method for generating a resource consumption forecast, and in particular, although not exclusively, to generating hardware resources usage forecasts or product sales forecasts.

Background of the invention

Univariate forecasting methods allow a prediction to be made from past data values for a single series. For example, future values of an attribute in a time series may be assumed to be based solely on past values. Such analysis methods are known in the art and can be used to perform forecasting, such as demand forecasting. Demand forecasting is of practical application in a wide range of fields in order to determine the likely future consumption of a resource, such the use of energy by users in a power network, the use of computational resources by physical machines or virtual machines on a network or in another system, or the consumption of a commodity by a population. Such forecasting has also found application to predict demand for a product, such as a consumer product. For example, products in the fast-moving consumer goods (FMCG) market, including foods, beauty and household care segments, typically exhibit unique behaviour with respect to sales trends, market externalities, seasonality and other sales up-lifts/down-lifts. The nature of differences in sales patterns of such products makes it impracticable for a demand planner to use a common timeseries-based model to forecast sales for a wide range of products in a large portfolio. Moreover, where a wide range of products are present in such a product portfolio, it can be challenging and time consuming for a demand planner to select an appropriate model in order to model the sales behaviour of each product. Such a process would typically require substantial expertise and skill on the part of the demand planner in order to determine a suitable model. Some aspects of the present disclosure relate to alleviating the above difficulties.

Summary of the invention

According to a first aspect of the present disclosure, there is provided a method for generating a demand model for a resource, comprising: receiving historical demand data for the resource; determining, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separating the historical demand data into a training dataset and a testing dataset; applying a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and selecting one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding parameters.

In this way, the method allows the selection of an appropriate model and parameter set for a given set of demand data without the need for detailed operator input or skill. As such, the process of demand planning may be made more efficient and less time consuming for the demand planner.

The method may comprise applying a smoothing function to the training data set. The smoothing function may include a residual outlier cleaning function. The provision of the outlier cleaning function may allow events that cannot be forecast to be removed. The historical demand data may comprise time series data for the resource. The results, or predictions, may correspond to the same time series as the testing dataset.

The resource may have a lifecycle. The method may include determining a stage of the resource in its lifecycle. The resource may be mid-lifecycle if it is available and established, in which at least a predetermined amount of historical data is associated with the resource.

The demand model may be a sales forecast. The historical demand data may be historical sales data. The resource may be a product, such as a consumer product. A product that is available and established may be considered to be midlife cycle. An available product may also be referred to as an active product, and may be a product that a supplier is willing to supply at present or in the future. The product may be considered to be midlife cycle if more than a predetermined number of the most recent available sales periods indicate that sales have been made.

The resource may be a technical resource. The historical demand data may be data concerning a physical system.

The method may comprise determining the stage in the lifecycle comprises classifying the resource as one of: a new resource, a midlife cycle resource, a decommissioned resource. A new resource may be a resource for which less than a predetermined period of historical data is available. The predetermined period of historical data may be available one of 1 month, 3 months, 6 months, 12 months, 24 months, or 36 months for example.

A decommissioned resource is one for which there was less than a predetermined level of demand in a preceding period of historical data. The predetermined level of demand may be expressed as a minimum number of units of the resource.

A new product may have fewer than 12 periods (e.g. months) of sales data. A decommissioned product may have fewer than a predetermined number of sales in the past 12 months. A decommissioned product may have had no sales within a predetermined period. Automated model selection may not be used for new products. The future sales of decommissioned products may be forecast as zero sales.

The method may comprise determining if the resource is a decommissioned resource. The method may comprise forecasting the future demand for a decommissioned resource as no demand.

The plurality of model algorithms may include one or more of: TBATS, Additive ETS (ETS AAA), Dynamic ETS (ETS ZZZ), Dynamic ETS with fixed additive seasonality (ETS ZZA), ARIMA, and a completely customized implementation of ETS known as HUL ETS.

The method may be performed on historical data for a plurality of resources.

There is also provided a method and associated apparatus for generating a demand model for a resource, the method comprising: receiving historical demand data for the resource; determining, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separating the historical demand data into a training dataset and a testing dataset; applying a plurality of models to the training data to determine a set of parameters for each model; generating results using each of the models and corresponding determined parameters; and selecting one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding determined parameters.

According to a further aspect of the disclosure there is provided an apparatus for generating a demand model for a resource, the apparatus comprising: one or more processors; an output device; and computer memory, the computer memory comprising computer program code configured to: receive historical demand data for the resource; determine, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separate the historical demand data into a training dataset and a testing dataset; apply a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and select one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding parameters.

Brief Description of Figures

One or more embodiments will now be described by way of example only with reference to the accompanying drawings in which: Figure 1 illustrates a method according to a first embodiment;

Figure 2 illustrates a method according to a second embodiment;

Figures 3a and 3b illustrate additive and multiplicative variants of ETS applied to the same historical data; and

Figure 4 illustrates a schematic block diagram of a computer system.

Detailed Description of the Invention

Demand Model

According to a first aspect there is provided a method for generating a demand model for a resource according to a first embodiment. The method may be performed as part of an automated processes.

The demand model may be a model for forecasting future consumption of resource, such as energy, computer or natural resources, or sales data, for example. The terms “demand” and “consumption” may be used interchangeable herein with reference to the model.

The method comprises receiving historical demand data for the resource. The historical demand data may comprise resource consumption data, such as energy in a power network, computational resource, or natural resources such as commodities from a mine, or sales data for a particular product, for example. Typically, the historical demand data is time series data for the resource.

From the historical demand data, a stage in a lifecycle of the resource is determined. The stage in the lifecycle may comprise classifying the resource as one of: a new resource, a mid-lifecycle resource, a decommissioned resource. Alternatively, determining a stage in a lifecycle of the resource may simply comprise determining whether the resource is available and established. The method tests whether the resource is available and established, which relates to a midlife-cycle resource. For midlife-cycle resources, predictions of future behaviour in consumption of the resource may be based on the historical data, whereas if the resource is only recently available then there may not be sufficient historical data to make such predictions with a suitable degree of confidence. Particularly where the method is performed as part of an automated process, it may be desirable for the method not to provide predictions where there is insufficient historical data because those predictions may be erroneously relied upon by the user of the automated method.

Data concerning mid-lifecycle resources may be processed. If the resource is found to be available and established (in that a predetermined amount of historical data is available), the historical demand data are separated into a training dataset and a testing dataset. In examples where the historical data are timeseries data, the training dataset may be provided by an earlier portion of the timeseries data than the testing dataset.

In some examples, it may be preferable to apply a smoothing function to the training data set in order to remove statistical outliers. A residual outlier cleaning function is an example of a class of smoothing function that may be applied using principles known in the art.

A plurality of model algorithms is applied to the training data to determine a set of parameters for each model algorithm, and a prediction is generated using each of the model algorithms and corresponding determined parameters. The determination may be based on trialing predetermined combinations of parameters. The plurality of algorithms may include disparate algorithms, in that the various algorithms may be suited to modelling different types of data. The plurality of algorithms may be provided by different standard statistical algorithms that are known in the field of applying univariate forecasting techniques. One of the model algorithms and parameters is selected as the forecasting model based on, for each model algorithm and parameter combination, a comparison of a variance, which may also be referred to as an error, between the testing dataset and the prediction.

The selected algorithm and parameter set determined by the method may be used to predict future demand for the corresponding product using other resource demand data that is not part of the training dataset or the testing dataset. In this way, the most relevant of a number of different types of algorithms may be selected without a priori knowledge of the nature of the data that the process is being applied to.

In general, it will be appreciated that the order in which steps are performed may be unimportant for some steps. For example, the historical demand data may be separated into training data and testing data before the stage in the resource lifecycle is established.

The method may be performed on historical data for a plurality of resources.

In one version, if the resource is a decommissioned resource, that is a resource that is no longer in use, or no longer available, then future demand may be forecast as zero.

Computer-Resource Control System

In a further aspect, the method may be used to predict the state of a memory allocation in a computer system. In this example, the resource is physical or virtual computer memory and the historical data relates to memory allocation data for a particular computer program operating on the system. The prediction of the future memory requirements for the system running the program enables virtual memory to be paged in advance to ensure the efficient functioning of the system, in the case where physical memory is limited and there are conflicting demands on the storage space that would provide the virtual memory. In this example, memory usage associated with a paricular application can be considered to be a resource and the method may be used to forecast memory usage for a plurality of applications.

E-commerce Platform

In another aspect, the method may be used to generate a sales forecasting model for use, for example, in an e-commerce platform. Such a method may comprise: receiving historical sales data for a product; determining, from the historical sales data, a stage in a lifecycle of the product, and if the product is mid-lifecycle: separating the historical sales data into a training dataset and a testing dataset; applying a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and selecting one of the model algorithms and parameters as the forecasting model based on a comparison of a variance between the testing dataset and the prediction for each model algorithm.

Computer Programs and Hardware

According to a further aspect of the disclosure there is provided a data processing unit configured to perform any method described herein as a computer- implementable. The data processing unit may comprise one or more processors and memory, the memory comprising computer program code configure to cause the processor to perform any method described herein.

According to a further aspect of the disclosure there is provided a computer readable storage medium comprising computer program code configure to cause a processor to perform any computer-implementable method described herein. The computer readable storage medium may be a non-transitory computer readable storage medium.

There may be provided a computer program, which when run on a computer, causes the computer to configure any apparatus, including a circuit, unit, controller, device or system disclosed herein to perform any method disclosed herein. The computer program may be a software implementation. The computer may comprise appropriate hardware, including one or more processors and memory that are configured to perform the method defined by the computer program.

The computer program may be provided on a computer readable medium, which may be a physical computer readable medium such as a disc or a memory device, or may be embodied as a transient signal. Such a transient signal may be a network download, including an internet download. The computer readable medium may be a computer readable storage medium or non-transitory computer readable medium.

Examples

Figure 1 illustrates a flow diagram of a method 100 for generating a demand model for a resource according to a first embodiment. The method may be performed as part of an automated processes.

The method 100 comprises receiving 102 historical demand data for the resource. From the historical demand data, a stage in a lifecycle of the resource is determined 104. In the data concerning mid-lifecycle resources are processed. If the resource is found to be available and established at step 106, the historical demand data are separated 108 into a training dataset 110 and a testing dataset 112.

A plurality of model algorithms is applied to the training data to determine a set of parameters for each model algorithm, and a prediction 116 is generated 114 using each of the model algorithms and corresponding determined parameters

One of the model algorithms and parameters is selected 118 as the forecasting model based on, for each model algorithm and parameter combination, a comparison of a variance, which may also be referred to as an error, between the testing dataset and the prediction.

Various aspects of the method described previously with reference to Figure 1 may be better understood by considering the specific embodiments discussed below regarding a commerce platform described with reference to Figure 2.

Figure 2 illustrates a flow diagram of a method 200 for generating a demand model for a resource according to a second embodiment. It will be appreciated that the second embodiment comprises all of the steps performed in the first embodiment, and includes further implementation details in the costs of product sales forecasting. The method 100 may be used to provide an algorithm and associated parameters that are suitable for modelling the sales behaviour of a particular product, or basepack. The sales may be primary sales or offtakes.

The method 200 comprises the steps of:

1 . Splitting 208 of data into training and testing Sets. 2. Cleaning 220 the data. Negatives sales or sales returns may be systematically rounded off to zero, for example. Such cleaning may be used to account for events that cannot be forecast.

3. Training data are then smoothened 222 using, for example, residual outlier smoothening method. A box plot method may be applied to error residuals obtained after fitting the timeseries with Super Smoother Regression (supsmu) for non-seasonal and a STL decomposition for seasonal packs.

4. Then it is determined 204 whether the basepack is new launch/decommissioned or regular pack (one with ongoing sales). a. If it is decommissioned, sales are forecasted as 0. b. If it is a new launch, customized logic is used (designed with demand planning team). The customized logic is not the subject of the present disclosure.

5. For regular packs, a plurality of models is then applied 214 in parallel. In this example, these models include the following algorithms: a. TBATS - to handle multiple seasonality based timeseries b. ETS ZZZ/ZZA - dynamic exponential smoothening to handle multiplicative trend and additive seasonality cases exponential time series method c. ARIMA - auto regressive moving average model to handle cases where there is strong autocorrelation with previous sales d. HUL ETS - a Customised ETS which gives disproportionate weighting to recent sales over historic sales. e. MLP - multi layer perceptron model to handle cases where traditional statistical methods fail f. ETS AAA - baseline additive model to ensure fallback in case of failure of all above methods Properties of the above example algorithms are discussed below.

ETS (Error Trend and Seasonality)

ETS, also known as exponential smoothening, is a widely used statistical forecasting method. Model include methods to forecast trend (T), seasonality (S), and error (E) components of a timeseries. Exponentially decreasing weighting is given to historic values of E, T and S using four control variables known as alpha, beta, gamma, and a dampening factor. The ETS model is robust, easy to understand and requires little to data to run forward. This makes the ETS model a good base to start any ensemble forecasting data model that can be run in multiple variations based on how its subcomponents interact with each other.

An implementation of the ETS model is available in R programming language from R. Hyndman et al., “Forecasting Functions for Time Series and Linear Models”, Package ‘forecast’, 31 March 2020, Version 8.12 (https://cran.r- proiect.org/web/packages/forecast/forecast.pdf)

Figures 3a and 3b illustrate additive and multiplicative variants 330, 332 of ETS applied to the same historical data.

Additive: In this variant 330, (L, T or S), where L is an error or base level of the forecast, is assumed to have a linear effect on forecast. That is, the trend is for the value to increase over time but the rate of increase is constant.

Multiplicative: In this variant 332, (L, T or S) has a multiplicative effect on forecast values. That is the trend increases at a multiplicative rate.

Three implementations of the ETS model are used as algorithms in the above method -

AAA - Additive Trend, Additive Level/Error and Additive Seasonality ZZA - Dynamically determined level/error and trend, Additive seasonality. All variations of E and T are taken while S is fixed at additive (i.e. AAA, AMA, MAA, MMA). The forecast with the least Akaike information criterion (AIC) error is selected.

ZZZ - E, T and S all are dynamically determined. In this implementation, all three E, T and S parameters are dynamically determined by making all the Additive and Multiple combinations and then the forecast with the least Akaike information criterion (AIC) error is selected. Also, the dampening factor by default may be applied.

ARIMA

ARIMA stands for autoregressive, integrated and moving average forecasting process. It consists of three components:

AR: Relationship of value with lagged (historic) p-values

MA: Relationship of deviation with q-lagged deviations

I: d-order differencing done between two successive values to make time series constant

ARIMA model thus has representation of form ARIMA (p,d,q). ARIMA models are extremely efficient at determining forecasts for series wherein there is good dependency of current values with previous values. Also, ARIMA models do not assume any inherent pattern in data and use iterative approach to fit data, thus this model is able to provide forecasts based on mixed/no clear pattern data more effectively than ETS.

Auto ARIMA may be used, which dynamically checks each combination of p, d and q values and then selects a final model based on the lowest AIC score.

An implementation of the ARIMA model is also available in R programming language from R. Hyndman et al., “Forecasting Functions for Time Series and Linear Models”, Package ‘forecast’, 31 March 2020, Version 8.12 (https://cran.r- proiect.org/web/packaqes/forecast/forecast.pdf).

TBATS

TBATS, also known as Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components, is a widely used, publicly available statistical method to forecast demand for timeseries data. This algorithm uses a trigonometric representation of seasonality to forecast complex seasonal patterns. TBATS is able to represent non-integer length and multiple seasonality. TBATS builds on top of exponential smoothening methods and combines them with ARMA for residuals and trigonometric method to account for seasonality.

An implementation of the TBATS model is also available in R programming language from R. Hyndman et al., “Forecasting Functions for Time Series and Linear Models”, Package ‘forecast’, 31 March 2020, Version 8.12 (https://cran.r- proiect.org/web/packages/forecast/forecast.pdf).

MLP

MLP, also known as multi-layer perceptron, is a basic neural network. MLP may be implemented to handle cases which otherwise cannot be predicted by traditional statistical methods. MLP trains a perceptron based neural network, wherein a hidden number of nodes are determined automatically by random 20% sampling and 5-fold cross validation.

Implementations of MLP model in R programming language are available from N. Kourentzes, “nnfor” package, version 0.9.6 (htps://www.rdocumentation.Org/packages/nnfor/versions/0.9.6/topics/mlp) HUL ETS

HUL ETS is a customized ETS implementation. HUL ETS uses standard additive Error, Trend, Seasonality and Dampening factors together with an additional parameter, “Residual-Weight”, which is added to give exponentially decreasing weighting to historic residual errors (on an absolute scale) while optimizing the model for the lowest sum of squared errors during curve fitting stage. This technique allows emphasises the importance of recent data in determining the forecast.

An implementation of the HUL ETS model (in R code) based on the standard implementation referred to above is provided below - xhat[i] = level[i-1]+trend[i-1]*phi+seasonal[i-period] level[i] = alpha*(sales[i]-seasonal[i-period])+( 1-alpha)*(level[i-1]+trend[i- 1]) trend [i]=beta *(level[i]-level[i- 1 ]) +( 1-beta)*(trend[i- 1 ]) seasonal [i]= gamma *(sales[i]-level[i- 1 ]-trend[i- 1 ]) +( 1 -gamma ) *seasonal[i- period] res[i] = sales[i]-xhat[i]

SSE = SSE + fres /f^A/;rfres//7/f^

Alpha, beta, gamma and phi are traditional level, trend, seasonal and dampening factors, respectively. / is the time index value, with 0 being the first period in historic values and the maximum value of / being the most recently available historic value. rf is a recency factor, rf = 0, means no weighting to historic residual values (strong recency in timeseries) and rf = 1 means equal weighting (low recency) rf is different from alpha, beta, gamma and phi. Traditional methods are relative and tells how a current value is derived from exponentially decreasing contribution of historic values, while rf on other hand is a follow-up parameter that helps fitting the most optimized curve closer to recent values, rather than historic values on an absolute scale of time series.

HUL ETS may be particularly useful for resources in which the demand behaviour has changed during the time series and thus equal weighting should not be given to historic data while preparing the forecast.

Regarding to Figure 2, the method 200 further comprises that:

6. The forecasts from these 6 models are compared 218 with the testing set data from step 1 . The model which gives best output, or lowest error (either of RMSE, MAPE, SSE, selectable by a demand planner), may be selected for that basepack.

7. Future sales are then forecast 219 using the selected model.

In some examples, the method described above with reference to Figure 2 may be extended to detect anomalies. Such additional steps may include that:

8. the forecast for every basepack is checked against historic RR (run rate or last 12 month average), for non-seasonal packs, or Trend and Like-to- like seasonal periods, for seasonal packs. If there is more than a certain variation (10%) in any of these components, the forecast is termed as an anomaly and flagged for manual review.

9. forecasts for new launches, decommissioned packs or ones which contribute to more than certain percentage (10%) to overall error are also termed as an anomaly and flagged for manual review. It has been found that such an automated implementation of such a method is able to automatically classify basepack as new launch/standard pack, determine seasonality, select the best out of 6 models and then forecast sales for it. Further, the method also provides an anomaly score to every forecast to ensure any spurious uplifts/downlifts were highlighted before planners for manual review.

Advantages of the method described with reference to Figure 2 include:

1 . it is able to forecast sales for basepacks from all FMCG subcategories.

2. It can handle basepacks/decom missioned packs automatically.

3. It is able to give disproportionate weight to recency in trend fitting (which is not done in conventional methods). This is particularly suitable to resources where demands change over time such as FMCG.

4. It is able to automatically adjust/tune models (ETS/ARIMA/MLP/HUL ETS) to optimize forecasts.

5. Anomaly detection allows the planner to review only packs flagged as requiring attention rather than going through each forecast.

6. It does not require manual intervention from the planner for regular packs and is scalable depending on portfolio size.

Figure 4 illustrates a schematic block diagram of a computer system 400 which may be used to implement the method described previously with reference to Figures 1 and 2. The system 400 comprises one or more processors 402 in communication with memory 404. The memory 404 is an example of a computer readable storage medium. The one or more processors 402 are also in communication with one or more input devices 406 and one or more output devices 408. The various components of the system 400 may be implemented using generic means for computing known in the art. For example, the input devices 406 may comprise a keyboard or mouse and the output devices 408 may comprise a monitor or display, and an audio output device such as a speaker.

Claims

1. An apparatus for generating a demand model for a resource, the apparatus comprising: one or more processors; an output device; and computer memory, the computer memory comprising computer program code configured to: receive historical demand data for the resource; determine, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separate the historical demand data into a training dataset and a testing dataset; apply a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and select one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding set of parameters.

2. An apparatus according to claim 1 , wherein the apparatus is configured to apply a smoothing function to the training data set.

3. An apparatus according to claim 2, wherein the smoothing function includes a residual outlier cleaning function.

4. An apparatus according to any of the preceding claims, wherein the historical demand data comprises time series data for the resource.

5. An apparatus according to any of the preceding claims, wherein the results correspond to the same time series as the testing dataset.

6. An apparatus according to any of the preceding claims, wherein the apparatus is configured to classify the resource as one of: a new resource, a mid-lifecycle resource, and a decommissioned resource.

7. An apparatus according to claim 6, wherein the new resource is a resource for which less than a predetermined period of historical data is available.

8. An apparatus according to claim 6 or claim 7, wherein the decommissioned resource is one for which there was less than a predetermined level of demand in a preceding period of historical data.

9. An apparatus according to any of claims 6 to 8, wherein the apparatus is configured to determine if the resource is a decommissioned resource, and forecasting the future demand for a decommissioned resource as no demand.

10. An apparatus according to any one of the preceding claims, wherein the demand model may be a sales forecast.

11 . An apparatus according to claim 10, wherein the historical demand data are historical sales data.

12. An apparatus according to any one of the preceding claims, wherein the resource is a product.

13. An apparatus according to any one of the preceding claims, wherein the apparatus is configured to forecast future demand using the demand model.

14. A method for generating a demand model for a resource, comprising: receiving historical demand data for the resource; determining, from the historical demand data, a stage in a lifecycle of the resource, and if the resource is available and at least a predetermined amount of historical data is associated with the resource: separating the historical demand data into a training dataset and a testing dataset; applying a plurality of models and sets of parameters to the training data to generate results using each of the models and corresponding parameters; and selecting one of the models and parameters as the demand model based on a comparison of an error between the testing dataset and the results for each model and corresponding set of parameters.

15. A computer readable storage medium comprising computer program code configure to cause a processor to perform the method of claim 14.