CN110555537A

CN110555537A - Multi-factor multi-time point correlated prediction

Info

Publication number: CN110555537A
Application number: CN201810541283.9A
Authority: CN
Inventors: 柯国霖; 边江; 刘铁岩
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2019-12-10
Also published as: WO2019231636A1

Abstract

In accordance with implementations of the present disclosure, a scheme for multi-factor multi-time point-related prediction is presented. In this scenario, a predictive query is received relating to a plurality of factors at a plurality of points in time. A plurality of historical features are extracted from historical data relating to a plurality of factors at a plurality of historical time points prior to the plurality of time points. Based at least on the plurality of historical features, a prediction result related to the plurality of factors at the plurality of time points is chronologically determined. The determination of the prediction result includes: based on the plurality of historical features, a first predicted outcome associated with the plurality of factors at a first time point of the plurality of time points is determined, and based on the plurality of historical features and the first predicted outcome, a second predicted outcome associated with the plurality of factors at a second time point after the first time point is determined. By the scheme, the prediction of a plurality of time points can be quickly expanded on the basis of solving the prediction problem of a single time point, and the prediction efficiency and accuracy are improved.

Description

multi-factor multi-time point correlated prediction

background

In many application scenarios, predictions need to be made of data over a period of time in the future. Examples of such scenarios include, for example, predictions of demand for goods or services, predictions of traffic conditions, and so forth. Accurate prediction results can help relevant organizations, individuals and the like to plan in advance, and therefore benefits can be achieved in the aspects of economic benefits, labor cost, danger avoidance and the like. In some cases, such predictions may also involve a number of factors. For example, it may be desirable to predict the demand for a certain good in different regions over a period of time in the future. In this example, different regions are considered as factors involved in the prediction problem. Multi-factor multi-time point-related prediction is currently still a challenge. Many of the proposed prediction methods still fail to give accurate and efficient prediction results.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

FIG. 1 illustrates a block diagram of a computing device capable of implementing various implementations of the present disclosure;

FIG. 2 illustrates a flow diagram of a multi-factor multi-time point correlated prediction process in accordance with some implementations of the present disclosure;

FIG. 3 illustrates a block diagram of an example of a prediction module in the computing device of FIG. 1, in accordance with some implementations of the present disclosure; and

FIG. 4 illustrates a flow diagram of a process for training a predictive model in accordance with some implementations of the present disclosure.

In the drawings, the same or similar reference characters are used to designate the same or similar elements.

Detailed Description

the present disclosure will now be discussed with reference to several example implementations. It should be understood that these implementations are discussed only to enable those of ordinary skill in the art to better understand and thus implement the present disclosure, and are not intended to imply any limitation as to the scope of the present disclosure.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one implementation" and "an implementation" are to be read as "at least one implementation". The term "another implementation" is to be read as "at least one other implementation". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

FIG. 1 illustrates a block diagram of a computing device 100 capable of implementing multiple implementations of the present disclosure. It should be understood that the computing device 100 shown in FIG. 1 is merely exemplary, and should not be construed as limiting in any way the functionality or scope of the implementations described in this disclosure. As shown in fig. 1, computing device 100 comprises computing device 100 in the form of a general purpose computing device. Components of computing device 100 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160.

in some implementations, the computing device 100 may be implemented as various user terminals or service terminals having computing capabilities. The service terminals may be servers, mainframe computing devices, etc. provided by various service providers. A user terminal such as any type of mobile terminal, fixed terminal, or portable terminal, including a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, Personal Communication System (PCS) device, personal navigation device, Personal Digital Assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that computing device 100 can support any type of interface to the user (such as "wearable" circuitry, etc.).

The processing unit 110 may be a real or virtual processor and can perform various processes according to programs stored in the memory 120. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of computing device 100. The processing unit 110 may also be referred to as a Central Processing Unit (CPU), microprocessor, controller, microcontroller.

Computing device 100 typically includes a number of computer storage media. Such media may be any available media that is accessible by computing device 100 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 120 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Memory 120 may include a prediction module 122 configured to perform the functions of the various implementations described herein. The prediction module 122 may be accessed and executed by the processing unit 110 to implement the corresponding functionality.

Storage device 130 may be a removable or non-removable medium and may include a machine-readable medium that can be used to store information and/or data and that can be accessed within computing device 100. The computing device 100 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in FIG. 1, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data media interfaces.

The communication unit 140 enables communication with another computing device over a communication medium. Additionally, the functionality of the components of computing device 100 may be implemented in a single computing cluster or multiple computing machines, which are capable of communicating over a communications connection. Thus, the computing device 100 may operate in a networked environment using logical connections to one or more other servers, Personal Computers (PCs), or another general network node.

The input device 150 may be one or more of a variety of input devices such as a mouse, keyboard, trackball, voice input device, and the like. Output device 160 may be one or more output devices such as a display, speakers, printer, or the like. Computing device 100 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., communicating with one or more devices that enable a user to interact with computing device 100, or communicating with any devices (e.g., network cards, modems, etc.) that enable computing device 100 to communicate with one or more other computing devices, as desired, via communication unit 140. Such communication may be performed via input/output (I/O) interfaces (not shown).

In some implementations, some or all of the various components of computing device 100 may be provided in the form of a cloud computing architecture, in addition to being integrated on a single device. In a cloud computing architecture, these components may be remotely located and may work together to implement the functionality described in this disclosure. In some implementations, cloud computing provides computing, software, data access, and storage services that do not require end users to know the physical location or configuration of the systems or hardware providing these services. In various implementations, cloud computing provides services over a wide area network (such as the internet) using appropriate protocols. For example, cloud computing providers provide applications over a wide area network, and they may be accessed through a web browser or any other computing component. The software or components of the cloud computing architecture and corresponding data may be stored on a server at a remote location. The computing resources in a cloud computing environment may be consolidated at a remote data center location or they may be dispersed. Cloud computing infrastructures can provide services through shared data centers, even though they appear as a single point of access to users. Accordingly, the components and functionality described herein may be provided from a service provider at a remote location using a cloud computing architecture. Alternatively, they may be provided from a conventional server, or they may be installed directly or otherwise on the client device.

In performing the prediction, the computing device 100 can receive a predictive query 170 related to a plurality of factors at a plurality of points in time via the input device 150. the predictive query 170 can be input, for example, by a user in the example of FIG. 1, the computing device 100 can receive the predictive query 170 via the input device 150, "what? the demand for X merchandise is in the three future months A and B market". The processing unit 110 of the computing device 100 can determine a predictive result 180 for the predictive query 170 by running the prediction module 122 and output the predictive result via the output device 160. for example, the predictive result 180 of the prediction module 122 contains a demand schedule for X merchandise in the three future months (5 to 7 months in 2018) of A and B market.

It should be appreciated that the predictive query 170 and predictive result 180 referred to in FIG. 1 are only one example. In other implementations, computing device 100 may also process other different types of predictions. While predictive queries 170 and predictive results 180 are presented in text and tabular form, respectively, in the example of fig. 1, in other examples, predictive queries 170 and/or predictive results 180 may be received and/or presented in other forms, such as audio, image, and/or video. Implementations of the present disclosure are not limited in this respect.

In some scenarios involving prediction, most scenarios are only capable of achieving single-factor dependent prediction at a single point in time in the future. Such prediction is not only inefficient, but also accuracy is difficult to guarantee. For example, to predict data for multiple points in time in the future, a respective model would need to be trained to perform the prediction for each point in time. This increases the complexity of training and use, consuming more time, labor, and computational cost.

According to some implementations of the present disclosure, a scheme for multi-factor multi-time point-related prediction is presented. According to this scheme, for a prediction query related to a plurality of factors at a plurality of time points, historical data related to the plurality of factors at a plurality of historical time points before the plurality of time points may be acquired, and a plurality of historical features may be extracted from the historical data for determining a prediction result related to the plurality of factors at the plurality of time points. In the determination of the prediction result, in addition to the extracted history feature, the prediction result determined for the previous time point is also used to determine the prediction result for the subsequent time point. By the method, the prediction for multiple time points can be quickly expanded on the basis of solving the prediction problem of a single time point, and the prediction efficiency and accuracy are improved.

Example implementations of the present disclosure will be described in detail below with reference to the accompanying drawings.

fig. 2 illustrates a flow diagram of a multi-factor multi-point-in-time correlated prediction process 200 according to some implementations of the present disclosure. Process 200 may be implemented by computing device 100 of FIG. 1, for example, at prediction module 122. For convenience of description, the process is described in conjunction with fig. 1.

at 210, the computing device 100 receives the predictive query 170 relating to a plurality of factors at a plurality of points in time. The predictive query 170 may be input by the user to the computing device 100 via a particular input device 150. The predictive query 170 can relate to multiple points in time in the future and multiple factors, indicating that the user desires that the computing device 100 be able to provide predictive results that relate to the multiple-factor, multiple points in time. Herein, the plurality of time points refers to time points divided in any time unit, such as a plurality of hours, a plurality of days, a plurality of weeks, a plurality of months, or a plurality of years, etc. Multiple factors refer to entities (also referred to as "variables") other than the time dimension in the predictive query, such as multiple locations, multiple items, multiple organizations, and so forth. In the example of fig. 1, the predictive query 170 may be "how much demand is for X commodity in cities a and B three months in the future," where "three months in the future" refers to three points in time (e.g., 5 months in 2018, 6 months in 2018, and 7 months in 2018) since the predictive query 170 was issued, and "cities a and B" refer to a plurality of factors (i.e., locations) to which the predictive query 170 relates.

In some implementations, the predictive query 170 may be initiated in various forms of text, speech, etc., or may be selected through a user interface of the output device 160 of the computing device 100. The computing device 100 may convert the initiated predictive query 170 into a computer-recognizable language, such as Structured Query Language (SQL), or the like, using a variety of suitable techniques. The computing device 100 may identify key information from the predictive query 170 regarding the point in time, factors, etc. to which the query relates. It should be understood that the scope of implementations of the present disclosure is not limited in this respect.

In response to receiving the predictive query 170, the computing device 100 may perform the corresponding prediction to provide a predicted result for the predictive query 170. Specifically, at 220, the computing device 100 extracts a plurality of historical features from historical data relating to a plurality of factors at a plurality of historical points in time prior to the plurality of points in time. In implementations of the present disclosure, predictive queries for future points in time are dependent on historical data. Here, the historical data used for prediction is historical data related to the same plurality of factors at a plurality of historical time points before the plurality of time points involved in the prediction inquiry 170. For example, if predictive query 170 relates to "demand for X merchandise in cities A and B for three months in the future," the historical data may be the actual demand for X merchandise in cities A and B over the previous months.

In some implementations, the historical data may be stored in a storage device of computing device 100 (e.g., storage device 130) or in other storage devices or databases accessible to computing device 100. In some implementations, the historical data to be used is related to the historical features to be extracted, as will be described in detail below.

in implementations of the present disclosure, to perform the prediction, the computing device 100 will extract historical features from the historical data. These historical features indicate characteristics of historical data relating to a number of factors that are useful for predicting future trends. In some implementations, the type of historical features to be extracted may be predetermined.

In some implementations, the plurality of historical features may include characteristics of historical data portions related to a single factor during a plurality of historical points in time, also referred to as single-factor related historical features. In this case, the computing device 100 identifies a plurality of history data parts respectively related to the respective factors from the history data, and then extracts a set of history features (referred to as a first set of history features for convenience of description) from each of the history data parts. Each historical data portion includes data items corresponding to respective factors at different ones of a plurality of historical time points. For example, in the historical demand data for X commodity in A and B cities, each historical data portion includes the demand for X commodity in the past months in A city or B city. The historical features extracted from each historical data portion may be particularly indicative of characteristics of the historical data portion that are relevant to the respective factor at a plurality of historical points in time.

In some implementations, for each historical data portion, the computing device 100 may extract a first historical feature, which may indicate short-term point-in-time historical information, and in particular may indicate data items related to a plurality of factors at a predetermined historical point-in-time. For example, for a given target time point of the plurality of time points to be predicted, data items related to the respective factors at the latest ith historical time point may be extracted from the historical data portion, where i may take a value approximately equal to 1, such as 1, 2, 3, and so on. For the example of fig. 1, for the historical data portion related to a city, the demand amount of a city for X commodity, which is pushed forward by 3 months from the first month in the future (e.g., 2 months in 2018), may be extracted as the first historical feature. For the history part related to B city, the corresponding first history feature can be extracted.

In some implementations, for each historical data portion, the computing device 100 may extract a second historical feature, which may indicate a long-term historical average, and in particular may indicate an average of data items related to the respective factor at least two consecutive time points of the plurality of historical time points. The extraction of the second historical feature is adapted to the historical data associated with the value. For example, for a given target time point of a plurality of time points to be predicted, an average value of k data items related to the latest k historical time points and corresponding factors may be extracted from the historical data part, where k may take a value greater than 2, such as 7, 14, 28, and so on. As a specific example, in fig. 1, for the history data part related to the a city, an average value of the demand amounts of the a city for the X commodities, which is pushed 3 months ahead from the first month in the future (for example, 2018, 2, 3, 4 months), may be extracted as the second history feature. For the history portion related to B city, the corresponding second history feature can also be extracted.

alternatively or additionally, for each historical data portion, the computing device 100 may also extract a third historical feature, which may indicate a periodic historical average, and in particular may indicate an average of data items related to the respective factor at a plurality of periodic points in time from the plurality of historical points in time. A periodic time point refers to a plurality of time points in which the time interval between every two consecutive time points is equal. Such a time period may be set, for example, to week, month, quarter, or year, etc. The extraction of the third history feature is also suitable for numerical related history data. For example, for a given target time point among a plurality of time points to be predicted, if the target time point is 4 months, an average value of 4-month-related data items of recent years (e.g., 2 years or more) may be extracted from the historical data section.

The above extracted historical features are respectively associated with a single factor. Additionally or alternatively, the plurality of historical features may also include historical features extracted from historical data portions related to two or more factors during a plurality of historical time points, also referred to as multi-factor related historical features, or multi-granular historical features. In particular, the computing device 100 may extract a set of historical features (referred to as a second set of historical features) across multiple portions of historical data. The second set of historical features indicates characteristics of the historical data portion that are related to at least two of the plurality of factors at the plurality of historical points in time. Such historical characteristics are particularly suited for factors having a hierarchical structure. For example, if multiple factors relate to different regions (e.g., cities), such factors (cities) relate to a hierarchy related to the division of regions that may include, for example, continents, countries, provinces/states, cities, administrative districts, streets, and so forth, from a coarse level to a fine level.

The computing device 100 may extract a hierarchical structure of the factors involved based on the factors, and then extract historical features of historical data portions related to two or more factors belonging to the same hierarchy. The computing device 100 may extract historical features directly from historical data portions related to these factors at the same hierarchical level, or may consider historical data portions related to other factors above or below the hierarchical level. For example, the computing device 100 may extract features similar to the first set of historical features described above as a second set of historical features from data items included in historical data portions relating to different cities, different provinces/states, different countries, etc. In this way, multifactorial related features can be taken into account.

In some implementations, in addition to the historical features, the computing device 100 may determine at least one temporal feature based on the plurality of time points, the temporal feature indicating a temporal pattern before, within, and/or after the plurality of time points. In general, many events behave differently in a holiday than in a non-holiday, as viewed in the time dimension. Thus, the temporal characteristics may indicate the status of the holiday, including whether there is a holiday, the duration of the holiday, and/or the type of holiday (e.g., holiday type such as a year of birth, christmas, new year of lunar calendar, etc.). In particular, computing device 100 may extract a first temporal feature indicative of a condition of historical holidays within a first predetermined time period (e.g., a month, a quarter, a year, etc.) prior to a plurality of time points, such as whether one or more holidays exist, the duration and/or type of holiday, etc. Computing device 100 may also extract a second temporal feature to indicate the status of holidays contained within the plurality of time points, and may additionally or alternatively extract a third temporal feature to indicate the status of future holidays within a second predetermined time period (e.g., one month, one quarter, one year, etc.) after the plurality of time points.

with continued reference to FIG. 2, at 230, the computing device 100 determines chronologically predicted results 180 related to the plurality of factors at the plurality of points in time based at least on the plurality of historical characteristics. The computing device 100 will determine the predicted outcome associated with each factor at each point in time. For example, in the example of FIG. 1, computing device 100 will determine the amount of demand for X goods in A and B markets in each of the next three months, such as the demand table shown in FIG. 1. In some implementations, if one or more temporal features are extracted in addition to the historical features, the computing device 100 may also determine the predicted outcome based on the temporal features.

in implementations of the present disclosure, in addition to the extracted historical features, the prediction determined for a previous point in time is also used to determine a prediction for a subsequent point in time. Specifically, the computing device 100 determines, based on the plurality of historical features, a first predicted outcome (e.g., a predicted outcome associated with each of the factors at a first time point) associated with the plurality of factors at a first time point of the plurality of time points. Then, the computing device 100 determines a second predicted result related to the plurality of factors at a second point in time after the first point in time (e.g., predicted results related to the respective factors at the second point in time) based on the plurality of historical features and the first predicted result. Computing device 100 may also determine a prediction result for a subsequent point in time in a similar manner.

In some implementations, associations between historical features and predicted results may be established in advance, and such associations may be represented as a model (e.g., referred to as a predictive model), for example. Then, a prediction result may be determined based on the extracted plurality of historical features using such a prediction model. Such predictive models may be pre-trained with training data, the training process being discussed in detail below.

in some implementations, the computing device 100 may utilize a Gradient Boosting Decision Tree (GBDT) with a predetermined set of parameters to determine the prediction results. GBDT is a widely used decision tree-based model that includes a plurality of nodes (also referred to as leaves or leaf nodes) arranged in a tree structure. GBDT uses the Boost concept, the algorithm consists of several decision trees, and the conclusions of all the decision trees are accumulated to make the final answer. Thus, instead of requiring each decision tree to learn too much knowledge, GBDT allows each tree to learn a portion of the knowledge, and then accumulate the learned knowledge to form a powerful model.

GBDTs typically require explicit feature input. That is, instead of learning what features in the historical data can contribute to the prediction result through a training process, the GBDT learns a combination of input predetermined features, weights, thresholds, and the like, and obtains a corresponding set of parameters, thereby obtaining a desired prediction capability. Decision tree construction and learning internal to GBDT falls within the scope of GBDT algorithms, and implementations of the present disclosure are not limited in this respect. The process of training the GBDT for prediction purposes in implementations of the present disclosure will be described in detail below.

In some implementations, a plurality of nodes in the GBDT are respectively associated with the extracted plurality of historical features. The GBDT will determine the prediction result based on the input of these historical features according to a predetermined set of parameters. Additionally, the GBDT includes additional nodes that are respectively associated with the extracted temporal features. Both the history feature and the time feature are extracted from the history data in advance. The GBDT may perform decisions based on the extracted features and ultimately determine a prediction result.

In some implementations, to reduce complexity, one GBDT for a single point in time is trained. The GBDT may determine a prediction result for a single point in time each time based on the input characteristics (historical characteristics and temporal characteristics). The computing device 100 may determine, in chronological order, a first prediction result for a first time point of the multiple time points based on the input features using the GBDT, and then determine, based on the input features and the first prediction result, a second prediction result for a second time point subsequent to the first time point.

In some implementations, the computing device 100 may modify the plurality of historical features (and temporal features) based on the first prediction, and determine a second prediction based on the modified features. Since some of the historical features and/or temporal features may be extracted based on the earliest time point among the plurality of time points as a reference, when a prediction result is determined for the earliest time point, the prediction result may be considered as historical data for a subsequent time point and may thus be used to modify the extracted features.

For example, for a second time point of the plurality of time points, if the first historical feature requires extraction of a data item related to a certain factor at the first 1 time point, the first historical feature may be modified from a data item at a time point prior to the earliest time point to a data item related to the factor in the prediction result determined for the earliest time point. As another example, for a second time point of the plurality of time points, if the first time characteristic indicates a condition of a holiday within a period of time before the earliest time point, the first time characteristic may be modified to indicate a condition of a holiday within a period of time including the earliest time point. It should be understood that modifications may be similarly performed for other historical characteristics and/or temporal characteristics. Of course, in some cases, some features may not be changed (e.g., the second temporal feature) even if the first prediction result is considered.

It should be understood that, for a subsequent time point in the plurality of time points, the predicted result of the previous time point may be used continuously to change the characteristics of the subsequent use, and further the predicted result of the subsequent time point may be determined. In this way, the prediction model used (e.g. GBDT) may only be trained to determine a prediction result (a prediction related to multiple factors at a certain point in time) for a single point in time. The prediction results of each determination will continue to be used for subsequent predictions, and thus such a prediction model may be considered a recursive prediction model (e.g., a recursive GBDT). By such a recursive approach, the design, training and use of the predictive model is simplified, thereby greatly reducing complexity and improving effectiveness.

fig. 3 shows an example of the GBDT-based prediction module 122. The prediction module 122 may be configured to implement the prediction process of fig. 2. As shown in fig. 3, the prediction module 122 includes a feature extraction sub-module 310 and a GBDT model 320. In response to the computing device 100 receiving the predicted query 170, the prediction module 122 is configured to extract historical features 312 from the historical data 302. In some implementations, the prediction module 122 is further configured to extract one or more temporal features 312 based on the plurality of points in time involved in predicting the query 170. The extracted historical and/or temporal features 312 are provided to the GBDT model 320 as model inputs. The GBDT model 320 determines a prediction 322 for a first time point of the plurality of time points in a temporal order based on the input features 312 using a set of parameters that have been trained. The prediction results 322 may be provided for output as part of the prediction results 180 of the computing device 100 and also provided to the feature extraction sub-module 310 for use in modifying the historical and/or temporal features 312 extracted by the feature extraction sub-module 310. The modified features 312 continue to be provided to the GBDT model 320 for use in determining a prediction 322 for a second, subsequent point in time. Similarly, a prediction result for a subsequent time point of the plurality of time points may also be determined.

Example implementations are described above that utilize a trained predictive model (e.g., GBDT) to perform prediction. The training process of the predictive model will be discussed below. Fig. 4 illustrates a flow diagram of a process 400 for training a predictive model according to some implementations of the present disclosure. Process 400 may be implemented by computing device 100 of FIG. 1, as well as by other computing devices. That is, the training and use of the predictive models may be the same or different computing devices. For convenience of description, the process is described in conjunction with fig. 1. The predictive model to be trained may be a GBDT model, such as GBDT model 320 described with reference to fig. 3.

at 410, computing device 100 obtains historical data relating to a plurality of factors at a plurality of historical points in time. The historical data is used as training data of the prediction model. Depending on the needs of the model training, historical data relating to different factors may be considered over a longer period of time that is available. In some implementations, if a predictive model is used to predict data related to multiple factors at multiple points in time in the future, the acquired historical data may be related to the same multiple factors. In other implementations, the acquired historical data may be related to a different number of factors. For example, if it is desired to utilize a predictive model to predict the demand of the A and B markets for X goods, historical demand amounts of X (or Y goods) for different C and D markets (or A and C markets) may also be obtained as historical data for training. Further, such historical data may also be data available at any historical point in time. Such historical data may each help enable the predictive model to learn therefrom correlations between features and predictions.

At 420, the computing device 100 extracts a plurality of historical features from the historical data. The features to be extracted during training may be of the same type as those extracted during use, even though the historical data may relate to different factors and historical points in time. These types of history features may be as described above with reference to fig. 2 and, for brevity, will not be described in detail herein.

In some implementations, in addition to historical features, computing device 100 may determine at least one temporal feature based on a plurality of historical points in time. Such temporal characteristics may be the conditions of holidays before, within, and after each historical time point to be considered, as determined with reference to that time point. Such temporal features are also similar to the temporal features extracted during use and are therefore not described in detail.

At 430, the computing device 100 determines a set of parameters for the predictive model based at least on the plurality of historical features. During the training process, the values of the parameter set may be initialized, for example to random values. The parameter set may then be continuously updated with training data until a predetermined convergence condition is reached. Such parameter set updates may be performed using various training methods, and implementations of the present disclosure are not limited in this respect.

Since the prediction model is designed to perform the prediction for a certain point in time, the set of parameters of the prediction model is updated each time during the training process based on the plurality of historical features and a first historical data portion relating to the plurality of factors at a certain historical point in time (e.g., a first historical point in time). At this point, the historical features are considered as inputs to the predictive model. The prediction model processes the input historical features with the current set of parameters, and the difference between such processing results and the real data (i.e. the first historical data portion) at that historical point in time may be used to update the set of parameters of the prediction model.

In some implementations, the parameter set may also continue to be updated with a portion of the historical data that is relevant at a second historical point in time after the first historical point in time.

since the prediction model is designed to determine the prediction results at different points in time in a recursive manner, in the training process, in addition to the plurality of historical features extracted from the historical data and the historical data part related to the plurality of factors at the second historical point in time, the first historical data part related to the previous first historical point in time is also used to update the parameter set of the prediction model again. In this case, the first historical data portion is considered to be a prediction result that the predictive model is expected to be able to determine for the first historical point in time. In some implementations, the computing device 100 may modify the plurality of historical features based on the first historical data portion and then take the modified historical features as input to the prediction model, such that the updating of the parameter set is performed again based on a difference between a result of the processing of the prediction model with the current parameter set and the second historical data portion.

This may lead to inaccurate long-term predictions, since prediction errors for previous points in time may be propagated to subsequent points in time. To reduce error propagation, random noise may be introduced in the historical data used for training during training of the predictive model. For example, the historical data may be changed in a random manner. Specifically, data items (e.g., numerical values) corresponding to respective factors at different historical time points in the historical data may be changed in a random manner, thereby achieving introduction of random noise in the historical data. By introducing random noise, the robustness of a prediction model (such as a GBDT model) can be improved, so that better long-term prediction accuracy can be realized.

some example implementations of the present disclosure are listed below.

In one aspect, the present disclosure provides a computer-implemented method. The method comprises the following steps: receiving a predictive query relating to a plurality of factors at a plurality of points in time; extracting a plurality of historical features from historical data related to a plurality of factors at a plurality of historical time points before the plurality of time points; and determining chronologically predicted outcomes related to the plurality of factors at the plurality of points in time based at least on the plurality of historical characteristics, the determining comprising: based on the plurality of historical features, a first predicted outcome associated with the plurality of factors at a first time point of the plurality of time points is determined, and based on the plurality of historical features and the first predicted outcome, a second predicted outcome associated with the plurality of factors at a second time point after the first time point is determined.

In some implementations, determining the prediction result includes: the prediction result is determined using a Gradient Boosting Decision Tree (GBDT) having a predetermined set of parameters, the GBDT including a plurality of nodes arranged in a tree structure, and the plurality of nodes being respectively associated with a plurality of historical features.

In some implementations, the set of parameters of the GBDT is determined based on additional historical data relating to additional factors at multiple historical points in time, and the additional historical data is changed in a random manner when the set of parameters is determined.

In some implementations, extracting the plurality of historical features includes: identifying, from the historical data, a plurality of historical data portions respectively associated with respective ones of the plurality of factors at a plurality of historical points in time; extracting a first set of historical features from the plurality of historical data portions, respectively, the first set of historical features indicating characteristics of the historical data portions that are related to the respective factors at the plurality of historical time points; and extracting a second set of historical features across the plurality of historical data portions, the second set of historical features indicating characteristics of the historical data portions that are related to at least two of the plurality of factors at the plurality of historical points in time.

in some implementations, each historical data portion includes data items corresponding to respective factors at different ones of a plurality of historical time points, and wherein extracting the first set of historical features includes: extracting at least one of the following historical features from each historical data portion: a first history feature indicating data items related to the respective factor at a predetermined one of the plurality of history time points, the predetermined time point being spaced from one of the plurality of time points by a predetermined length of time, a second history feature indicating an average of the data items related to the respective factor at least two consecutive ones of the plurality of history time points, and a third history feature indicating an average of the data items related to the respective factor at a plurality of periodic ones of the plurality of history time points.

In some implementations, at least two factors belong to the same level in the predetermined hierarchy.

In some implementations, determining the prediction result further includes: determining at least one temporal feature based on the plurality of time points, the at least one temporal feature indicating a temporal pattern before, within and/or after the plurality of time points; and determining the prediction result further based on the at least one temporal feature.

In some implementations, determining the at least one temporal characteristic includes determining at least one of: the system includes a first time characteristic indicative of a condition of a holiday within a first predetermined time period before a plurality of time points, a second time characteristic indicative of a condition of a holiday contained within the plurality of time points, and a third time characteristic indicative of a condition of a holiday within a second predetermined time period after the plurality of time points.

In some implementations, the condition of the holiday indicates at least one of: whether there are holidays, the duration of the holidays, and the type of the holiday.

In some implementations, determining the second prediction result includes: modifying the plurality of historical features based on the first prediction result; and determining a second prediction result based on the modified plurality of historical features.

In another aspect, the present disclosure provides an apparatus. The apparatus comprises: a processing unit; and a memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the apparatus to perform actions. The actions include: receiving a predictive query relating to a plurality of factors at a plurality of points in time; extracting a plurality of historical features from historical data related to a plurality of factors at a plurality of historical time points before the plurality of time points; and determining chronologically predicted outcomes related to the plurality of factors at the plurality of points in time based at least on the plurality of historical characteristics, the determining comprising: based on the plurality of historical features, a first predicted outcome associated with the plurality of factors at a first time point of the plurality of time points is determined, and based on the plurality of historical features and the first predicted outcome, a second predicted outcome associated with the plurality of factors at a second time point after the first time point is determined.

In yet another aspect, the present disclosure provides a computer program product. A computer program product is stored in a computer storage medium and includes machine executable instructions that, when executed by a device, cause the device to: receiving a predictive query relating to a plurality of factors at a plurality of points in time; extracting a plurality of historical features from historical data related to a plurality of factors at a plurality of historical time points before the plurality of time points; and determining chronologically predicted outcomes related to the plurality of factors at the plurality of points in time based at least on the plurality of historical characteristics, the determining comprising: based on the plurality of historical features, a first predicted outcome associated with the plurality of factors at a first time point of the plurality of time points is determined, and based on the plurality of historical features and the first predicted outcome, a second predicted outcome associated with the plurality of factors at a second time point after the first time point is determined.

in some implementations, the machine executable instructions, when executed by the apparatus, further cause the apparatus to: the prediction result is determined using a Gradient Boosting Decision Tree (GBDT) having a predetermined set of parameters, the GBDT including a plurality of nodes arranged in a tree structure, and the plurality of nodes being respectively associated with a plurality of historical features.

in some implementations, the machine executable instructions, when executed by the apparatus, further cause the apparatus to: identifying, from the historical data, a plurality of historical data portions respectively associated with respective ones of the plurality of factors at a plurality of historical points in time; extracting a first set of historical features from the plurality of historical data portions, respectively, the first set of historical features indicating characteristics of the historical data portions that are related to the respective factors at the plurality of historical time points; and extracting a second set of historical features across the plurality of historical data portions, the second set of historical features indicating characteristics of the historical data portions that are related to at least two of the plurality of factors at the plurality of historical points in time.

In some implementations, each historical data portion includes data items corresponding to respective factors at different ones of a plurality of historical points in time, and the machine-executable instructions, when executed by the device, further cause the device to: extracting at least one of the following historical features from each historical data portion: a first history feature indicating data items related to the respective factor at a predetermined one of the plurality of history time points, the predetermined time point being spaced from one of the plurality of time points by a predetermined length of time, a second history feature indicating an average of the data items related to the respective factor at least two consecutive ones of the plurality of history time points, and a third history feature indicating an average of the data items related to the respective factor at a plurality of periodic ones of the plurality of history time points.

In some implementations, the machine executable instructions, when executed by the apparatus, further cause the apparatus to: determining at least one temporal feature based on the plurality of time points, the at least one temporal feature indicating a temporal pattern before, within and/or after the plurality of time points; and determining the prediction result further based on the at least one temporal feature.

In some implementations, the machine executable instructions, when executed by the apparatus, further cause the apparatus to: the system includes a first time characteristic indicative of a condition of a holiday within a first predetermined time period before a plurality of time points, a second time characteristic indicative of a condition of a holiday contained within the plurality of time points, and a third time characteristic indicative of a condition of a holiday within a second predetermined time period after the plurality of time points.

in some implementations, the machine executable instructions, when executed by the apparatus, further cause the apparatus to: modifying the plurality of historical features based on the first prediction result; and determining a second prediction result based on the modified plurality of historical features.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

in the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A computer-implemented method, comprising:

Receiving a predictive query relating to a plurality of factors at a plurality of points in time;

Extracting a plurality of historical features from historical data relating to the plurality of factors at a plurality of historical time points prior to the plurality of time points; and

Determining chronologically predicted outcomes related to the plurality of factors at the plurality of points in time based at least on the plurality of historical characteristics, the determining comprising:

Based on the plurality of historical features, determining a first predicted outcome at a first time point of the plurality of time points that is related to the plurality of factors, an

Determining a second predicted outcome associated with the plurality of factors at a second time point after the first time point based on the plurality of historical features and the first predicted outcome.

2. the method of claim 1, wherein determining the prediction outcome comprises:

Determining the prediction result using a Gradient Boosting Decision Tree (GBDT) having a predetermined set of parameters, the GBDT comprising a plurality of nodes arranged in a tree structure and the plurality of nodes being respectively associated with the plurality of historical features.

3. The method of claim 2, wherein the set of parameters of the GBDT is determined based on further historical data relating to further factors at further historical points in time, and the further historical data is changed in a random manner when determining the set of parameters.

4. The method of claim 1, wherein extracting the plurality of historical features comprises:

identifying, from the historical data, a plurality of historical data portions that are respectively associated with respective ones of the plurality of factors at the plurality of historical points in time;

Extracting a first set of historical features from the plurality of historical data portions, respectively, the first set of historical features indicating characteristics of the historical data portions that are related to the respective factors at the plurality of historical points in time; and

Extracting a second set of historical features across the plurality of historical data portions, the second set of historical features indicating characteristics of the historical data portions that are related to at least two of the plurality of factors at the plurality of historical points in time.

5. The method of claim 4, wherein each historical data portion comprises data items corresponding to respective factors at different ones of the plurality of historical time points, and wherein extracting the first set of historical features comprises: extracting at least one of the following historical features from each historical data portion:

A first historical feature indicating data items associated with respective factors at a predetermined historical time point of the plurality of historical time points, the predetermined time point being spaced from one of the plurality of time points by a predetermined length of time,

A second historical feature indicating an average of data items associated with respective factors at least two consecutive time points of the plurality of historical time points,

A third historical feature indicating an average of the data items associated with the respective factor at a plurality of periodic time points of the plurality of historical time points.

6. The method of claim 4, wherein the at least two factors belong to the same level in a predetermined hierarchy.

7. The method of claim 1, wherein determining the prediction result further comprises:

Determining at least one temporal feature based on the plurality of time points, the at least one temporal feature indicating a temporal pattern before, within, and/or after the plurality of time points; and

Determining the prediction outcome further based on the at least one temporal feature.

8. the method of claim 7, wherein determining the at least one temporal feature comprises determining at least one of:

A first time characteristic indicative of a condition of holidays within a first predetermined time period prior to the plurality of time points,

a second time characteristic indicating a condition of holidays contained within the plurality of time points, an

A third temporal characteristic indicative of a condition of the holiday within a second predetermined time period after the plurality of time points.

9. the method of claim 8, wherein the holiday condition indicates at least one of: whether there are holidays, the duration of the holidays, and the type of the holiday.

10. the method of claim 1, wherein determining the second prediction comprises:

Modifying the plurality of historical features based on the first prediction result; and

Determining the second prediction result based on the modified plurality of historical features.

11. An apparatus, comprising:

A processing unit; and

A memory coupled to the processing unit and containing instructions stored thereon that, when executed by the processing unit, cause the apparatus to perform acts comprising:

12. The apparatus of claim 11, wherein determining the prediction outcome comprises:

13. the apparatus of claim 12, wherein the set of parameters of the GBDT is determined based on further historical data relating to further factors at further historical points in time, and the further historical data is changed in a random manner when determining the set of parameters.

14. the apparatus of claim 11, wherein extracting the plurality of historical features comprises:

15. The apparatus of claim 14, wherein each historical data portion comprises data items corresponding to respective factors at different ones of the plurality of historical time points, and wherein extracting the first set of historical features comprises: extracting at least one of the following historical features from each historical data portion:

16. The apparatus of claim 14, wherein the at least two factors belong to a same level in a predetermined hierarchy.

17. The apparatus of claim 11, wherein determining the prediction result further comprises:

18. The apparatus of claim 17, wherein determining the at least one temporal feature comprises determining at least one of:

19. The apparatus of claim 11, wherein determining the second prediction comprises:

20. A computer program product stored in a computer storage medium and comprising machine executable instructions that, when executed by an apparatus, cause the apparatus to: