WO2021044192A1

WO2021044192A1 - Detection, prediction and/or compensation of data drift in distributed clouds

Info

Publication number: WO2021044192A1
Application number: PCT/IB2019/057460
Authority: WO
Inventors: Mbarka SOUALHIA; Chunyan Fu; Yves Lemieux
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2021-03-11

Abstract

An apparatus and method are disclosed for detection, prediction and/or compensation of data drift in distributed clouds. In one embodiment, a method implemented in a machine learning system includes at least one of detecting and predicting a data drift in an input data stream of the machine learning system; determining a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detecting and the predicting; and applying the compensation function to the input data stream to offset at least part of the data drift.

Description

DETECTION, PREDICTION AND/OR COMPENSATION OF DATA DRIFT

IN DISTRIBUTED CLOUDS

TECHNICAL FIELD

Wireless communication and in particular, detection, prediction and/or compensation of data drift in distributed clouds.

BACKGROUND

Several data processing methods assume that the underlying relationships in the data are static over time. However, in practice, the underlying relationships in data can change and evolve over time. Thus, the data can be considered ‘non-stationary data’ due to the underlying relationships in the data changing and evolving over time. For example, data can change and evolve in a cloud system because both the workload and the cloud infrastructure can change over time due to the dynamic behavior of the cloud and/or hidden or unpredictable context(s). Concretely, the statistical properties of features used to train machine learning models (“models”) can change over time in unforeseen ways. Hence, such changes in the statistical properties over time can result in a change in the pattern that was discovered when training a model using the same data. Consequently, this can result in degradation in performance of models due to the changes in the underlying relationships between the input features and the output targets in a trained model.

In the area of machine learning, the changing of the underlying relationships between the input and output variables in a model is called “concept drift” (also referred to interchangeably herein as “data drift”). Data drift has a direct impact on the accuracy of predictive models.

SUMMARY

Some embodiments advantageously provide a method and system for detection and/or prediction of data drift in e.g., distributed clouds. Due to the direct impact of data drift on the accuracy of predictive models, it may be desirable to identify when the data drift can occur and to determine the variables/features that would experience such drift. According to one aspect of the present disclosure, a method implemented in a machine learning system is provided. The method includes: a. at least one of detecting and predicting a data drift in an input data stream of the machine learning system; b. determining a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detecting and the predicting; and c. applying the compensation function to the input data stream to offset at least part of the data drift.

In some embodiments of this aspect, applying the compensation function to the input data stream further includes applying the compensation function to the input data stream of a trained machine learning model to compensate for the at least part of the data drift, the at least part of the data drift associated with data that is input into the trained machine learning model. In some embodiments of this aspect, the method further includes obtaining output values of the compensation function that is applied to the input data stream. In some embodiments of this aspect, the method further includes providing the output values of the compensation function as input values to the trained machine learning model. In some embodiments of this aspect, the method further includes obtaining an output from the machine learning system according to, at least in part, the compensation function.

In some embodiments of this aspect, the at least one of the detecting and predicting the data drift further includes detecting the data drift in the input data stream using a drift detector model. In some embodiments of this aspect, the drift detector model is configured to map at least one training window to at least one feature and at least one corresponding drift time of the at least one feature. In some embodiments of this aspect, the drift detector model is configured to detect whether at least one feature is drifting during at least one data window, the at least one data window being based at least in part on the at least one training window that is mapped to the at least one feature.

In some embodiments of this aspect, the at least one of the detecting and predicting the data drift further includes predicting the data drift in the input data stream using a drift predictor model. In some embodiments of this aspect, the drift predictor model is configured to estimate at least one next value and a distribution of at least one feature based on the input data stream. In some embodiments of this aspect, the drift predictor model is configured to predict whether at least one feature is drifting based at least in part on at least one predetermined drift pattern and the estimated next value and the distribution. In some embodiments of this aspect, the at least one predetermined drift pattern includes one or more of a sudden drift pattern, a gradual drift pattern, an incremental drift pattern and a reoccurring drift pattern.

In some embodiments of this aspect, the method further includes determining whether an amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds a predetermined drift threshold. In some embodiments of this aspect, the method further includes if the amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds the predetermined drift threshold, initiating retraining a machine learning model associated with the data drift. In some embodiments of this aspect, applying the compensation function to the input data stream further includes applying the compensation function to the input data stream if the amount of the at least one of the detected data drift and the predicted data drift does not at least meet or exceed a predetermined drift threshold. In some embodiments of this aspect, the determining the compensation function further includes determining at least one action that offsets at least part of the data drift.

According to another aspect of the present disclosure, a machine learning system is provided. The machine learning system comprising processing circuitry. The processing circuitry is configured to at least one of detect and predict a data drift in an input data stream of the machine learning system. The processing circuitry is configured to determine a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detection and the prediction. The processing circuitry is configured to apply the compensation function to the input data stream to offset at least part of the data drift.

In some embodiments of this aspect, the processing circuitry is further configured to apply the compensation function to the input data stream by being configured to apply the compensation function to the input data stream of a trained machine learning model to compensate for the at least part of the data drift, the at least part of the data drift associated with data that is input into the trained machine learning model. In some embodiments of this aspect, the processing circuitry is further configured to obtain output values of the compensation function that is applied to the input data stream. In some embodiments of this aspect, the processing circuitry is further configured to provide the output values of the compensation function as input values to the trained machine learning model. In some embodiments of this aspect, the processing circuitry is further configured to obtain an output from the machine learning system according to, at least in part, the compensation function.

In some embodiments of this aspect, the processing circuitry is further configured to at least one of detect and predict the data drift by being further configured to detect the data drift in the input data stream using a drift detector model. In some embodiments of this aspect, the drift detector model is configured to map at least one training window to at least one feature and at least one corresponding drift time of the at least one feature. In some embodiments of this aspect, the drift detector model is configured to detect whether at least one feature is drifting during at least one data window, the at least one data window being based at least in part on the at least one training window that is mapped to the at least one feature.

In some embodiments of this aspect, the processing circuitry is further configured to at least one of detect and predict the data drift by being further configured to predict the data drift in the input data stream using a drift predictor model. In some embodiments of this aspect, the drift predictor model is configured to estimate at least one next value and a distribution of at least one feature based on the input data stream. In some embodiments of this aspect, the drift predictor model is configured to predict whether at least one feature is drifting based at least in part on at least one predetermined drift pattern and the estimated next value and the distribution. In some embodiments of this aspect, the at least one predetermined drift pattern includes one or more of a sudden drift pattern, a gradual drift pattern, an incremental drift pattern and a reoccurring drift pattern.

In some embodiments of this aspect, the processing circuitry is further configured to determine whether an amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds a predetermined drift threshold. In some embodiments of this aspect, the processing circuitry is further configured to if the amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds the predetermined drift threshold, initiate retraining a machine learning model associated with the data drift. In some embodiments of this aspect, the processing circuitry is further configured to apply the compensation function to the input data stream by being configured to apply the compensation function to the input data stream if the amount of the at least one of the detected data drift and the predicted data drift does not at least meet or exceed a predetermined drift threshold. In some embodiments of this aspect, the processing circuitry is further configured to determine the compensation function by being configured to determine at least one action that offsets at least part of the data drift.

BRIEF DESCRIPTION OF THE DRAWINGS

A more complete understanding of the present embodiments, and the attendant advantages and features thereof, will be more readily understood by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:

FIG. 1 is a schematic diagram illustrating an example architecture for a machine learning (ML) system in a cloud environment, according to some embodiments of the present disclosure;

FIG. 2 is a block diagram of another example ML system architecture according to the principles in the present disclosure;

FIG. 3 is a flowchart of an example process in an ML system according to some embodiments of the present disclosure;

FIG. 4 is a flowchart of yet another example process for compensating for data drift according to some embodiments of the present disclosure;

FIG. 5 is a schematic diagram of an example features and training learner according to some embodiments of the present disclosure;

FIG. 6 is a schematic diagram of an example data drift detector according to some embodiments of the present disclosure;

FIG. 7 is a schematic diagram of an example data drift predictor according to some embodiments of the present disclosure; FIG. 8 is a schematic diagram of an example compensation learner according to some embodiments of the present disclosure;

FIG. 9 illustrates an example incremental drift detection and compensation according to the principles of the present disclosure; and FIG. 10 illustrates a reoccurring drift detection, prediction and compensation according to the principles of the present disclosure.

DETAILED DESCRIPTION

Some studies have attempted to address the problem of data drift in distributed clouds and a summary of such studies is provided below:

In one study, there is proposed an approach to classify the new streamed data in the presence of data drift using the Hoeffding tree algorithm. This approach can update the model with the new changes in the streamed data. However, this solution can generate several modifications to the trained model because the streamed data may experience multiple changes over time, especially when the model has several input features. In addition, this approach requires updating the model very frequently.

In another study, an integrated machine learning for detecting drift in network traffic is proposed. The proposed approach includes two main parts that are (1) an online support vector machine classifier to detect the drift, and

(2) a Kullback-Leibler divergence measurement scheme to quantify the data drift in the network. However, the proposed approach does not include any component to apply changes to the collected network data, or to the trained model.

Yet another work describes a weighted Bayes approach to quickly detect drift using a decision tree for an uncertain data stream. The Naive Bayes is a commonly-used classifier that makes classifications using the Maximum A Posteriori decision rule in a Bayesian setting. It shows good results in different problems such as text classification and spam detection. The proposed approach can detect the drift using the Hoeffding algorithm in the learning stage while using a weighted Bayes classifier in the tree leaves in the classification stage. Furthermore, this approach uses a sliding window to replace the data tree and to handle the drifting data. Nevertheless, this approach is associated with a high cost while updating the model and does not propose any mechanism to update the streamed data.

In yet another study, a drift detection method is presented using data about data labels over time that can be predicted by a classifier trained with the limited number of labeled data samples. However, this approach was found to rely on the availability of those labels and cannot be applied when data changes its structure (e.g., missing values) over time.

Another study proposes a system that detects drift using a set of features of an online classifier and a block-based classifier. In addition, the study analyzes drifts to identify their root cause.

Yet another work addresses the problem of data drift using a proposed framework called PINE (Predictive and parameter INsensitive Ensemble). PINE can detect changes of the streamed data from the cloud and map it to the trained data to obtain the differences. Given the resulting differences, PINE can identify whether the detected changes follow the same trend as in the four existing drift patterns (sudden, incremental, reoccurring, gradual). However, this approach does not propose any change to the drifting features to follow the same pattern as the pattern in the trained model.

Thus, the existing approaches to address data drift in the cloud are lacking.

For example, existing approaches do not consider the change in relationships between the input features and output targets and do not propose adaptive changes to the drifting data. Consequently, the accuracy of the trained model can decrease due to e.g., using the same training data (especially when the most relevant features in a model experience data drift).

Furthermore, the existing solutions that take into account data drift require updating and retraining the original model frequently. However, updating and retraining the model frequently includes a high cost in terms of training time and the computing resources.

Existing solutions do not include early identification of the presence of data drift and adjusting the drifting data accordingly.

Some embodiments of the present disclosure include one or more of: defining a “drift detector/predictor” to early detect/predict the occurrence of drifting data according to data changes occurring in the cloud and different training windows for a given model; and/or defining a “compensation learner” to propose one or more compensation changes to be applied in the presence of a drift and to update the drifting data accordingly in order to improve the accuracy of a trained model.

Some embodiments described in the present disclosure may include one or more of the following advantages: early detection of the occurrence of non- stationary data and updating the model and/or data accordingly; obtaining an adjusted value of the drifted features’ values (especially for the most relevant features in a model); learning the distribution for the drift points/changes and predicting when a feature will be drifting; no need to update the trained model very frequently, which reduces re-train costs; and/or obtain more accurate results on detection/prediction models.

Before describing in detail exemplary embodiments, it is noted that the embodiments reside primarily in combinations of apparatus components and processing steps related to detection, prediction and/or compensation of data drift in distributed clouds. Accordingly, components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.

As used herein, relational terms, such as “first” and “second,” “top” and “bottom,” and the like, may be used solely to distinguish one entity or element from another entity or element without necessarily requiring or implying any physical or logical relationship or order between such entities or elements. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the concepts described herein. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In embodiments described herein, the joining term, “in communication with” and the like, may be used to indicate electrical or data communication, which may be accomplished by physical contact, induction, electromagnetic radiation, radio signaling, infrared signaling or optical signaling, for example. One having ordinary skill in the art will appreciate that multiple components may interoperate and modifications and variations are possible of achieving the electrical and data communication.

In some embodiments described herein, the term “coupled,” “connected,” and the like, may be used herein to indicate a connection, although not necessarily directly, and may include wired and/or wireless connections.

In some embodiments, the term “raw data” is used herein and refers to the data collected from a monitored system. Raw data is usually preprocessed in order for a machine learning model to use. The preprocessing procedures may include removing empty data samples, inputting missing data samples, and normalizing the data, etc.

In some embodiments, the term “data stream” (or streamed data) is used herein and refers to a sequence of packets of data or data packets used to transmit or receive information that is in the process of being transmitted from a monitored system. The “data stream” may also be called “online data” in some contexts.

Online data may contain one or multiple data samples at a time and is usually used by a trained machine learning model for inferring a real-time system status.

Online data is contrast to the “offline data,” which is collected offline, containing a batch of data samples and being stored in a data repository. Offline data is often used for training, testing and validating machine learning models.

In some embodiments, the term “stationary data” is used herein and refers to the data whose mean, variance and autocorrelation structure does not change over time. In contrast, drifted/drifting data is considered as “non-stationary data.”

In some embodiments, the term “data window” (or a window of data) is used herein and refers to a defined time range for data collection. For example, given a window of 10 minutes, data from the previous 10 minutes can be collected from a monitored system at a time. A window can slide. For example, if online data is collected every 10 seconds with a window size of 10 minutes, then the 10 minutes’ window is considered to “slide” forward every 10 second interval. Sliding window is a technique that is often used for time series predictions.

In some embodiments, the term “training window” is used herein and refers to the time range/interval for collecting training data. For example, a training window of 1 hour, 1 day, 1 week or 1 month means that training data has been collected for 1 hour, 1 day, 1 week or 1 month, respectively.

The terms “concept drift” and “data drift” are used interchangeably herein.

The term “feature” is used herein and is an input used for machine learning. For example, with respect to a dataset table or matrix, the features may be a column in the dataset table. A feature may represent an observable attribute/quality and a value combination (e.g., rank of value 2).

As used herein, the term “drift time” is used herein and may indicate a time that or during which a feature is detected to drift and/or a time that or during which a feature is predicted to drift.

The term “machine learning (ML) system” used herein can be any kind of ML system, such as, for example, a computing device, one or more processors, one or more processing circuitries, a machine, a mobile wireless device, a user equipment, a central processing unit (CPU), a graphics processing unit (GPU), a field programmable gate array (FPGA), a server, a network node, a base station, etc. that may implement one or more machine learning models and, in particular, may apply one or more of the techniques disclosed herein to detect, predict and/or compensate for data drift.

Note that although some of the embodiments are described with reference to data processing in the cloud, it should be understood that the techniques disclosed herein may be beneficial and applicable to other types of machine learning problems and/or systems in which data drift is experienced.

Any two or more embodiments described in this disclosure may be combined in any way with each other.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Referring again to the drawing figures, in which like elements are referred to by like reference numerals, there is shown in FIG. 1 a schematic diagram of a communication system 10 in a distributed cloud environment, according to an embodiment, constructed in accordance with the principles of the present disclosure. The communication system 10 in FIG. 1 is a non-limiting example and other embodiments of the present disclosure may be implemented by one or more other systems and/or networks. The system 10 includes an example ML system 12 that detects, predicts and/or compensates for data drift according to the principles of the present disclosure. The ML system 12 may include an original, trained ML model 20 and one or more components that can be used to detect, predict and/or compensate for data drift in an input data stream/online data of the ML model 20 using techniques in the present disclosure, such as, one or more of a features and training learner 30, a drift detector 32, a drift predictor 34 and a compensator 36. In some embodiments, one or more of the training learner 30, drift detector 32, drift predictor 34 and compensator 36 may be software code executed by processing hardware and stored in memory in the ML system 12. In some embodiments, one or more of the training learner 30, drift detector 32, drift predictor 34 and compensator 36 may be hardware storing and executing instructions corresponding to the functions/processes described herein.

Some embodiments of the features and training learner 30, drift detector 32, drift predictor 34 and compensator 36 are described below. In the example embodiments, it may be assumed that the offline data and the online data depicted in FIG. 1 have already been preprocessed from raw data.

Features and Training Learner

The features and training learner 30 obtains and analyzes the offline data and learns when a data drift would occur according to different training windows. For example, in some embodiments, the features and training learner 30 uses different sizes of the training dataset and calculates their statistical properties (e.g., mean, bias, variance, distribution, etc.). The features and training learner 30 may then observe the features that will be drifting according to different training windows using probabilistic or machine learning algorithms, such as, unsupervised or supervised algorithms, Naives Bayes, Decision Trees, Random Forest, Support Vector Machine (SVM), Neural Networks, etc. One objective of the features and training learner 30 is to provide a map between a drifting feature and the feature’s mean and deviation calculated using the data collected from the training window excluding the data from the drifting time period. The map is called “feature map” hereinafter. The feature map is then provided to the drift detector that detects a data drift for an online stream data. Although the features and training learner 30 is shown in FIG. 1 as internal to the ML system 12, in some embodiments, the features and training learner 30 is run offline to learn the drift detector model from the training set/offline data and is therefore external to a live ML system 12. Furthermore, in some embodiments, the features and training learner 30 may learn new relationships between the recent offline data and the target features and update the features map. The Features and training learner 30 can be updated over time to get an updated features map about the drifting features.

Drift Detector

The drift detector 32 may implement a drift detector function. In some embodiments, the drift detector 32 is run online and is internal to a live ML system 12. Given the online data with the features’ names and their values, the drift detector 32 can identify whether a feature is drifting or not according to the obtained features map discussed herein above and the data window size. For example, if feature FI is in the feature map and the online value of FI is beyond the scope of FI’s mean +/- deviation, FI’s drifting is detected. On the other hand, in some embodiments, if feature FI is in the feature map and the online value of FI is less than the scope of FI’s mean+/-deviation, then FI may be determined to be not drifting. Furthermore, in some embodiments, the drift detector 32 may identify the new drifting features based on an updated features map.

Drift Predictor

The drift predictor 34 may be configured to estimate the values of the next data samples for the different features using, for examples, Naive Bayes, autoregressive integrated moving average (ARIMA), recurrent neural network (RNN) and convolutional neural networks (CNN). Next, the drift predictor 34 can predict the data drift using the estimated values of features and known drift patterns, such as one or more of sudden, incremental, reoccurring and gradual drift patterns. The parameters of a drift pattern (e.g. data range for sudden drift, drift recurrent frequency and data range for reoccurring drift, data distribution for incremental drift, and frequency and distribution for gradual drift) is learned from the offline data. In some embodiments, new data drift pattern parameters may be learned from the recent offline data using statistical models and/or probabilistic models.

In some embodiments, the drift predictor 34 and the drift detector 32 may be complementary functions that assist with determining an appropriate compensation for data drift. For example, in some embodiments the drift detector 32 may be considered to use a simple value checking on the current data sample to perform the drift detection, while the drift predictor 34 may be considered to use a ML+ pattern matching to predict a future drifting. Stated another way, in some embodiments, the drift detector 32 detects data drift by e.g., using the features map obtained from the features and training learner 30 to identify the drifted features based on the means/deviations, of the features; while the drift predictor 34 predicts data drift by e.g., learning drift patterns using ML models and estimating data values to early identify data drift.

Compensator The compensator 36 may compensate the drift of online data/features to be used as input for the ML model 20 (e.g., a neural network model for network performance prediction and an SVM model for traffic classification). In some embodiments, given the outputs from the drift detector 32 and/or drift predictor 34 (e.g., predicted drift features and drift times) the compensator 36 may be configured to determine changes to the online data (as compared to the training set data).

In some embodiments, compensator 36 can be run in an online system (e.g., system 12) and may use, e.g., the difference between the online data stream and the features’ map (mean, deviation, etc.) to learn the appropriate compensation action to be applied to compensate for the data drift.

In some embodiments, the system 12 includes a drift detector 32 which is configured to detect a data drift in an input data stream of the ML system 12. In some embodiments, the system 12 includes a drift predictor 34 which is configured to predict a data drift in an input data stream of the ML system 12. In some embodiments, the system 12 includes a compensator 36 which is configured to determine a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detecting and the predicting. In some embodiments, the compensator 36 is configured to apply the compensation function to the input data stream to offset at least part of the data drift.

Note that although only one ML system 12 is shown in FIG. 1 for convenience, it is understood that the communication system 10 may include many more ML systems 12. In addition, the functions described herein as being performed by an ML system may be distributed over a plurality of ML systems. In other words, it is contemplated that the functions of the ML system (or one or more components of the ML system, such as, the features and training learner 30, drift detector 32, drift predictor 34 and compensator 36) described herein are not limited to performance by a single physical device and, in fact, can be distributed among several physical devices.

Referring now to FIG. 2, another example ML system 12 in accordance with the present disclosure is shown, including drift detector 32, drift predictor 34 and compensator 36. Note that although only a single drift detector 32, a single drift predictor 34 and a single compensator 36 are shown in FIGS. 1 and 2 for convenience, the ML system 12 may include many more drift detectors 32, drift predictors 34 and compensators 36.

The ML system 12 includes (and/or uses) a communication interface 50, processing circuitry 52, and memory 54. The communication interface 50 may include an interface configured to receive data (e.g., a live input data stream, streamed data/online data, non-stationary data, drift pattern, etc.), for which a data drift may be detected, predicted and/or compensated for according to the principles in the present disclosure. The communication interface 50 may include an interface that transmits information, which may be based on the application of a compensation function to compensate for the data drift occurring in a trained ML model according to the principles in the present disclosure. In some embodiments, the communication interface 50 may be formed as or may include, for example, one or more radio frequency (RF) transmitters, one or more RF receivers, and/or one or more RF transceivers, and/or may be considered a radio interface. In some embodiments, the communication interface 50 may include a wired interface, such as one or more network interface cards.

The processing circuitry 52 may include one or more processors 56 and memory, such as, the memory 54. In particular, in addition to a traditional processor and memory, the processing circuitry 52 may comprise integrated circuitry for processing and/or control, e.g., one or more processors and/or processor cores and/or FPGAs (Field Programmable Gate Array) and/or ASICs (Application Specific Integrated Circuitry) adapted to execute instructions. The processor 56 may be configured to access (e.g., write to and/or read from) the memory 54, which may comprise any kind of volatile and/or nonvolatile memory, e.g., cache and/or buffer memory and/or RAM (Random Access Memory) and/or ROM (Read-Only Memory) and/or optical memory and/or EPROM (Erasable Programmable Read-Only Memory).

Thus, the ML system 12 may further include software stored internally in, for example, memory 54, or stored in external memory (e.g., storage resource in the cloud) accessible by the ML system 12 via an external connection. The software may be executable by the processing circuitry 52. The processing circuitry 52 may be configured to control any of the methods and/or processes described herein and/or to cause such methods, and/or processes to be performed, e.g., by the ML system 12.

The memory 54 is configured to store data, programmatic software code and/or other information described herein. In some embodiments, the software may include instructions stored in memory 54 that, when executed by the processor 56, drift detector 32, drift predictor 34 and/or compensator 36, causes the processing circuitry 52 and/or configures the ML system 12 to perform the processes described herein with respect to the ML system 12 (e.g., processes described with reference to FIG. 3 and/or any of the other flowcharts and description).

Although FIG. 2 shows drift detector 32, drift predictor 34 and compensator 36, as being within a respective processor, it is contemplated that these elements may be implemented such that a portion of the elements is stored in a corresponding memory within the processing circuitry. In other words, the elements may be implemented in hardware or in a combination of hardware and software within the processing circuitry.

FIG. 3 is a flowchart of an example process in an ML system 12 for detection, prediction and/or compensation of data drift according to one or more of the principles of the present disclosure. One or more Blocks and/or functions and/or methods performed by the ML system 12 may be performed by one or more elements of the ML system 12 such as by drift detector 32, drift predictor 34, compensator 36, processing circuitry 52, processor 56, memory 54, communication interface 50, etc. according to the example method. The example method includes at least one of detecting and predicting (Block S100), such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, a data drift in an input data stream of the machine learning system 12. The method includes determining (Block S102), such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, a compensation function, the compensation function configured to offset at least part of the data drift. The compensation function is based at least in part on the at least one of the detecting and the predicting. The method includes applying (Block S104), such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, the compensation function to the input data stream to offset at least part of the data drift. In some embodiments, applying the compensation function to the input data stream further includes applying, such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, the compensation function to the input data stream of a trained machine learning model to compensate for the at least part of the data drift, the at least part of the data drift associated with data that is input into the trained machine learning model. In some embodiments, the method further includes obtaining, such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, output values of the compensation function that is applied to the input data stream. In some embodiments, the method includes providing, such as via drift detector 32, drift predictor 34, compensator 36, processing circuitry 52 and/or communication interface 50, the output values of the compensation function as input values to the trained machine learning model 20. In some embodiments, the method includes obtaining, such as via drift detector 32, drift predictor 34, compensator 36, processing circuitry 52 and/or communication interface 50, an output from the machine learning system 12 according to, at least in part, the compensation function.

In some embodiments, the at least one of the detecting and predicting the data drift further includes detecting, such as via drift detector 32 and/or processing circuitry 52, the data drift in the input data stream using a drift detector model. In some embodiments, the drift detector model is configured to map at least one training window to at least one feature and at least one corresponding drift time of the at least one feature. In some embodiments, the drift detector model is configured to detect whether at least one feature is drifting during at least one data window, the at least one data window being based at least in part on the at least one training window that is mapped to the at least one feature.

In some embodiments, the at least one of the detecting and predicting the data drift further includes predicting, such as via drift predictor 34 and/or processing circuitry 52, the data drift in the input data stream using a drift predictor model. In some embodiments, the drift predictor model is configured to estimate at least one next value and a distribution of at least one feature based on the input data stream. In some embodiments, the drift predictor model is configured to predict whether at least one feature is drifting based at least in part on at least one predetermined drift pattern and the estimated next value and the distribution. In some embodiments, the at least one predetermined drift pattern includes one or more of a sudden drift pattern, a gradual drift pattern, an incremental drift pattern and a reoccurring drift pattern.

In some embodiments, the method further includes determining, such as via drift detector 32, drift predictor 34, compensator 36 and/or processing circuitry 52, whether an amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds a predetermined drift threshold. In some embodiments, the method further includes if the amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds the predetermined drift threshold, initiating (e.g., requesting), such as via drift detector 32, drift predictor 34, compensator 36, processing circuitry 52 and/or communication interface 50, retraining a machine learning model associated with the data drift. In some embodiments, applying the compensation function to the input data stream further includes applying, such as via drift detector 32, drift predictor 34, compensator 36, processing circuitry 52 and/or communication interface 50, the compensation function to the input data stream if the amount of the at least one of the detected data drift and the predicted data drift does not at least meet or exceed a predetermined drift threshold. In some embodiments, determining the compensation function further includes determining at least one action that offsets at least part of the data drift.

Having described some embodiments for detection, prediction and/or compensation of data drift according to one or more of the principles of the present disclosure, a more detailed description of some of the embodiments is described below, which may be implemented by one or more ML systems 12 and/or one or more elements in an ML system 12, such as, drift detector 32, drift predictor 34, and compensator 36.

FIG. 4 is a flowchart of yet another example process in an ML system 12 according to some embodiments of the present disclosure. In Block S200, the process includes analyzing, such as via the features and training learner 30, the relationship between the features and the training windows and calculating the mean and deviation of each feature based on the used training window. The input to this analysis may be offline data and the output may be drifting features and their corresponding means and deviations. In Block S202, the process includes obtaining, such as via drift detector 32, online data samples from the input data stream. In Block S204, the process includes detecting, such as via drift detector 32, data drift based on the data values of the obtained online data samples. For example, the detecting may include comparing a feature’s mean and deviation from a baseline dataset (e.g., training set data) to the feature’s value from a second dataset (e.g., obtained online data samples) to determine whether a drift has occurred and/or to determine an amount of the drift and/or deviation between the baseline dataset and the second dataset. In Block S206, the process includes determining, such as via drift detector 32, whether a drift is detected. For example, in some embodiments, a drift is detected if the feature value is larger than the mean-deviation or less than the mean-deviation. The mean and deviation may be calculated from the offline data window. The mean and deviation may also be obtained from the “Features Map”, which is the output from the features and training learner 30.

If a drift is not detected, the process returns to Block S202, where another online data sample is obtained from the input data stream. If a drift is detected, the process proceeds to Block S208, where the drift amount is compared, such as via drift detector 32, to a threshold drift amount. In some embodiments, one or more of the techniques disclosed herein may be used to compensate for data drift in a live system to avoid the high costs associated with retraining the ML model 20 while also improving the accuracy of the ML model 20 by compensating for data drift.

However, the data drift may reach a threshold level at which the ML model 20 should be retrained. If the detected amount of data drift meets or exceeds the threshold drift amount, the process proceeds to Block S210 where the ML model 20 is retrained. On the other hand, if the detected amount of data drift is less than the threshold drift amount, the process proceeds to Block S212 to determine if data drift was predicted by e.g., the drift predictor 34. If a drift was not predicted, the process proceeds to Block S216 where a drift pattern may be identified and compensation actions to be applied to compensate for the detected drift are identified by e.g., compensator 36. In Block S218, the compensation actions are applied by e.g., compensator 36. Examples of such application of compensation actions are discussed in more detail below with reference to FIGS. 9 and 10. In some embodiments, the process may include a prediction process (in parallel, or instead of the drift prediction process), as can be seen in FIG. 4. Similar to the drift detection process, the drift prediction process may include obtaining offline data/training set data. In Block S220, one or more drift patterns may be identified, such as via drift predictor 34, in the offline data. In Block S222, the process includes obtaining, such as via drift predictor 34, a windowed online data sample (when a moving window technique is used for prediction) from the input data stream. In Block S224, the process includes estimating, such as via drift predictor 34, next values of the features. In Block S226, the process includes predicting, such as via drift predictor 34, the data drift based on the estimated values and the identified drift patterns. For example, the predicting may include comparing the estimated next values to the identified drift patterns to determine whether a data drift is predicted to occur. In Block S228, the drift predictor 34 may determine whether a drift is predicted according to the input drift pattern and the estimated values of the features. If a drift is not predicted, the process may return to Block S222 where another window online data sample is obtained from the input data stream. On the other hand, if a drift is predicted, the process may proceed to Block S230 where compensation actions to be applied to compensate for the predicted drift are identified by e.g., compensator 36. In some embodiments, the drift prediction results may be used to verify the detected drift. In some embodiments, such compensation actions are applied as in Block S218. As can be seen in FIG. 4, if the drift is predicted (e.g., yes drift predicted in Block S212), the compensation actions identified by the drift predictor 34 in e.g., Block S230, can be applied in Block S218, which can save time as compared to using the drift detector 32 without a drift predictor 34. On the other hand, if a drift has not been predicted (e.g., no drift predicted in Block S212), the compensator 36 may have to store all the previous detection results and calculate the drift pattern.

In some embodiments, the detecting, predicting and compensating steps may be repeatedly performed by ML system 12 to continuously monitor and compensate for data drift occurring at the input data stream of the ML model 20 to e.g., avoid or at least prolong retraining the ML model 20 offline due to reduced accuracy caused by data drift. FIGS. 5-8 are block diagrams illustrating an example of the features and training learner 30, drift detector 32, drift predictor 34 and compensator 36, respectively. Precisely, an example structure of each component/element is shown and a description each elements’ input(s), output(s), objective(s) and functions and/or steps is provided below.

Features and Training Learner

FIG. 5 illustrates an example features and training learner 30 according to some embodiments of the present disclosure. The features and training learner 30 may include a training windows learner 60, a drifting features learner 62, a drifting times learner 64 and a feature map creator 66. The learners 60, 62, 64 may be software code used to learn training windows, drifting features and drift times, respectively. The training windows can be learned in parallel with the features learner 62 using heuristic methods (e.g., Tabu Search, Swarm Intelligence, Genetic Algorithm). For a very simplified example, using 1-month data, features FI, F3 and F5’s drifts are detected. Testing with 1-week data, the same drifts are detected for F3 and F5. Testing with 1-day data, the same drift is only detected for F5. Thus, the training window sizes for FI, F3 and F5 are 1 -month, 1-week and 1-day, respectively. The feature map creator 66 may take as inputs, the training windows, the drifting features and the drift times and output the features map, discussed herein above. In some embodiments, the features and training learner 30 may be characterized according to one or more of the following:

- Inputs: offline data including features’ names and values/training set data, different training window intervals (1-hour, 4-hour, 1-day, 1- week/month data set, etc.).

- Outputs: function that maps the training windows and their corresponding drifting features over time. The feature map that stores the features and their corresponding means and deviations.

- Objective: at least one objective is to obtain a function to determine the window sizes and the corresponding drifting features.

In some embodiments, the features and training learner 30 may perform one or more of the following steps: - Define a function/model that takes into account the parameters of the selected model, the number of data samples and the number of input features.

- Given the defined model/function, evaluate the drifting features based on different training windows (e.g., one hour, 4 hours, 1-week data set, etc.) and their drifting times.

- Calculate the means and deviations for the drifted features.

Drift Detector

FIG. 6 illustrates an example drift detector 32 according to some embodiments of the present disclosure. In some embodiments, the drift detector 32 may be characterized according to one or more of the following:

- Inputs: online data values of the ML model 20 features from the input data stream.

- Outputs: list of the one or more features that are being detected as drifting (e.g., according to a drift detector model learned from the features and training learner 30) and the amount of drift.

- Objective: at least one objective is to detect on-the-fly (e.g., in real-time) the features experiencing drift (e.g., drift/no drift) and/or the difference between the detected drift as compared to the means and deviations from the features map (e.g., drift amount).

In some embodiments, the drift detector 32 may perform one or more of the following steps:

- Use the given online data and map the online data to the obtained drift detection model.

- Detect whether at a given time a feature is drifting or not via comparing the values of the online data and the means and the deviations from the feature map.

- Measure the amount of drift given the statistical properties (i.e., means and/or deviations) of the features based on data from the features map.

Drift Predictor

FIG. 7 illustrates an example drift predictor 34 according to some embodiments of the present disclosure. The drift predictor 34 may include a next value estimator 70, a next distribution estimator 72, a data changes mapper 74 and a predicting model 78.

In some embodiments, the drift predictor 34 may be characterized according to one or more of the following:

Inputs: a window of online data samples, the existing patterns of drift (e.g., sudden, gradual, incremental, and reoccurring) and the offline data.

Outputs: expected data values of features, predicted drift for certain features and their corresponding drift amount.

Objective: at least one objective is to early identify drifted data based on estimated next values of the online data.

In some embodiments, the drift predictor 34 may perform one or more of the following steps:

Learn the types of changes, such as by data changes mapper 74, that may occur to features values including the sudden, gradual, incremental/weird, reoccurring changes based on the input offline data using machine learning algorithms such as neural network, random forest, support vector machine, etc.

Use, for example, Naive Bayes and a window of online data sample to estimate, such as via next value estimator 70, the next values and estimate, such as via next distribution estimator 72, the next distribution of a set of features over time.

Predict, such as via predicting model 78, the presence of a data drift using the defined drift pattern and the estimated values of the features. If the estimated values presents a change compared to the statistical properties of the trained data, determine, such as via predicting model 78, that there is a drift.

Compensator

FIG. 8 illustrates an example compensator 36 according to some embodiments of the present disclosure. In some embodiments, the compensator 36 may be characterized according to one or more of the following: - Inputs: drifting features and the amount of drift from the drift detector 32 and/or the drift predictor 34.

- Outputs: list of one or more compensation actions (e.g., changes/adjustments to be made to the online data) to be applied on the drifting features.

- Objectives: at least one objective is to update the drifted data when the drift is detected according to the data used while training the original ML model 20.

In some embodiments, the compensator 36 may perform one or more of the following steps:

- Determine if the number of drifted features and/or the amount of the drift exceeds a compensation threshold. If yes, request model retraining. For example, given a trained model with 100 features, if > 50% of the features drifted larger than 70% of the trained data range, the compensator 36 determines to request a retrain of the original ML model 20.

- Otherwise (i.e., if compensation threshold is not met or exceeded), determine if the same detected drift has been predicted (i.e., same features and the same (or close to the same) amount of drift) and if there exists a compensation function in the drift predictor 34. If yes, apply the compensation function to the online data to remove the expected drift and thereby provide a better accuracy for the original ML model 20.

- If a drift is not predicted, match the detected drift to sudden, incremental, reoccurring or gradual patterns, and learn a function of compensation and apply the compensation function to the online data.

FIG. 9 illustrates an example of incremental drift detection and compensation according to one or more of the techniques in the present disclosure. Specifically, FIG. 9 presents an example of data drift experienced by feature “FI” in a dataset. When the drift detector 32 identifies that the FI values have drifted (“Drift 1”), the drift detector 32 may measure the deviation amount between the new values (from the online data) and the trained values of FI (on which the original ML model 20 is based). Such measured deviation amount for FI may be sent to the compensator 36.

The compensator 36 may determine whether the received drift amount was previously predicted or not (by e.g., the drift predictor 34). In this example, this drift was not predicted; therefore, compensator 36 considers this drift as a sudden drift and compensates for the deviation amount in order to return the FI values back to the “normal” range (that is, “normal” according to the statistical properties of previous data (e.g., mean, deviation) such as the training set data). In addition, the compensator 36 may store the drifted feature, the deviation amount and the applied action in e.g., memory 54.

When “Drift 2” is detected by e.g., the drift detector 32, the compensator 36 determines that the detected deviation is incremental, as compared to the “Drift 1”. Therefore, the compensator 36 compensates (e.g., such as by applying an offset) for the data drift to return the FI feature values back to a normal range. Then, the compensator 36 may create a function (e.g., compensation function) that represents the incremental deviation for such FI feature and proposes a corresponding compensation action (e.g., to substitute or subtract the incremented deviation from the value so that the value goes back to the normal range according to the means and deviations values from the Features Map). When the drift detector 32 detects “Drift 3”, a call may be made to the created function to find the appropriate compensation action.

FIG. 10 illustrates an example of reoccurring drift detection, prediction and compensation according to one or more of the techniques in the present disclosure. Specifically, FIG. 10 shows an example of a reoccurring drift experienced by feature F2 in a given dataset. The drift predictor 34 can early identify “Drift 1” in “Day 1”. The drift predictor 34 measures the drift amount for F2 and identifies the appropriate action to be applied (e.g., substitute to get back to the previous mean of trained data in the features map).

When such drift is captured by the drift detector 32, the compensator 36 can determine that this drift was previously identified by the drift predictor 34. Accordingly, the compensator 36 may obtain the drift prediction results for “Drift 1” (i.e., drift amount and compensation action) and applies and stores drift prediction results. When “Drift 2” and “Drift 3” are detected and predicted, the compensator 36 will apply the same compensation action to compensate for the identified drift since the drifts (Drifts 2 and 3) follow the same distribution as Drift 1.

FIGS. 9 and 10, as well as FIGS. 1-8, depict non-limiting examples of using the principles in the present disclosure to offset at least part of the data drift in a live ML system. One or more embodiments of the present disclosure may be implemented and deployed within any distributed or centralized cloud system that uses online stream data. One or more embodiments of the present disclosure provide a solution to detect and/or predict drifting data in distributed clouds on-the- fly and/or to propose adaptive changes to such drifting data accordingly to, e.g., improve the accuracy of machine learning-based models as compared with existing models and systems. One or more embodiments of the present disclosure may include one or more of a:

- “features and training learner” that learns when a data drift would occur according to different training windows.

- “drift detector” that can identify whether a feature is drifting or not according to the statistical properties of the features within their corresponding training windows and provides the amount of drift for each feature.

- “compensator” or “compensation learner” that identifies a list of changes to be applied on the drifting features to adapt such drifting data to the trained model and its data set.

- “drift predictor” that can early identify whether a feature would experience a drift and determines the amount of such drift.

As will be appreciated by one of skill in the art, the concepts described herein may be embodied as a method, data processing system, and/or computer program product. Accordingly, the concepts described herein may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or “module.” Furthermore, the disclosure may take the form of a computer program product on a tangible computer usable storage medium having computer program code embodied in the medium that can be executed by a computer. Any suitable tangible computer readable medium may be utilized including hard disks, CD-ROMs, electronic storage devices, optical storage devices, or magnetic storage devices.

Some embodiments are described herein with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable memory or storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

It is to be understood that the functions/acts noted in the blocks may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

Computer program code for carrying out operations of the concepts described herein may be written in an object-oriented programming language such as Java® or C++ or Python. However, the computer program code for carrying out operations of the disclosure may also be written in conventional procedural programming languages, such as the "C" programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to the user's computer through a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments. Accordingly, all embodiments can be combined in any way and/or combination, and the present specification, including the drawings, shall be construed to constitute a complete written description of all combinations and subcombinations of the embodiments described herein, and of the manner and process of making and using them, and shall support claims to any such combination or subcombination.

It will be appreciated by persons skilled in the art that the embodiments described herein are not limited to what has been particularly shown and described herein above. In addition, unless mention was made above to the contrary, it should be noted that all of the accompanying drawings are not to scale. A variety of modifications and variations are possible in light of the above teachings without departing from the scope of the following claims.

Claims

What is claimed is:

1. A method implemented in a machine learning system (12), the method comprising: a. at least one of detecting and predicting (SI 00) a data drift in an input data stream of the machine learning system (12); b. determining (S102) a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detecting and the predicting; and c. applying (SI 04) the compensation function to the input data stream to offset at least part of the data drift.

2. The method of Claim 1, wherein applying the compensation function to the input data stream further comprises: applying the compensation function to the input data stream of a trained machine learning model (20) to compensate for the at least part of the data drift, the at least part of the data drift associated with data that is input into the trained machine learning model (20).

3. The method of any one of Claims 1 and 2, further comprising one or more of: obtaining output values of the compensation function that is applied to the input data stream; providing the output values of the compensation function as input values to the trained machine learning model (20); and obtaining an output from the machine learning system (12) according to, at least in part, the compensation function.

4. The method of any one of Claims 1-3, wherein the at least one of the detecting and predicting the data drift further comprises: detecting the data drift in the input data stream using a drift detector model

(32). 5. The method of Claim 4, wherein one or more of: the drift detector model (32) is configured to map at least one training window to at least one feature and at least one corresponding drift time of the at least one feature; and the drift detector model (32) is configured to detect whether at least one feature is drifting during at least one data window, the at least one data window being based at least in part on the at least one training window that is mapped to the at least one feature.

6. The method of any one of Claims 1-5, wherein the at least one of the detecting and predicting the data drift further comprises: predicting the data drift in the input data stream using a drift predictor model

(34).

7. The method of Claim 6, wherein one or more of: the drift predictor model (34) is configured to estimate at least one next value and a distribution of at least one feature based on the input data stream; the drift predictor model (34) is configured to predict whether at least one feature is drifting based at least in part on at least one predetermined drift pattern and the estimated next value and the distribution; and the at least one predetermined drift pattern includes one or more of a sudden drift pattern, a gradual drift pattern, an incremental drift pattern and a reoccurring drift pattern.

8. The method of any one of Claims 1-7, further comprising: determining whether an amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds a predetermined drift threshold.

9. The method of Claim 8, further comprising: if the amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds the predetermined drift threshold, initiating retraining a machine learning model associated with the data drift.

10. The method of any one of Claims 1- 9, wherein applying the compensation function to the input data stream further comprises: applying the compensation function to the input data stream if the amount of the at least one of the detected data drift and the predicted data drift does not at least meet or exceed a predetermined drift threshold.

11. The method of any one of Claims 1-10, wherein the determining the compensation function further comprises: determining at least one action that offsets at least part of the data drift.

12. A machine learning system (12), the machine learning system (12) comprising processing circuitry (52), the processing circuitry (52) configured to: a. at least one of detect and predict a data drift in an input data stream of the machine learning system (12); b. determine a compensation function, the compensation function configured to offset at least part of the data drift, the compensation function being based at least in part on the at least one of the detection and the prediction; and c. apply the compensation function to the input data stream to offset at least part of the data drift.

13. The machine learning system (12) of Claim 12, wherein the processing circuitry (52) is further configured to apply the compensation function to the input data stream by being configured to: apply the compensation function to the input data stream of a trained machine learning model (20) to compensate for the at least part of the data drift, the at least part of the data drift associated with data that is input into the trained machine learning model (20). 14. The machine learning system (12) of any one of Claims 12 and 13, wherein the processing circuitry (52) is further configured to one or more of: obtain output values of the compensation function that is applied to the input data stream; provide the output values of the compensation function as input values to the trained machine learning model (20); and obtain an output from the machine learning system (12) according to, at least in part, the compensation function.

15. The machine learning system (12) of any one of Claims 12-14, wherein the processing circuitry (52) is further configured to at least one of detect and predict the data drift by being further configured to: detect the data drift in the input data stream using a drift detector model (32).

16. The machine learning system (12) of Claim 15, wherein one or more of: the drift detector model (32) is configured to map at least one training window to at least one feature and at least one corresponding drift time of the at least one feature; and the drift detector model (32) is configured to detect whether at least one feature is drifting during at least one data window, the at least one data window being based at least in part on the at least one training window that is mapped to the at least one feature.

17. The machine learning system (12) of any one of Claims 12-16, wherein the processing circuitry (52) is further configured to at least one of detect and predict the data drift by being further configured to: predict the data drift in the input data stream using a drift predictor model

(34).

18. The machine learning system (12) of Claim 17, wherein one or more of: the drift predictor model (34) is configured to estimate at least one next value and a distribution of at least one feature based on the input data stream; the drift predictor model (34) is configured to predict whether at least one feature is drifting based at least in part on at least one predetermined drift pattern and the estimated next value and the distribution; and the at least one predetermined drift pattern includes one or more of a sudden drift pattern, a gradual drift pattern, an incremental drift pattern and a reoccurring drift pattern.

19. The machine learning system (12) of any one of Claims 12-18, wherein the processing circuitry (52) is further configured to: determine whether an amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds a predetermined drift threshold.

20. The machine learning system (12) of Claim 19, wherein the processing circuitry (52) is further configured to: if the amount of the at least one of the detected data drift and the predicted data drift at least meets or exceeds the predetermined drift threshold, initiate retraining a machine learning model associated with the data drift.

21. The machine learning system (12) of any one of Claims 12- 20, wherein the processing circuitry (52) is further configured to apply the compensation function to the input data stream by being configured to: apply the compensation function to the input data stream if the amount of the at least one of the detected data drift and the predicted data drift does not at least meet or exceed a predetermined drift threshold.

22. The machine learning system (12) of any one of Claims 12-21, wherein the processing circuitry (52) is further configured to determine the compensation function by being configured to: determine at least one action that offsets at least part of the data drift.