US20220284245A1 - Randomized method for improving approximations for nonlinear support vector machines - Google Patents
Randomized method for improving approximations for nonlinear support vector machines Download PDFInfo
- Publication number
- US20220284245A1 US20220284245A1 US17/191,379 US202117191379A US2022284245A1 US 20220284245 A1 US20220284245 A1 US 20220284245A1 US 202117191379 A US202117191379 A US 202117191379A US 2022284245 A1 US2022284245 A1 US 2022284245A1
- Authority
- US
- United States
- Prior art keywords
- monitored system
- svm model
- monitored
- support vectors
- svm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 40
- 238000012706 support-vector machine Methods 0.000 title description 10
- 238000012549 training Methods 0.000 claims abstract description 39
- 230000009471 action Effects 0.000 claims abstract description 23
- 239000013598 vector Substances 0.000 claims description 50
- 238000005457 optimization Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000012423 maintenance Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 description 10
- 230000009977 dual effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000005192 partition Methods 0.000 description 4
- 238000009472 formulation Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G06K9/6269—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2193—Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G06K9/6257—
-
- G06K9/6265—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the disclosed embodiments generally relate to techniques for improving the performance of supervised-learning models, such as support vector machines (SVMs). More specifically, the disclosed embodiments provide a randomized technique that iteratively improves approximations for nonlinear SVM models.
- SVMs support vector machines
- Support vector machines comprise a popular class of supervised machine-learning techniques, which can be used for both classification and regression purposes.
- the task of allocating and computing the associated large kernels e.g., Gaussian
- the complexity of an SVM solution technique grows quadratically in memory space and cubically in running time as a function of the number of observations in the data set. This means it is impractical to use SVMs for larger data sets with more than hundreds of thousands of observations, which are becoming increasingly common in many application domains.
- the disclosed embodiments relate to a system that improves operation of a monitored system.
- the system uses a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest.
- the system makes approximations to reduce computing costs, wherein the approximations involve stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model.
- the system uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system. When one or more conditions-of-interest are detected, the system performs an action to improve operation of the monitored system.
- the system uses a block-diagonal approximation to initialize an active set of support vectors for the SVM model.
- the system iteratively performs the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount.
- the system randomly selects additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model.
- the system solves a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors.
- the system updates the active support vectors with the new active set of support vectors.
- the SVM model is formulated based on one of the following types of kernels: a linear kernel; a polynomial kernel; a hyperbolic tangent kernel; and a radial basis function kernel.
- the monitored system comprises one of the following: a computer system; a database system; a website; an online customer-support system; a vehicle; an aircraft; a utility system asset; and a piece of machinery.
- the data points received from the monitored system include one or more of the following: time-series sensor signals; computer parameters; textual data; numerical data; and image data.
- detecting the one or more conditions-of-interest involves detecting one or more of the following: an impending failure of the monitored system; a malicious-intrusion event in the monitored system; a preventive-maintenance condition for the monitored system; a fraud condition for the monitored system; a product-purchasing condition for the monitored system; and a consumer-attrition condition for the monitored system.
- performing the action to improve operation of the monitored system involves one or more of the following: sending a notification to an administrator of the monitored system; performing an action to stop a malicious-intrusion event in the monitored system; scheduling a maintenance operation for the monitored system; performing an action to stop an instance of fraud associated with the monitored system; performing an action to make relevant offers to customers associated with the monitored system; and performing an action to improve satisfaction of a customer associated with the monitored system.
- FIG. 1 illustrates an exemplary computing environment including an application and associated customer-support system in accordance with the disclosed embodiments.
- FIG. 2 illustrates an exemplary prognostic-surveillance system, which operates on time-series signals obtained from sensors in a monitored system, in accordance with the disclosed embodiments.
- FIG. 3A illustrates a maximum margin separating hyperplane for a linear kernel SVM in accordance with the disclosed embodiments.
- FIG. 3B illustrates exemplary classes that are not linearly separable in accordance with the disclosed embodiments.
- FIG. 4 illustrates an exemplary block-diagonal matrix in accordance with the disclosed embodiments.
- FIG. 5 presents pseudocode for a nonlinear kernel SVM in accordance with the disclosed embodiments.
- FIG. 6 presents a flowchart illustrating operations the system performs to improve operation of the monitored system in accordance with the disclosed embodiments.
- FIG. 7 presents a flowchart illustrating the process of training the SVM model in accordance with the disclosed embodiments.
- the data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system.
- the computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
- the methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above.
- a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
- the methods and processes described below can be included in hardware modules.
- the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
- ASIC application-specific integrated circuit
- FPGAs field-programmable gate arrays
- FIG. 1 illustrates an exemplary computing system 100 , which includes an application 120 and a customer-support system 124 in accordance with the disclosed embodiments.
- Application 120 is provided by an organization, such as a commercial enterprise, to enable customers 102 - 104 to perform various operations associated with the organization, or to access one or more services provided by the organization.
- application 120 can include online accounting software that customers 102 - 104 can access to prepare and file tax returns online.
- application 120 provides a commercial website for selling merchandise. Note that application 120 can be hosted on a local or remote server.
- customer-support system 124 receives various signals from application 120 and associated database system 122 .
- customer-support system 124 analyzes these signals using an associated SVM model 126 to produce information, which is presented to an analyst 111 through client system 115 to facilitate interactions with customers 102 - 104 .
- SVM model 126 can perform a classification operation based on the signals received from application 120 and database 122 to detect: a possible malicious-intrusion event; a possible fraudulent transaction; or a set of customer interactions that indicate possible dissatisfaction of a customer.
- a notification about a detected problem can be presented to analyst 111 , which enables analyst 111 to take action to remedy the problem.
- prognostic-surveillance system 200 operates on a set of time-series sensor signals 204 obtained from sensors in monitored system 202 .
- monitored system 202 can generally include any type of machinery or facility, which includes sensors and generates time-series signals.
- time-series signals 204 can originate from any type of sensor, which can be located in a component in monitored system 202 , including: a voltage sensor; a current sensor; a pressure sensor; a rotational speed sensor; and a vibration sensor.
- time-series signals 204 feed into a time-series database 206 , which stores the time-series signals 204 for subsequent analysis.
- the time-series signals 204 either feed directly from monitored system 202 or from time-series database 206 into analysis module 208 .
- Analysis module 208 uses an associated SVM model 210 to analyze time-series signals 204 to detect various problematic conditions for monitored system 200 .
- analysis module 208 can be used to detect: an impending failure of the monitored system 202 ; a malicious-intrusion event in monitored system 202 ; or a condition indicating that preventive maintenance is required for the monitored system 202 .
- a notification about a detected problem can then be sent to analyst 212 , which enables analyst 212 to take action to remedy the problem.
- d(x) is the distance from x to the hyperplane
- ⁇ , v, ⁇ >0 are associated parameters.
- IPMs predictor-corrector Interior-Point Methods
- a parallel distributed IPM implementation can handle billions of observations, a relatively large number of features, including high cardinality factors.
- predictor-corrector interior-point techniques exhibit fast and robust convergence and are among the most accurate techniques.
- IPMs have just a few user-controlled parameters (e.g., primal and dual infeasibility measures, maximum number of iterations); their default values are usually good in practice, and do not require tweaking.
- a careful IPM implementation is a powerful and reliable optimization engine.
- a nonlinear SVM (in its dual form) can be formulated as follows
- x i are data samples (observations)
- M is the number of observations
- y i are class labels
- C is the misclassification penalty
- k( ⁇ , ⁇ ) is the nonlinear kernel function
- x is the vector of search variables
- Q is a symmetric positive-semidefinite matrix
- c represents the linear part of the objective function
- l is the vector of lower bounds
- u is the vector of upper bounds
- A is a matrix of linear equality constraints.
- d 1 , and d 2 are dual variables associated with the lower and upper bounds correspondingly, and y is the vector of dual variables associated with the linear equality constraints.
- the predictor-corrector interior-point algorithm will solve (twice at each step) the following system of equations, known as the reduced Karush-Kuhn-Tucker (KKT) system:
- each partition X p does not necessarily have the same number of rows, correspondingly Q p can be of different sizes. See FIG. 4 , which presents an example of a block-diagonal matrix, wherein Q 1 , Q2 and Q 3 are square matrices of any size, which capture all nonzero elements.
- ⁇ , v, ⁇ >0 are associated parameters, whose values can be chosen via, e.g., Bayesian optimization.
- At the next step we solve the nonlinear SVM model on the union ⁇ 0 . This procedure can be repeated a number of times.
- the stopping criteria can be
- FIG. 6 presents a flowchart illustrating operations the system performs to improve operation of a monitored system in accordance with the disclosed embodiments.
- the system uses a training data set comprising labeled data points received from the monitored system to train the training data set.
- the system uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system (step 606 ).
- the system performs an action to improve operation of the monitored system (step 608 ).
- FIG. 7 presents a flowchart illustrating the process of training the SVM model in accordance with the disclosed embodiments.
- the system uses a block-diagonal approximation to initialize an active set of support vectors for the SVM model (step 702 ).
- the system iteratively performs the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount.
- the system randomly selects additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model (step 704 ).
- the system solves a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors (step 706 ). If the new active set of support vectors produces fewer misclassifications than the active set of support vectors, the system updates the active support vectors with the new active set of support vectors (step 708 ).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Multimedia (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosed embodiments relate to a system that improves operation of a monitored system. During a training mode, the system uses a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest. While training the SVM model, the system makes approximations to reduce computing costs, wherein the approximations involve stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model. Next, during a surveillance mode, the system uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system. When one or more conditions-of-interest are detected, the system performs an action to improve operation of the monitored system.
Description
- The disclosed embodiments generally relate to techniques for improving the performance of supervised-learning models, such as support vector machines (SVMs). More specifically, the disclosed embodiments provide a randomized technique that iteratively improves approximations for nonlinear SVM models.
- Support vector machines (SVMs) comprise a popular class of supervised machine-learning techniques, which can be used for both classification and regression purposes. For large scale data sets, the task of allocating and computing the associated large kernels (e.g., Gaussian), which are used to solve the SVM model, becomes prohibitively expensive. More specifically, for such nonlinear kernels, the complexity of an SVM solution technique grows quadratically in memory space and cubically in running time as a function of the number of observations in the data set. This means it is impractical to use SVMs for larger data sets with more than hundreds of thousands of observations, which are becoming increasingly common in many application domains.
- To remedy this computing-cost problem, people perform various types of approximations, such as: sampling data points; computing block-diagonal approximations for nonlinear kernels; and performing incomplete Cholesky factorizations. These approximations can significantly reduce computation costs, which makes it practical to analyze large data sets. Unfortunately, the use of such approximations generally produces suboptimal results during classification and regression operations. Moreover, there presently do not exist any techniques for effectively improving these suboptimal results.
- Hence, what is needed is a technique for improving approximations for nonlinear SVMs.
- The disclosed embodiments relate to a system that improves operation of a monitored system. During a training mode, the system uses a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest. While training the SVM model, the system makes approximations to reduce computing costs, wherein the approximations involve stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model. Next, during a surveillance mode, the system uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system. When one or more conditions-of-interest are detected, the system performs an action to improve operation of the monitored system.
- In some embodiments, while training the SVM model, the system uses a block-diagonal approximation to initialize an active set of support vectors for the SVM model. Next, the system iteratively performs the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount. First, the system randomly selects additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model. The system then solves a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors. Then, if the new active set of support vectors produces fewer misclassifications than the active set of support vectors, the system updates the active support vectors with the new active set of support vectors.
- In some embodiments, while randomly selecting the additional points, the system selects an additional point x from the training data set with a probability P(x)=(μ+v d(x))−β, wherein d(x) represents a distance from x to the separating hyperplane, and β, v and β represent associated parameters.
- In some embodiments, the SVM model is formulated based on one of the following types of kernels: a linear kernel; a polynomial kernel; a hyperbolic tangent kernel; and a radial basis function kernel.
- In some embodiments, the monitored system comprises one of the following: a computer system; a database system; a website; an online customer-support system; a vehicle; an aircraft; a utility system asset; and a piece of machinery.
- In some embodiments, the data points received from the monitored system include one or more of the following: time-series sensor signals; computer parameters; textual data; numerical data; and image data. In some embodiments, detecting the one or more conditions-of-interest involves detecting one or more of the following: an impending failure of the monitored system; a malicious-intrusion event in the monitored system; a preventive-maintenance condition for the monitored system; a fraud condition for the monitored system; a product-purchasing condition for the monitored system; and a consumer-attrition condition for the monitored system.
- In some embodiments, performing the action to improve operation of the monitored system involves one or more of the following: sending a notification to an administrator of the monitored system; performing an action to stop a malicious-intrusion event in the monitored system; scheduling a maintenance operation for the monitored system; performing an action to stop an instance of fraud associated with the monitored system; performing an action to make relevant offers to customers associated with the monitored system; and performing an action to improve satisfaction of a customer associated with the monitored system.
-
FIG. 1 illustrates an exemplary computing environment including an application and associated customer-support system in accordance with the disclosed embodiments. -
FIG. 2 illustrates an exemplary prognostic-surveillance system, which operates on time-series signals obtained from sensors in a monitored system, in accordance with the disclosed embodiments. -
FIG. 3A illustrates a maximum margin separating hyperplane for a linear kernel SVM in accordance with the disclosed embodiments. -
FIG. 3B illustrates exemplary classes that are not linearly separable in accordance with the disclosed embodiments. -
FIG. 4 illustrates an exemplary block-diagonal matrix in accordance with the disclosed embodiments. -
FIG. 5 presents pseudocode for a nonlinear kernel SVM in accordance with the disclosed embodiments. -
FIG. 6 presents a flowchart illustrating operations the system performs to improve operation of the monitored system in accordance with the disclosed embodiments. -
FIG. 7 presents a flowchart illustrating the process of training the SVM model in accordance with the disclosed embodiments. - The following description is presented to enable any person skilled in the art to make and use the present embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present embodiments. Thus, the present embodiments are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.
- The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
- The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium. Furthermore, the methods and processes described below can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
-
FIG. 1 illustrates anexemplary computing system 100, which includes anapplication 120 and a customer-support system 124 in accordance with the disclosed embodiments. Withincomputing system 100, a number of customers 102-104 interact withapplication 120 through client systems 112-114, respectively.Application 120 is provided by an organization, such as a commercial enterprise, to enable customers 102-104 to perform various operations associated with the organization, or to access one or more services provided by the organization. For example,application 120 can include online accounting software that customers 102-104 can access to prepare and file tax returns online. In another example,application 120 provides a commercial website for selling merchandise. Note thatapplication 120 can be hosted on a local or remote server. - During operation, customer-
support system 124 receives various signals fromapplication 120 and associateddatabase system 122. Next, customer-support system 124 analyzes these signals using an associatedSVM model 126 to produce information, which is presented to ananalyst 111 throughclient system 115 to facilitate interactions with customers 102-104. For example,SVM model 126 can perform a classification operation based on the signals received fromapplication 120 anddatabase 122 to detect: a possible malicious-intrusion event; a possible fraudulent transaction; or a set of customer interactions that indicate possible dissatisfaction of a customer. Finally, a notification about a detected problem can be presented toanalyst 111, which enablesanalyst 111 to take action to remedy the problem. - An SVM model can also be used to facilitate the operation of a prognostic-surveillance system. As illustrated in
FIG. 2 , prognostic-surveillance system 200 operates on a set of time-series sensor signals 204 obtained from sensors in monitoredsystem 202. Note that monitoredsystem 202 can generally include any type of machinery or facility, which includes sensors and generates time-series signals. Moreover, time-series signals 204 can originate from any type of sensor, which can be located in a component in monitoredsystem 202, including: a voltage sensor; a current sensor; a pressure sensor; a rotational speed sensor; and a vibration sensor. - During operation of prognostic-
surveillance system 200, time-series signals 204 feed into a time-series database 206, which stores the time-series signals 204 for subsequent analysis. Next, the time-series signals 204 either feed directly from monitoredsystem 202 or from time-series database 206 intoanalysis module 208.Analysis module 208 uses an associatedSVM model 210 to analyze time-series signals 204 to detect various problematic conditions for monitoredsystem 200. For example,analysis module 208 can be used to detect: an impending failure of the monitoredsystem 202; a malicious-intrusion event in monitoredsystem 202; or a condition indicating that preventive maintenance is required for the monitoredsystem 202. A notification about a detected problem can then be sent toanalyst 212, which enablesanalyst 212 to take action to remedy the problem. - We now present details of our new randomized technique that iteratively improves approximations to support nonlinear SVMs. As mentioned above, for large scale data sets, allocating and computing a nonlinear (e.g., Gaussian) kernel for an SVM is often prohibitively expensive. To address the problem, we propose a novel technique. In the first step, it constructs a block-diagonal approximation of the kernel to find an initial set of support vectors S. It then generates new random samples of observations based on their proximity to the separating hyperplane, which improves S after each iteration.
- Let X be the input data set. Any point, which is not a support vector X∈X\S, can be safely dropped from the SVM model, because an inactive constraint can be dropped from an optimization problem without changing the optimal solution. Once an initial set of support vectors S has been found, we first drop all X\S points from the data set. It is intuitively clear that any point that is too far from the separating hyperplane (in the transformed feature space) has little chance of ever entering the set of optimal support vectors. Therefore, at the next iteration of our technique, we add points with probability
-
P(x)=(μ+vd(x))−β (1) - where d(x) is the distance from x to the hyperplane, and μ, v, β>0 are associated parameters. In other words, the closer the point is to the current separating hyperplane, the greater the chance it will be added back to the model. Then we solve the new model, and repeat.
- Let us illustrate our approach on the airline on-time data set. Because it has approximately 123 million observations, solving a nonlinear SVM is out of the question (because it is impractical with existing technology to allocate a 123 million-by-123 million square matrix). So we first construct a block-diagonal approximation to find an initial set of support vectors S0. Say, for example, S0 has 300 support vectors, which approximate the optimal solution (the optimal set of support vectors). We, of course, cannot allocate a nonlinear kernel for the original data set, but for, say, a 10,300-observation data set, we surely can. So at the next step, we randomly choose 10,000 observations X0, such that the probability of an observation to be added to the new model is given by formula (1), and solve the SVM model on S0∪X0 observations, which gives us S1. The process is then repeated until some stopping criteria are met.
- Imagine we have two sets of points and wish to construct a maximum margin separating hyperplane (see
FIG. 3A ). This model is known as linear SVM. Linear SVM models can be solved very effectively by modern predictor-corrector Interior-Point Methods (IPMs). A parallel distributed IPM implementation can handle billions of observations, a relatively large number of features, including high cardinality factors. Generally speaking, predictor-corrector interior-point techniques exhibit fast and robust convergence and are among the most accurate techniques. In addition to that, IPMs have just a few user-controlled parameters (e.g., primal and dual infeasibility measures, maximum number of iterations); their default values are usually good in practice, and do not require tweaking. A careful IPM implementation is a powerful and reliable optimization engine. - Whenever the classes are not linearly separable (see
FIG. 3B ), a nonlinear kernel SVM can be an effective solution. However, in stark contrast to the linear SVM, a nonlinear kernel SVM is often a remarkably more challenging problem. A nonlinear SVM (in its dual form) can be formulated as follows -
- where xi are data samples (observations), M is the number of observations, yi are class labels, C is the misclassification penalty, and k(·, ·) is the nonlinear kernel function.
- Commonly, the following kernels are used in practice:
-
- linear kernel k(xi, xj)=xi Txj
- polynomial kernel k(xi, xj)=(1+xi Txj)d for some d>0
- radial basis function k(xi, xj)=exp(−γ∥xi−xj∥2) for some γ>0
- The biggest challenge in the (2) formulation lies in constructing the quadratic matrix Q: qij≡k(xi, xj). Q can become prohibitively large even for medium data set sizes. To illustrate this, let us consider a one million observation data set, which nowadays would be viewed as rather small. It will require 3.7 terabytes to store the lower (or upper) triangular part of Q. Note, this number (3.7 terabytes) does not depend upon the number of columns in the data set, because Q∈ M×M, and it grows quadratically with the number of observations M.
- In this section we give a brief overview of the predictor-corrector interior-point method for SVM. As stated earlier, a nonlinear SVM formulation is a classical quadratic programming (QP) model. Let us consider the following standard QP formulation, which is identical to (2), except we no longer use SVM specific notation, but switch to the standard QP nomenclature:
-
- here x is the vector of search variables, Q is a symmetric positive-semidefinite matrix, c represents the linear part of the objective function, l is the vector of lower bounds, u is the vector of upper bounds, and A is a matrix of linear equality constraints.
- The dual program to (3) can be stated as follows:
-
- where d1, and d2 are dual variables associated with the lower and upper bounds correspondingly, and y is the vector of dual variables associated with the linear equality constraints.
- The predictor-corrector interior-point algorithm will solve (twice at each step) the following system of equations, known as the reduced Karush-Kuhn-Tucker (KKT) system:
-
- where the right-hand sides ρ1 and ρ2 are defined as follows:
-
- During the predictor step, u and the delta terms are dropped, and the resultant system is solved for the initial estimate of the delta terms. During the corrector step, an estimate of the μ is reinstated to the system, along with nonlinear delta terms and the system is solved again.
- To solve KKT, one has to compute the Cholesky factorization
-
- and then proceed to solve for Δy
-
AL −T L −1 A T Δy=−ρ 2 −AL −T L −1ρ1 (7) - and, finally, restore Δx
-
Δx=L −T L −1(ρ1 A T Δy) (8) - Of course, no explicit inverses of the lower L, and upper LT triangular matrices are computed; instead, one carries out forward and backward substitutions.
- Now we must recall that Q (the SVM kernel matrix) can be prohibitively large; and for most medium to large scale data inputs, it simply cannot be allocated. We next provide an approximation to the nonlinear SVM model, and then show how to improve it.
- We consider the most typical case: “tall and skinny” matrices, where M»N. When storing such matrices on a cluster of compute nodes, X is usually partitioned into a collection of row blocks
-
- where Xp∈ M
p ×N. Granularity of each partition and their number can be arbitrary. By reducing the number of rows in each partition (we can always increase the number of partitions P), we can assume that for each row block Xp, its corresponding part of the nonlinear kernel Qp=k(xi,xj), ∀xi,xj∈Xp can also be stored in memory. In other words, instead of the full matrix Q (which we cannot allocate for all except for the smallest of input data sets), we store only its block-diagonal part -
- Note that because each partition Xp does not necessarily have the same number of rows, correspondingly Qp can be of different sizes. See
FIG. 4 , which presents an example of a block-diagonal matrix, wherein Q1, Q2 and Q3 are square matrices of any size, which capture all nonzero elements. - Some of the obvious properties of the {tilde over (Q)} matrix:
-
- it is also positive-semidefinite
- its inverse is also a block-diagonal matrix, of the same shape
- a Cholesky factorization, see (6)
-
- is carried out by each worker independently (“embarrassingly parallel method”).
- Introducing Q into the reduced KKT system (5) makes it tractable to store and solve. Understandably, we would not be solving the original nonlinear SVM model, but its block-diagonal approximation, which we will denote dSVM, where ‘d’ stands for “diagonal”.
- Having solved dSVM, we found a set of support vectors, which to some extent approximate the optimal solution. Let us consider a hyperplane wTx+b=0 and an arbitrary observation g. The distance from g to the hyperplane is given by
-
- It is intuitively clear, if the distance d is large, the chance of g being a support vector is small; therefore, we do not need to keep the observation in the optimization model. In the transformed feature space, the core expression |wTy+b| translates to
-
- Let S be the initial set of support vectors, obtained by solving the dSVM. To improve it, we randomly choose N (e.g., N=20000) observations from the input data set X, where each point x is drawn with probability
-
-
- 1. maximum number of models (maxIterations>0)
- 2. minimal improvement of the solution quality (0<minProgress<1)
The resultant technique is illustrated by the pseudocode which appears inFIG. 5 .
-
FIG. 6 presents a flowchart illustrating operations the system performs to improve operation of a monitored system in accordance with the disclosed embodiments. During a training mode, the system uses a training data set comprising labeled data points received from the monitored system to train the - SVM to detect one or more conditions-of-interest (step 602). While training the SVM model, the system makes approximations to reduce computing costs, wherein the approximations involve stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model (step 604). Next, during a surveillance mode, the system uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system (step 606). When one or more conditions-of-interest are detected, the system performs an action to improve operation of the monitored system (step 608).
-
FIG. 7 presents a flowchart illustrating the process of training the SVM model in accordance with the disclosed embodiments. First, the system uses a block-diagonal approximation to initialize an active set of support vectors for the SVM model (step 702). Next, the system iteratively performs the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount. First, the system randomly selects additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model (step 704). Next, the system solves a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors (step 706). If the new active set of support vectors produces fewer misclassifications than the active set of support vectors, the system updates the active support vectors with the new active set of support vectors (step 708). - We propose using a block-diagonal approximation to produce an initial set of support vectors. We also propose a way to generate random samples, which provides a higher probability of inclusion for points that are closer to the separating hyperplane (in the transformed feature space). Indeed, the standard way of solving large scale SVM models today would focus on random sampling of the input data, which produces significantly lower model accuracy than our new technique.
- Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
- The foregoing descriptions of embodiments have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present description to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present description. The scope of the present description is defined by the appended claims.
Claims (20)
1. A method for improving operation of a monitored system, comprising:
during a training mode,
using a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest, and
while training the SVM model, making approximations to reduce computing costs, wherein making the approximations comprises stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model; and
during a surveillance mode,
using the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system, and
when one or more conditions-of-interest are detected, performing an action to improve operation of the monitored system.
2. The method of claim 1 , wherein while training the SVM model, the method performs the following operations:
using a block-diagonal approximation to initialize an active set of support vectors for the SVM model; and
iteratively performing the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount,
randomly selecting additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model,
solving a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors, and
if the new active set of support vectors produces fewer misclassifications than the active set of support vectors, updating the active support vectors with the new active set of support vectors.
3. The method of claim 2 , wherein while randomly selecting the additional points, the method selects an additional point x from the training data set with a probability P(x)=(μ+v d(x))−β, wherein d(x) represents a distance from x to the separating hyperplane, and μ, v and β represent associated parameters.
4. The method of claim 1 , wherein the SVM model is formulated based on one of the following types of kernels:
a linear kernel;
a polynomial kernel;
a hyperbolic tangent kernel; and
a radial basis function kernel.
5. The method of claim 1 , wherein the monitored system comprises one of the following:
a computer system;
a database system;
a website;
an online customer-support system;
a vehicle;
an aircraft;
a utility system asset; and
a piece of machinery.
6. The method of claim 1 , wherein data points received from the monitored system include one or more of the following:
time-series sensor signals;
computer parameters;
textual data;
numerical data; and
image data.
7. The method of claim 1 , wherein detecting the one or more conditions-of-interest comprises detecting one or more of the following:
an impending failure of the monitored system;
a malicious-intrusion event in the monitored system;
a preventive-maintenance condition for the monitored system;
a fraud condition for the monitored system;
a product-purchasing condition for the monitored system; and
a consumer-attrition condition for the monitored system.
8. The method of claim 1 , wherein performing the action to improve operation of the monitored system comprises one or more of the following:
sending a notification to an administrator of the monitored system;
performing an action to stop a malicious-intrusion event in the monitored system;
scheduling a maintenance operation for the monitored system;
performing an action to stop an instance of fraud associated with the monitored system;
performing an action to make relevant offers to customers associated with the monitored system; and
performing an action to improve satisfaction of a customer associated with the monitored system.
9. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for improving operation of a monitored system, the method comprising:
during a training mode,
using a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest, and
while training the SVM model, making approximations to reduce computing costs, wherein making the approximations comprises stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model; and
during a surveillance mode,
using the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system, and
when one or more conditions-of-interest are detected, performing an action to improve operation of the monitored system.
10. The non-transitory computer-readable storage medium of claim 9 , wherein while training the SVM model, the method performs the following operations:
using a block-diagonal approximation to initialize an active set of support vectors for the SVM model; and
iteratively performing the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount,
randomly selecting additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model,
solving a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors, and
if the new active set of support vectors produces fewer misclassifications than the active set of support vectors, updating the active support vectors with the new active set of support vectors.
11. The non-transitory computer-readable storage medium of claim 10 , wherein while randomly selecting the additional points, the method selects an additional point x from the training data set with a probability P(x)=(μ+v d(x))−β, wherein d(x) represents a distance from x to the separating hyperplane, and μ, v and β represent associated parameters.
12. The non-transitory computer-readable storage medium of claim 9 , wherein the SVM model is formulated based on one of the following types of kernels:
a linear kernel;
a polynomial kernel;
a hyperbolic tangent kernel; and
a radial basis function kernel.
13. The non-transitory computer-readable storage medium of claim 9 , wherein the monitored system comprises one of the following:
a computer system;
a database system;
a website;
an online customer-support system;
a vehicle;
an aircraft;
a utility system asset; and
a piece of machinery.
14. The non-transitory computer-readable storage medium of claim 9 , wherein data points received from the monitored system include one or more of the following:
time-series sensor signals;
computer parameters;
textual data;
numerical data; and
image data.
15. The non-transitory computer-readable storage medium of claim 9 , wherein detecting the one or more conditions-of-interest comprises detecting one or more of the following:
an impending failure of the monitored system;
a malicious-intrusion event in the monitored system;
a preventive-maintenance condition for the monitored system;
a fraud condition for the monitored system;
a product-purchasing condition for the monitored system; and
a consumer-attrition condition for the monitored system.
16. The non-transitory computer-readable storage medium of claim 9 , wherein performing the action to improve operation of the monitored system comprises one or more of the following:
sending a notification to an administrator of the monitored system;
performing an action to stop a malicious-intrusion event in the monitored system;
scheduling a maintenance operation for the monitored system;
performing an action to stop an instance of fraud associated with the monitored system;
performing an action to make relevant offers to customers associated with the monitored system; and
performing an action to improve satisfaction of a customer associated with the monitored system.
17. A system that improves operation of a monitored system, comprising:
at least one processor and at least one associated memory; and
an optimization mechanism that executes on the at least one processor,
wherein during a training mode, the optimization mechanism,
uses a training data set comprising labeled data points received from the monitored system to train the SVM to detect one or more conditions-of-interest, and
while training the SVM model, makes approximations to reduce computing costs, wherein making the approximations comprises stochastically discarding points from the training data set based on an inverse distance to a separating hyperplane for the SVM model; and
wherein during a surveillance mode, the optimization mechanism,
uses the trained SVM model to detect the one or more conditions-of-interest based on monitored data points received from the monitored system, and
when one or more conditions-of-interest are detected, performs an action to improve operation of the monitored system.
18. The system of claim 17 , wherein while training the SVM model, the optimization mechanism performs the following operations:
uses a block-diagonal approximation to initialize an active set of support vectors for the SVM model; and
iteratively performs the following operations to improve the SVM model while SVM misclassifications continue to decrease by more than a minimum amount,
randomly selecting additional points from the training data set based on an inverse distance to the separating hyperplane for the SVM model,
solving a nonlinear kernel for the SVM model based on the active set of support vectors and the additional data points to compute a new active set of support vectors, and
if the new active set of support vectors produces fewer misclassifications than the active set of support vectors, updating the active support vectors with the new active set of support vectors.
19. The system of claim 18 , wherein while randomly selecting the additional points, the optimization mechanism selects an additional point x from the training data set with a probability P(x)=(μ+v d(x))−β, wherein d(x) represents a distance from x to the separating hyperplane, and μ, v and β represent associated parameters.
20. The system of claim 17 , wherein the SVM model is formulated based on one of the following types of kernels:
a linear kernel;
a polynomial kernel;
a hyperbolic tangent kernel; and
a radial basis function kernel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/191,379 US20220284245A1 (en) | 2021-03-03 | 2021-03-03 | Randomized method for improving approximations for nonlinear support vector machines |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/191,379 US20220284245A1 (en) | 2021-03-03 | 2021-03-03 | Randomized method for improving approximations for nonlinear support vector machines |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220284245A1 true US20220284245A1 (en) | 2022-09-08 |
Family
ID=83117262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/191,379 Pending US20220284245A1 (en) | 2021-03-03 | 2021-03-03 | Randomized method for improving approximations for nonlinear support vector machines |
Country Status (1)
Country | Link |
---|---|
US (1) | US20220284245A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9336494B1 (en) * | 2012-08-20 | 2016-05-10 | Context Relevant, Inc. | Re-training a machine learning model |
US11507785B2 (en) * | 2020-04-30 | 2022-11-22 | Bae Systems Information And Electronic Systems Integration Inc. | Anomaly detection system using multi-layer support vector machines and method thereof |
-
2021
- 2021-03-03 US US17/191,379 patent/US20220284245A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9336494B1 (en) * | 2012-08-20 | 2016-05-10 | Context Relevant, Inc. | Re-training a machine learning model |
US11507785B2 (en) * | 2020-04-30 | 2022-11-22 | Bae Systems Information And Electronic Systems Integration Inc. | Anomaly detection system using multi-layer support vector machines and method thereof |
Non-Patent Citations (1)
Title |
---|
A Fast Parallel Optimization for Training Support Vector Machine, Dong et al. (Year: 2003) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10474959B2 (en) | Analytic system based on multiple task learning with incomplete data | |
US20180278640A1 (en) | Selecting representative metrics datasets for efficient detection of anomalous data | |
US8417648B2 (en) | Change analysis | |
US11902114B2 (en) | System and method for predicting and reducing subscriber churn | |
Dheepa et al. | Behavior based credit card fraud detection using support vector machines | |
US11657601B2 (en) | Methods, devices and systems for combining object detection models | |
US20200019840A1 (en) | Systems and methods for sequential event prediction with noise-contrastive estimation for marked temporal point process | |
US10699207B2 (en) | Analytic system based on multiple task learning with incomplete data | |
US20200394557A1 (en) | Systems and methods for machine classification and learning that is robust to unknown inputs | |
US20170154107A1 (en) | Determining term scores based on a modified inverse domain frequency | |
US10949747B1 (en) | Deep learning model training system | |
US10140345B1 (en) | System, method, and computer program for identifying significant records | |
US11151463B2 (en) | Distributable event prediction and machine learning recognition system | |
US20220374498A1 (en) | Data processing method of detecting and recovering missing values, outliers and patterns in tensor stream data | |
CN115964432A (en) | User operation abnormity analysis method and system based on big data visualization | |
US11195084B1 (en) | Neural network training system | |
US20220284245A1 (en) | Randomized method for improving approximations for nonlinear support vector machines | |
US11727109B2 (en) | Identifying adversarial attacks with advanced subset scanning | |
US11916927B2 (en) | Systems and methods for accelerating a disposition of digital dispute events in a machine learning-based digital threat mitigation platform | |
JP5135803B2 (en) | Optimal parameter search program, optimal parameter search device, and optimal parameter search method | |
US20230297430A1 (en) | Machine-Learning Model Retargeting | |
Wang et al. | Fair few-shot learning with auxiliary sets | |
US8489524B2 (en) | Systems and methods for turbo on-line one-class learning | |
US11210157B2 (en) | Output method and information processing apparatus | |
US20230385456A1 (en) | Automatic segmentation using hierarchical timeseries analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOLOVASHKIN, DMITRY;HORNICK, MARK F.;ARANCIBIA CODDOU, MARCOS R.;AND OTHERS;SIGNING DATES FROM 20210225 TO 20210226;REEL/FRAME:055594/0794 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |