WO2017217881A1

WO2017217881A1 - Acceleration of svm clustering technology using chebyshev iteration technique

Info

Publication number: WO2017217881A1
Application number: PCT/RU2016/000359
Authority: WO
Inventors: Mikhail Petrovich LEVIN; Alexander Nikolaevich Filippov; Xuecang ZHANG
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2017-12-21
Also published as: CN109416687B; CN109416687A

Abstract

A system for training a classifier for clustering a plurality of data items, comprising a processor adapted to profile a subset of a plurality of training samples to determine a layers number of training layers and values of a vector operator used by an iterative optimization process for evaluating separation plane parameter(s) of a clustering classifier, train the clustering classifier using a multi-layer scheme comprising a layers number of training layers each comprising a plurality of optimization iterations and output the clustering classifier for clustering new samples. Each optimization iteration comprises calculating optimal solutions for the separation plane parameter(s), each of the optimal solutions is calculated by applying the iterative optimization process to the plurality of training samples using a respective pair of Lagrange multipliers, evaluating the plurality of optimal solutions compared to optimality condition(s) and initiating a next optimization iteration in case the optimality condition(s) is not satisfied.

Description

I

APPLICATION FOR PATENT

Inventor(s): Mikhail PETROVICH LEVIN, Alexander FILIPPOV, Zhang

XUECANG 5

Title: ACCELERATION OF SVM CLUSTERING TECHNOLOGY USING

CHEBYSHEV ITERATION TECHNIQUE

BACKGROUND 10 The present invention relates to training a classifier for clustering a plurality of data items and, more specifically, but not exclusively, to training a classifier for clustering a plurality of data items using a hybrid scheme employing a multi-layer iteration technique combined with an iterative optimization process.

Classification of data items in general and clustering in particular has gained 15 increasing importance with the advancements in machine learning applications. The data items may relate to many types depending on the application of the learning machine, for example, visual objects, audio objects, big data items, research collected data items, experiments results data items and the likes. The clustering of the data items aims to divide the data items to clusters (groups) according to one or more 20 characteristics of the data items such that data items that substantially share one or more characteristics are clustered together.

One of the main practices for implementing clustering classifiers using learning machine algorithms is support vector machines (SVM) that may use different kernels and/or different measures types to provide a best fit for the clustered data. The SVM 25 training may be formulated as optimizing a quadratic programming problem (QPP) with constraints. This formulation may significantly speed up training of the SVM classifiers thus reducing processing resources and/or training session time. Recent progress in the SVM research introduced the possibility to replace the multi-dimension QPP optimization with an iterative two-dimensional problem QPP optimization that 30 may reduce complexity of the SVM training process. SUMMARY

According to an aspect of some embodiments of the present invention there is provided a system for training a classifier for clustering a plurality of data items, comprising a processor adapted to:

- Profile a subset of training samples selected from a plurality of training samples to determine a layers number of training layers and values of a vector operator used by an iterative optimization process for evaluating one or more separation plane parameters of a clustering classifier;

- Train the clustering classifier using a multi-layer technique comprising a layers number of training layers each comprising a plurality of optimization iterations, each optimization iteration comprising:

^■ Calculating a plurality of optimal solutions for the one or more separation plane parameters. Each of the optimal solutions is calculated by applying the iterative optimization process to the plurality of training samples using a respective pair of a plurality of pairs of Lagrange multipliers.

^■ Evaluating the plurality of optimal solutions compared to one or more optimality conditions.

^■ Initiating a next optimization iteration in case the one or more optimality conditions is not satisfied.

Output the clustering classifier for clustering new samples.

The calculation is done concurrently by a plurality of processing pipelines of the processor each executing independently the iterative optimization process to calculate a respective one of the plurality of optimal solutions using the respective pair of Lagrange multipliers.

The clustering classifier is a supervised vector machine (SVM) clustering classifier.

The one or more separation plane parameters define one or more separation planes separating between two or more clusters each comprising a respective portion of the training samples.

The iterative optimization process evaluates a quadratic programming problem (QPP) which is an equivalent formulation of a quadratic optimization problem (QOP) used for evaluating the one or more separation plane parameters. The QPP equivalent formulation is expressed through the plurality of Lagrange multipliers. The multi-layer technique employs a Chebyshev multi-layer technique.

The system of any of the previous claims, wherein the iterative optimization process employs a sequential minimization optimization (SMO) process.

The values of the vector operator include a minimum eigenvalue and/or a maximum eigenvalue.

The subset of training samples is selected randomly from the plurality of training samples.

The profile is an iterative process in which an alternative-variable descent minimization process is applied to the subset during each of a plurality of profiling iterations until the minimum eigenvalue and/or a maximum eigenvalue are identified.

The one or more optimality conditions is a Karush-Kuhn-Tucker (KKT) optimality condition.

For each of the plurality of training layers an improved optimization factor is applied to the iterative optimization process.

According to an aspect of some embodiments of the present invention there is provided a computer implemented method of creating a classifier for clustering a plurality of data items, comprising:

Profiling a subset of training samples selected from a plurality of training samples to determine a layers number of training layers and values of a vector operator used for an iterative optimization process for evaluating one or more separation plane parameters of a clustering classifier;

Training the clustering classifier using a multi-layer technique comprising a layers number of training layers each comprising a plurality of optimization iterations, each optimization iteration comprising:

■ Evaluating the plurality of optimal solutions compared to one or more optimality conditions.

■ Initiating a next optimization iteration in case the one or more optimality conditions is not satisfied.

Outputting the clustering classifier for classifying new samples. According to an aspect of some embodiments of the present invention there is provided a computer implemented method of clustering a plurality of data items using a trained clustering classifier, comprising:

Designating a plurality of data items;

Applying a clustering classifier for clustering the data items to two or more clusters by analyzing one or more characteristics of the data items with respect to one or more separation plane parameters learned during a training process; and

Outputting the plurality of data items arranged in the two or more clusters;

Wherein the training process employs a hybrid scheme combining a multi-layer technique with an iterative optimization process.

The multi-layer technique is a Chebyshev multi-layer technique.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.

In the drawings:

FIG. 1 is a flowchart of an exemplary process for training a clustering classifier through a hybrid multi-layer optimization process, according to some embodiments of the present invention;

FIG. 2 is a schematic illustration of an exemplary system for training a clustering classifier through a hybrid multi-layer optimization process, according to some embodiments of the present invention; and FIG. 3 is a flowchart of an exemplary process for clustering data items using a clustering classifier trained through a hybrid scheme, according to some embodiments of the present invention.

DETAILED DESCRIPTION

The present invention, in some embodiments thereof, relates to training a classifier for clustering a plurality of data items and, more specifically, but not exclusively, to training a classifier for clustering a plurality of data items using a hybrid scheme employing a multi-layer iteration technique combined with an iterative optimization process.

The present invention presents systems and methods for creating and/or training a clustering classifier such as, for example, an SVM clustering classifier using a hybrid scheme. The hybrid scheme combines a multi-layer iteration technique in which an iterative optimization process is executed through a plurality of iterations in each of the several layers (stages) to calculate an optimal solution for one or more separation plane parameters of a separation hyperplane for separating between two or more clusters of training samples. The separation plane parameter(s) define a distance between two or more of the training samples based on comparison of one or more characteristics of the training samples. The training samples may include, for example, visual objects, audio objects, text objects, big-data data items, research collected data items, experiments results data items and the likes. The multi-layer iteration scheme, for example, a multi -layer Chebyshev iteration technique is employed such that the iterative optimization process, for example, a sequential iterative minimization optimization (SMO) process is executed in several stages, i.e. layers. For each layer, an optimization factor dictating the required accuracy of the optimization is updated (improved) for the SMO process until one or more optimality conditions, for example, arush-Kuhn- Tucker (KKT) condition(s) are satisfied (met). This means that in case the optimality condition(s) are not satisfied the iterative optimization process is repeated in an additional layer with the improved optimization factor. The iterative SMO process is employed to solve a multi-dimensional quadratic optimization problem (QOP) formulated as a QPP in order to calculate optimal one or more separation plane parameters of the hyperplane. While the QOP is a multi-dimensional optimization problem with respect to multiple characteristics of the training samples, the QPP formulation allows solving the multi-dimensional problem as multiple two dimensional optimization problems using Lagrange multipliers over the training samples. The calculated optimal results for the two dimensional problems may then be combined. The iterative SMO is used for calculating the two dimensional optimization problems to calculate the optimal solution for the hyperplane separation parameter(s) 5 using pairs of Lagrange multipliers that are updated during the iterative optimization process.

Before applying the hybrid technique to the training samples, at least a subset of the training samples is profiled to estimate a number of layers required during the optimization process and/or to define one or more values for the vector operator used 10 by the SMO. The profiling is done since the training process directly depends on the type of the data items the clustering classifier needs to cluster. Therefore, the optimization process layers number as well as the vector operator's values, for example, a minimal and/or a maximal eigenvalue values must be first extracted based on the training samples by profiling them. The subset of training samples used for the 15 profiling may be randomly selected from the training samples. The profiling may be done using one or more methods, for example, an alternative-variable descent minimization for determining a type and/or characteristic(s) of the training samples.

Optionally, the calculation of the plurality of the two dimensional optimization problems is done concurrently (in parallel) by two or more processing cores, for 20 example, a processor, a core, a processing node, a vector processor, a thread and/or the like. Each of the processing cores may execute the iterative optimization process using a different pair of the Lagrange multipliers. The optimization may be simultaneously executed by the plurality of processing cores since each of the optimization processes is independent from the other optimization processes. 25

After the clustering classifier is created and/or trained, it may be provided for clustering a plurality of new (unseen) data items of the same type as the training samples possessing the same characteristic(s) as the training samples used for creating and/or training the clustering classifier.

The hybrid scheme for training the clustering classifier may present significant 30 advantages compared to currently existing training methods. The hybrid scheme may reduce significantly the number of iterations executed during the optimization process and thus achieve a faster convergence. The fast convergence to the optimization goal, i.e. the separation plane parameters) satisfying the optimality condition(s) may be achieved through the gradual refinement (improvement) of the optimization factor such that the optimization factor is improved for every additional layer only when the previous layer does not meet the optimization goal. Reducing the number of iterations may also reduce computation resources for identifying the optimal solution, for 5 example, computation time and/or computation load. This may be particularly beneficial for big data clustering classifiers.

In addition the possibility to reduce the QPP to a plurality of independent two dimensional optimization problems may allow concurrent (parallel and/or simultaneous) execution of multiple optimization process using multiple processing 10 cores, for example, processor(s), core(s), thread(s) and or the like. The processing cores may further include, vector processor(s), graphic processing unit(s) (GPU), single instruction multiple data (SIMD) engine(s) and/or the like. That way the SMO is actually performed in parallel by the plurality of processing cores such that in practice a parallel minimal optimization (PMO) is executed that may significantly improve the 15 convergence time and/or computation load involved with the optimization process for training the clustering classifier.

Moreover, improvements to clustering classifiers, for example, SVM clustering classifiers may be easily integrated into the hybrid scheme since the hybrid scheme may employ commonly used optimization algorithm(s) during the layered process. 20

The present invention, in some embodiments thereof, relates to using a classifier trained using the hybrid scheme training process for clustering a plurality of new data items not previously "seen" by the classifier. The clustering classifier trained through the hybrid scheme as presented above may be used for clustering a plurality of new data items of the same type as the training samples to two or more clusters. 25

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in 30 various ways.

The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic 5 storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium 10 or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.

The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote 15 computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic 20 circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference to 25 flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions. 30

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed 5 substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out 10 combinations of special purpose hardware and computer instructions.

Reference is now made to FIG. 1 , which is a flowchart of an exemplary process for training a clustering classifier through a hybrid multi-layer optimization process, according to some embodiments of the present invention. A process 100 for training a clustering classifier such as, for example, an SVM clustering classifier by applying a 15 hybrid optimization process that combines a multi-layer iteration with an iterative optimization process to calculate an optimal solution for one or more separation plane parameters of a hyperplane separating two or more clusters of data items. The separation plane parameter(s) define an optimized distance between two or more of the data items based on comparison of one or more characteristics of the data items. The 20 iterative optimization process is executed in a staged sequence, i.e. layers of the multilayer iteration, for example, a multi-layer Chebyshev iteration technique. During each layer, an optimization factor dictating the required accuracy of the optimization is updated (improved) for the iterative optimization process until one or more optimality conditions are satisfied (met). This means that in case the optimality condition(s), for 25 example, one or more KKT condition(s) are not satisfied, the iterative optimization process is repeated with the improved optimization factor. The iterative optimization process may employ SMO process for solving a multi-dimensional QOP formulated as a QPP to calculate optimal values for the separation plane parameter(s) of the hyperplane separating between cluster(s) of the training samples. The QPP is 30 formulated as multiple two dimensional optimization problems each addressing one of the characteristics of the training samples. For each of the two dimensional optimization problems, the iterative SMO is used to calculate the optimal solution for the hyperplane separation parameter(s) using a pair of Lagrange multipliers updated during the optimization process.

Reference is also made to FIG. 2, which is a schematic illustration of an exemplary system for training a clustering classifier through a hybrid multi-layer optimization process, according to some embodiments of the present invention. A 5 system 200 includes an input/output (I/O) interface 210 for designating training samples 202 and/or outputting a clustering classifier 204, a processor(s) 212 for training the clustering classifier 204 and a storage 216. The training samples 202 may include for example, visual objects, audio objects, text objects, big-data data items, research collected data items, experiments results data items and/or the like. The 10 clustering classifier 204 may be for example, an SVM clustering classifier. The I/O interface 210 may include one or more interfaces, for example, a network interface, a memory interface and/or a storage interface for connecting to the respective resource(s), i.e. network resources, memory resources and/or storage resources. The I/O interface 210 may be used for designating, receiving and/or fetching the training 15 samples 202 from, for example, the memory, the network, storage such as the storage 216 and/or the like. Similarly, the I/O interface 210 may be used for storing and/or transmitting the clustering classifier 204 to, for example, the memory, the network, the storage 216 and/or the like. The processor(s) 212, homogenous or heterogeneous, may be arranged for parallel processing, as clusters and/or as one or more multi core 20 processor(s) each having one or more processing cores 214. Each processing core 214 may be, for example, a processor, a processing core, a thread, a processing node and/or the like. The processor(s) 204 may comprise one or more distributed processing clusters each comprising one or more processing nodes with processors having one or more processing cores such as the processing core 214. The distributed processing 25 clusters may communicate with each other through one or more interfaces, for example, a network, a fabric, a direct interconnection, a link and/or the like. The processor(s) 212 may further include one or more vector processors each having multiple processing pipelines that may be considered as the processing cores 214 capable of independently executing program instructions. The processor(s) 212 may 30 further include one or more SIMD engines and/or GPUs capable of concurrently executing a similar instruction and/or process to a plurality of different data sets. For implementations where the same process is applied to different data sets as may be the case for the process 100, the SIMD engine may be considered as a processor having multiple processing cores such as the processing core 214. The storage 216 may include one or more non-transitory persistent storage devices, for example, a hard drive, a Flash array and the like. The storage 216 may further comprise one or more network storage devices, for example, a storage server, a network accessible storage (NAS) a network drive, and/or the like.

The optimization process 100 may be executed by one or more software modules, for example, a management module 220 and/or an optimizer 222 each comprising a plurality of program instructions executed by the processor(s) 104 and/or the processing cores 214 from the storage 216. Wherein a software module may be, for example, a process, an application, a utility and/or the like that comprising a plurality of program instructions stored in a non-transitory medium such as the storage 216 and executed by a processor such as the processor(s) 212 and/or the processing cores 214. The processor(s) 212 may execute the management module 220 to control the entire training process 100 and the optimizer 222 for executing the multiple SMO processes. Optionally, multiple optimizer 222 processes are executed by the plurality of processing cores 214 where each of the processing cores 214 executes an instance of the optimizer 222 such that the plurality of optimizer 222 processes are executed concurrently by the plurality of processing cores 214. The concurrent execution of the optimization processes in practice makes the SMO become a PMO.

Before further describing the present invention, some background is provided. The optimization problem (QOP) for searching the hyperplane used by the clustering classifier 204 for clustering the training samples 202 into two or more clusters may be formulated as shown in equation set 1 below.

Equation Set 1 :

(a) = w x— b

(b) y_{ = (w x^* - b)≥ 0 , 1 < i < n

Where u is the separation hyperplane that is eventually the hyperplane used by the clustering classifier 204 to classify the training samples 202, w is a set of separation parameters, x is the set of training samples 202 x , and b is the threshold.

The QOP as expressed in equation (la) is subject to constraints inequality as expressed in equation (lb) below where y_f G {-!, +!} and n is the number of dimensions of the QOP, i.e. the number of characteristics of the training samples 202 used for clustering the training samples 202.

The QOP of equation (la) subject to the constraint of equation (lb) may be reduced to a minimization problem of | |w| | subject to the constraint of equation (lb).

According to Kuhn-Tucker Theorem, the QOP of equation set 1 under the constrained of equation (lb) is equivalent to a dual problem of evaluation of the saddle point to the Lagrange function as expressed in equation 2 below.

Equation 2:

l ⁿ

L(w, b, a ) = - 1 1 w| 1² - α_έ}/₍ ((νν ¾ - b) → minmax

w,b

1= 1

Where a = { i}"₌₁ are Lagrange multipliers and L is a Lagrange function.

The QOP of equation 2 may be rewritten with respect to the Lagrange multipliers as a QPP as expressed in equation set 3 below.

Equatio

Where C is a predefined arbitrary value, for example, a constant value.

Training the clustering classifier 204 consists of evaluating a

by calculating a solution for the QPP of equation set 3 and evaluation of w and b. In other words, the SMO is executed in two aspects, one is to evaluate all training samples 202 for each pair of Lagrange multipliers and the second is to evaluate all pairs of the Lagrange multipliers. This is expressed in equation set 4 below

Equation Set 4:

n

( ) w = ^ iYiXi

1 = 1

(b) b = w Xi - y_it CLi > 0

As said before, the SMO process for calculating a solution to the QPP may be 20 regarded as an iterative process consisting of a plurality of optimization iterations for optimizing each pair {0₍, o_m} of the plurality of Lagrange multipliers a = over the training samples 202 _j of x. Each of the QPP sub-problems, each with respect to one of the pairs {a_h a_m} of Lagrange multipliers, may be solved analytically. The optimization iterations are repeated until one or more optimality conditions, for example, a KKT condition are satisfied (met) as expressed in equation set 5 below.

Equation Set 5:

The iterative SMO process for training the clustering classifier 204 is done by applying the SMO process to solve the QPP as expressed in the equation set 4 for searching the optimal separation plane parameter(s) of a separation plane separating between two or more clusters of the training samples 202. Where the optimal separation plane parameter(s) are calculated with respect to the optimality conditions as expressed in the equation set 5. When executed through the plurality of optimization iterations the SMO may be expressed as shown in equation set 6 below.

Equation Set 6:

Where k is a number of optimization iterations.

The iterative SMO processes of the equation set 6 may be further formulated as shown in equation 7 below.

Equation Set 7:

Where z = {w , b} is a searching vector for searching for the optimal separation parameter(s), A is a vector operator used by the SMO process and τ is an optimization factor used during the SMO sub-problems optimization. The searching vector directly depends on the type of the training samples 202 and on one or more characteristics of the training samples 202. For example, assuming the training samples are images, the characteristic(s) may be, for example, object(s) detected in the images, image(s) resolution, image size and/or the like. As another example, assuming the training samples are experimental data results, the characteristic(s) may be, for example, type of experiment(s), results range, time of experiment, location of experiment and/or the like.

Reference is made once again to FIG. 1. As shown at 102, the process 100 5 starts with the management module 220 designating the plurality of training data items (samples) 202 using for example, the I/O interface 212.

As shown at 104, the management module 220 selects a subset of training samples 202 for profiling the training samples to allow appropriate setting of the hybrid scheme for training the clustering classifier 204 according to the training 10 samples 202. The training samples subset is defined to be large enough to allow accurate profiling while small enough as not to require excessive computation resources, for example, computation time and/or computation load. The profiling may be done once to identify one or more parameters of the optimization process 100 and may be applicable for training the clustering classifier 204 for multiple types of data 15 sets such as the training samples 202.

The management module 220 profiles the training samples subset to determine a layers number (M) that indicates the number of layer employed during the hybrid scheme as described herein after. In addition, during the profiling process, the management module 220 calculates a minimal eigenvalue (A_miri) and/or maximum 20 eigenvalue {X_max) used by the vector operator A during the optimization process. The profiling made by the management module 220 is directed at defining the parameters of the vector operator A as well as the number of layer required for the optimization process 100 and may therefore be independent of the type of the data set(s), i.e. the training samples 202 clustered by the clustering classifier 204. 25

The management module 220 profiles the training samples subset using one or more iterative methods, for example, alternative-variable descent for calculating the layer number M and/or the vector operator eigenvalue _min and/or X_max as expressed in function 1 below.

Function 1 : 30 Find Nither (X_{min i} X_max , M) → min

For example, assuming M e {4, 6, 8}. First the management module 220 may set the number of layers to M = 4, X_min = 1.0010 and X_max = 0.7955. During each of one or more following steps (iterations) of the profiling iterative alternative-variable descent process the management module 220 may update the eigenvalues with a step value of -0.001 . The alternative-variable descent process decreases the step value near a local minimum point in descent order and evaluates the number of iterations that will be required during the process 100 with the calculated eigenvalues. The management 5 module 220 applies the same alternative-variable descent process for the other candidates of M, for example, 6 and 8. Eventually, the profiling may produce the following values:

m_in = 1.0010, X_max = 0.7955 and M = 6.

As shown at 106, the management module 220 sets an initial value for a set of 10 Lagrange multipliers used to solve the Lagrange function as expressed in the equation set 3 transformed to the equation set 7. The management module 220 may arbitrary select initial values for the Lagrange multipliers, for example, 1.

As shown at 108, training the clustering classifier 204 is done by applying the iterative optimization process as expressed in the equation set 7 for searching the 15 optimal separation plane parameter(s) of a separation plane separating between two or more clusters of the training samples 202.

The hybrid scheme employs M layers 1 10 each comprising a plurality of optimization iterations for calculating analytically an optimal solution for each of the Lagrange pairs {fy, a_m} over the training samples 202 Xi of x. As discussed before, 20 each layer 1 10 may be considered as comprising two iterative loops, the first includes calculating an optimal solution for a specific pair of the Lagrange multipliers over the entire training samples 202 and the second evaluating all pairs in the selected set of Lagrange multipliers.

As shown at 1 12, the management module 220 sets the optimization factor τ. 25 Initially (for the first layer 1 10) management module 220 may select the optimization factor to have a default value, for example, 1. During each successive layer 1 10, the optimization factor may be updated, i.e. improved to force a better optimization during the following layer 1 10. The optimization factor may be updated for each layer 1 10 according to a formula as presented in equation set 8 below. 30 Equation Set 8:

(a) T_k+m = r ^ jTT j , (m = 1, 2, 3, 4, ... , M)

K+m i + p₀ cos[(2m - 1) 7Γ/2 + ^■,max

- λ ^■.mm

+ λ ^■max

As shown at 1 14, the optimizer 222 analytically calculates optimal solutions for the optimization problem of the equation set 7. The optimizer 222 uses the updated optimization factor provided by the management module 220 for executing the optimization process. The optimizer 222 executes the plurality of optimization iterations for evaluating each pair of the pairs of Lagrange multipliers over all training 5 samples 202. The Lagrange multipliers may be updated during each iteration to identify the set of the Lagrange multipliers that produces the maximal minimal value for the Lagrange functions. The optimizer 222 may update the value of one or more of the Lagrange in order to reduce the maximal minimal value of the Lagrange function calculated during the current iteration compared to the maximal minimal value of the 10 Lagrange function calculated during a previous iteration. In case there is no decreasing of the Lagrange function minimal value on the current iteration, the solution (the set of Lagrange multipliers) from the previous step is chosen as a final solution.

Optionally, since the optimization paths for evaluating the pairs of the Lagrange multipliers are independent from each other, the optimization processes may 15 be executed concurrently (in parallel) by multiple optimizer 222 processes (instances) each executed by a respective one of the processing cores 214. Each of the optimizer 222 processes may be assigned with a respective pair of the pairs of Lagrange multipliers and evaluate the optimal solution for the optimization problem through one or more iterations over the training samples 202. 20

As shown at 1 16, the management module 220 evaluates the calculated optimal solutions provided by the optimizer 222 for all pairs of Lagrange multipliers over all training samples 202.

As shown at 1 18, which is a decision point, the management module 220 determines whether the calculated optimal solutions for the separation plane 25 parameter(s) satisfy one or more optimality conditions, for example, a KKT condition. In case the optimal solutions satisfy, i.e. meet and/or fulfill, the optimality condition(s), the process 100 branches to 120. In case the optimal solutions do not satisfy the optimality condition(s) the process 100 branches to 1 12 to execute an additional layer of the optimization process with the updated (improved) optimization factor. 30 As shown at 120, the management module 220 outputs the clustering classifier 204 through the I/O interface 210. The clustering classifier 204 may then be used for classifying new samples, i.e. data items of the same type as the training samples 202 to one or more clusters.

The following embodiment exemplifies the invention.

Two exemplary data files are shown in table 1 below.

Table 1 :

The convergence accuracy is a maximal difference between absolute values of the separation plane parameter(s) on the current optimization iteration compared to the previous optimization iteration.

A clustering classifier such as the clustering classifier 204 is trained using a standard SMO sequence compared to the hybrid scheme optimization process such as the process 100 implemented with 6 layers. Results are presented in table 2 below.

Table 2:

Where the improvement factor is calculated as the number of the iterations performed while using the standard SMO optimization process divided by the number of iterations performed using the hybrid scheme optimization process 100. As evident from the table 2 the hybrid scheme may present significant reduction in the number of optimization iterations of up to over 13 times faster for example 2. The hybrid scheme may thus achieve fast convergence and/or reduce the computations resources for training the clustering classifier 204.

According to some embodiments of the present invention there are provided 5 methods and systems for clustering data items using a trained clustering classifier such as the clustering classifier 204, for example, an SVM based clustering classifier.

The trained clustering classifier 204 trained through a process such as the process 100 may be used for clustering data items of the same type as training samples used to train the clustering classifier, 10

Reference is now made to FIG. 3, which is a flowchart of an exemplary process for clustering data items using a clustering classifier trained through a hybrid scheme, according to some embodiments of the present invention. A process 300 is used to classify a plurality of data items into two or more separate clusters based on one or more characteristics of the data items. The clustering classifier 204 used for the 15 process 300 is trained through a hybrid scheme training process such as the process 100. The clustering classifier 204 may be implemented through one or more software modules executed by a processor such as the processor(s) 212 from a storage such as the storage 216 in a system such as the system 200.

As shown at 302, the process 300 starts with designating a plurality of data 20 items for example, visual objects, audio objects, text objects, big-data data items, research collected data items, experiments results data items and/or the like. The designated data items are new data items not previously "seen" by the clustering classifier 204. The designated data items are of the same type as the training samples 202 used for training the clustering classifier 204 during the process 100. 25

As shown at 304, the clustering classifier 204 is applied to the plurality of data items to cluster the data items to two or more separate clusters. The clustering classifier 204 analyzes one or more characteristics of the data items and applies one or more plane separation parameters for clustering the data items to the clusters. The plane separation parameter(s) are learned during the hybrid scheme training process 30 100 in which the clustering classifier 204 is trained.

As shown at 306, the data items are arranged in the clusters identified by the clustering classifier 204. As shown at 308, the data items are output arranged in the clusters.

The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the 5 described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

It is expected that during the life of a patent maturing from this application 10 many relevant SVM based clustering algorithms will be developed and the scope of the term SVM clustering classifier is intended to include all such new technologies a priori.

As used herein the term "about" refers to ± 10 %.

The terms "comprises", "comprising", "includes", "including", "having" and 15 their conjugates mean "including but not limited to". This term encompasses the terms "consisting of and "consisting essentially of.

The phrase "consisting essentially of means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed 20 composition or method.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof. 25

The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.

The word "optionally" is used herein to mean "is provided in some 30 embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict. Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as 5 well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range. 10

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers 15 and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be 20 provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

All publications, patents and patent applications mentioned in this specification 25 are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the 30 extent that section headings are used, they should not be construed as necessarily limiting.

Claims

WHAT IS CLAIMED IS:

1. A system for training a classifier for clustering a plurality of data items, comprising;

a processor adapted to:

profile a subset of training samples selected from a plurality of training samples to determine a layers number of training layers and values of a vector operator used by an iterative optimization process for evaluating at least one separation plane parameter of a clustering classifier;

train the clustering classifier using a multi-layer technique comprising a layers number of training layers each comprising a plurality of optimization iterations, each optimization iteration comprising:

calculating a plurality of optimal solutions for the at least one separation plane parameter, each of the optimal solutions is calculated by applying the iterative optimization process to the plurality of training samples using a respective pair of a plurality of pairs of Lagrange multipliers,

evaluating the plurality of optimal solutions compared to at least one optimality condition, and

initiating a next optimization iteration in case the at least one optimality condition is not satisfied; and

output the clustering classifier for clustering new samples.

2. The system of claim 1, wherein the processor is further adapted to calculate the plurality of optimal solutions concurrently by a plurality of processing pipelines of the processor, each processing pipeline executing independently the iterative optimization process to calculate a respective one of the plurality of optimal solutions using the respective pair of Lagrange multipliers.

3. The system of any of the previous claims, wherein the clustering classifier is a supervised vector machine (SVM) clustering classifier.

4. The system of any of the previous claims, wherein the at least one separation plane parameter defines at least one separation plane separating between at least two clusters, each cluster comprising a respective portion of the training samples.

5. The system of any of the previous claims, wherein the iterative optimization process evaluates a quadratic programming problem, QPP, which is an equivalent formulation of a quadratic optimization problem, QOP, used for evaluating the at least one separation plane parameter, wherein the QPP equivalent formulation is expressed through the plurality of Lagrange multipliers.

6. The system of any of the previous claims, wherein the multi-layer technique is a Chebyshev multi-layer technique.

7. The system of any of the previous claims, wherein the iterative optimization process employs a sequential minimization optimization, SMO, process.

8. The system of any of the previous claims, wherein the values of the vector operator include a minimum eigenvalue and/or a maximum eigenvalue.

9. The system of any of the previous claims, wherein the subset of training samples is selected randomly from the plurality of training samples.

10. The system of any of the previous claims, wherein the profile is an iterative process in which an alternative-variable descent minimization process is applied to the subset during each of a plurality of profiling iterations until the minimum eigenvalue and/or a maximum eigenvalue are identified.

1 1. The system of any of the previous claims, wherein the at least one optimality condition is a arush-Kuhn-Tucker, KT, optimality condition.

12. The system of any of the previous claims, wherein for each of the plurality of training layers an improved optimization factor is applied to the iterative optimization process.

13. A computer implemented method of creating a classifier for clustering a plurality of data items, comprising:

profiling a subset of training samples selected from a plurality of training samples to determine a layers number of training layers and values of a vector operator used for an iterative optimization process for evaluating at least one separation plane parameter of a clustering classifier;

outputting the clustering classifier for classifying new samples.

14. A computer implemented method of clustering a plurality of data items using a trained clustering classifier, comprising:

designating a plurality of data items;

applying a clustering classifier for clustering the data items to at least two clusters by analyzing at least one characteristic of the data items with respect to at least one separation plane parameter learned during a training process; and

outputting the plurality of data items arranged in the at least two clusters;

15. The method of claim 14, wherein the multi-layer technique is a Chebyshev multi-layer technique.