WO2005122065A1

WO2005122065A1 - System and method for elimination of irrelevant and redundant features to improve cad performance

Info

Publication number: WO2005122065A1
Application number: PCT/US2005/019116
Authority: WO
Inventors: Murat Dundar
Original assignee: Siemens Medical Solutions Usa, Inc.
Priority date: 2004-06-02
Filing date: 2005-06-01
Publication date: 2005-12-22
Also published as: US20050281457A1

Abstract

A computer-implemented method for processing an image includes identifying a plurality of candidates for an object of interest in the image (201), extracting a feature set for each candidate, determining a reduced feature set by removing a least one redundant feature from the feature set to maximize a Rayleigh quotient (202), determining at least one candidate of the plurality of candidates as a positive candidate based on the reduced feature set (203), and displaying the positive candidate for analysis of the object (204).

Description

SYSTEM AND METHOD FOR ELIMINATION OF IRRELEVANT AND REDUNDANT FEATURES TO IMPROVE CAD PERFORMANCE

This application claims priority to U.S. Provisional Application Serial No. 60/576,115, filed on June 2, 2004, which is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

1. Technical Field: The present invention relates to image processing, and more particularly to system and method for feature selection in an object detection system.

2. Discussion of Related Art: Features of medical images are typically identified by several imaging technicians working independently. As a result, technicians often identify the same or similar features. These features may be redundant or irrelevant, which may in turn impact classifier performance. Therefore, a need exists for a system and method of eliminating redundant and irrelevant features from a feature set.

SUMMARY OF THE INVENTION According to an embodiment of the present disclosure, a computer- implemented method for processing an image includes identifying a plurality of candidates for an object of interest in the image, extracting a feature set for each candidate, determining a reduced feature set by removing a least one redundant feature from the feature set to maximize a Rayleigh quotient, determining at least one candidate of the plurality of candidates as a positive candidate based on the reduced feature set, and displaying the positive candidate for analysis of the object. Determining the reduced feature set comprises initializing a discriminant vector and a regularization parameter, and determining, iteratively, the reduced feature set. Determining, iteratively, the reduced feature set includes determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set, determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set, determining a transformation vector, updating the class

scatter matrix and means according to the transformation vector, and determining the discriminant vector. The method comprises comparing, at each iteration, each element of the discriminant vector to a threshold, and stopping the iterative determination of the reduced feature set upon determining that all elements are greater than the threshold. The threshold is a user defined variable for controlling a degree to which features are eliminated. The transformation vector and the discriminant vector can be determined as: ^r(S_w *(aa^T)) min _aaeRd (£ ((ro₊ - m_) * a = b . ^SΛ- cfe_t ≤ ≥O According to an embodiment of the present disclosure, a program storage device is provided readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing an image. The method includes identifying a plurality of candidates for an object of interest in the image, extracting a feature set for each candidate, determining a reduced feature set by removing a least one redundant feature from the feature set to maximize a Rayleigh quotient, determining at least one candidate of the

plurality of candidates as a positive candidate based on the reduced feature set, and displaying the positive candidate for analysis of the object. According to an embodiment of the present disclosure, a computer- implemented detection system comprises an object detection module determining a candidate object and a feature set for the candidate object, and a feature selection module coupled to the object detection module, wherein the feature selection module receives the feature set and generates a reduced feature set having a desirable value of a Rayleigh quotient, wherein the object detection modules implements the reduced feature set for detecting an object in an image. The feature selection module further includes an initialization module setting an initial value of a discriminant vector and a regularization parameter, a reduction module determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set, and a discriminant module determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set. The feature selection module further includes a sparsity module determining a transformation vector, and an update module updating the class scatter matrix and means according to the transformation vector, wherein the sparsity module determines the discriminant vector given the updated class scatter matrix and means.

BRIEF DESCRIPTION OF THE DRAWINGS Preferred embodiments of the present invention will be described below in more detail, with reference to the accompanying drawings: Figure 1 is a system according to an embodiment of the present disclosure; Figure 2 is a flow chart of a method according to an embodiment of the present disclosure; Figure 3 is a graph of testing error according to an embodiment of the present disclosure; Figure 4A is a graph of receiver operating characteristics (ROC) curves for training results according to an embodiment of the present disclosure; Figure 4B is a graph of receiver operating characteristics (ROC) curves for training results according to an embodiment of the present disclosure; Figure 5 is a flow chart of a method according to an embodiment of the present disclosure; and Figure 6 is a diagram of an object detection system according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS According to an embodiment of the present disclosure, irrelevant and redundant features are automatically eliminated from a feature set extracted from images, such as CT or MRI images. It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In one embodiment, the present invention may be implemented in software as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Referring to Figure 1 , according to an embodiment of the present disclosure, a computer system 101 for implementing a image processing method can comprise, inter alia, a central processing unit (CPU) 102, a memory 103 and an input/output (I/O) interface 104. The computer system 101 is generally coupled through the I/O interface 104 to a display 105 and various input devices 106 such as a mouse and keyboard. The support circuits can include circuits such as cache, power supplies, clock circuits, and a communications bus. The memory 103 can include random access memory (RAM), read only memory (ROM), disk drive, tape drive, etc., or a combination thereof. The present invention can be implemented as a routine 107 that is stored in memory 103 and executed by the CPU 102 to process the signal from the signal source 108. As such, the computer system 101 is a general purpose computer system that becomes a specific purpose computer system when executing the routine 107 of the present invention. The computer platform 101 also includes an operating system and micro instruction code. The various processes and functions described herein may either be part of the micro instruction code or part of the application program (or a combination thereof) which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device. It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings of the present invention provided herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention. Referring to Figure 2, a Computer-Aided Detection (CAD) system automatically identifies candidates for an object of interest in an image 201 given known characteristics such as the shape of an abnormality, e.g., a polyp, extract features for each candidate 202, wherein a determined feature set is reduced (e.g., see Figure 5), labels candidates as positive or negative 203, and displays positive candidates to a radiologist for diagnosis 204. The labeling or classification is performed by a classifier that has been trained offline from a training dataset and then frozen for use in the CAD system. The training dataset is a database of images where candidates have been labeled by an expert. The ability to generalize is important to the CAD system and thus the classifier. The classifier needs to correctly labels new datasets. Because a large number of different classifiers can be built from the training data using classification methods, each with adjustable parameters, the choice of the classifier is important. Classification performance is determined by a classification methods used and an inherent class information available in the features provided. The classification methods determine the best achievable separation between classes by exploiting the potential information available within the feature set. In real-world settings the number of features available can be more than needed. It is expected that a large number of features would provide more discriminating power. With a limited number of training examples in a high dimensional feature space two classes can be separated in many ways. However, for generalization ability, few separations will generalize well on the new datasets. Thus, feature selection is important. According to an embodiment of the present disclosure, an automatic feature selection method is built into Fisher's Linear Discriminant (FLD). The method identifies a feature subset by iteratively maximizing a ratio between and within class scatter matrices with respect to the discriminant coefficients and feature weights, respectively (see Figure 5). The FLD arises in a special case when classes have a common covariance matrix. FLD is a classification method that projects the high dimensional data onto a line for a binary classification problem and performs classification in this one dimensional space. This projection is chosen such that the ratio of between and within class scatter matrices or the Rayleigh quotient is maximized. Let X_{ e R^dxl be a matrix containing the I training data points on d-dimensional space and /. the number of labeled samples for class ω_t, i e {+} . FLD is the projection , which maximizes, τ ( _ ^Q^SB^G

where

Sg = (m₊ — m_) ( ₊ — m_)

^Sw - τ (^{Xi ~ mie}h) (^{Xi ~ m}*^eD

are the between and within class scatter matrices respectively and

is the mean of class ^ωι and ^eu is an dimensional vector of ones. Transforming the above problem into a convex quadratic programming problem provides algorithmic advantages. For example, notice that if is a solution to Eq.(1), then so is any scalar multiple of it. Therefore, to avoid multiplicity of solutions, a constraint ά^rS_B =b² is imposed, which is equivalent to ά^r(?τι₊ -m_) = b where b is some arbitrary positive scalar. Then the optimization problem of Eq.(1) becomes,

Problem 1 : mm_aGRd ^τSiγ s.t. ^τ (ιn+ — m_) — b For binary classification problems the solution of this problem is _α* ₌ — ^bs_w' n₊-^m ₇_⁾ — ^_Qtø _t^_{at eac}^ _e|_emen|- ₀f ^_Q discriminant vector is a (m₊-m_ ) S_w (m₊-m_ ) weighted sum of the difference between class mean vectors where the weighting coefficients are rows of ^ . According to this expansion since S^¹ is positive definite unless the difference of the class means along a given feature is zero all features contributes to the final discriminant. If a given feature in the training set is redundant, its contribution to the final discriminant would be artificial and not desirable. As a linear classifier FLD is well suited to handle features of this sort provided that they do not dominate the feature set, that is, the ratio of redundant to relevant features is not significant. Although the contribution of a single redundant feature to the final discriminant would be negligible when several of these features are available at the same time, the overall impact could be quite significant leading to poor prediction accuracy. Apart from this impact, in the context of FLD these undesirable features also pose numerical constraints on the computation of S^¹ especially when the number of training samples is limited. Indeed, when the number of features, d is higher than the number of training samples, I, S_w becomes ill-conditioned and its inverse does not exist. Hence eliminating the irrelevant and redundant features may provide a two-fold boost on the performance. According to an embodiment of the present disclosure, a sparse formulation of FLD incorporating a regularization constraint on the FLD. A system and method eliminate those features determined to have limited impact on the objective function. Sparse Fisher Discriminant Analysis: Blindly fitting classifiers without appropriate regularization conditions yields over-fitted models. Methods for controlling model complexity are needed in modern data analysis. In particular, when the number of features available is large, an appropriate regularization can dramatically reduce the dimensionality and produces better generalization performance that is supported by learning theory. For linear models of the form «^Γ as considered here, well-established regularization conditions include the 2-norm penalty and 1-norm penalty on the weight vector a. A regularized model fitting problem can be written as: / =min(error( )+ LP( )). (2) where λ is called the regularization parameter. According to an embodiment of the present disclosure, a 1-norm penalty P(f) =

has been implemented in a sparse FLD formulation, which generates sparser feature subsets than 2-norm penalty. The regularized model fitting formulation of Eq.(2) has an equivalent formulation as =min(errør(jO, subject to : P(f) ≤ γ). (3) where the parameter γ plays a similar role to the regularization parameter λ in Eq.(2) to trade off between the training error and the penalty term. If a is required to be non-negative, the 1-norm of can be determined as e_t. Optimization Problem 2 may be obtained. With new constraints Problem 1 can be updated as follows,

Problem 2 :

The feasible set associated with Problem 1 is denoted by

Ω_j = { e R_d, ^r(m₊ -m_) = b] band that associated with Problem 2 by

Ω_l = { ≡ R_d,ά^r(m₊ -m_) = b,ά^re_l ≤ γ, >0} , and observe that Ω₂ c Ω,.

< χ

^{are defined wherΘ} i = {l-,d} . The set

Ω₂ is empty whenever <5_max <0 or δ_min > γ. In addition to the feasibility constraints γ< <5_max should hold to achieve a sparse solution. According to an embodiment of the present disclosure, a linear transformation will ensure _na > ° ^anc' standardize the sparsity constraint. For simplicity and without loss of generality S_w is assumed to be a diagonal matrix with elements λ_i,i = l,...,d where λ_{ are the eigenvalues of S_w.

Under this scenario a solution to Problem 1 is * = b [^m÷~m- ^}' , ... , ^{m*^~ζ- ^)d J where b = ^έ — . A linear transformation is defined as D = diag(a ...,d_d) = bdiag^^~™-^h ,...,⁽"ⁱ^^0d ) such that x \→ Dx where diag indicates a diagonal matrix. With this transformation, Problem 2 takes the following form

Problem- 3 : irι_αeβd s.t.

< _ = max, _ ,₂ and < _n = min, - ^bλi ,₂ are defined where i = {l,...,d} . Note that ^ and δ_T[aκ are nonnegative and hence both feasibility constraints are satisfied when δ_nύB > γ. For γ> d the globally optimum solution a* to Problem 3 is * = [l,...,lf , i.e., nonsparse solution. For γ< d sparse solutions can be obtained. Unlike Problem 2 where the upper bound on γ depends on mean vectors, here the upper bound is d, i.e., the number of features. The sparse formulation is a biconvex programming problem.

Problem 4 : mm_α._{; a£R}d a^τ (5^ * (a.cF^ a s.t. ^τ (( ₊ — m_) * a) = b ^{a e}ι -^ 7» ^a -≥ 0 An initialization, or=[l,...,l]^r, is performed, and a* is solved for, e.g., a solution to Problem 1. α* is fixed and α* is solved for, e.g., a solution to Problem 3. The Iterative Feature Selection Method: Referring to Figure 5, successive feature elimination can be obtained by iteratively solving the above biconvex programming problem. (501) Set the discriminant vector to all ones, the regularization parameter to d such that is much less than d; ⁰ = e_n,d° = d, γ« d For each iteration i do the following: (502) Select the d^l features with a values greater than ε,d'^' ≤ d e.g., select the features with the corresponding element of the discriminant vector greater than ε. (503) Determine the class scatter matrices and means in the d^l - dimensional (reduced) feature space. (504) Solve Problem 4 to obtain a¹, the transformation vector. (505) Using the newly obtained transformation vector, fix a to a¹ and update the class scatter matrices and means. (506) Solve Problem 4 to obtain ', the discriminant. (507) Stop when all

e.g., stop if none of the elements of the discriminant vector is less than ε . ε is a threshold for controlling how aggressive feature elimination is performed, ε may be user selected. Since at each iteration a is truncated the above method is not guaranteed to converge. However, at any iteration i when d^l <

sparseness would be achieved and hence all a^ would be equal to one. Therefore the algorithm stops when d^l < γ, at the latest.

Experimental Results: A Toy Example; this experiment is adapted from Weston et al., Feature Selection for SVMs, Advances in Neural Information Processing Systems, 13 pp. 668-674. Using an artificial data it has been demonstrated that the performance of conventional FLD suffers from the presence of too many irrelevant features whereas the proposed sparse approach produces a better prediction accuracy by successfully handling these irrelevant features. The probability of y =1 or y =-1 is equal. The first three features x x₂,x₃ are drawn as x_t = yN(i,5) . Note that only one of these features is relevant for discriminating one class from the other, the other two are redundant. The rest of the features are drawn as x. = N(0,20). Note that these features are noise. The noise features are added to the feature set one by one allowing us to observe the gradual change in the prediction capability of both approaches. The method is initialized as d = 3, e.g., start with the first three features and proceed as follows. Samples are generated for training (e.g., 200) and samples are generated for testing (e.g., 1000). Both approaches are trained and tested. The corresponding prediction errors are recorded, d is increased by one and repeat the above procedure until we reach d = 20. For the proposed approach we select the best two features. The error bars in Figure 3 are obtained by repeating the above process 100 times for each d each time using a different training and testing set. Figure 3 illustrates testing error vs. / for artificial data. Full dimensionality and two-dimensional feature subset compared: Curve 301 corresponds to FLD and curve 302 corresponds to a sparse method according to an embodiment of the present disclosure. Looking at the results, at d = 3 with two redundant features the prediction accuracy of the conventional FLD is decent. With the same two redundant features at d= 3 the standard deviation in prediction error is smaller under a method according to an embodiment of the present disclosure indicating the elimination of one or both of the redundant features. As d gets larger and noise features are added to the feature set the performance of the conventional FLD deteriorates significantly whereas the average prediction error for the proposed formulation remains around its initial level with some increase in the standard deviation. Also 90% of the time a method according to an embodiment of the present disclosure selects feature two and three together. These are the two most powerful features in the set. Example 2: Colon Cancer; Data Sources and Domain Description; Colorectal cancer is the third most common cancer in both men and women. It is estimated that in 2004, nearly 147,000 cases of colon and rectal cancer will be diagnosed in the US, and more than 56,730 people would die from colon cancer. While there is wide consensus that screening patients is effective in decreasing advanced disease, only 44% of the eligible population undergoes any colorectal cancer screening. There are many factors for this, Multiple reasons have been identified for non-compliance, key being: patient comfort, bowel preparation and cost. Non-invasive virtual colonoscopy derived from computer tomographic (CT) images of the colon holds great promise as a screening method for colorectal cancer, particularly if CAD tools are developed to facilitate the efficiency of radiologists' efforts in detecting lesions. In over 90% of the cases colon cancer progressed rapidly is from local (polyp adenomas) to advanced stages (colorectal cancer), which has very poor survival rates. However, identifying (and removing) lesions (polyp) when still in a local stage of the disease, has very high survival rates, thus illustrating the critical need for early diagnosis. The database of high-resolution CT images used in this study were obtained from NYU Medical Center, Cleveland Clinic Foundation, and two EU sites in Vienna and Belgium. The 163 patients were randomly partitioned into two groups: training (n=96) and test (n=67). The test group was sequestered and only used to evaluate the performance of the final system.

Training Data Patient and Polyp Info: There were 96 patients with 187 volumes. A total of 76 polyps were identified in this set with a total number of 9830 candidates. Testing Data Patient and Polyp Info: There were 67 patients with 133 volumes. A total of 53 polyps were identified in this set with a total number of 6616 candidates. A combined total of 207 features are extracted for each candidate by three imaging scientists. Feature Selection and Classification: In this experiment three feature selection methods where considered in a wrapper framework and compare their prediction performance on the Colon Dataset. These techniques are namely, the sparse formulation proposed in this study (SFLD), the sparse formulation for Kernel Fisher Discriminant with linear loss and linear regularizer (SKFD) and a greedy sequential forward-backward feature selection algorithm implemented with FLD (GFLD). Sparse Fisher Linear Discriminant (SFLD): The choice of plays an important role on the generalization performance of a method according to an embodiment of the present disclosure. It regularizes the FLD by seeking a balance between the "goodness of fit", e.g., Rayleigh

Quotient and the number of features used to achieve this performance. The value of this parameter is estimated by cross validation. Leave- One-Patient-Out (LOPO) cross validation may be implemented. In this scheme, both views are left out, e.g., the supine and the prone views, of one patient from the training data. The classifier is trained using the patients from the remaining set, and tested on both views of the "left-out" patient. LOPO is superior to other cross-validation metrics such as leave-one-volume-out, leave-one-polyp-out or k-fold cross-validation because it simulates the actual use, wherein the CAD system processes both volumes for a new patient. For instance, with any of the above alternative methods, if a polyp is visible in both views, the corresponding candidates could be assigned to different folds; thus a classifier may be trained and tested on the same polyp (albeit in different views). To find the optimum value of γ, a method is run for varying sizes of γ≡ [Id]. For each value of the Receiver Operating Characteristics (ROC) curve is obtained by evaluating the Leave One Patient Out (LOPO) Cross Validation performance of a sparse FLD method and

determining the area under this curve. The optimum value of γ is chosen as the value that results in the largest area. Kernel Fisher Discriminant with linear loss and linear regularizer (SKFD): In this approach there is a set of constraints for every data point on the training set which leads to large optimization problems. To alleviate the computational burden on mathematical programming formulation for this approach Laplacian models may be implemented for both the loss function and the regularizer. This choice leads to linear programming formulation instead of the quadratic programming formulation that is obtained when a Gaussian model is assumed for both the loss function and the regularizer. The linear programming formulation used is written as:

where e± is vector of ones of size the number of points in class ±. The final classifier for an unseen data point x is given by sign ( x-β). The regularization parameter is estimated by

LOPO. Greedy sequential forward-backward feature selection algorithm with FLD (GFLD): This approach starts with an empty subset and performs a forward selection succeeded by a backward attempt to eliminate a feature from the subset. During each iteration of the forward selection exactly one feature is added to the feature subset. To determine which feature to add, the algorithm tentatively adds to the candidate feature subset one feature that is not already selected and tests the LOPO performance of a classifier built on the tentative feature subset. The feature that results in the largest area under the ROC curve is added to the feature subset. During each iteration of the backward elimination the algorithm attempts to eliminate the feature that results in the largest ROC area gain. This process goes on until no or negligible improvement is gained. In this study the algorithm stops when the increase on the ROC area after a forward selection is less than 0.005. A total of 17 features is selected before this constraint is met. SKFD was run on a subset of the training dataset where all the positive candidates and a random subset of size 1000 of the negative candidates where included. The 5 algorithms run included: 1. SFLD on the original training set. 2. GFLD on the original training set. 3. Conventional on the original training set. 4. SKFD on the subset training set. 5. SFLK on the subset training set (denoted as SFLDsub).

Table 1 : The number of features selected (d), the area of the ROC curve scaled by 100 (Area) and the sensitivity corresponding to 90% specificity (Sens) is shown for all algorithms considered in this study. The values in parenthesis show the corresponding values for the testing results. Algorithm d Area Sens (%) SFLD 25 94.8 (94.9) 89 (87) SFLD-sub 17 94.7 (94.1) 92 (85) GFLD 17 94.3 (94.7) 85 (83) SKFD 18 88.0 (82.0) 65 (60) FLD 207 80.3 (89.1) 63 (77) TABLE 1 The ROC curves in Figure 3 demonstrates the LOPO performance of the each method and those in Figure 4 show the performance on the test data set. Table 1 shows the number of features selected (d), the area of the ROC curve scaled by 100 (Area) and the sensitivity corresponding to 90% specificity (Sens) for all algorithms considered in this study. These results show that Sparse (SFLD) and SFLDsub outperform the greedy and conventional FLD and SKFD both on the training and testing datasets. Although SFLD-sub performs better than SFLD on the training data, SFLD generalizes slightly better on the testing data. This is not surprising because SFLD-sub uses a subset of the original training data. GFLD performs almost equally well with SFLDsub and SFLD methods but the difference is hidden in the computational cost needed to select the features in GFLD. The computational cost of GFLD is proportional to d³ whereas that of SFLD is proportional to d². According to an embodiment of the present disclosure, a method for sparse formulation of the Fisher Linear Discriminant is applied to medical images. The method is applicable to other images. Experimental results favor the proposed algorithm over two other feature selection/regularization techniques implemented in the FLD framework both in terms of prediction accuracy and the computational cost fir large data sets. Referring to Figure 6, a computer-implemented detection system includes an object detection module determining a candidate object and a feature set for the candidate object 601. The system includes a feature selection module 602 coupled to the object detection module 601 , wherein the feature selection module 602 receives the feature set and generates a reduced feature set having a desirable value of a Rayleigh quotient, wherein the object detection modules 601 implements the reduced feature set for detecting an object in an image. A feature selection module includes an initialization module 603 setting an initial value of a discriminant vector and a regularization parameter, a reduction module 604 determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set, a discriminant module 605 determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set, a sparsity module 606 determining a transformation vector, and an update module 607 updating the class scatter matrix and means according to the transformation vector, wherein the sparsity module 606 determines the discriminant vector given the updated class scatter matrix and means. Having described embodiments for a system and method for feature selection in an object detection system, it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments of the invention disclosed which are within the scope and spirit of the invention as defined by the appended claims. Having thus described the invention with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A computer-implemented method for processing an image comprising: identifying a plurality of candidates for an object of interest in the image; extracting a feature set for each candidate; determining a reduced feature set by removing a least one redundant feature from the feature set to maximize a Rayleigh quotient; determining at least one candidate of the plurality of candidates as a positive candidate based on the reduced feature set; and displaying the positive candidate for analysis of the object.

2. The computer-implemented method of claim 1 , wherein determining the reduced feature set comprises: initializing a discriminant vector and a regularization parameter; and determining, iteratively, the reduced feature set.

3. The computer-implemented method of claim 2, wherein determining, iteratively, the reduced feature set comprises: determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set; determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set; determining a transformation vector; updating the class scatter matrix and means according to the transformation vector; and determining the discriminant vector.

4. The computer-implemented method of claim 2, further comprising: comparing, at each iteration, each element of the discriminant vector to a threshold; and stopping the iterative determination of the reduced feature set upon determining that all elements are greater than the threshold.

5. The computer-implemented method of claim 4, wherein the threshold is a user defined variable for controlling a degree to which features are eliminated.

6. The computer-implemented method of claim 2, wherein the transformation vector and the discriminant vector can be determined as: ^r(S_w * (aa^T))a ^min _cc_azR" ^ ^^m+ ^~ m_) * a = b . ^SΛ- tfe^ xa≥ O

7. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing an image, the method steps comprising: identifying a plurality of candidates for an object of interest in the image; extracting a feature set for each candidate; determining a reduced feature set by removing a least one redundant feature from the feature set to maximize a Rayleigh quotient; determining at least one candidate of the plurality of candidates as a positive candidate based on the reduced feature set; and displaying the positive candidate for analysis of the object.

8. The method of claim 7, wherein determining the reduced feature set comprises: initializing a discriminant vector and a regularization parameter; and determining, iteratively, the reduced feature set.

9. The method of claim 8, wherein determining, iteratively, the reduced feature set comprises: determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set; determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set; determining a transformation vector; updating the class scatter matrix and means according to the transformation vector; and determining the discriminant vector.

10. The method of claim 8, further comprising: comparing, at each iteration, each element of the discriminant vector to a threshold; and stopping the iterative determination of the reduced feature set upon determining that all elements are greater than the threshold.

11. The method of claim 10, wherein the threshold is a user defined variable for controlling a degree to which features are eliminated.

12. The method of claim 8, wherein the transformation vector and the discriminant vector can be determined as: ά^r(S_w * (aa^τ))a min _aaeRά ((m₊ -m_)* a = b . ^*'^■ ά^re_l ≤ γ_ta≥ 0

13. A computer-implemented detection system comprising: an object detection module determining a candidate object and a feature set for the candidate object; and a feature selection module coupled to the object detection module, wherein the feature selection module receives the feature set and generates a reduced feature set having a desirable value of a Rayleigh quotient, wherein the object detection modules implements the reduced feature set for detecting an object in an image.

14. The computer-implemented detection system of claim 13, wherein the feature selection module further comprises: an initialization module setting an initial value of a discriminant vector and a regularization parameter; a reduction module determining the reduced feature set according to the discriminant vector, wherein features of the feature set with an element of the discriminant vector greater than a threshold are selected as the reduced feature set; a discriminant module determining a class scatter matrix and mean in a reduced dimensional space defined by the reduced feature set; a sparsity module determining a transformation vector; and an update module updating the class scatter matrix and means according to the transformation vector, wherein the sparsity module determines the discriminant vector given the updated class scatter matrix and means.