CN107636678B

CN107636678B - Method and apparatus for predicting attributes of image samples

Info

Publication number: CN107636678B
Application number: CN201580080731.4A
Authority: CN
Inventors: 汤晓鸥; 黄琛; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2015-06-29
Filing date: 2015-06-29
Publication date: 2021-12-14
Anticipated expiration: 2035-06-29
Also published as: CN107636678A; WO2017000118A1

Abstract

The present application relates to methods and systems for predicting attributes of image samples. The method for predicting attributes of image samples includes: obtaining a plurality of image subsets from a training set comprising a plurality of training image samples; gradually splitting each subset of images to generate a decision forest for prediction; determining paths of nodes of the test image sample in the decision forest; merging training image samples at all leaf nodes in each determined path; clustering all the merged training image samples to obtain overlapping clusters, each merged training image sample being clustered into at least one of the overlapping clusters; and predicting attributes of the test image sample from the overlapping clusters.

Description

Method and apparatus for predicting attributes of image samples

Technical Field

The present application relates to machine learning, and in particular to a method and apparatus for predicting attributes of image samples.

Background

Data imbalance exists in many visual tasks, and high-level face age estimation and head pose estimation are detected from low-level edges. In the widely used FG-NET and MORPH databases, young images are often much more numerous than older images, the human head rarely exhibits extreme poses, and the various image edge image data structures obey a power-law distribution on the BSDS500 database.

Without dealing with this imbalance problem, conventional vision algorithms tend to have a strong learning bias for the majority classes and poor prediction accuracy for the minority classes, and the interestingness is usually equal or greater (e.g., few edges may convey the most important semantic information about natural images). The lack of learning for a few classes is due to a complete lack of representation due to a limited number of examples or even no examples, especially in the presence of small data sets. For example, FG-NET and Pointing' 04 head pose datasets have only 1002 and 2790 images total (8 images with 60+ age and 60 images with 90 ° pitch angle), respectively; and FG-NET has no images for some age classes above 60. This may present a greater challenge in inferring invisible (unseen) data from a few classes of samples that typically have high variability. Worse yet, small unbalanced data sets may be accompanied by class overlap issues, which further exacerbate the difficulty of learning.

In the field of machine learning, there are three common approaches to deal with the negative effects of data imbalances: resampling, cost-sensitive learning (cost-sensitive learning), and ensemble learning (ensemble learning). The goal of the resampling method is to equalize the classes a priori by undersampling the majority class or oversampling the minority class (or both), but it will be easy to exclude valuable information or introduce noise. By adjusting the misclassification cost (misclassification cost) associated with the samples, the cost-sensitive learning (cost-sensitive learning) method is generally considered to work better than the random resampling method, however, the true cost (true cost) is often unknown. One effective technique to be further improved is to resort to ensemble learning (even without any a priori). Chen et al combines bagging with cost sensitive decision trees to generate a weighted version of a random forest, which, to our knowledge, is only based on an unbalanced learning method of the random forest. They use class weights to balance the Gini criterion during node splitting and aggregation at leaf nodes.

The above methods have two common disadvantages: 1) they are designed for classification or regression, but there is no general solution for both. 2) They have limited interpretation ability with respect to unseen appearance and synthesis of new tags on the observed data space. This is more severe in the case of a combination of imbalance and small number of samples, where a few classes are represented too little due to an excessively reduced (or even no) number of samples/labels. Herein, the problems of data imbalance and invisible data inference in both classification and regression cases are addressed.

Disclosure of Invention

One aspect of the present application discloses a method for predicting attributes of image samples. The method for predicting attributes of image samples may comprise: obtaining a plurality of image subsets from a training set comprising a plurality of training image samples; gradually splitting (splitting) each of these image subsets to generate a decision forest for prediction; determining paths of nodes of the test image sample in the decision forest; merging training image samples at all leaf nodes in each determined path; clustering all of the merged training image samples to obtain overlapping clusters, each of the merged training image samples being clustered into at least one of the overlapping clusters; and predicting attributes of the test image sample from the overlapping clusters.

According to embodiments of the present application, splitting may comprise: clustering training image samples into different classes at each node of a decision forest; assigning weights to the clustered classes, wherein greater weights are assigned to classes with fewer training image samples and lesser weights are assigned to classes with more training image samples; and splitting the training image samples based on the assigned weights.

According to embodiments of the present application, the decision forest may have a depth such that all training image samples in each class have the same attributes.

According to embodiments of the present application, the information gain of a decision forest may be below a first fixed threshold.

According to an embodiment of the present application, the training image sample size at a leaf node of the decision forest may have a size below a second fixed threshold.

According to embodiments of the present application, splitting may comprise: the training image samples are split by a cost sensitive linear support vector machine for classification.

According to embodiments of the present application, splitting may comprise: the training image samples are split by cost sensitive linear support vector regression for regression.

According to an embodiment of the present application, clustering may include: calculating the distance between the offset points between two samples in the combined training image samples; and assigning each of the merged training image samples to at least one cluster based on the inter-bias-point distance to obtain an overlapping cluster, wherein the inter-bias-point distance is the euclidean distance of two of the merged training image samples multiplied by a factor equal to or greater than 1 if the two of the merged training image samples have the same attribute, and otherwise the inter-bias-point distance is the euclidean distance multiplied by a factor less than 1.

According to an embodiment of the present application, the predicting may include: finding clusters of the overlapping clusters that approximate the test image sample; calculating a coefficient estimate for the test image sample from the found cluster; updating the coefficient estimate via a nearest neighbor approximation; the updated coefficient estimates are used to predict properties of the test image sample.

Another aspect of the application discloses a system for predicting attributes of image samples. The system for predicting attributes of image samples may comprise: splitting means for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples and progressively splitting each of these subsets to generate a decision forest for prediction; a determining device electrically connected with the splitting device and used for determining the path of the node of the test image sample in the decision forest; clustering means electrically connected to the determining means and adapted to combine the training samples at all leaf nodes in each determined path and to locally cluster all combined training samples to obtain overlapping clusters, each of the overlapping clusters having at least two attributes; and a prediction means electrically connected to the clustering means and for predicting the property of the test sample from the overlapping clusters.

According to embodiments of the present application, the cleaving apparatus may further comprise: a clustering unit for clustering the training image samples into different classes at each node of the decision forest; a first assigning unit electrically connected to the clustering unit and configured to assign weights to the clusters-processed classes, wherein a greater weight is assigned to the class having fewer training image samples and a smaller weight is assigned to the class having more training image samples; and a splitting unit electrically connected with the assigning unit and configured to split the training image samples based on the assigned weights.

According to an embodiment of the application, the splitting unit may be a cost sensitive linear support vector machine for classification.

According to embodiments of the present application, the splitting unit may be a cost sensitive linear support vector regression for regression.

According to an embodiment of the present application, the clustering device may further include: a calculating unit for calculating an inter-bias point distance between two of the merged training image samples; and a second assigning unit electrically connected to the calculating unit and configured to assign each of the merged training image samples to at least one cluster based on the inter-offset-point distance to obtain an overlapped cluster, wherein if two of the merged training image samples have the same attribute, the calculating unit may calculate the inter-offset-point distance by calculating a euclidean distance of the two of the merged training image samples multiplied by a factor equal to or greater than 1, and otherwise calculate the inter-offset-point distance by calculating the euclidean distance multiplied by a factor less than 1.

According to an embodiment of the present application, the prediction apparatus may further include: a finding unit for finding a cluster of the overlapping clusters that approximates the test image sample; an estimating unit electrically connected to the finding unit and configured to calculate a coefficient estimation value of the test image sample from the found cluster; an updating unit electrically connected to the estimating unit and configured to update the coefficient estimation value via a nearest neighbor approximation; and a prediction unit electrically connected to the update unit and configured to predict an attribute of the test image sample using the updated coefficient estimation value.

Another aspect of the application relates to a system for predicting attributes of image samples. The system may include: a memory that may store executable components; and a processor electrically coupled to the memory that can execute components to perform operations of the system, wherein the executable components can include: a splitting component for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples and progressively splitting each subset to generate a decision forest for prediction; determining means for determining paths of nodes of the test image sample in the decision forest; clustering means for merging training samples at all leaf nodes in each determined path and locally clustering all merged training samples to obtain overlapping clusters; and a prediction component for predicting an attribute of the test sample from the overlapping cluster.

According to embodiments of the present application, the splitting component may further comprise: a clustering subcomponent for clustering the training image samples into distinct classes at each node of the decision forest; a first assigning sub-component for assigning weights to the cluster-processed classes, wherein a greater weight is assigned to the class with fewer training image samples and a lesser weight is assigned to the class with more training image samples; and a splitting subcomponent for splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, the clustering means may further include: a calculating subcomponent for calculating an offset point-to-point distance between two of the merged training image samples; and a second assigning subcomponent for assigning one of the merged training image samples to at least one cluster based on the inter-bias-point distance to obtain an overlapping cluster, wherein the calculating subcomponent can calculate the inter-bias-point distance by calculating the euclidean distance of two of the merged training image samples multiplied by a factor equal to or greater than 1 if the two of the merged training image samples have the same attribute, and otherwise by calculating the euclidean distance multiplied by a factor less than 1.

According to an embodiment of the application, the prediction component may further comprise: a finding subcomponent for finding a cluster in the overlapping clusters that approximates the test image sample; an estimation subcomponent for calculating coefficient estimates for the test image samples from the found clusters; an update subcomponent for updating the coefficient estimates via a near-neighbor like approximation; and a prediction subcomponent for predicting an attribute of the test image sample using the updated coefficient estimate.

The present application combines ensemble learning with cost sensitive learning in a natural way without resampling, thereby avoiding information loss and adding noise.

Drawings

Illustrative, non-limiting embodiments of the invention are described below with reference to the accompanying drawings. The figures are illustrative and are generally not drawn to exact scale. The same reference numbers will be used throughout the drawings to refer to the same or like elements.

Fig. 1 illustrates a method for predicting attributes of image samples according to an embodiment of the present application.

Fig. 2 illustrates sub-steps of generating a decision forest according to an embodiment of the present application.

Fig. 3 illustrates sub-steps of obtaining overlapping clusters according to an embodiment of the present application.

Fig. 4 illustrates a system for predicting attributes of image samples according to an embodiment of the present application.

Figure 5 illustrates a schematic block diagram of a cleaving apparatus according to an embodiment of the present application.

Fig. 6 illustrates a schematic block diagram of a clustering apparatus according to an embodiment of the present application.

Fig. 7 illustrates a schematic block diagram of a prediction apparatus according to an embodiment of the present application.

Fig. 8 illustrates a system for predicting attributes of image samples according to an embodiment of the present application.

Detailed Description

Hereinafter, embodiments of the present application will be described in detail with reference to the detailed description and the accompanying drawings.

Will refer to the training set

Various embodiments of the present application are described, wherein x_i∈R^DIs a sample s_iA feature vector of, and y_iIs a sample s_iThe label of (1). The present application aims at unbiased prediction of sample features x (even in the presence of severe imbalance and small data sets). The label y ∈ C refers to class coefficients for classification (e.g., edge classes) and numerical values for regression (e.g., age and pose angle). To identify the correct decision regions for the majority classes and the more important decision regions for the minority classes, the present application relies on efficient and robust random decision forests. A random decision forest is an ensemble of decision trees learned from a plurality of random subsets of data. Each tree recursively divides the input space into unconnected partitions, generating candidate decision regions in a coarse-to-fine manner.

Fig. 1 illustrates a method 1000 for predicting attributes of an image sample according to an embodiment of the present application.

In step S100, a training set is received

And multiple image subsets are obtained from the training set by, for example, sampling.

Then, in step S200, each of the image subsets is progressively split to generate a decision tree. The generated decision tree constitutes a decision forest for predicting attributes of the test image sample.

Step S200 will now be described in detail with reference to fig. 2.

As shown in fig. 2Shown therein, in step S210, training samples S at node j are formed, for example, by employing the well-known K-means technique_jClustering into two classes

For classification, training sample S at node j_jAre clustered into two classes to be split into left or right nodes. For a multi-class case (e.g., a 10-class case), 10 classes are classified into one part including 5 similar classes and another part including the other 5 classes, and then the two parts are gradually split. Then, in step S220, to prevent the bias toward the majority class (majority class), a weight is assigned to the class subjected to the clustering process. In this application, weights are defined as a function of cluster distribution. For example, the weight may be associated with a factor f (p)_k)＝(1-p_k)/p_kAre associated with wherein

Obviously, f (p)_k) The minority class is given more weight without losing overall performance. Then, in step S230, for the local sample S_jFor node j, S_jCan be cost-sensitively split into

And

in particular, cost sensitive splitting may employ a factor f (p)_k). When the maximum depth or local sample size | S is reached_jWhen | falls below the second fixed threshold, step S230 stops. For classification, step S230 may also stop if the information gain described in equation (1) falls below the first fixed threshold. The information gain is defined as:

where H represents class entropy. To enter intoLine regression, the information gain may be replaced with a label variance defined as: h(s) ═ Σ_y(y-μ)²V S | where μ ═ Σ_yy/| S |. Thus, a decision forest is obtained.

For example, to perform the classification, the splitting function used in step S230 may be determined from a cost-sensitive version of the linear SVM:

where w is the weight vector, C is the regularization parameter, and if

Then z is_i1, otherwise z_iIs-1. Finally, pass sgn (w)^Tx_i) Sending each training sample to

Or

To perform regression, the splitting function used in step S230 may be determined from a linear SVR that is sensitive to cost:

wherein epsilon is more than or equal to 0. Node by predicting value w^Tx_iLocal mean of } and label

And branches to the left or right in comparison.

Then, in step S300, a test image sample is input to each decision tree of the decision forest generated in step S200. According to the splitting criterion of each node of the decision tree, the nodes that can be reached by the test image sample can be determined in each of the decision trees, and thus the paths of the nodes of the test image sample in the decision forest can be determined.

Then, in step S400, the training samples at all leaf nodes in each determined path are merged, so that a broader decision region covering as many as possible few samples is segmented out. That is, all sample sets of leaf nodes reachable by test samples

Are combined into a larger sample set

Then, in step S500, the merged training samples are clustered into overlapping clusters. That is, each training image sample that has been merged may belong to at least one cluster, such that overlapping clusters may have complementary shapes, thereby enriching the cluster representation.

Step S500 will now be described in detail with reference to fig. 3.

As shown in fig. 3, in step S510, the inter-bias point distance between two of the merged training samples is calculated. For example, x_iAnd x_jInter-dot distance therebetween

Attribute with label bias (label-biased):

where d is the Euclidean distance, if x_iAnd x_jClass of (2) is different, then 1 (y)_i≠y_j) 1 and g (y) τ y/(max { y } -y) is a reciprocal increasing function, τ being a trade-off parameter. The offset distance makes clustering discriminable because it favors "same class" data over different classes of data. In extreme cases (e.g., in the case of classification), it still forms multiple clusters all from exactly one class (even if the cluster members differ significantly in shape), which is suitable for proceedingAnd (5) line classification processing. The offset inter-point distance may be used in a K-means technique for clustering.

Then, in S520, each of the merged training samples is assigned to at least one cluster based on the inter-bias point distance. For example, by in each iteration being based on the sample x_iNearest centroid of

Sample x_iIs released to more than one centroid

Allowing the clusters to overlap each other (empirically ω 0.8).

Hereinafter, step S500 will now be discussed in detail in examples according to embodiments of the present application.

Given N training samples, to cluster the N training samples into K overlapping clusters, the following steps are performed:

I. determining the centroids of the K clusters;

calculating offset distances between the N-K image samples and the centroids of the K clusters;

assigning each of the image samples to more than one centroid, i.e. allowing the clusters to overlap each other by: in each iteration, based on sample x_iNearest centroid of

Sample x_iIs released to more than one centroid

The N image samples are then clustered into K clusters by using a modified K-means technique (based on offset distance and multiple assignments). This results in overlapping clusters each containing some "inter-class" samples, but with complementary shapes to enrich the cluster representation;

updating the centroids of the K clusters; and

v. repeat II-IV until the centroids of the K overlapping clusters converge.

Then, in step S600, the properties of the test samples may be predicted from the overlapping clusters.

Step S600 will be described in detail below. Given step 500 to generate K overlapping clusters

These overlapping clusters have their feature matrices

And

the label of the sample q is predicted in step S600.

Specifically, in step S600, first, the affine bag model (affine hull model) AH is passed through_kEach of the overlapping clusters is modeled, and the affine package model is capable of accounting for different patterns of invisible data. Each AH_kCovers all possible affine combinations of its samples and can be parameterized as:

AH_k＝{x＝μ_k+U_kv_k,k＝1,...,K} (5)

wherein

Is the center of mass, U_kIs from the central L_kThe obtained orthonormal base of SVD, and v_kIs a coefficient vector.

Then, through calculation

(i.e., determine the exponent k) to determine which affine package model is used to approximate the sample q. Using mu_k+U_kv_kThe sample q is updated.

Then, based on the updated q and the determined exponent k, the robust estimation value α is estimated_kEstimated as:

based on the estimated

The sparse coefficient α can be determined by_k：

Therefore, the sparse coefficient α_kEstimated value

Of (3) is performed.

Then, the joint optimization of the cluster and its approximation to the nearest neighbor sparse prior is formulated as:

where ε ≧ 0, and λ and γ are the tuning parameters.

These operations are repeated until convergence is reached. The labels for sample q are then predicted as being used for regression

Or by reaction at y_k(with sparse components for classification) among them is majority voted to predict the label of sample q.

The following presents a list of operations for a method for predicting properties of a sample according to an embodiment of the present application.

The application also relates to a system for predicting attributes of image samples according to embodiments of the application.

Fig. 4 illustrates a system 2000 for predicting attributes of image samples according to an embodiment of the present application. Reference will be made to the training set as mentioned above

System 2000 is described.

As shown in fig. 4, the system 2000 includes a splitting means 100, a determining means 200, a clustering means 300, and a predicting means 400.

As shown in fig. 5, the splitting apparatus 100 includes a clustering unit 110, a first allocating unit 120, and a splitting unit 130. Training set

Is input into the clustering unit 110. The clustering unit 110 is used to generate a plurality of image subsets from the training set, e.g. by sampling. In addition, the clustering unit 110 clusters the training samples S at node j by employing, for example, the well-known K-means technique_jClustering into two classes

The first allocation unit 120 is electrically connected to the clustering unit 110. The first assigning unit 120 is configured to assign weights to the clusters processed classes according to the output of the clustering unit 110. The weight is the same as the weight mentioned in step S220, and a detailed description thereof will not be repeated herein.

The splitting unit 130 is electrically connected to the first distribution unit 120. Based on the assigned weights, splitting unit 130 may split the local sample S at node j_jCost sensitive splitting into

And

the splitting unit 130 may employ a factor f (p)_k) To exchangeSample S_jA cost sensitive split is performed. When the maximum depth or local sample size | S is reached_jWhen | falls below a second fixed threshold, splitting unit 130 may stop splitting. For classification, the splitting unit 130 may also stop splitting if the information gain described in equation (1) falls below a first fixed threshold. To perform regression, the information gain may be replaced with the above-mentioned tag variance. Thus, a decision forest is obtained.

The determination device 200 is electrically connected to the splitting device 100. The generated decision forest is output by the splitting means 100 to the determining means 200. The test sample is input to the determination device 200. The determining means 200 is then used to determine the nodes in each decision tree that can be reached by the test image sample, and hence determine the path of the nodes of the test image sample in the decision forest.

The clustering means 300 is electrically connected to the determining means 200. The clustering means 300 is arranged to merge the training samples at all leaf nodes in each determined path, thereby segmenting a broader decision region covering as many as a few samples as possible. That is, the clustering apparatus 300 sets all samples of the leaf nodes that can be reached by the test sample

Combined into a larger sample set

Then, the clustering means 300 clusters the merged training samples into overlapping clusters.

As shown in fig. 6, the clustering apparatus 300 further includes a calculating unit 310 and a second allocating unit 320.

The calculation unit 310 is configured to calculate an inter-bias distance between two of the merged training samples. For example, the inter-bias-point distance may be an inter-bias-point distance as defined in equation (4).

The second dispensing unit 320 is electrically connected to the calculating unit 310. The inter-bias-point distance is output by the calculation unit 310 to the second dispensing unit 320. The second dispensing unit 320 is then used to dispense the material based on the offset inter-point distanceEach of the combined training samples is assigned to at least one cluster. For example, the second allocating unit 320 may allocate the first allocation unit by, in each iteration, based on the sample x_iNearest centroid of

Sample x_iIs released to more than one centroid

Allowing the clusters to overlap each other (empirically ω 0.8).

The prediction apparatus 400 is electrically connected to the clustering apparatus 300. The overlapping clusters are output by the clustering means 300 to the prediction means 400. The prediction means 400 is then used to predict the properties of the test samples from the overlapping clusters.

As shown in fig. 7, the prediction apparatus 400 includes a discovery unit 410, an estimation unit 420, an update unit 430, and a prediction unit 440.

The finding unit 410 is used to find clusters that approximate the test sample among the overlapping clusters. The estimation unit 420 is electrically connected to the discovery unit 410 and is configured to calculate coefficient estimates for the test image samples from the discovered clusters. The updating unit 430 is electrically connected to the estimating unit 420 and is configured to update the coefficient estimation values via a nearest neighbor approximation. The prediction unit 440 is electrically connected to the update unit 430 and is configured to predict an attribute of the test image sample using the updated coefficient estimation value. The operation of the prediction means is substantially the same as the step described in step S600.

The present application also relates to a system 3000 for predicting properties of a test sample according to embodiments of the present application.

As shown in fig. 8, system 3000 includes: a memory 3100 storing executable components; and a processor 3200 coupled to memory 3100 and for executing the executable components to perform the operations of system 3000. These executable components include: a splitting component 3110 for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and progressively splitting each of the subsets to generate a decision forest for prediction; a determining component 3120 for determining paths of nodes of the test image sample in a decision forest; clustering component 3130 for merging training samples at all leaf nodes in each determined path and locally clustering all merged training samples to obtain overlapping clusters; and a prediction unit 3140 for predicting the properties of the test samples from the overlapping clusters.

According to an embodiment of the present application, the splitting component 3110 further comprises: a clustering subcomponent for clustering the training image samples into distinct classes at each node of the decision forest; a first assigning sub-component for assigning weights to the cluster-processed classes, wherein a greater weight is assigned to the class with fewer training image samples and a lesser weight is assigned to the class with more training image samples; and a splitting subcomponent for splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, clustering component 3130 further comprises: a calculating subcomponent for calculating an offset point-to-point distance between two of the merged training image samples; and a second assigning subcomponent for assigning the merged one of the training image samples to at least one cluster based on the inter-bias-point distance to obtain an overlapping cluster, wherein the calculating subcomponent calculates the inter-bias-point distance by calculating a euclidean distance of two of the merged training image samples multiplied by a factor equal to or greater than 1 if the two of the merged training image samples have the same attribute, and otherwise calculates the inter-bias-point distance by calculating a euclidean distance multiplied by a factor less than 1.

According to an embodiment of the present application, the prediction part 3140 further includes: a finding subcomponent for finding a cluster in the overlapping clusters that approximates the test image sample; a coefficient estimation value calculation sub-component for calculating a coefficient estimation value of the test image sample from the found cluster; an update subcomponent for updating the coefficient estimates via a near-neighbor like approximation; and a prediction subcomponent for predicting an attribute of the test image sample using the updated coefficient estimate.

Embodiments within the scope of the present invention may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Apparatus within the scope of the invention may be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method acts within the scope of the invention may be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.

Embodiments within the scope of the present invention are advantageously implemented in one or more computer programs that are executable on a programmable processor including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program may be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language (if desired); and in any case, the language may be a compiled or interpreted language. By way of example, suitable processors include both general purpose processors and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files.

Embodiments within the scope of the present invention include computer-readable media for carrying or having computer-executable instructions, computer-readable instructions, or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Examples of computer readable media may include: physical storage media such as RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices; or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions, computer-readable instructions, or data structures and which can be accessed by a general purpose or special purpose computer system. Any of the above may be supplemented by, or incorporated in, ASICs (application-specific integrated circuits). While embodiments of the present invention have been shown and described, changes and modifications may be made to such embodiments without departing from the true scope of the present invention.

While preferred examples of the present invention have been described, variations or modifications in those examples may occur to those skilled in the art upon learning of the basic inventive concepts. It is intended that the appended claims be construed to include preferred examples and that all such variations or modifications are within the scope of the invention.

It will be apparent to those skilled in the art that variations and modifications of the present invention can be made without departing from the spirit and scope of the invention. Thus, if these changes or modifications fall within the scope of claims and equivalent techniques, they may also fall within the scope of the present invention.

Claims

1. A method for predicting attributes of image samples, comprising:

obtaining a plurality of image subsets from a training set comprising a plurality of training image samples;

gradually splitting each of the image subsets to generate a decision forest for prediction;

determining paths of nodes of the test image sample in the decision forest;

merging training image samples at all leaf nodes in each determined path;

clustering all merged training image samples to obtain overlapping clusters, each merged training image sample being clustered into at least one of the overlapping clusters; and

predicting properties of the test image sample from the overlapping clusters,

wherein the splitting comprises:

clustering the training image samples into different classes at each node of the decision forest;

the clustered classes are assigned weights, with classes with fewer training image samples being assigned more weight and classes with more training image samples being assigned less weight.

2. The method of claim 1, wherein the splitting further comprises:

splitting the training image samples based on the assigned weights.

3. A method as claimed in claim 2, wherein the decision forest has a depth such that all training image samples of each said class have the same attribute.

4. The method of claim 2, wherein an information gain of the decision forest is below a first fixed threshold.

5. The method of claim 1, wherein a training image sample size at the leaf nodes of the decision forest has a size below a second fixed threshold.

6. The method of claim 2, wherein the splitting comprises:

the training image samples are split by a cost sensitive linear support vector machine for classification.

7. The method of claim 2, wherein the splitting comprises:

the training image samples are split by cost sensitive linear support vector regression for regression.

8. The method of claim 1, wherein the clustering comprises:

calculating the distance between the offset points between two samples in the combined training image samples; and

assigning each of the training image samples that has been combined to at least one cluster based on the inter-bias point distance to obtain the overlapping cluster,

wherein the inter-bias-point distance is the Euclidean distance multiplied by a factor equal to or greater than 1 for the two samples of the merged training image samples if the two samples of the merged training image samples have the same attribute, otherwise the inter-bias-point distance is the Euclidean distance multiplied by a factor less than 1.

9. A system for predicting attributes of image samples, comprising:

splitting means for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples and progressively splitting each of said image subsets to generate a decision forest for prediction;

determining means electrically connected to the splitting means and for determining paths of nodes of a test image sample in the decision forest;

clustering means electrically connected to the determining means and configured to combine the training samples at all leaf nodes in each determined path and to locally cluster all combined training samples to obtain overlapping clusters, each of the overlapping clusters having at least two attributes; and

prediction means electrically connected to said clustering means and for predicting an attribute of a test sample from said overlapping clusters,

wherein the splitting apparatus further comprises:

a clustering unit that clusters the training image samples into different classes at each node of the decision forest;

a first assigning unit electrically connected to the clustering unit and assigning weights to the clustered classes, wherein a greater weight is assigned to a class having fewer training image samples and a lesser weight is assigned to the class having more training image samples.

10. The system of claim 9, wherein the cleaving device further comprises:

a splitting unit electrically connected with the assigning unit and splitting the training image samples based on the assigned weights.

11. The system of claim 10, wherein the splitting unit is a cost sensitive linear support vector machine for classification.

12. The system of claim 10, wherein the splitting unit is a cost sensitive linear support vector regression for regression.

13. The system of claim 9, wherein the clustering means further comprises:

a calculating unit for calculating an offset point-to-point distance between two samples of the merged training image samples; and

a second assigning unit electrically connected to the calculating unit and assigning one of the merged training image samples to at least one cluster based on the inter-bias point distance to obtain the overlapping cluster,

wherein the calculation unit calculates the inter-bias-point distance by calculating a euclidean distance of the two samples of the combined training image samples multiplied by a factor equal to or greater than 1 if the two samples of the combined training image samples have the same attribute, and calculates the inter-bias-point distance by calculating the euclidean distance multiplied by a factor less than 1 otherwise.

14. A system for predicting attributes of image samples, comprising:

a memory storing executable components; and

a processor electrically coupled to the memory to execute the executable components to perform operations of the system, wherein the executable components comprise:

a splitting component for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples and progressively splitting each of said image subsets to generate a decision forest for prediction;

determining means for determining paths of nodes of a test image sample in the decision forest;

clustering means for merging training image samples at all leaf nodes in each determined path and clustering all merged training samples to obtain overlapping clusters, each merged training image sample being clustered into at least one of the overlapping clusters; and

a prediction component for predicting an attribute of a test sample from the overlapping clusters,

wherein the splitting component further comprises:

a clustering subcomponent for clustering the training image samples into distinct classes at each node of the decision forest;

a first assigning subcomponent for assigning weights to the cluster-processed classes, wherein greater weights are assigned to classes with fewer training image samples and lesser weights are assigned to classes with more training image samples.

15. The system of claim 14, wherein the splitting component further comprises:

a splitting subcomponent for splitting the training image samples based on the assigned weights.

16. The system of claim 14, wherein the clustering component further comprises:

a calculating subcomponent for calculating an offset point-to-point distance between two of the merged training image samples; and

a second assigning component for assigning one of the merged training image samples to at least one cluster based on the inter-bias point distance to obtain the overlapping cluster,

wherein the computation subcomponent computes the inter-bias-point distance by computing the Euclidean distance of the two samples in the combined training image samples multiplied by a factor equal to or greater than 1 if the two samples in the combined training image samples have the same attribute, and otherwise computes the inter-bias-point distance by computing the Euclidean distance multiplied by a factor less than 1.