WO2017000118A1

WO2017000118A1 - Method and apparatus for predicting attribute for image sample

Info

Publication number: WO2017000118A1
Application number: PCT/CN2015/082645
Authority: WO
Inventors: Xiaoou Tang; Chen Huang; Chen Change Loy
Original assignee: Xiaoou Tang
Priority date: 2015-06-29
Filing date: 2015-06-29
Publication date: 2017-01-05
Also published as: CN107636678B; CN107636678A

Abstract

The present application relates to a method and a system for predicting an attribute for an image sample. The method for predicting the attribute for the image sample comprises: obtaining a plurality of image subsets from a training set comprising a plurality of training image samples; splitting progressively each of the image subsets to generate a decision forest for prediction; determining paths of nodes in the decision forest for a test image sample; merging the training image samples at all leaf nodes in each of the determined paths; clustering all the merged training image samples to obtain overlapping clusters, each of the merged training image samples being clustered into at least one of the overlapping clusters; and predicting, from the overlapping clusters, an attribute for the test image sample.

Description

METHOD AND APPARATUS FOR PREDICTING ATTRIBUTE FOR IMAGE SAMPLE

Technical Field

The present application relates to machine learning, and in particular to a method and an apparatus for predicting an attribute for an image sample.

Background of the Application

Data imbalance exists in many vision tasks ranging from low-level edge detection to high-level facial age estimation and head pose estimation. There are often much more images of the youth than the old in the widely used FG-NET and MORPH datasets, human head rarely exhibits extreme poses, and the various image edge image data structures obey a power-law distribution on the BSDS500 dataset.

Without handling this imbalance issue, conventional vision algorithms have a very strong learning bias towards the majority class with poor predictive accuracy for the minority class, usually of equal or more interest (e.g. rare edges may convey the most important semantic information about natural images) . The insufficient learning for the minority class is due to the complete lack of representation by a limited number of or even no examples, especially in the presence of small data sets. For instance, FG-NET and Pointing’ 04 head pose datasets have only 1002 and 2790 images in total, 8 images with 60+ ages and 60 images with pitch angle 90°, respectively； and FG-NET has no images for certain age classes above 60. This presents a bigger challenge to unseen data extrapolation from the few minority class samples that usually have high variability. Even worse, the small imbalanced data sets can be accompanied by the class overlap problem which further compounds the learning difficulty.

In the machine learning community, there are three common approaches to counter the negative impact of data imbalance: resampling, cost-sensitive learning and ensemble learning. Resampling approaches aim to make class priors equal by under-sampling the majority class or over-sampling the minority class (or both) , but can easily eliminate valuable information or introduce noise. Cost-sensitive learning is often reported to outperform random re-sampling by adjusting misclassification costs associated with samples, however the true costs are often unknown. An effective technique for further improvement is to resort to ensemble learning even without any priors. Chen et al. combined bagging and cost-sensitive decision trees to generate a weighted version of random forest, which is the only imbalanced learning method based on random forest to the best of our knowledge. They used the class weights for balancing the Gini criterion during node splitting and aggregation at the leaf nodes.

The above approaches have two common drawbacks: 1) They are designed for either classification or regression without an universal solution to both. 2) They have a limited ability to account for unseen appearances or synthesize novel labels on the observed data space. This is more critical in case of the combination of imbalance and small sample size where minority class is underrepresented by an excessively reduced number of or even no samples/labels. In this paper the problems of data imbalance and unseen data extrapolation are addressed in both classification and regression scenarios.

Summary of the Application

One aspect of the present application discloses a method for predicting an attribute for an image sample. The method for predicting the attribute for the image sample may comprise: obtaining a plurality of image subsets from a training set comprising a plurality of training image samples； splitting progressively each of the image subsets to generate a decision forest for prediction； determining paths of nodes in the decision forest for a test image sample； merging the training image samples at all leaf nodes in each of the determined paths； clustering all the merged training image samples to obtain overlapping clusters, each of the merged training image samples being clustered into at least one of the overlapping clusters； and predicting, from the overlapping clusters, an attribute for the test image sample.

According to an embodiment of the present application, the splitting may comprise: clustering the training image samples into different classes at each node of the decision forest； assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, the decision forest may have a depth such that all training image samples in each of the classes have a same attribute.

According to an embodiment of the present application, an information gain of the decision forest may be lower than a fixed threshold.

According to an embodiment of the present application, the training image samples at the leaf node of the decision forest may have a size lower than a fixed threshold.

According to an embodiment of the present application, the splitting may comprise: splitting the training image samples by a cost-sensitive linear support vector machine for classification.

According to an embodiment of the present application, the splitting may comprise: splitting the training image samples by a cost-sensitive linear support vector regression for regression.

According to an embodiment of the present application, the clustering may comprise: calculating a biased inter-point distance between two of the merged training image samples； and assigning, based on the biased inter-point distance, each of the merged training image samples to at least one cluster to obtain the overlapping clusters, wherein the biased inter-point distance is an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise the biased inter-point distance is the Euclidean distance multiplied by a factor less than one.

According to an embodiment of the present application, the predicting may comprise: finding a cluster of the overlapping clusters which approximates the test image sample； calculating a coefficient estimate for the test image sample from the found cluster； updating the coefficient estimate via a class-neighbor approximation； predicting the attribute for the test image sample using the updated coefficient estimate.

Another aspect of the present application discloses a system for predicting an attribute for an image sample. The system for predicting the attribute for the image sample may comprises a splitting device for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and splitting progressively each of the subsets to generate a decision forest for prediction； a determining device being electrically connected with the splitting device and for determining paths of the nodes in the decision forest for a test image sample； a clustering device being electrically connected with the determining device and for merging the training samples at all leaf nodes in each of the determined paths, and clustering locally all the merged training samples to obtain overlapping clusters, each of which has at least two attributes； and a predicting device being electrically connected with the cluster and for predicting, from the overlapping clusters, an attribute for the test sample.

According to an embodiment of the present application, the splitting device may further comprise: a clustering unit for clustering the training image samples into different classes at each node of the decision forest； a first assigning unit being electrically connected with the clustering unit and for assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and a splitting unit being electrically connected with the assigning unit and for splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, the splitting unit may be a cost-sensitive linear support vector machine for classification.

According to an embodiment of the present application, the splitting unit may be a cost-sensitive linear support vector regression for regression.

According to an embodiment of the present application, the clustering device may further comprise: a calculating unit for calculating a biased inter-point distance between two of the merged training image samples； and a second assigning unit being electrically connected with the calculating unit and for assigning, based on the biased inter-point distance, one of the merged training image samples to at least one cluster to obtain the overlapping clusters, wherein the calculating unit may calculate the biased inter-point distance by calculating an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise by calculating the Euclidean distance multiplied by a factor less than one.

According to an embodiment of the present application, the predicting device may further comprise: a finding unit for finding a cluster of the overlapping clusters which approximates the test image sample； a estimating unit being electrically connected with the finding unit and for calculating a coefficient estimate for the test image sample from the found cluster； an updating unit being electrically connected with the estimating unit and for the coefficient estimate via a class-neighbor approximation； and a predicting unit being electrically connected with the updating unit and for predicting the attribute for the test image sample using the updated coefficient estimate.

Another aspect of the present application relates to a system for predicting an attribute for an image sample. The system may comprise a memory that may store executable components； and a processor electrically coupled to the memory that may execute the executable components to perform operations of the system, wherein the executable components may comprise: a splitting component configured for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and splitting progressively each of the subsets to generate a decision forest for prediction； a determining component configured for determining paths of the nodes in the decision forest for a test image sample； a clustering component configured for merging the training samples at all leaf nodes in each of the determined paths, and clustering locally all the merged training samples to obtain overlapping clusters； and a predicting component configured for predicting, from the overlapping clusters, an attribute for the test sample.

According to an embodiment of the present application, the splitting component may further comprise: a clustering sub-component configured for clustering the training image samples into different classes at each node of the decision forest； a first assigning sub-component configured for assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and a splitting sub-component configured for splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, the clustering component may further comprise: a calculating sub-component configured for calculating a biased inter-point distance between two of the merged training image samples； and a second assigning sub-component configured for assigning, based on the biased inter-point distance, one of the merged training image samples to at least one cluster to obtain the overlapping clusters, wherein the calculating sub-component may calculate the biased inter-point distance by calculating an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise by calculating the Euclidean distance multiplied by a factor less than one.

According to an embodiment of the present application, the predicting component may further comprise: a finding sub-component configured for finding a cluster of the overlapping clusters which approximates the test image sample； a estimating sub-component configured for calculating a coefficient estimate for the test image sample from the found cluster； an updating sub-component configured for the coefficient estimate via a class-neighbor approximation； and a predicting sub-component configured for predicting the attribute for the test image sample using the updated coefficient estimate.

The present application combines ensemble-and cost-sensitive learning in a natural manner and without resampling, thereby avoiding information loss and added noise.

Brief Description of the Drawing

Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.

Fig. 1 illustrates a method for predicting an attribute for an image sample according to an embodiment of the present application.

Fig. 2 illustrates sub-steps of generating a decision forest according to an embodiment of the present application.

Fig. 3 illustrates sub-steps of obtaining overlapping clusters according to an embodiment of the present application.

Fig. 4 illustrates a system for predicting an attribute for an image sample according to an embodiment of the present application.

Fig. 5 illustrates a schematic block diagram of a splitting device according to an embodiment of the present application.

Fig. 6 illustrates a schematic block diagram of a clustering device according to an embodiment of the present application.

Fig. 7 illustrates a schematic block diagram of a predicting device according to an embodiment of the present application.

Fig. 8 illustrates a system for predicting an attribute for an image sample according to an embodiment of the present application.

Detailed Description

Hereinafter, the embodiments of the present application will be described in detail with reference to the detailed description as well as the drawings.

Various embodiments of the present application will be described with reference to a training set

where x_i∈R^D is the feature vector of sample s_i, and y_i is the label of the sample s_i, The present application aims to make unbiased prediction for a sample feature x even in the presence of severely imbalanced and small datasets. The label y∈C refers to a class index (e.g. edge class) for classification and a numeric value (e.g. age and pose angle) for regression. In order to indentify correct decision regions for the majority class, and more important for the minority, the present application resort to a random decision forest which is efficient and robust. The random decision forest is an ensemble of decision trees learned from multiple random data subsets. Each tree recursively divides the input space into disjoint partitions generating candidate decision regions in a coarse-to-fine manner.

Fig. 1 illustrates a method 1000 for predicting an attribute for an image sample according to an embodiment of the present application.

In step S100, the training set

is received and a plurality of image subsets are obtained from the training set by, for example, sampling.

Then, in step S200, each of image subsets is progressively split to generate a decision tree. The generated decision trees constitute the decision forest which is used for predicting an attribute of the test image sample.

The step S200 will now be described in details with reference to Fig. 2.

As shown in Fig. 2, in step S210, the training samples S_j at the node j is clustered into two classes

for example by adopting the well-known K-means technique. For classification, the training samples S_j at the node j are clustered into two classes so as to be split into the left node or the right node. For a multi-class scenario, for example a ten-class scenario, the ten classes are clustered into a part comprising five similar classes and another part comprising the other five classes, and then the two parts are progressively split. Then, in step S220, in order to prevent being biased toward the majority class, weights are assigned to the clustered classes. In the present application, the weight is defined as a function of the cluster distribution. For example, the weight may be associated with a factor of f (p_k) ＝ (1 p_k) /p_k, where

Obviously, f (p_k) gives larger weights to the minority classes without losing the overall performance. Then, in step S230, for a node j with local samples S_j, S_j may be cost-sensitively split into

and

Specifically, the cost-sensitive splitting may employ the factor of f (p_k) . Step S230 stops when a maximum depth is reached or local sample size |S_j| falls below a fixed threshold. For classification, step S230 may also stop if information gain which is described in Equation (1) falls below a fixed threshold. The information gain is defined as:

whereH denotes the class entropy. For regression, the information gain can be replaced by the label variance which is defined as: H (S) ＝Σ_y (y-μ) ²/|S| where μ＝Σ_y y/|S|. Accordingly, the decision forest is obtained.

For example, for classification, the splitting function used in step S230 may be determined by a cost-sensitive version of linear SVM:

where w is the weight vector, C is a regularization parameter, and z_i＝1 if

otherwise z_i＝-1. Each training sample is finally sent to

or

by sgn (w^T x_i) . For regression, the splitting function used in step S230 may be determined by a cost-sensitive version of linear SVR:

where ε≥0. The node branches left or right by comparing the numeric predictions {w^T x_i} with the local mean of labels

Then, in step S300, the test image sample is inputted into each decision tree of the decision forest generated in step S200. According to the splitting criteria of each node of the decision trees, the nodes that can be reached by the test image sample can be determined in each of the decision trees, and thus paths of the nodes for the test image sample in the decision forest can be determined.

Then, in step S400, the training samples at all leaf nodes in each of the determined paths are merged, carving a broader decision region covering as many minority samples as possible. That is, all the sample sets

of the leaf nodes that may be reached by the test sample are merged into a larger one

Then, in step S500, the merged training samples are clustered into overlapping clusters. That is, one of the merged training image samples may belong to at least one cluster so that the overlapping cluster may have complementary appearances, enriching cluster representations.

The step S500 will now be described in details with reference to Fig. 3.

As shown in Fig. 3, in step S510, a biased inter-point distance between two of the merged training samples is calculated. For example, the inter-point distance

between x_i and x_j is label-biased:

where d is the Euclidean distance, 1 (y_i ≠y_j) ＝1 if the class labels of x_i and x_j are different, and g (y) ＝τy/ (max {y}-y) is a reciprocal increasing function, τ is the trade-off parameter. The biased distance makes clustering discriminative by preferring the “same-class” data-pairs to those from different classes. In extreme cases, for example, in classification scenarios, it forms clusters each purely from one class even if the cluster members differ remarkably in appearances, which is suitable for classification. The biased inter-point distance may be used in the K-means technology for clustering.

Then, in step S520, each of the merged training samples are assigned to at least one cluster based on the biased inter-point distance. For example, the clusters are allowed to overlap with each other by relaxing the cluster assignment of a sample x_i based on its nearest centroid

to more than one centroid

in each iteration (ω＝0.8 empirically) .

Hereinafter, the step S500 will now be discussed in details in an example according to an embodiment of the present application.

Given N training samples, in order to cluster the N training samples into K overlapping clusters, the following steps will be performed:

I. Determining centroids of K clusters；

II. Calculating the biased distance between the N-K image samples and the centroids of the K clusters；

III. Assigning each of the image samples to more than one centroid , that is, the clusters are allowed to overlap with each other by relaxing the cluster assignment of a sample x_i based on its nearest centroid

to more than one centroid

in each iteration and then clustering the N image samples into the K clusters by using the modified K-means technology which is based on the biased distance and the multi assignment. This results in overlapping clusters each containing some "inter-class" samples but with complementary appearances to enrich cluster representations；

IV. Updating the centroids of the K clusters； and

V. Repeating II-IV until the centroids of the K overlapping clusters are converged.

Then, in step S600, the attribute of the test sample can be predicted from the overlapping clusters.

Step S600 will be described in details hereinafter. Given that step 500 generates K overlapping clusters

with their feature matrices

and labels

the label for a sample q is predicted in step S600.

Specifically, in step S600, first, each of the overlapping clusters is modeled by an affine hull model AH_k that is able to account for unseen data of different modes. Every single AH_k covers all possible affine combinations of its samples and can be parameterized as

AH_k＝ {x＝μ_k+U_kv_k, k＝1,..., K} (5) .

where

is the centroid, U_k is the is the orthonormal basis obtained from the SVD of the centered L_k, and v_k is the coefficient vector.

Then, it is determined which Affine Hull model is used to approximate the sample q by calculating

that is, the index k is determined. The sample q is updated using μ_k+U_kv_k.

Then, based on the updated q and the determined index k, a robust estimate

is estimated as

Based on the estimated

asparse coefficient α_k may be determined by:

Accordingly, the sparse coefficient α_k is constrained by the estimate

Then, a joint optimization is formulated over the belonging cluster and its approximation with a class-neighbor sparsity prior as:

where ε ≥ 0, and λ and γ are regulation parameters.

These operations are repeated until convergence is reached. Then, the label for the sample q is predicted as

for regression or by majority voting among y_k with sparse components for classification.

A list of operations of the method for predicting an attribute for a sample according to an embodiment of the present application is given below:

The present application also relates to a system for predicting an attribute for an image sample according to an embodiment of the present application.

Fig. 4 illustrates a system 2000 for predicting an attribute for an image sample according to an embodiment of the present application. The system 2000 will be described with reference to the training set

as mentioned above.

As shown in Fig. 4, the system 2000 comprises a splitting device 100, a determining device 200, a clustering device 300 and a predicting device 400.

As shown in Fig. 5, the splitting device 100 comprises a clustering unit 110, a first assigning unit 120, and a splitting unit 130. The training set

is inputted into the clustering unit 110. The clustering unit 110 is configured for generating a plurality of image subsets from the training set by, for example, sampling. Further, the clustering unit 110 clusters the training samples S_j at the node j into two classes

by adopting, for example, the well-known K means technique.

The first assigning unit 120 is electrically connected with the clustering unit 110. The first assigning unit 120 is configured for assigning weights to the clustered classes according to the output of the clustering unit 110. The weight is the same as that mentioned in step S220, the detailed description of which will not be repeated herein.

The splitting unit 130 is electrically connected with the first assigning unit 120. Based on the assigned weights, the splitting unit 130 may cost-sensitively split the local samples S_j at a node j into

and

The splitting unit 130 may employ the factor of f (p_k) to perform the cost-sensitive splitting to the local samples S_j. The splitting unit 130 may stop splitting when a maximum depth is reached or local sample size |S_j| falls below a fixed threshold. For classification, the splitting unit 130 may also stop splitting if information gain which is described in Equation (1) falls below a fixed threshold. For regression, the information gain can be replaced by the above-mentioned label variance. Accordingly, the decision forest is obtained.

The determining device 200 is electrically connected with the splitting device 100. The generated decision forest is outputted by the splitting device 100 to the determining device 200. A test sample is inputted into the determining device 200. Then, the determining device 200 is configured for determining the nodes that can be reached by the test image sample in each of the decision trees and thus determines paths of the nodes for the test image sample in the decision forest.

The clustering device 300 is electrically connected with the determining device 200. The clustering device 300 is configured for merging the training samples at all leaf nodes in each of the determined paths, carving a broader decision region covering as many minority samples as possible. That is, the clustering device 300 merges all the sample sets

of the leaf nodes that may be reached by the test sample into a larger one

Then, the clustering device 300 clusters the merged training samples into overlapping clusters.

As shown in Fig. 6, the clustering device 300 further comprises a calculating unit 310 and a second assigning unit 320.

The calculating unit 310 is configured for calculating a biased inter-point distance between two of the merged training samples. For example, the biased inter-point distance may be that as defined in Equation (4) .

The second assigning unit 320 is electrically connected with the calculating unit 310. The biased inter-point distance is outputted by the calculating unit 310 to the second assigning unit 320. Then, the second assigning unit 320 is configured for assigning each of the merged training samples to at least one cluster based on the biased inter-point distance. For example, the second assigning unit 320 allows the clusters to overlap with each other by relaxing the cluster assignment of a sample x_i based on its nearest centroid

to more than one centroid

in each iteration (ω＝0.8 empirically) .

The predicting device 400 is electrically connected with the clustering device 300. The overlapping clusters are outputted by the clustering device 300 to the predicting device 400. Then, the predicting device 400 is configured for predicting an attribute for the test sample from the overlapping clusters.

As shown in Fig. 7, the predicting device 400 comprises a finding unit 410, a estimating unit 420, an updating unit 430 and a predicting unit 440.

The finding unit 410 is configured for finding a cluster of the overlapping clusters which approximates the test sample. The estimating unit 420 is electrically connected with the finding unit 410 and is configured for calculating a coefficient estimate for the test image sample from the found cluster. The updating unit 430 is electrically connected with the estimating unit 420 and is configured for updating the coefficient estimate via a class-neighbor approximation. The predicting unit 440 is electrically connected with the updating unit 430 and is configured for predicting the attribute for the test image sample using the updated coefficient estimate. The operations of the predicting device are substantially the same as the steps described in step S600.

The present application also relates to a system 3000 for predicting an attribute for a test sample according to an embodiment of the present application.

As shown in Fig. 8, the system 3000 comprises a memory 3100 that stores executable components and a processor 3200 coupled to the memory 3100 and configured for executing the executable components to perform operations of the system 3000. The executable components comprise: a splitting component 3110 configured for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and splitting progressively each of the subsets to generate a decision forest for prediction； a determining component 3120 configured for determining paths of the nodes in the decision forest for a test image sample； a clustering component 3130 configured for merging the training samples at all leaf nodes in each of the determined paths, and clustering locally all the merged training samples to obtain overlapping clusters； and a predicting component 3140 configured for predicting, from the overlapping clusters, an attribute for the test sample.

According to an embodiment of the present application, the splitting component 3110 further comprises a clustering sub-component for clustering the training image samples into different classes at each node of the decision forest； a first assigning sub-component for assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and a splitting sub-component for splitting the training image samples based on the assigned weights.

According to an embodiment of the present application, the clustering component 3130 further comprises: a calculating sub-component for calculating a biased inter-point distance between two of the merged training image samples； and a second assigning sub-component for assigning, based on the biased inter-point distance, one of the merged training image samples to at least one cluster to obtain the overlapping clusters, wherein the calculating sub-component calculates the biased inter-point distance by calculating an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise by calculating the Euclidean distance multiplied by a factor less than one.

According to an embodiment of the present application, the predicting component 3140 further comprises: a finding sub-component for finding a cluster of the overlapping clusters which approximates the test image sample； a coefficient estimate calculating sub-component for calculating a coefficient estimate for the test image sample from the found cluster； a updating sub-component for the coefficient estimate via a class-neighbor approximation； and a predicting sub-component for predicting the attribute for the test image sample using the updated coefficient estimate.

Embodiments within the scope of the present invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. Apparatus within the scope of the present invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor； and method actions within the scope of the present invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.

Embodiments within the scope of the present invention be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired； and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files.

Embodiments within the scope of the present invention include computer-readable media for carrying or having computer-executable instructions, computer-readable instructions, or data structures stored thereon. Such computer-readable media may be any available media, which is accessible by a general-purpose or special-purpose computer system. Examples of computer-readable media may include physical storage media such as RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other media which can be used to carry or store desired program code means in the form of computer-executable instructions, computer-readable instructions, or data structures and which may be accessed by a general-purpose or special-purpose computer system. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) . While particular embodiments of the present invention have been shown and described, changes and modifications may be made to such embodiments without departing from the true scope of the invention.

Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims are intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.

Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims

A method for predicting an attribute for an image sample, comprising:

obtaining a plurality of image subsets from a training set comprising a plurality of training image samples；

splitting progressively each of the image subsets to generate a decision forest for prediction；

determining paths of nodes in the decision forest for a test image sample；

merging the training image samples at all leaf nodes in each of the determined paths；

clustering all the merged training image samples to obtain overlapping clusters, each of the merged training image samples being clustered into at least one of the overlapping clusters； and

predicting, from the overlapping clusters, an attribute for the test image sample.
The method according to claim 1, wherein the splitting comprises:

clustering the training image samples into different classes at each node of the decision forest；

assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and

splitting the training image samples based on the assigned weights.
The method according to claim 2, wherein the decision forest has a depth such that all training image samples in each of the classes have a same attribute.
The method according to claim 2, wherein an information gain of the decision forest is lower than a fixed threshold.
The method according to claim 1, wherein the training image samples at the leaf node of the decision forest have a size lower than a fixed threshold.
The method according to claim 2, wherein the splitting comprises:

splitting the training image samples by a cost-sensitive linear support vector machine for classification.
The method according to claim 2, wherein the splitting comprises:

splitting the training image samples by a cost-sensitive linear support vector regression for regression.
The method according to claim 1, wherein the clustering comprises:

calculating a biased inter-point distance between two of the merged training image samples； and

assigning, based on the biased inter-point distance, each of the merged training image samples to at least one cluster to obtain the overlapping clusters,

wherein the biased inter-point distance is an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise the biased inter-point distance is the Euclidean distance multiplied by a factor less than one.
The method according to claim 1, wherein the predicting comprises:

finding a cluster of the overlapping clusters which approximates the test image sample；

calculating a coefficient estimate for the test image sample from the found cluster；

updating the coefficient estimate via a class-neighbor approximation；

predicting the attribute for the test image sample using the updated coefficient estimate.
A system for predicting an attribute for an image sample, comprising:

a splitting device for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and splitting progressively each of the subsets to generate a decision forest for prediction；

a determining device being electrically connected with the splitting device and for determining paths of the nodes in the decision forest for a test image sample；

a clustering device being electrically connected with the determining device and for merging the training samples at all leaf nodes in each of the determined paths, and clustering locally all the merged training samples to obtain overlapping clusters, each of which has at least two attributes； and

a predicting device being electrically connected with the cluster and for predicting, from the overlapping clusters, an attribute for the test sample.
The system according to claim 10, wherein the splitting device further comprises:

a clustering unit for clustering the training image samples into different classes at each node of the decision forest；

a first assigning unit being electrically connected with the clustering unit and for assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and

a splitting unit being electrically connected with the assigning unit and for splitting the training image samples based on the assigned weights.
The system according to claim 11, wherein the splitting unit is a cost-sensitive linear support vector machine for classification.
The system according to claim 11, wherein the splitting unit is a cost-sensitive linear support vector regression for regression.
The system according to claim 10, wherein the clustering device further comprises:

a calculating unit for calculating a biased inter-point distance between two of the merged training image samples； and

a second assigning unit being electrically connected with the calculating unit and for assigning, based on the biased inter-point distance, one of the merged training image samples to at least one cluster to obtain the overlapping clusters,

wherein the calculating unit calculates the biased inter-point distance by calculating an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise by calculating the Euclidean distance multiplied by a factor less than one.
The system according to claim 10, wherein the predicting device further comprises:

a finding unit for finding a cluster of the overlapping clusters which approximates the test image sample；

a estimating unit being electrically connected with the finding unit and for calculating a coefficient estimate for the test image sample from the found cluster；

an updating unit being electrically connected with the estimating unit and for the coefficient estimate via a class-neighbor approximation； and

a predicting unit being electrically connected with the updating unit and for predicting the attribute for the test image sample using the updated coefficient estimate.
A system for predicting an attribute for an image sample, comprising:

a memory that stores executable components； and

a processor electrically coupled to the memory that executes the executable components to perform operations of the system, wherein the executable components comprise:

a splitting component configured for obtaining a plurality of image subsets from a training set comprising a plurality of training image samples, and splitting progressively each of the subsets to generate a decision forest for prediction；

a determining component configured for determining paths of the nodes in the decision forest for a test image sample；

a clustering component configured for merging the training samples at all leaf nodes in each of the determined paths, and clustering locally all the merged training samples to obtain overlapping clusters； and

a predicting component configured for predicting, from the overlapping clusters, an attribute for the test sample.
The system according to claim 16, wherein the splitting component further comprises:

a clustering sub-component configured for clustering the training image samples into different classes at each node of the decision forest；

a first assigning sub-component configured for assigning weights to the clustered classes, wherein a greater weight is assigned to the class having less training image samples, and a smaller weight is assigned to the class having more training image samples； and

a splitting sub-component configured for splitting the training image samples based on the assigned weights.
The system according to claim 16, wherein the clustering component further comprises:

a calculating sub-component configured for calculating a biased inter-point distance between two of the merged training image samples； and

a second assigning sub-component configured for assigning, based on the biased inter-point distance, one of the merged training image samples to at least one cluster to obtain the overlapping clusters,

wherein the calculating sub-component calculates the biased inter-point distance by calculating an Euclidean distance of the two of the merged training image samples multiplied by a factor equal or more than one if the two of the merged training image samples have a same attribute, and otherwise by calculating the Euclidean distance multiplied by a factor less than one.
The system according to claim 16, wherein the predicting component further comprises:

a finding sub-component configured for finding a cluster of the overlapping clusters which approximates the test image sample；

a estimating sub-component configured for calculating a coefficient estimate for the test image sample from the found cluster；

an updating sub-component configured for the coefficient estimate via a class-neighbor approximation； and

a predicting sub-component configured for predicting the attribute for the test image sample using the updated coefficient estimate.