EP2100259A2

EP2100259A2 - Classification method for an object's image and corresponding device

Info

Publication number: EP2100259A2
Application number: EP07871938A
Authority: EP
Inventors: Sid Ahmed Berrani; Christophe Garcia
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2006-12-21
Filing date: 2007-12-14
Publication date: 2009-09-16
Also published as: WO2008081143A2; FR2910668A1; WO2008081143A3

Abstract

The invention relates to a classification method of an object's image belonging to a category of objects, said method comprising a prior step consisting of obtaining a subset (Ej) of object's images of said category, said subset being associated to a classification criterion, and said process being characterized in that it also comprises: - a calculation step (b1) of the distance between said image and said subset (Ej) according to the formula: Dist(q, Ej) = DMC({v1, V2,..., vN}U{q}) - DMC({v1, V2,..., vN}) in which: - Dist(q, Ej) is the distance between an image vector q corresponding to said image and the vector of the images of the subset Ej, - (V1, v2,..., vN} are the vectors V1 to vN of the images of said subset Ej, - and DMC(x1,...,xp}) is an operator which gives the determinant of the covariance matrix of the vectors X1 to Xp, assimilated to P observations of a random vector variable Xi, i being an index varying from 1 to P, - and a comparison step (b2) of the thus calculated distance having a decision threshold (δj).

Description

Method of classifying an object image and corresponding device

The present invention is in the field of image processing.

More specifically, the invention relates to a method of classifying images of objects of the same category, according to visual criteria related to this category.

In automatic object recognition systems in an image or a video, the location of objects in the image or video is an essential first step before recognition. This is the object detection. This step aims to extract only the pieces of the image each containing an object to detect. These pieces of images are then passed to the recognition module for identification.

These object recognition techniques, however, are very sensitive to the quality of the extracted objects and in particular to their positions in the image pieces. Indeed, object recognition modules have generally learned to recognize an object in a well-defined position in an image. The performance of the object recognition systems therefore degrade significantly when the object is not in this determined position. It is therefore essential to have an image classifier that can classify the image pieces according to the position of the extracted object in these pieces, in order to present to the recognition module only the image pieces in which the extracted object is in a position adapted for its recognition. This classification is essential in the learning phase of object recognition systems, so that learning is effective, but also in the identification phase, so that recognition is most likely to succeed. If there are several specialized object recognition systems each in a specific position of the object to be recognized, the classifier may be used to refer, during the identification phase, the image pieces to the systems. recognition of appropriate images, depending on the positions of the objects in these image pieces.

These systems are mostly applied to face recognition. Currently face recognition systems work optimally only on face images in the frontal position. This is why the classification of face images according to their pose, frontal, semi-frontal or profile, is an important issue in the field of facial analysis.

The classification of faces in poses has given rise to many works following essentially three approaches. The first approach is based on tracking algorithms to estimate the image of a video model of a face, and also to deduce the pose of this face. Two examples of work following this first approach are described in the following articles:

- "Face Tracking and Pose Estimation Using Affine Motion Parameters", by P. Yao and G. Evans, published in 2001 on the occasion of the twelfth SCIA conference, according to the English "Scandinavian Conference on Image Analysis",

- and "Face pose estimation System by combining hybrid ICA-SVM learning and re-registration" by K. Seo, I. Cohen, S. You and U. Neumann, published in January 2004 at the time of a conference ACCV, according to English "Asian Conference on Computer Vision". The techniques according to this approach use time information to estimate the movement of a face from one image to another, and therefore have the disadvantage of being limited to videos, and not to be applicable to the classification of extracted faces. from still images. The second approach uses the positions of the facial elements, such as the eyes, the nose and the mouth, and biometrics rules of the face, to deduce the pose of a face in an image. An example of classification of faces in poses using the detection of facial elements is given in the article "Pose classification of human faces by weighting mask function approach", of C. Lin and K. -C. Fan, published in issue 24 of the journal "Pattern Recognition Letters".

The methods according to this second approach are limited by the performance of the automatic methods for detecting facial elements: these provide positions of facial elements which are not sufficiently precise to ensure a satisfactory classification rate, in particular for profile poses.

The third approach is based on statistical models of poses to be searched, constructed from a principal component analysis from thumbnail examples, or on binary masks to characterize a particular pose. Examples of work using principal component analysis are described in the following articles:

- "View-Based and Modular Eigenspaces for Face Recognition", by Alex Pentland, Baback Moghaddam and Thad Starner, published in 1994 on the occasion of the thirteenth "Institute of Electrical and

Electronic Engineer (IEEE) Conference on Computer Vision and Pattern Recognition ",

- and "Eigenfaces for Recognition", by M. Turk and A. Pentland, published in the third volume of the first issue of the journal "Journal of Cognitive Neuroscience".

It is further noted that in the article by C. Lin and K. -C. Fan cited above, we use binary masks to classify faces in poses, thus combining detection of facial elements and use of characteristic models. A binary mask is a reference mask that is applied to a grayscale face image, which is transformed into a binary image in black and white by thresholding from this mask, then a correlation is performed to determine the pose of the face. The methods based on binary masks therefore use the more or less dark areas and shadows of the faces, which makes their effectiveness very sensitive to lighting: indeed a side lighting for example distorts the shadows that one expects to find on a face illuminated from the front in a grayscale image. Similarly methods using a principal component analysis are linear methods which makes them very resistant to light variations. More generally, the techniques according to this third approach are very sensitive to significant variations in the faces, such as the presence of blackout elements such as glasses, a beard or a mustache, which significantly reduce their rates of good classification.

Other image classification systems, such as the classification system described in the article "Application of an iterative classification process directed to an urban and peri-urban Algerian site", by N. Ouarab and Y. Smara, published in 1997 in the book "Remote sensing of urban and peri-urban environments" by the Agence Universitaire de la Francophonie, use the notion of distance between images.

The distances used in these systems are, for example, Euclidean distances, or the so-called Mahalanobis distance. These distances are applicable between two image vectors only and are associated with other classification methods such as nearest neighbor classification. The comparisons made for the classification are thus image-to-image comparisons. These systems are used to group images very close to each other. Unlike the invention, they do not take into account all the characteristics of the vectors of a class and their dispersion.

The present invention aims to solve the disadvantages of the prior art by providing a method and a classification device of images, which use a new applicable distance between an image descriptor vector and a set of image descriptor vectors.

To this end, the invention proposes a method of classifying an object image belonging to a category of objects, said method comprising a preliminary step of obtaining a subset of object images of said object category. category, said subset being associated with a classification criterion, and said method being characterized in that it further comprises:

a calculation step (b1) of distance between said image (I) and said subset (E _j ) according to the formula: Dist (q, E _j ) = DMCKv ₁ , V ₂ , ..., v _N } u {q} - DMCaV ₁ , V ₂ , ..., v _N }) where:

- Dist (q, E _j ) is a distance between an image vector q corresponding to said image and the vectors of the images of said subset E _j , - {vi, v ₂ , ..., VN} are the vectors V ₁ to VN images of said subset Ej,

- and DMC (Ix ₁ xp)) is an operator which returns the determinant of the covariance matrix of the vectors X ₁ to Xp, assimilated to a number P of observations of a random vector variable Xi, i being a variant index of 1 to P,

and a comparison step (b2) of the distance thus calculated with a decision threshold (δ _j ).

Thanks to the invention, an image classification method applicable to still images is obtained which is fast, independent of the efficiency of a system used upstream, for example for the detection of elements, and which is robust relative to to important variations of the images between them. Especially when used for face recognition, it allows to classify different types of faces in frontal, semi-frontal and profile poses in a robust way with respect to the light variations and the presence occulting elements, both during the learning phase and during the identification phase.

According to a preferred characteristic, said decision threshold is associated with said subset of images. To determine whether an image to be classified meets a classification criterion, its distance with respect to several subsets of images corresponding to this criterion is calculated and compared with a decision threshold. Adapting the decision threshold to each subset makes it possible to take into account the heterogeneity of the subsets that obey the same classification criterion, so as to optimize the rate of good classification of the method according to the invention. .

According to another preferred characteristic, said decision threshold is equal to the smallest distance calculated between said subset and the images of a set of negative images that do not correspond to a classification criterion associated with said subset.

This choice of decision threshold makes it possible to obtain a classification that is coherent with respect to the learning images, divided into subsets of positive images, corresponding to a classification criterion, and a set of negative images, which obey not to this criterion. According to another preferred characteristic, said decision threshold is chosen so as to maximize the sum of: the rate of good image classification of another subset of images obeying the same classification criterion as said subset, and the image rejection rate of a set of negative images not corresponding to said classification criterion.

This choice of decision threshold makes it possible to improve the rate of good classification of the process according to the invention.

The invention also relates to a face recognition method using the object image classification method according to the invention. The invention also relates to a device implementing the classification method according to the invention and the method of face recognition using this method.

The device and the method of face recognition have advantages similar to those of the classification method according to the invention.

The invention also relates to a computer program comprising instructions for implementing the classification method according to the invention or the face recognition method using it when it is executed on a computer.

Other features and advantages will appear on reading a preferred embodiment described with reference to the figures in which:

FIG. 1 represents different phases of the classification method according to the invention,

FIG. 2 represents a device implementing the classification method according to the invention,

FIG. 3 represents different steps of a learning phase of the classification method according to the invention, FIG. 4 represents the contents of a learning database,

FIG. 5 represents different stages of a phase of use of the classification method according to the invention,

FIG. 6 represents more precisely the obtaining of a classification result by the method according to the invention during this phase of use, FIG. 7 represents a mode of obtaining decision threshold associated with a sub-phase. set of learning images.

According to a preferred embodiment of the invention, the method according to the invention is applied to the classification of faces in poses, and more precisely it is used to determine if a face in an image is in frontal pose or not. However, the object image classification method according to the invention can be used to classify any other type of object according to various classification criteria, for example to classify images of logos.

Moreover, the use of the classification method according to the invention, which is carried out in a utilization phase φ2 shown in FIG. 1, first requires the execution of a learning phase φ1, detailed below, and which is not repeated after each use of the method according to the invention.

The classification method according to the invention is typically implemented in a software manner in an ORD computer represented in FIG. 2. The learning phase φ1 is for example implemented using a learning module MA, and the phase of use φ2 is implemented in a classification module MC, which receives an image to classify I as input, and returns a classification result Res. The learning phase φ1 makes it possible to fill the database of BDD learning to which are connected the learning module MA and the classification module MC.

This learning phase φ1 comprises two steps ai and a2, represented in FIG. 3, and whose objective is to provide training data necessary for the use phase φ2.

The first step ai is obtaining subsets of learning images. These learning images are actually face images corresponding to face-bounding boxes extracted after face detection in larger images. Step a1 requires a training image database, which comprises grayscale images representative of the pose to be learned, that is to say images of faces in the frontal position. These images are called "positive images", and form the IP set shown in Figure 4. The image database also includes an IN set of grayscale face images representing the other, non-frontal poses, referred to as "negative frames". These positive and negative images represent different faces, represented on the same scale in each of these images. For face-up classification, since not all facial details are needed, the images in the learning base have an image resolution of only 40 pixels ^* 40 pixels, but which is sufficient.

In step a1, the set IP of positive images is partitioned into subsets Ej, j being an index varying from 1 to M. These subsets E1 to EM are associated with the classification criterion of the frontal pose, and are homogeneous, that is to say that we find in each subset faces in frontal position with common visual criteria. For example, the set IP contains four subsets Ei to E ₄ such that:

- E ₁ contains images of faces with glasses, - E ₂ contains images of faces with whiskers,

- E3 contains images of faces without whiskers or glasses, smiling,

- E ₄ contains images of faces without whiskers or glasses, neutral. The subassemblies E 1 to EM preferably characterize most types of faces in frontal pose. This partition of the set of positive images IP is performed manually or automatically using grouping algorithms. These algorithms make it possible to group images that are very close to one another by using, for example, measurements of similarity or Euclidean distances between two image vectors. This partition in homogeneous subassemblies makes it possible in the following to obtain more specialized sub-classifications of faces in frontal position, in a powerful way.

The set IN of negative images contains images of faces of all types in non-frontal poses, for example which have a left profile, a straight profile, or are in a semi-profile pose, with or without glasses, with a beard, etc.

The second step a2 of the learning phase φ1 is the calculation of the classification parameters associated with each subset E _j of images.

These parameters are part of the learning data and are as follows:

The covariance matrix Σ _j of the subset E _j . If it contains for example N images of faces, N being an integer, these images are described by N vectors Vj, where i is an index varying from 1 to N. These vectors contain the 40 ^* 40 values of gray levels corresponding to 40 ^* 40 pixels of each image of the subset E _j . The covariance matrix Σ _j is then defined by:

the determinant det _j of the covariance matrix Σ _j ,

and a decision threshold δ _j , the obtaining of which is detailed below. The parameters Σ _j and det _j allow in the utilization phase φ2 to calculate the distance of an image I to be classified with the subset E _j , while the decision threshold δj makes it possible to determine, in view of this distance , if the image I could be classified in the subset E ₁ . In other words, the parameters of a subset make it possible to classify the image I according to the invention in a category that is finer than the frontal pose only. Indeed, if the subsets E _j were limited for example to a single subset Ei containing images of faces with glasses, this embodiment of the classification method according to the invention would allow to select images of faces in front pose corresponding to this visual criterion only. The utilization phase φ2 is therefore divided into two steps b1 and b2, represented in FIG.

The first step b1 is the calculation of the distance between the image I to be classified with each subset E _j , by using the following formula: Dist (q, EJ) = DMC ({Vi, V ₂ , ..., v _N } u {q}) - DMC ({Vi, V ₂ , ..., v _N }) where:

- Dist (q, E _j ) is the distance between an image vector q corresponding to the image I, and the subset of the vectors of the images of the subset Ej, - {vi, v ₂ , ... , VN} are the vectors Vi to VN images of the subset

- and DMC ({xi, ..., Xp}) is an operator that returns the determinant of the covariance matrix of the vectors Xi to Xp, assimilated to a number P of observations of a random vector variable x ,, i being an index varying from 1 to P.

The distance between the image I and the subset E _j is thus calculated by subtracting the determinant _j from the matrix Σ _j from the determinant of the covariance matrix of the set formed by the vectors Vi to VN and the vector q .

The choice of this new metric is motivated by the fact that, given the homogeneity of the learning subsets, the classification of an image I in a subset E _j amounts to evaluating the impact of the addition. from image I to subset E _j on the homogeneity of this subset E _j . With respect to the Mahalanobis distance used in the state of the art, which only applies to two vectors, the distance used by the method according to the invention has the advantage of considering a vector with respect to a set of vectors, which makes it possible to better take into account the distribution of the vectors of the same reference set. In other words, the classification method according to the invention is a statistical classification method which is based on the study of the distribution of the descriptor vectors of the images in space. The second step b2 is the comparison of the distance Dist (q, E _j ) previously calculated with the decision threshold δj of the subset E _j , as represented in FIG. 6: - If the distance Dist (q, E _j ) is below the decision threshold δ _j , the result of this subclassification in the subset E _j is 1, that is to say that the image I could be classified in this subset of images,

If the distance Dist (q, Ej) is greater than the decision threshold δj, the result of this sub-classification in the subset E _j is 0, that is to say that the image I is not not frontal or does not correspond to the visual criteria associated with the subset of images E _j .

The results of each of these sub-classifications for each of the subsets E ₁ to E _M are combined by a logical "OR" to give the final result Res of the classification of the image I in front pose:

a value 1 of the result Res indicates a frontal pose, and a value 0 of the result Res indicates a non-frontal pose. Indeed, the subsets Ei to EM being representative of most types of faces in frontal pose, if the image I represents a face in front pose, the result of at least one of these subclassifications will be worth 1 and the final result Res will also be worth 1.

We now detail different modes of obtaining the decision threshold δ _j associated with the subset of images E _j . Other methods of obtaining are possible, the value of the decision threshold δ _j to be able to obtain a good rate of classification of images in front pose.

A first way of obtaining the decision threshold δ _j consists in choosing the value of this threshold equal to the smallest distance calculated between each negative image contained in the set IN, and the subset E _j . The calculation of distances between the negative images and the subset E _j uses the same formula as in step b1.

A second way of obtaining the decision threshold δ _j , represented in FIG. 7, uses another subset E ' _j of positive images of visual criteria similar to those of the images of the subset Ej, and the set IN of negative images. We first calculate the distances of i between the images of the subset E ' _j and the subset E _j , as well as the distances di between the images of the set IN and the subset E _j , using the same formula as in step b1. Then, the threshold value δj is chosen so as to maximize the rate of good classification of the images of the subset E ' _j , and the rejection rate of the images of the set IN. If we give the same importance to each of these rates, then we choose the threshold value δ _j that maximizes the sum of these rates. Thus: - If the largest distance of images of the subset E'j is smaller than the smallest distance d of the images of the set IN, the decision threshold δ _j is fixed at the median value between these two distances.

If, conversely, the largest distance of images of the subset E ' _j is greater than the smallest distance d of the images of the set

IN, a threshold search maximizing the sum of the good classification rate of the images of the subset E ' _j and the rejection rate of the images of the set IN, is executed in the interval [d; d'] formed by these two distances. This interval is partitioned at regular intervals and for each value, the sum of these rates is evaluated. The decision threshold δ _j retained is the one that maximizes this sum.

It should be noted that in this embodiment of the invention images of faces are classified according to a single classification criterion, that of the frontal pose, but it is possible to adapt it to classify images of faces according to several criteria. classification. For example, once an image I was classified as non-frontal by the method described in this embodiment, the image is carried out in this image of the classification method according to the invention made in a similar manner but with a different classification criterion. Two classification criteria are used. To do this, it suffices to appropriately adapt the positive image and negative image learning sets to each new classification criterion.

Claims

1. A method for classifying an image (I) of an object belonging to a category of objects, said method comprising a preliminary step of obtaining (ai) a subset (E _j ) of images of objects of said class, said subset being associated with a classification criterion, and said method being characterized in that it further comprises: - a calculation step (b1) of distance between said image (I) and said sub-set - together (E _j ) according to the formula:

Dist (q, Ej) = DMC ({vi, V ₂ v _N } u {q}) - DMC ({vi, V ₂ v _N }) where:

Dist (q, E _j ) is a distance between an image vector q corresponding to said image and the vectors of the images of said subset Ej,

- {vi, V ₂ VN} are the vectors Vi to VN of the images of said subset E _j ,

- and DMC ({xi, ..., xp}) is an operator that returns the determinant of the covariance matrix of the vectors Xi to Xp, assimilated to a number P of observations of a random vector variable Xj, i being an index varying from 1 to P,

2. Classification method according to claim 1, characterized in that said decision threshold (δ _j ) is associated with said subset (E _j ) of images.

3. classification process according to claim 2, characterized in that said decision threshold (δ _j ) is equal to the smallest distance calculated between said subset (E ₁ ) and images of a set of negative images (IN) not corresponding to a classification criterion associated with said subset

4. Classification method according to claim 2, characterized in that said decision threshold (δ _j ) is chosen so as to maximize the sum:

the rate of good classification of images of another subset of images obeying the same classification criterion as the subset (Ej), and the rejection rate of images of a set of images; Negative (IN) images that do not meet the classification criteria.

A face recognition method using the object image classification method according to any one of claims 1 to 4.

6. Device comprising means adapted to implement one of the methods according to any one of claims 1 to 5.

A computer program comprising instructions for carrying out one of the methods according to any one of claims 1 to 5, when executed on a computer.