EP2100259A2 - Classification method for an object's image and corresponding device - Google Patents

Classification method for an object's image and corresponding device

Info

Publication number
EP2100259A2
EP2100259A2 EP07871938A EP07871938A EP2100259A2 EP 2100259 A2 EP2100259 A2 EP 2100259A2 EP 07871938 A EP07871938 A EP 07871938A EP 07871938 A EP07871938 A EP 07871938A EP 2100259 A2 EP2100259 A2 EP 2100259A2
Authority
EP
European Patent Office
Prior art keywords
images
subset
image
classification
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07871938A
Other languages
German (de)
French (fr)
Inventor
Sid Ahmed Berrani
Christophe Garcia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP2100259A2 publication Critical patent/EP2100259A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Definitions

  • the present invention is in the field of image processing.
  • the invention relates to a method of classifying images of objects of the same category, according to visual criteria related to this category.
  • the location of objects in the image or video is an essential first step before recognition.
  • This step aims to extract only the pieces of the image each containing an object to detect. These pieces of images are then passed to the recognition module for identification.
  • object recognition techniques are very sensitive to the quality of the extracted objects and in particular to their positions in the image pieces.
  • object recognition modules have generally learned to recognize an object in a well-defined position in an image. The performance of the object recognition systems therefore degrade significantly when the object is not in this determined position.
  • This classification is essential in the learning phase of object recognition systems, so that learning is effective, but also in the identification phase, so that recognition is most likely to succeed. If there are several specialized object recognition systems each in a specific position of the object to be recognized, the classifier may be used to refer, during the identification phase, the image pieces to the systems. recognition of appropriate images, depending on the positions of the objects in these image pieces.
  • the classification of faces in poses has given rise to many works following essentially three approaches.
  • the first approach is based on tracking algorithms to estimate the image of a video model of a face, and also to deduce the pose of this face.
  • Two examples of work following this first approach are described in the following articles:
  • the methods according to this second approach are limited by the performance of the automatic methods for detecting facial elements: these provide positions of facial elements which are not sufficiently precise to ensure a satisfactory classification rate, in particular for profile poses.
  • the third approach is based on statistical models of poses to be searched, constructed from a principal component analysis from thumbnail examples, or on binary masks to characterize a particular pose. Examples of work using principal component analysis are described in the following articles:
  • a binary mask is a reference mask that is applied to a grayscale face image, which is transformed into a binary image in black and white by thresholding from this mask, then a correlation is performed to determine the pose of the face.
  • the methods based on binary masks therefore use the more or less dark areas and shadows of the faces, which makes their effectiveness very sensitive to lighting: indeed a side lighting for example distorts the shadows that one expects to find on a face illuminated from the front in a grayscale image.
  • methods using a principal component analysis are linear methods which makes them very resistant to light variations. More generally, the techniques according to this third approach are very sensitive to significant variations in the faces, such as the presence of blackout elements such as glasses, a beard or a mustache, which significantly reduce their rates of good classification.
  • the distances used in these systems are, for example, Euclidean distances, or the so-called Mahalanobis distance. These distances are applicable between two image vectors only and are associated with other classification methods such as nearest neighbor classification. The comparisons made for the classification are thus image-to-image comparisons. These systems are used to group images very close to each other. Unlike the invention, they do not take into account all the characteristics of the vectors of a class and their dispersion.
  • the present invention aims to solve the disadvantages of the prior art by providing a method and a classification device of images, which use a new applicable distance between an image descriptor vector and a set of image descriptor vectors.
  • the invention proposes a method of classifying an object image belonging to a category of objects, said method comprising a preliminary step of obtaining a subset of object images of said object category.
  • category said subset being associated with a classification criterion
  • said method being characterized in that it further comprises:
  • Dist (q, E j ) DMCKv 1 , V 2 , ..., v N ⁇ u ⁇ q ⁇ - DMCaV 1 , V 2 , ..., v N ⁇ )
  • - Dist (q, E j ) is a distance between an image vector q corresponding to said image and the vectors of the images of said subset E j , - ⁇ vi, v 2 , ..., VN ⁇ are the vectors V 1 to VN images of said subset Ej,
  • DMC (Ix 1 xp)) is an operator which returns the determinant of the covariance matrix of the vectors X 1 to Xp, assimilated to a number P of observations of a random vector variable Xi, i being a variant index of 1 to P,
  • an image classification method applicable to still images is obtained which is fast, independent of the efficiency of a system used upstream, for example for the detection of elements, and which is robust relative to to important variations of the images between them.
  • it allows to classify different types of faces in frontal, semi-frontal and profile poses in a robust way with respect to the light variations and the presence occulting elements, both during the learning phase and during the identification phase.
  • said decision threshold is associated with said subset of images.
  • its distance with respect to several subsets of images corresponding to this criterion is calculated and compared with a decision threshold. Adapting the decision threshold to each subset makes it possible to take into account the heterogeneity of the subsets that obey the same classification criterion, so as to optimize the rate of good classification of the method according to the invention. .
  • said decision threshold is equal to the smallest distance calculated between said subset and the images of a set of negative images that do not correspond to a classification criterion associated with said subset.
  • This choice of decision threshold makes it possible to obtain a classification that is coherent with respect to the learning images, divided into subsets of positive images, corresponding to a classification criterion, and a set of negative images, which obey not to this criterion.
  • said decision threshold is chosen so as to maximize the sum of: the rate of good image classification of another subset of images obeying the same classification criterion as said subset, and the image rejection rate of a set of negative images not corresponding to said classification criterion.
  • This choice of decision threshold makes it possible to improve the rate of good classification of the process according to the invention.
  • the invention also relates to a face recognition method using the object image classification method according to the invention.
  • the invention also relates to a device implementing the classification method according to the invention and the method of face recognition using this method.
  • the device and the method of face recognition have advantages similar to those of the classification method according to the invention.
  • the invention also relates to a computer program comprising instructions for implementing the classification method according to the invention or the face recognition method using it when it is executed on a computer.
  • FIG. 1 represents different phases of the classification method according to the invention
  • FIG. 2 represents a device implementing the classification method according to the invention
  • FIG. 3 represents different steps of a learning phase of the classification method according to the invention
  • FIG. 4 represents the contents of a learning database
  • FIG. 5 represents different stages of a phase of use of the classification method according to the invention
  • FIG. 6 represents more precisely the obtaining of a classification result by the method according to the invention during this phase of use
  • FIG. 7 represents a mode of obtaining decision threshold associated with a sub-phase. set of learning images.
  • the method according to the invention is applied to the classification of faces in poses, and more precisely it is used to determine if a face in an image is in frontal pose or not.
  • the object image classification method according to the invention can be used to classify any other type of object according to various classification criteria, for example to classify images of logos.
  • the use of the classification method according to the invention which is carried out in a utilization phase ⁇ 2 shown in FIG. 1, first requires the execution of a learning phase ⁇ 1, detailed below, and which is not repeated after each use of the method according to the invention.
  • the classification method according to the invention is typically implemented in a software manner in an ORD computer represented in FIG. 2.
  • the learning phase ⁇ 1 is for example implemented using a learning module MA, and the phase of use ⁇ 2 is implemented in a classification module MC, which receives an image to classify I as input, and returns a classification result Res.
  • the learning phase ⁇ 1 makes it possible to fill the database of BDD learning to which are connected the learning module MA and the classification module MC.
  • This learning phase ⁇ 1 comprises two steps ai and a2, represented in FIG. 3, and whose objective is to provide training data necessary for the use phase ⁇ 2.
  • the first step ai is obtaining subsets of learning images. These learning images are actually face images corresponding to face-bounding boxes extracted after face detection in larger images.
  • Step a1 requires a training image database, which comprises grayscale images representative of the pose to be learned, that is to say images of faces in the frontal position. These images are called “positive images”, and form the IP set shown in Figure 4.
  • the image database also includes an IN set of grayscale face images representing the other, non-frontal poses, referred to as "negative frames”. These positive and negative images represent different faces, represented on the same scale in each of these images. For face-up classification, since not all facial details are needed, the images in the learning base have an image resolution of only 40 pixels * 40 pixels, but which is sufficient.
  • step a1 the set IP of positive images is partitioned into subsets Ej, j being an index varying from 1 to M.
  • These subsets E1 to EM are associated with the classification criterion of the frontal pose, and are homogeneous, that is to say that we find in each subset faces in frontal position with common visual criteria.
  • the set IP contains four subsets Ei to E 4 such that:
  • - E 1 contains images of faces with glasses
  • - E 2 contains images of faces with whiskers
  • the subassemblies E 1 to EM preferably characterize most types of faces in frontal pose.
  • This partition of the set of positive images IP is performed manually or automatically using grouping algorithms. These algorithms make it possible to group images that are very close to one another by using, for example, measurements of similarity or Euclidean distances between two image vectors.
  • This partition in homogeneous subassemblies makes it possible in the following to obtain more specialized sub-classifications of faces in frontal position, in a powerful way.
  • the set IN of negative images contains images of faces of all types in non-frontal poses, for example which have a left profile, a straight profile, or are in a semi-profile pose, with or without glasses, with a beard, etc.
  • the second step a2 of the learning phase ⁇ 1 is the calculation of the classification parameters associated with each subset E j of images.
  • the covariance matrix ⁇ j of the subset E j is then defined by:
  • the parameters ⁇ j and det j allow in the utilization phase ⁇ 2 to calculate the distance of an image I to be classified with the subset E j , while the decision threshold ⁇ j makes it possible to determine, in view of this distance , if the image I could be classified in the subset E 1 .
  • the parameters of a subset make it possible to classify the image I according to the invention in a category that is finer than the frontal pose only.
  • this embodiment of the classification method according to the invention would allow to select images of faces in front pose corresponding to this visual criterion only.
  • the utilization phase ⁇ 2 is therefore divided into two steps b1 and b2, represented in FIG.
  • - Dist (q, E j ) is the distance between an image vector q corresponding to the image I, and the subset of the vectors of the images of the subset Ej, - ⁇ vi, v 2 , ... , VN ⁇ are the vectors Vi to VN images of the subset
  • DMC ( ⁇ xi, ..., Xp ⁇ ) is an operator that returns the determinant of the covariance matrix of the vectors Xi to Xp, assimilated to a number P of observations of a random vector variable x ,, i being an index varying from 1 to P.
  • the distance between the image I and the subset E j is thus calculated by subtracting the determinant j from the matrix ⁇ j from the determinant of the covariance matrix of the set formed by the vectors Vi to VN and the vector q .
  • the classification method according to the invention is a statistical classification method which is based on the study of the distribution of the descriptor vectors of the images in space.
  • the second step b2 is the comparison of the distance Dist (q, E j ) previously calculated with the decision threshold ⁇ j of the subset E j , as represented in FIG. 6: - If the distance Dist (q, E j ) is below the decision threshold ⁇ j , the result of this subclassification in the subset E j is 1, that is to say that the image I could be classified in this subset of images,
  • the result of this sub-classification in the subset E j is 0, that is to say that the image I is not not frontal or does not correspond to the visual criteria associated with the subset of images E j .
  • a value 1 of the result Res indicates a frontal pose
  • a value 0 of the result Res indicates a non-frontal pose.
  • the subsets Ei to EM being representative of most types of faces in frontal pose, if the image I represents a face in front pose, the result of at least one of these subclassifications will be worth 1 and the final result Res will also be worth 1.
  • a first way of obtaining the decision threshold ⁇ j consists in choosing the value of this threshold equal to the smallest distance calculated between each negative image contained in the set IN, and the subset E j .
  • the calculation of distances between the negative images and the subset E j uses the same formula as in step b1.
  • a second way of obtaining the decision threshold ⁇ j uses another subset E ' j of positive images of visual criteria similar to those of the images of the subset Ej, and the set IN of negative images.
  • the threshold value ⁇ j is chosen so as to maximize the rate of good classification of the images of the subset E ' j , and the rejection rate of the images of the set IN.
  • the threshold value ⁇ j that maximizes the sum of these rates.
  • the decision threshold ⁇ j retained is the one that maximizes this sum.
  • images of faces are classified according to a single classification criterion, that of the frontal pose, but it is possible to adapt it to classify images of faces according to several criteria. classification. For example, once an image I was classified as non-frontal by the method described in this embodiment, the image is carried out in this image of the classification method according to the invention made in a similar manner but with a different classification criterion. Two classification criteria are used. To do this, it suffices to appropriately adapt the positive image and negative image learning sets to each new classification criterion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a classification method of an object's image belonging to a category of objects, said method comprising a prior step consisting of obtaining a subset (Ej) of object's images of said category, said subset being associated to a classification criterion, and said process being characterized in that it also comprises: - a calculation step (b1) of the distance between said image and said subset (Ej) according to the formula: Dist(q, Ej) = DMC({v1, V2,..., vN}U{q}) - DMC({v1, V2,..., vN}) in which: - Dist(q, Ej) is the distance between an image vector q corresponding to said image and the vector of the images of the subset Ej, - (V1, v2,..., vN} are the vectors V1 to vN of the images of said subset Ej, - and DMC(x1,...,xp}) is an operator which gives the determinant of the covariance matrix of the vectors X1 to Xp, assimilated to P observations of a random vector variable Xi, i being an index varying from 1 to P, - and a comparison step (b2) of the thus calculated distance having a decision threshold (δj).

Description

Procédé de classification d'une i mage d'objet et dispositif correspondant Method of classifying an object image and corresponding device
La présente invention se situe dans le domaine du traitement d'images.The present invention is in the field of image processing.
Plus précisément, l'invention concerne un procédé de classification d'images d'objets d'une même catégorie, suivant des critères visuels liés à cette catégorie.More specifically, the invention relates to a method of classifying images of objects of the same category, according to visual criteria related to this category.
Dans les systèmes automatiques de reconnaissance d'objets dans une image ou dans une vidéo, la localisation des objets dans l'image ou la vidéo constitue une première étape indispensable avant la reconnaissance. Il s'agit de la détection d'objet. Cette étape a pour objectif d'extraire uniquement les morceaux de l'image contenant chacun un objet à détecter. Ces morceaux d'images sont ensuite passés au module de reconnaissance pour identification.In automatic object recognition systems in an image or a video, the location of objects in the image or video is an essential first step before recognition. This is the object detection. This step aims to extract only the pieces of the image each containing an object to detect. These pieces of images are then passed to the recognition module for identification.
Ces techniques de reconnaissance d'objets sont cependant très sensibles à la qualité des objets extraits et en particulier à leurs positions dans les morceaux d'image. En effet les modules de reconnaissance d'objets ont en général appris à reconnaître un objet dans une position bien déterminée dans une image. Les performances des systèmes de reconnaissance d'objets se dégradent donc significativement lorsque l'objet n'est pas dans cette position déterminée. Il est donc essentiel de disposer d'un classificateur d'images qui permette de classer les morceaux d'image suivant la position de l'objet extrait dans ces morceaux, afin de ne présenter au module de reconnaissance que les morceaux d'image dans lesquels l'objet extrait est dans une position adaptée pour sa reconnaissance. Cette classification est indispensable dans la phase d'apprentissage des systèmes de reconnaissance d'objets, pour que l'apprentissage soit efficace, mais aussi dans la phase d'identification, pour que la reconnaissance ait le plus de chances d'aboutir. Si l'on dispose de plusieurs systèmes de reconnaissance d'objets spécialisés chacun dans une position déterminée de l'objet à reconnaître, le classificateur permet éventuellement d'aiguiller, lors de la phase d'identification, les morceaux d'image vers les systèmes de reconnaissance d'images adéquats, en fonction des positions des objets dans ces morceaux d'image.These object recognition techniques, however, are very sensitive to the quality of the extracted objects and in particular to their positions in the image pieces. Indeed, object recognition modules have generally learned to recognize an object in a well-defined position in an image. The performance of the object recognition systems therefore degrade significantly when the object is not in this determined position. It is therefore essential to have an image classifier that can classify the image pieces according to the position of the extracted object in these pieces, in order to present to the recognition module only the image pieces in which the extracted object is in a position adapted for its recognition. This classification is essential in the learning phase of object recognition systems, so that learning is effective, but also in the identification phase, so that recognition is most likely to succeed. If there are several specialized object recognition systems each in a specific position of the object to be recognized, the classifier may be used to refer, during the identification phase, the image pieces to the systems. recognition of appropriate images, depending on the positions of the objects in these image pieces.
Ces systèmes sont surtout appliqués à la reconnaissance de visages. Actuellement les systèmes de reconnaissance de visages ne fonctionnent de manière optimale que sur des images de visages en position frontale. C'est pourquoi la classification d'images de visages suivant leur pose, frontale, semi- frontale ou de profil, est un enjeu important dans le domaine de l'analyse faciale.These systems are mostly applied to face recognition. Currently face recognition systems work optimally only on face images in the frontal position. This is why the classification of face images according to their pose, frontal, semi-frontal or profile, is an important issue in the field of facial analysis.
La classification de visages en poses a ainsi donné lieu à de nombreux travaux suivant essentiellement trois approches. La première approche se base sur des algorithmes de suivi pour estimer dans les images d'une vidéo le modèle d'un visage, et pour déduire également la pose de ce visage. Deux exemples de travaux suivant cette première approche sont décrits dans les articles suivants:The classification of faces in poses has given rise to many works following essentially three approaches. The first approach is based on tracking algorithms to estimate the image of a video model of a face, and also to deduce the pose of this face. Two examples of work following this first approach are described in the following articles:
- "Face Tracking and Pose Estimation Using Affine Motion Parameters", de P. Yao et G. Evans, publié en 2001 à l'occasion de la douzième conférence SCIA, d'après l'anglais "Scandinavian Conférence on Image Analysis",- "Face Tracking and Pose Estimation Using Affine Motion Parameters", by P. Yao and G. Evans, published in 2001 on the occasion of the twelfth SCIA conference, according to the English "Scandinavian Conference on Image Analysis",
- et " Face pose estimation System by combining hybrid ICA-SVM learning and re-registration" de K. Seo, I. Cohen, S. You et U. Neumann, publié en janvier 2004 à l'occasion d'une conférence ACCV, d'après l'anglais " Asian Conférence on Computer Vision". Les techniques selon cette approche utilisent une information temporelle pour estimer le mouvement d'un visage d'une image à une autre, et présentent donc l'inconvénient d'être limitées aux vidéos, et de ne pas être applicables à la classification de visages extraits à partir d'images fixes. La deuxième approche utilise les positions des éléments faciaux, tels que les yeux, le nez et la bouche, et des règles de biométrie du visage, pour déduire la pose d'un visage dans une image. Un exemple de classification de visages en poses utilisant la détection d'éléments faciaux est donné dans l'article "Pose classification of human faces by weighting mask function approach", de C. Lin et K. -C. Fan, publié dans le numéro 24 de la revue "Pattern Récognition Letters".- and "Face pose estimation System by combining hybrid ICA-SVM learning and re-registration" by K. Seo, I. Cohen, S. You and U. Neumann, published in January 2004 at the time of a conference ACCV, according to English "Asian Conference on Computer Vision". The techniques according to this approach use time information to estimate the movement of a face from one image to another, and therefore have the disadvantage of being limited to videos, and not to be applicable to the classification of extracted faces. from still images. The second approach uses the positions of the facial elements, such as the eyes, the nose and the mouth, and biometrics rules of the face, to deduce the pose of a face in an image. An example of classification of faces in poses using the detection of facial elements is given in the article "Pose classification of human faces by weighting mask function approach", of C. Lin and K. -C. Fan, published in issue 24 of the journal "Pattern Recognition Letters".
Les méthodes selon cette deuxième approche sont limitées par les performances des méthodes automatiques de détection d'éléments faciaux: celles-ci fournissent des positions d'éléments faciaux qui ne sont pas suffisamment précises pour assurer un taux de bonne classification satisfaisant, en particulier pour des poses de profil.The methods according to this second approach are limited by the performance of the automatic methods for detecting facial elements: these provide positions of facial elements which are not sufficiently precise to ensure a satisfactory classification rate, in particular for profile poses.
La troisième approche se base sur des modèles statistiques des poses à rechercher, construits à partir d'une analyse en composantes principales à partir d'exemples d'imagettes, ou encore sur des masques binaires pour caractériser une pose particulière. Des exemples de travaux utilisant une analyse en composantes principales sont décrits dans les articles suivants:The third approach is based on statistical models of poses to be searched, constructed from a principal component analysis from thumbnail examples, or on binary masks to characterize a particular pose. Examples of work using principal component analysis are described in the following articles:
- "View-Based and Modular Eigenspaces for Face Récognition", de Alex Pentland, Baback Moghaddam et Thad Starner, publié en 1994 à l'occasion de la treizième conférence "Institute of Electrical and- "View-Based and Modular Eigenspaces for Face Recognition", by Alex Pentland, Baback Moghaddam and Thad Starner, published in 1994 on the occasion of the thirteenth "Institute of Electrical and
Electronic Engineer (IEEE) Conférence on Computer Vision and Pattern Récognition ",Electronic Engineer (IEEE) Conference on Computer Vision and Pattern Recognition ",
- et "Eigenfaces for récognition", de M. Turk et A. Pentland, publié dans le troisième volume du premier numéro de la revue "Journal of Cognitive Neuroscience".- and "Eigenfaces for Recognition", by M. Turk and A. Pentland, published in the third volume of the first issue of the journal "Journal of Cognitive Neuroscience".
Il est de plus à noter que dans l'article de C. Lin et K. -C. Fan cité précédemment, on utilise des masques binaires pour classifier des visages en poses, combinant ainsi détection d'éléments faciaux et utilisation de modèles caractéristiques. Un masque binaire est un masque de référence que l'on applique à une image de visage en niveaux de gris, laquelle est transformée en image binaire en noir et blanc par seuillage à partir de ce masque, puis une corrélation est effectuée pour déterminer la pose du visage. Les méthodes à base de masques binaires utilisent donc les zones plus ou moins foncées et les zones d'ombre des visages, ce qui rend leur efficacité très sensible à l'éclairage: en effet un éclairage de côté par exemple fausse les zones d'ombre que l'on s'attend à trouver sur un visage éclairé de face dans une image en niveaux de gris. De même les méthodes utilisant une analyse en composantes principales sont des méthodes linéaires ce qui les rend très peu robustes aux variations lumineuses. Plus généralement les techniques selon cette troisième approche sont très sensibles aux variations importantes des visages, telles que la présence d'éléments occultants comme des lunettes, de la barbe ou de la moustache, qui diminuent significativement leurs taux de bonne classification.It is further noted that in the article by C. Lin and K. -C. Fan cited above, we use binary masks to classify faces in poses, thus combining detection of facial elements and use of characteristic models. A binary mask is a reference mask that is applied to a grayscale face image, which is transformed into a binary image in black and white by thresholding from this mask, then a correlation is performed to determine the pose of the face. The methods based on binary masks therefore use the more or less dark areas and shadows of the faces, which makes their effectiveness very sensitive to lighting: indeed a side lighting for example distorts the shadows that one expects to find on a face illuminated from the front in a grayscale image. Similarly methods using a principal component analysis are linear methods which makes them very resistant to light variations. More generally, the techniques according to this third approach are very sensitive to significant variations in the faces, such as the presence of blackout elements such as glasses, a beard or a mustache, which significantly reduce their rates of good classification.
D'autres systèmes de classification d'images, comme par exemple le système de classification décrit dans l'article "Application d'un processus itératif de classification dirigée à un site urbain et périurbain algérien", de N. Ouarab et Y. Smara, publié en 1997 dans le livre "Télédétection des milieux urbains et périurbains" par l'Agence Universitaire de la Francophonie, utilisent la notion de distance entre images.Other image classification systems, such as the classification system described in the article "Application of an iterative classification process directed to an urban and peri-urban Algerian site", by N. Ouarab and Y. Smara, published in 1997 in the book "Remote sensing of urban and peri-urban environments" by the Agence Universitaire de la Francophonie, use the notion of distance between images.
Les distances utilisées dans ces systèmes sont par exemple des distances euclidiennes, ou la distance dite de Mahalanobis. Ces distances sont applicables entre deux vecteurs d'images seulement et sont associées à d'autres méthodes de classification comme la classification au plus proche voisin. Les comparaisons effectuées pour la classification sont ainsi des comparaisons image à image. Ces systèmes sont utilisés pour faire du regroupement d'images très proches l'une de l'autre. Contrairement à l'invention, ils ne prennent cependant pas en compte l'ensemble des caractéristiques des vecteurs d'une classe et de leur dispersion.The distances used in these systems are, for example, Euclidean distances, or the so-called Mahalanobis distance. These distances are applicable between two image vectors only and are associated with other classification methods such as nearest neighbor classification. The comparisons made for the classification are thus image-to-image comparisons. These systems are used to group images very close to each other. Unlike the invention, they do not take into account all the characteristics of the vectors of a class and their dispersion.
La présente invention a pour but de résoudre les inconvénients de la technique antérieure en fournissant un procédé et un dispositif de classification d'images, qui utilisent une nouvelle distance applicable entre un vecteur descripteur d'image et un ensemble de vecteurs descripteurs d'images.The present invention aims to solve the disadvantages of the prior art by providing a method and a classification device of images, which use a new applicable distance between an image descriptor vector and a set of image descriptor vectors.
A cette fin, l'invention propose un procédé de classification d'une image d'objet appartenant à une catégorie d'objets, ledit procédé comportant une étape préalable d'obtention d'un sous-ensemble d'images d'objets de ladite catégorie, ledit sous-ensemble étant associé à un critère de classification, et ledit procédé étant caractérisé en ce qu'il comporte en outre:To this end, the invention proposes a method of classifying an object image belonging to a category of objects, said method comprising a preliminary step of obtaining a subset of object images of said object category. category, said subset being associated with a classification criterion, and said method being characterized in that it further comprises:
- une étape de calcul (b1) de distance entre ladite image (I) et ledit sous- ensemble (Ej) selon la formule: Dist(q, Ej) = DMCKv1, V2, ..., vN}u{q}) - DMCaV1, V2, ..., vN}) où :a calculation step (b1) of distance between said image (I) and said subset (E j ) according to the formula: Dist (q, E j ) = DMCKv 1 , V 2 , ..., v N } u {q} - DMCaV 1 , V 2 , ..., v N }) where:
- Dist(q, Ej) est une distance entre un vecteur d'image q correspondant à ladite image et les vecteurs des images dudit sous-ensemble Ej, - {vi , v2, ..., VN} sont les vecteurs V1 à VN des images dudit sous- ensemble Ej,- Dist (q, E j ) is a distance between an image vector q corresponding to said image and the vectors of the images of said subset E j , - {vi, v 2 , ..., VN} are the vectors V 1 to VN images of said subset Ej,
- et DMC(Ix1 xp}) est un opérateur qui retourne le déterminant de la matrice de covariance des vecteurs X1 à Xp, assimilés à un nombre P d'observations d'une variable vectorielle aléatoire Xi, i étant un indice variant de 1 à P,- and DMC (Ix 1 xp)) is an operator which returns the determinant of the covariance matrix of the vectors X 1 to Xp, assimilated to a number P of observations of a random vector variable Xi, i being a variant index of 1 to P,
- et une étape de comparaison (b2) de la distance ainsi calculée avec un seuil de décision (δj).and a comparison step (b2) of the distance thus calculated with a decision threshold (δ j ).
Grâce à l'invention, on obtient un procédé de classification d'images applicable à des images fixes, rapide, indépendant de l'efficacité d'un système utilisé en amont par exemple pour la détection d'éléments, et qui est robuste par rapport à des variations importantes des images entre elles. Notamment lorsqu'il est utilisé pour la reconnaissance de visage, il permet de classifier différents types de visages en poses frontale, semi-frontale et de profil de manière robuste par rapport aux variations lumineuses et à la présence d'éléments occultants, aussi bien lors de la phase d'apprentissage que lors de la phase d'identification.Thanks to the invention, an image classification method applicable to still images is obtained which is fast, independent of the efficiency of a system used upstream, for example for the detection of elements, and which is robust relative to to important variations of the images between them. Especially when used for face recognition, it allows to classify different types of faces in frontal, semi-frontal and profile poses in a robust way with respect to the light variations and the presence occulting elements, both during the learning phase and during the identification phase.
Selon une caractéristique préférée, ledit seuil de décision est associé audit sous-ensemble d'images. Pour déterminer si une image à classer répond à un critère de classification, sa distance par rapport à plusieurs sous-ensembles d'images correspondant à ce critère est calculée et comparée à un seuil de décision. Le fait d'adapter le seuil de décision à chaque sous-ensemble permet de tenir compte de l'hétérogénéité des sous-ensembles qui obéissent à un même critère de classification, de manière à optimiser le taux de bonne classification du procédé selon l'invention.According to a preferred characteristic, said decision threshold is associated with said subset of images. To determine whether an image to be classified meets a classification criterion, its distance with respect to several subsets of images corresponding to this criterion is calculated and compared with a decision threshold. Adapting the decision threshold to each subset makes it possible to take into account the heterogeneity of the subsets that obey the same classification criterion, so as to optimize the rate of good classification of the method according to the invention. .
Selon une autre caractéristique préférée, ledit seuil de décision est égal à la plus petite distance calculée entre ledit sous-ensemble et les images d'un ensemble d'images négatives ne correspondant pas à un critère de classification associé audit sous-ensemble.According to another preferred characteristic, said decision threshold is equal to the smallest distance calculated between said subset and the images of a set of negative images that do not correspond to a classification criterion associated with said subset.
Ce choix de seuil de décision permet d'obtenir une classification cohérente par rapport aux images d'apprentissage, réparties en sous- ensembles d'images positives, correspondant à un critère de classification, et un ensemble d'images négatives, qui n'obéissent pas à ce critère. Selon une autre caractéristique préférée, ledit seuil de décision est choisi de manière à maximiser la somme: du taux de bonne classification d'images d'un autre sous-ensemble d'images obéissant à un même critère de classification que ledit sous- ensemble, et du taux de rejet d'images d'un ensemble d'images négatives ne correspondant pas audit critère de classification.This choice of decision threshold makes it possible to obtain a classification that is coherent with respect to the learning images, divided into subsets of positive images, corresponding to a classification criterion, and a set of negative images, which obey not to this criterion. According to another preferred characteristic, said decision threshold is chosen so as to maximize the sum of: the rate of good image classification of another subset of images obeying the same classification criterion as said subset, and the image rejection rate of a set of negative images not corresponding to said classification criterion.
Ce choix de seuil de décision permet d'améliorer le taux de bonne classification du procédé selon l'invention.This choice of decision threshold makes it possible to improve the rate of good classification of the process according to the invention.
L'invention concerne aussi un procédé de reconnaissance de visage utilisant le procédé de classification d'image d'objet selon l'invention. L'invention concerne également un dispositif mettant en œuvre le procédé de classification selon l'invention et le procédé de reconnaissance de visage utilisant ce procédé.The invention also relates to a face recognition method using the object image classification method according to the invention. The invention also relates to a device implementing the classification method according to the invention and the method of face recognition using this method.
Le dispositif et le procédé de reconnaissance de visage présentent des avantages analogues à ceux du procédé de classification selon l'invention.The device and the method of face recognition have advantages similar to those of the classification method according to the invention.
L'invention concerne encore un programme d'ordinateur comportant des instructions pour mettre en oeuvre le procédé de classification selon l'invention ou le procédé de reconnaissance de visage l'utilisant, lorsqu'il est exécuté sur un ordinateur.The invention also relates to a computer program comprising instructions for implementing the classification method according to the invention or the face recognition method using it when it is executed on a computer.
D'autres caractéristiques et avantages apparaîtront à la lecture d'un mode de réalisation préféré décrit en référence aux figures dans lesquelles:Other features and advantages will appear on reading a preferred embodiment described with reference to the figures in which:
- la figure 1 représente différentes phases du procédé de classification selon l'invention,FIG. 1 represents different phases of the classification method according to the invention,
- la figure 2 représente un dispositif mettant en œuvre le procédé de classification selon l'invention,FIG. 2 represents a device implementing the classification method according to the invention,
- la figure 3 représente différentes étapes d'une phase d'apprentissage du procédé de classification selon l'invention, - la figure 4 représente le contenu d'une base de données d'apprentissage,FIG. 3 represents different steps of a learning phase of the classification method according to the invention, FIG. 4 represents the contents of a learning database,
- la figure 5 représente différentes étapes d'une phase d'utilisation du procédé de classification selon l'invention,FIG. 5 represents different stages of a phase of use of the classification method according to the invention,
- la figure 6 représente plus précisément l'obtention d'un résultat de classification par le procédé selon l'invention lors de cette phase d'utilisation, - la figure 7 représente un mode d'obtention de seuil de décision associé à un sous-ensemble d'images d'apprentissage.FIG. 6 represents more precisely the obtaining of a classification result by the method according to the invention during this phase of use, FIG. 7 represents a mode of obtaining decision threshold associated with a sub-phase. set of learning images.
Selon un mode préféré de réalisation de l'invention, le procédé selon l'invention est appliqué à la classification de visages en poses, et plus précisément il est utilisé pour déterminer si un visage dans une image est en pose frontale ou non. Cependant le procédé de classification d'images d'objets selon l'invention est utilisable pour classer tout autre type d'objet selon divers critères de classification, comme par exemple pour classer des images de logos.According to a preferred embodiment of the invention, the method according to the invention is applied to the classification of faces in poses, and more precisely it is used to determine if a face in an image is in frontal pose or not. However, the object image classification method according to the invention can be used to classify any other type of object according to various classification criteria, for example to classify images of logos.
De plus l'utilisation du procédé de classification selon l'invention, qui s'effectue dans une phase d'utilisation φ2 représentée à la figure 1 , nécessite au préalable l'exécution d'une phase d'apprentissage φ1 , détaillée plus loin, et qui n'est pas répétée ensuite à chaque utilisation du procédé selon l'invention.Moreover, the use of the classification method according to the invention, which is carried out in a utilization phase φ2 shown in FIG. 1, first requires the execution of a learning phase φ1, detailed below, and which is not repeated after each use of the method according to the invention.
Le procédé de classification selon l'invention est typiquement implémenté de manière logicielle dans un ordinateur ORD représenté à la figure 2. La phase d'apprentissage φ1 est par exemple implémentée à l'aide d'un module d'apprentissage MA, et la phase d'utilisation φ2 est implémentée dans un module de classification MC, qui reçoit une image à classer I en entrée, et retourne un résultat de classification Res. La phase d'apprentissage φ1 permet de remplir la base de données d'apprentissage BDD à laquelle sont reliés le module d'apprentissage MA et le module de classification MC.The classification method according to the invention is typically implemented in a software manner in an ORD computer represented in FIG. 2. The learning phase φ1 is for example implemented using a learning module MA, and the phase of use φ2 is implemented in a classification module MC, which receives an image to classify I as input, and returns a classification result Res. The learning phase φ1 makes it possible to fill the database of BDD learning to which are connected the learning module MA and the classification module MC.
Cette phase d'apprentissage φ1 comporte deux étapes ai et a2, représentées à la figure 3, et qui ont pour objectif de fournir des données d'apprentissage nécessaires à la phase d'utilisation φ2.This learning phase φ1 comprises two steps ai and a2, represented in FIG. 3, and whose objective is to provide training data necessary for the use phase φ2.
La première étape ai est l'obtention de sous-ensembles d'images d'apprentissage. Ces images d'apprentissage sont en fait des images de visage correspondant à des boîtes englobantes de visages extraits après détection de visages dans des images de plus grand format. L'étape ai nécessite une base d'images d'apprentissage, qui comprend des images en niveaux de gris représentatives de la pose à apprendre, c'est-à-dire des images de visages en position frontale. Ces images sont appelées "images positives", et forment l'ensemble IP représenté à la figure 4. La base d'images d'apprentissage comprend également un ensemble IN d'images de visages en niveaux de gris représentant les autres poses, non frontales, appelées "images négatives". Ces images positives et négatives représentent différents visages, représentés à la même échelle dans chacune de ces images. Pour la classification en pose frontale, tous les détails du visage n'étant pas nécessaires, les images de la base d'apprentissage ont une résolution d'image de 40 pixels * 40 pixels seulement, mais qui s'avère suffisante.The first step ai is obtaining subsets of learning images. These learning images are actually face images corresponding to face-bounding boxes extracted after face detection in larger images. Step a1 requires a training image database, which comprises grayscale images representative of the pose to be learned, that is to say images of faces in the frontal position. These images are called "positive images", and form the IP set shown in Figure 4. The image database also includes an IN set of grayscale face images representing the other, non-frontal poses, referred to as "negative frames". These positive and negative images represent different faces, represented on the same scale in each of these images. For face-up classification, since not all facial details are needed, the images in the learning base have an image resolution of only 40 pixels * 40 pixels, but which is sufficient.
A l'étape ai , l'ensemble IP d'images positives est partitionné en sous- ensembles Ej, j étant un indice variant de 1 à M. Ces sous-ensembles Ei à EM sont associés au critère de classification de la pose frontale, et sont homogènes, c'est-à-dire qu'on retrouve dans chaque sous-ensemble des visages en position frontale avec des critères visuels communs. Par exemple l'ensemble IP contient quatre sous-ensembles Ei à E4 tels que:In step a1, the set IP of positive images is partitioned into subsets Ej, j being an index varying from 1 to M. These subsets E1 to EM are associated with the classification criterion of the frontal pose, and are homogeneous, that is to say that we find in each subset faces in frontal position with common visual criteria. For example, the set IP contains four subsets Ei to E 4 such that:
- E1 contient des images de visages avec lunettes, - E2 contient des images de visages avec moustaches,- E 1 contains images of faces with glasses, - E 2 contains images of faces with whiskers,
- E3 contient des images de visages sans moustaches ni lunettes, souriants,- E3 contains images of faces without whiskers or glasses, smiling,
- E4 contient des images de visages sans moustaches ni lunettes, neutres. Les sous-ensembles Ei à EM caractérisent de préférence la plupart des types de visages en pose frontale. Cette partition de l'ensemble IP d'images positives est effectuée manuellement ou automatiquement à l'aide d'algorithmes de regroupement. Ces algorithmes permettent de regrouper des images très proches l'une de l'autre en utilisant par exemple des mesures de similarité ou de distances euclidiennes entre deux vecteurs d'image. Cette partition en sous-ensembles homogènes permet dans la suite d'obtenir des sous-classifications plus spécialisées de visages en position frontale, de manière performante.- E 4 contains images of faces without whiskers or glasses, neutral. The subassemblies E 1 to EM preferably characterize most types of faces in frontal pose. This partition of the set of positive images IP is performed manually or automatically using grouping algorithms. These algorithms make it possible to group images that are very close to one another by using, for example, measurements of similarity or Euclidean distances between two image vectors. This partition in homogeneous subassemblies makes it possible in the following to obtain more specialized sub-classifications of faces in frontal position, in a powerful way.
L'ensemble IN d'images négatives contient quant à lui des images de visages de tous types dans des poses non-frontales, par exemple qui présentent un profil gauche, un profil droit, ou qui sont dans une pose semi- profil, avec ou sans lunettes, avec barbe, etc.The set IN of negative images contains images of faces of all types in non-frontal poses, for example which have a left profile, a straight profile, or are in a semi-profile pose, with or without glasses, with a beard, etc.
La seconde étape a2 de la phase d'apprentissage φ1 est le calcul des paramètres de classification associés à chaque sous-ensemble Ej d'images.The second step a2 of the learning phase φ1 is the calculation of the classification parameters associated with each subset E j of images.
Ces paramètres font partie des données d'apprentissage et sont les suivants:These parameters are part of the learning data and are as follows:
- La matrice de covariance ∑j du sous-ensemble Ej. Si celui-ci contient par exemple N images de visages, N étant un entier, ces images sont décrites par N vecteurs Vj, i étant un indice variant de 1 à N. Ces vecteurs contiennent les 40*40 valeurs de niveaux de gris correspondants aux 40*40 pixels de chaque image du sous-ensemble Ej. La matrice de covariance ∑j est alors définie par:The covariance matrix Σ j of the subset E j . If it contains for example N images of faces, N being an integer, these images are described by N vectors Vj, where i is an index varying from 1 to N. These vectors contain the 40 * 40 values of gray levels corresponding to 40 * 40 pixels of each image of the subset E j . The covariance matrix Σ j is then defined by:
- le déterminant detj de la matrice de covariance ∑j,the determinant det j of the covariance matrix Σ j ,
- et un seuil de décision δj, dont l'obtention est détaillée plus loin. Les paramètres ∑j et detj permettent dans la phase d'utilisation φ2 de calculer la distance d'une image I à classer avec le sous-ensemble Ej, tandis que le seuil de décision δj permet de déterminer, au vu de cette distance, si l'image I pourrait être classée dans le sous-ensemble E1. Autrement dit les paramètres d'un sous-ensemble permettent de classifier l'image I selon l'invention dans une catégorie plus fine que la pose frontale uniquement. En effet si les sous-ensembles Ej se limitaient par exemple à un seul sous- ensemble Ei contenant des images de visages avec lunettes, ce mode de réalisation du procédé de classification selon l'invention permettrait de sélectionner les images de visages en pose frontale correspondant à ce critère visuel uniquement. La phase d'utilisation φ2 se décompose donc en deux étapes b1 et b2, représentées à la figure 5.and a decision threshold δ j , the obtaining of which is detailed below. The parameters Σ j and det j allow in the utilization phase φ2 to calculate the distance of an image I to be classified with the subset E j , while the decision threshold δj makes it possible to determine, in view of this distance , if the image I could be classified in the subset E 1 . In other words, the parameters of a subset make it possible to classify the image I according to the invention in a category that is finer than the frontal pose only. Indeed, if the subsets E j were limited for example to a single subset Ei containing images of faces with glasses, this embodiment of the classification method according to the invention would allow to select images of faces in front pose corresponding to this visual criterion only. The utilization phase φ2 is therefore divided into two steps b1 and b2, represented in FIG.
La première étape b1 est le calcul de la distance entre l'image I à classer avec chaque sous-ensemble Ej, en utilisant la formule suivante: Dist(q, EJ) = DMC({Vi, V2, ..., vN}u{q}) - DMC({Vi, V2, ..., vN}) où :The first step b1 is the calculation of the distance between the image I to be classified with each subset E j , by using the following formula: Dist (q, EJ) = DMC ({Vi, V 2 , ..., v N } u {q}) - DMC ({Vi, V 2 , ..., v N }) where:
- Dist(q, Ej) est la distance entre un vecteur d'image q correspondant à l'image I, et le sous-ensemble des vecteurs des images du sous- ensemble Ej, - {vi, v2, ..., VN} sont les vecteurs Vi à VN des images du sous-ensemble- Dist (q, E j ) is the distance between an image vector q corresponding to the image I, and the subset of the vectors of the images of the subset Ej, - {vi, v 2 , ... , VN} are the vectors Vi to VN images of the subset
- et DMC({xi, ...,Xp}) est un opérateur qui retourne le déterminant de la matrice de covariance des vecteurs Xi à Xp, assimilés à un nombre P d'observations d'une variable vectorielle aléatoire x,, i étant un indice variant de 1 à P.- and DMC ({xi, ..., Xp}) is an operator that returns the determinant of the covariance matrix of the vectors Xi to Xp, assimilated to a number P of observations of a random vector variable x ,, i being an index varying from 1 to P.
La distance entre l'image I et le sous-ensemble Ej est donc calculée en soustrayant le déterminant detj de la matrice ∑j au déterminant de la matrice de covariance de l'ensemble formé pas les vecteurs Vi à VN et le vecteur q.The distance between the image I and the subset E j is thus calculated by subtracting the determinant j from the matrix Σ j from the determinant of the covariance matrix of the set formed by the vectors Vi to VN and the vector q .
Le choix de cette métrique nouvelle est motivé par le fait que, étant donné l'homogénéité des sous-ensembles d'apprentissage, la classification d'une image I dans un sous-ensemble Ej revient à évaluer l'impact de l'ajout de l'image I au sous-ensemble Ej sur l'homogénéité de ce sous-ensemble Ej. Par rapport à la distance de Mahalanobis utilisée dans l'état de l'art, qui s'applique uniquement à deux vecteurs, la distance utilisée par le procédé selon l'invention a l'avantage de considérer un vecteur par rapport à un ensemble de vecteurs, ce qui permet de mieux prendre en compte la distribution des vecteurs d'un même ensemble de référence. Autrement dit le procédé de classification selon l'invention est un procédé de classification statistique qui se base sur l'étude de la répartition des vecteurs descripteurs des images dans l'espace. La seconde étape b2 est la comparaison de la distance Dist(q, Ej) précédemment calculée avec le seuil de décision δj du sous-ensemble Ej, comme représenté à la figure 6: - Si la distance Dist(q, Ej) est inférieure au seuil de décision δj, le résultat de cette sous-classification dans le sous-ensemble Ej vaut 1 , c'est-à-dire que l'image I pourrait être classée dans ce sous-ensemble d'images,The choice of this new metric is motivated by the fact that, given the homogeneity of the learning subsets, the classification of an image I in a subset E j amounts to evaluating the impact of the addition. from image I to subset E j on the homogeneity of this subset E j . With respect to the Mahalanobis distance used in the state of the art, which only applies to two vectors, the distance used by the method according to the invention has the advantage of considering a vector with respect to a set of vectors, which makes it possible to better take into account the distribution of the vectors of the same reference set. In other words, the classification method according to the invention is a statistical classification method which is based on the study of the distribution of the descriptor vectors of the images in space. The second step b2 is the comparison of the distance Dist (q, E j ) previously calculated with the decision threshold δj of the subset E j , as represented in FIG. 6: - If the distance Dist (q, E j ) is below the decision threshold δ j , the result of this subclassification in the subset E j is 1, that is to say that the image I could be classified in this subset of images,
- Si la distance Dist(q, Ej) est supérieure au seuil de décision δj, le résultat de cette sous-classification dans le sous-ensemble Ej vaut 0, c'est-à-dire que l'image I n'est pas frontale ou ne correspond pas aux critères visuels associés au sous-ensemble d'images Ej.If the distance Dist (q, Ej) is greater than the decision threshold δj, the result of this sub-classification in the subset E j is 0, that is to say that the image I is not not frontal or does not correspond to the visual criteria associated with the subset of images E j .
Les résultats de chacune de ces sous-classifications pour chacun des sous-ensembles E1 à EM sont combinés par un "OU" logique pour donner le résultat final Res de la classification de l'image I en pose frontale:The results of each of these sub-classifications for each of the subsets E 1 to E M are combined by a logical "OR" to give the final result Res of the classification of the image I in front pose:
- une valeur 1 du résultat Res indique une pose frontale, et une valeur 0 du résultat Res indique une pose non-frontale. En effet les sous-ensembles Ei à EM étant représentatifs de la plupart des types de visages en pose frontale, si l'image I représente un visage en pose frontale, le résultat d'au moins une de ces sous-classifications vaudra 1 et le résultat final Res vaudra également 1.a value 1 of the result Res indicates a frontal pose, and a value 0 of the result Res indicates a non-frontal pose. Indeed, the subsets Ei to EM being representative of most types of faces in frontal pose, if the image I represents a face in front pose, the result of at least one of these subclassifications will be worth 1 and the final result Res will also be worth 1.
On détaille maintenant différents modes d'obtentions du seuil de décision δj associé au sous-ensemble d'images Ej. D'autres modes d'obtentions sont possibles, la valeur du seuil de décision δj devant permettre d'obtenir un bon taux de classification d'images en pose frontale.We now detail different modes of obtaining the decision threshold δ j associated with the subset of images E j . Other methods of obtaining are possible, the value of the decision threshold δ j to be able to obtain a good rate of classification of images in front pose.
Un premier mode d'obtention du seuil de décision δj consiste à choisir la valeur de ce seuil égale à la plus petite distance calculée entre chaque image négative contenue dans l'ensemble IN, et le sous-ensemble Ej. Le calcul des distances entre les images négatives et le sous-ensemble Ej utilise la même formule qu'à l'étape b1 .A first way of obtaining the decision threshold δ j consists in choosing the value of this threshold equal to the smallest distance calculated between each negative image contained in the set IN, and the subset E j . The calculation of distances between the negative images and the subset E j uses the same formula as in step b1.
Un second mode d'obtention du seuil de décision δj, représenté à la figure 7, utilise un autre sous-ensemble E'j d'images positives de critères visuels similaires à ceux des images du sous-ensemble Ej, et l'ensemble IN d'images négatives. On calcule tout d'abord les distances d'i entre les images du sous-ensemble E'j et le sous-ensemble Ej, ainsi que les distances di entre les images de l'ensemble IN et le sous-ensemble Ej, en utilisant la même formule qu'à l'étape b1. Puis on choisit la valeur de seuil δj de façon à maximiser le taux de bonne classification des images du sous-ensemble E'j, et le taux de rejet des images de l'ensemble IN. Si l'on donne la même importance à chacun de ces taux, alors on choisit la valeur de seuil δj qui maximise la somme de ces taux. Ainsi: - Si la plus grande distance d' des images du sous-ensemble E'j est inférieure à la plus petite distance d des images de l'ensemble IN, le seuil de décision δj est fixé à la valeur médiane entre ces deux distances.A second way of obtaining the decision threshold δ j , represented in FIG. 7, uses another subset E ' j of positive images of visual criteria similar to those of the images of the subset Ej, and the set IN of negative images. We first calculate the distances of i between the images of the subset E ' j and the subset E j , as well as the distances di between the images of the set IN and the subset E j , using the same formula as in step b1. Then, the threshold value δj is chosen so as to maximize the rate of good classification of the images of the subset E ' j , and the rejection rate of the images of the set IN. If we give the same importance to each of these rates, then we choose the threshold value δ j that maximizes the sum of these rates. Thus: - If the largest distance of images of the subset E'j is smaller than the smallest distance d of the images of the set IN, the decision threshold δ j is fixed at the median value between these two distances.
- Si à l'inverse, la plus grande distance d' des images du sous-ensemble E'j est supérieure à la plus petite distance d des images de l'ensembleIf, conversely, the largest distance of images of the subset E ' j is greater than the smallest distance d of the images of the set
IN, une recherche de seuil maximisant la somme du taux de bonne classification des images du sous-ensemble E'j et du taux de rejet des images de l'ensemble IN, est exécutée dans l'intervalle [d;d'] formé par ces deux distances. Cet intervalle est partitionné à des pas réguliers et pour chaque valeur, la somme de ces taux est évaluée. Le seuil de décision δj retenu est celui qui maximise cette somme.IN, a threshold search maximizing the sum of the good classification rate of the images of the subset E ' j and the rejection rate of the images of the set IN, is executed in the interval [d; d'] formed by these two distances. This interval is partitioned at regular intervals and for each value, the sum of these rates is evaluated. The decision threshold δ j retained is the one that maximizes this sum.
Il est à noter que dans ce mode de réalisation de l'invention on classe des images de visages selon un seul critère de classification, celui de la pose frontale, mais il est possible de l'adapter pour classer des images de visages suivant plusieurs critères de classification. Par exemple, une fois qu'une image I a été classée comme non-frontale par le procédé décrit dans ce mode de réalisation, on exécute sur cette image le procédé de classification selon l'invention réalisé de manière similaire mais avec un critère de classification différent. On utilise ainsi deux critères de classification. Il suffit pour cela d'adapter de manière appropriée les ensembles d'apprentissage d'images positives et d'images négatives à chaque nouveau critère de classification. It should be noted that in this embodiment of the invention images of faces are classified according to a single classification criterion, that of the frontal pose, but it is possible to adapt it to classify images of faces according to several criteria. classification. For example, once an image I was classified as non-frontal by the method described in this embodiment, the image is carried out in this image of the classification method according to the invention made in a similar manner but with a different classification criterion. Two classification criteria are used. To do this, it suffices to appropriately adapt the positive image and negative image learning sets to each new classification criterion.

Claims

REVENDICATIONS
1. Procédé de classification d'une image (I) d'objet appartenant à une catégorie d'objets, ledit procédé comportant une étape préalable d'obtention (ai ) d'un sous-ensemble (Ej) d'images d'objets de ladite catégorie, ledit sous- ensemble étant associé à un critère de classification, et ledit procédé étant caractérisé en ce qu'il comporte en outre: - une étape de calcul (b1 ) de distance entre ladite image (I) et ledit sous- ensemble (Ej) selon la formule:1. A method for classifying an image (I) of an object belonging to a category of objects, said method comprising a preliminary step of obtaining (ai) a subset (E j ) of images of objects of said class, said subset being associated with a classification criterion, and said method being characterized in that it further comprises: - a calculation step (b1) of distance between said image (I) and said sub-set - together (E j ) according to the formula:
Dist(q, Ej) = DMC({vi, V2 vN}u{q}) - DMC({vi, V2 vN}) où :Dist (q, Ej) = DMC ({vi, V 2 v N } u {q}) - DMC ({vi, V 2 v N }) where:
- Dist(q, Ej) est une distance entre un vecteur d'image q correspondant à ladite image et les vecteurs des images dudit sous-ensemble Ej,Dist (q, E j ) is a distance between an image vector q corresponding to said image and the vectors of the images of said subset Ej,
- {vi , V2 VN} sont les vecteurs Vi à VN des images dudit sous- ensemble Ej,- {vi, V 2 VN} are the vectors Vi to VN of the images of said subset E j ,
- et DMC({xi,...,xp}) est un opérateur qui retourne le déterminant de la matrice de covariance des vecteurs Xi à Xp, assimilés à un nombre P d'observations d'une variable vectorielle aléatoire Xj, i étant un indice variant de 1 à P,- and DMC ({xi, ..., xp}) is an operator that returns the determinant of the covariance matrix of the vectors Xi to Xp, assimilated to a number P of observations of a random vector variable Xj, i being an index varying from 1 to P,
- et une étape de comparaison (b2) de la distance ainsi calculée avec un seuil de décision (δj).and a comparison step (b2) of the distance thus calculated with a decision threshold (δ j ).
2. Procédé de classification selon la revendication 1 , caractérisé en ce que ledit seuil de décision (δj) est associé audit sous-ensemble (Ej) d'images.2. Classification method according to claim 1, characterized in that said decision threshold (δ j ) is associated with said subset (E j ) of images.
3. Procédé de classification selon la revendication 2, caractérisé en ce que ledit seuil de décision (δj) est égal à la plus petite distance calculée entre ledit sous-ensemble (E1) et les images d'un ensemble d'images négatives (IN) ne correspondant pas à un critère de classification associé audit sous-ensemble3. classification process according to claim 2, characterized in that said decision threshold (δ j ) is equal to the smallest distance calculated between said subset (E 1 ) and images of a set of negative images (IN) not corresponding to a classification criterion associated with said subset
4. Procédé de classification selon la revendication 2, caractérisé en ce que ledit seuil de décision (δj) est choisi de manière à maximiser la somme:4. Classification method according to claim 2, characterized in that said decision threshold (δ j ) is chosen so as to maximize the sum:
- du taux de bonne classification d'images d'un autre sous-ensemble d'images obéissant à un même critère de classification que ledit sous- ensemble (Ej), - et du taux de rejet d'images d'un ensemble d'images négatives (IN) ne correspondant pas audit critère de classification.the rate of good classification of images of another subset of images obeying the same classification criterion as the subset (Ej), and the rejection rate of images of a set of images; Negative (IN) images that do not meet the classification criteria.
5. Procédé de reconnaissance de visage utilisant le procédé de classification d'image d'objet selon l'une quelconque des revendications 1 à 4.A face recognition method using the object image classification method according to any one of claims 1 to 4.
6. Dispositif comportant des moyens adaptés à mettre en œuvre l'un des procédés selon l'une quelconque des revendications 1 à 5.6. Device comprising means adapted to implement one of the methods according to any one of claims 1 to 5.
7. Programme d'ordinateur comportant des instructions pour mettre en oeuvre l'un des procédés selon l'une quelconque des revendications 1 à 5, lorsqu'il est exécuté sur un ordinateur. A computer program comprising instructions for carrying out one of the methods according to any one of claims 1 to 5, when executed on a computer.
EP07871938A 2006-12-21 2007-12-14 Classification method for an object's image and corresponding device Withdrawn EP2100259A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0655795A FR2910668A1 (en) 2006-12-21 2006-12-21 Image classifying method for recognizing face, involves calculating distance between image of category of objects and sub-assembly according to specific formula, and comparing calculated distance with decision threshold
PCT/FR2007/052522 WO2008081143A2 (en) 2006-12-21 2007-12-14 Classification method for an object's image and corresponding device

Publications (1)

Publication Number Publication Date
EP2100259A2 true EP2100259A2 (en) 2009-09-16

Family

ID=38222291

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07871938A Withdrawn EP2100259A2 (en) 2006-12-21 2007-12-14 Classification method for an object's image and corresponding device

Country Status (3)

Country Link
EP (1) EP2100259A2 (en)
FR (1) FR2910668A1 (en)
WO (1) WO2008081143A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366430B2 (en) * 2017-02-06 2019-07-30 Qualcomm Incorporated Systems and methods for customizing amenities in shared vehicles
CN112115740B (en) * 2019-06-19 2024-04-09 京东科技信息技术有限公司 Method and apparatus for processing image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2300746B (en) * 1995-05-09 1999-04-07 Mars Inc Validation
FR2884007A1 (en) * 2005-03-29 2006-10-06 France Telecom FACIAL IDENTIFICATION METHOD FROM FACE IMAGES, CORRESPONDING COMPUTER DEVICE AND PROGRAM

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2008081143A3 *

Also Published As

Publication number Publication date
WO2008081143A2 (en) 2008-07-10
FR2910668A1 (en) 2008-06-27
WO2008081143A3 (en) 2008-10-23

Similar Documents

Publication Publication Date Title
Chen et al. Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation
Sun et al. Face spoofing detection based on local ternary label supervision in fully convolutional networks
Wang et al. Beyond object recognition: Visual sentiment analysis with deep coupled adjective and noun neural networks.
US8712157B2 (en) Image quality assessment
Ortiz et al. Face recognition in movie trailers via mean sequence sparse representation-based classification
Shepley Deep learning for face recognition: a critical analysis
JP2007109229A (en) Apparatus and method for detecting prescribed subject
Zhang et al. Boosting-based face detection and adaptation
Nadeem et al. Real time surveillance for low resolution and limited data scenarios: An image set classification approach
An et al. Face recognition in multi-camera surveillance videos using dynamic Bayesian network
Alafif et al. On detecting partially occluded faces with pose variations
EP2100259A2 (en) Classification method for an object's image and corresponding device
Sabaghi et al. Deep learning meets liveness detection: recent advancements and challenges
Yang Face Detection.
Chen et al. A view-based statistical system for multi-view face detection and pose estimation
Zuo Fast human face detection using successive face detectors with incremental detection capability
Ramirez et al. Face detection using combinations of classifiers
Li et al. Video face recognition system: Retinaface-mnet-faster and secondary search
Favorskaya et al. Image-based anomaly detection using CNN cues generalisation in face recognition system
Almansour et al. I-privacy photo: Face recognition and filtering
Sugandi et al. Face recognition based on pca and neural network
CN111353353A (en) Cross-posture face recognition method and device
Brenner et al. Graph-based recognition in photo collections using social semantics
Iqbal et al. Who is the hero? semi-supervised person re-identification in videos
Thakral et al. Efficient Object Recognition using Convolution Neural Networks Theorem

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090619

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110315