FR2864300A1

FR2864300A1 - Person localizing process for e.g. video telephone scenes filtering field, involves choosing active contour algorithm for video image to apply to initial contour for obtaining final contour localizing person to be extracted in image

Info

Publication number: FR2864300A1
Application number: FR0351171A
Authority: FR
Inventors: Anis Rojbi; Jean Claude Schmitt
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2003-12-22
Filing date: 2003-12-22
Publication date: 2005-06-24
Also published as: WO2005071612A1

Abstract

The process involves performing an initial segmentation of a video image based on an average saturation of the image and its comparison to a preset threshold. A model is applied to the segmented image and an initial contour of the image is deducted. An active contour algorithm for the image is chosen and applied to the initial contour for obtaining a final contour localizing a person to be extracted in the image. An independent claim is also included for a computer program product having instructions for implementing a process of localizing a person in a video image.

Description

PROCEDE DE LOCALISATION ET DE SEGMENTATION FLOUE D'UNEPROCESS FOR LOCALIZATION AND FUZZY SEGMENTATION OF A

PERSONNE DANS UNE IMAGE VIDEOPERSON IN A VIDEO IMAGE

La présente invention concerne un procédé de localisation d'une personne dans une image vidéo. Elle concerne également un procédé de segmentation floue de ladite personne. The present invention relates to a method of locating a person in a video image. It also relates to a fuzzy segmentation method of said person.

L'invention trouve une application particulièrement avantageuse, mais non limitative, dans le domaine du filtrage des scènes visiophoniques. Toutefois, d'autres domaines d'applications peuvent être envisagés comme la surveillance ou la transmission télévisée par ADSL. The invention finds a particularly advantageous, but not limiting, application in the field of filtering videophone scenes. However, other areas of applications may be considered such as surveillance or television transmission by ADSL.

Dans les applications liées aux services de visiophonie, destinés au grand public notamment, on rencontre deux types de difficultés, à savoir: io dans le domaine résidentiel, le fait de montrer une image de son environnement personnel est souvent ressenti par les utilisateurs comme une intrusion dans leur vie privée. Cette atteinte au respect de l'intimité constitue un frein psychologique important à l'utilisation de la visiophonie à domicile. In applications related to videophone services, intended for the general public in particular, there are two types of difficulties, namely: io in the residential field, the fact of showing an image of his personal environment is often felt by users as an intrusion in their private life. This violation of privacy is a significant psychological barrier to the use of home video calling.

- dans le domaine des téléphones mobiles, un autre type de difficultés se présente qui est lié au fait que transmettre un grand nombre d'informations complexes (rues, voitures, immeubles, fond en mouvement,... ) n'a pour résultat que de brouiller le message visiophonique que l'utilisateur nomade souhaite envoyer. - in the field of mobile phones, another type of difficulty arises which is linked to the fact that transmitting a large number of complex information (streets, cars, buildings, moving background, ...) only results in to scramble the videophone message that the nomadic user wishes to send.

Dans les deux situations, mais pour des raisons différentes, il s'avère nécessaire d'éliminer le fond de l'image pour ne conserver que les personnes en cours de conversation téléphonique. Pour la visiophonie résidentielle, l'avantage est de préserver l'intimité des utilisateurs. Pour la visiophonie mobile, l'avantage est de pouvoir augmenter le taux de compression en ne transmettant que les éléments d'image utiles. In both situations, but for different reasons, it is necessary to eliminate the background of the image to keep only people in conversation. For residential videotelephony, the advantage is to preserve the privacy of users. For mobile video telephony, the advantage is to be able to increase the compression ratio by transmitting only the useful image elements.

Pour atteindre ces résultats, il faut, sur le plan technique, être capable de segmenter l'image vidéo afin de distinguer du reste de l'image que l'on veut éliminer les zones d'intérêt que l'on veut conserver, c'est à dire celles dans lesquelles la personne que l'on veut isoler est présente. Il est clair que cette opération de segmentation, dite floue, nécessite au préalable une opération de localisation de la personne dans l'image vidéo. To achieve these results, it is technically necessary to be able to segment the video image to distinguish from the rest of the image that we want to eliminate the areas of interest that we want to keep that is, those in which the person you want to isolate is present. It is clear that this segmentation operation, said fuzzy, requires a prior operation of locating the person in the video image.

s Parmi les procédés connus de localisation, on peut citer l'utilisation d'un réseau de neurones pour classifier les éléments d'image (pixels) en tant qu'appartenant au visage ou non. L'inconvénient de cette approche réside dans le temps de calcul qui en général ne permet pas de faire des traitements en temps réel. Known localization methods include the use of a neural network to classify pixels (pixels) as part of the face or not. The disadvantage of this approach lies in the computation time which in general does not allow to do real-time processing.

io Il existe également d'autres procédés de localisation dit par modèles déformables. Ces modèles sont basés sur un algorithme de contour actif dont le principe général consiste d'abord à isoler le sujet à extraire dans un contour initial approché et à faire évoluer le contour par déformation algorithmique vers un contour final constituant une bonne approximation quant à la localisation du sujet d'intérêt dans l'image. There are also other localization methods known as deformable models. These models are based on an active contour algorithm whose general principle consists firstly in isolating the subject to be extracted in an approximate initial contour and in changing the contour by algorithmic deformation to a final contour constituting a good approximation as to the location. of the subject of interest in the image.

Dans certains algorithmes de contour actif, l'évolution du contour est guidée par l'optimisation simultanée de deux critères. Le premier mesure la régularité géométrique du contour, en s'appuyant par exemple sur le calcul de sa courbure locale. Le second mesure une propriété de l'intensité lumineuse le long des points de l'image traversés par le contour déformable, par exemple la norme du gradient de l'intensité lumineuse afin de favoriser l'attraction du contour vers des points de fort contraste. Ces méthodes sont efficaces lorsque l'utilisateur peut initialiser le modèle de manière approximative autour de la région d'intérêt et exigent donc une initialisation proche de la solution, ce qui suppose une connaissance a priori de la position de la personne à extraire dans l'image. De ce fait, aucune mise en oeuvre pratique de ces algorithmes ne peut être envisagée. Enfin, on remarquera en outre qu'une initialisation éloignée de la solution nécessite d'avoir recours à de fréquentes re-paramétrisations qui peuvent présenter des difficultés d'adaptation à la géométrie de la forme étudiée. In some active contour algorithms, the evolution of the contour is guided by the simultaneous optimization of two criteria. The first measure the geometric regularity of the contour, relying for example on the calculation of its local curvature. The second one measures a property of the luminous intensity along the points of the image traversed by the deformable contour, for example the norm of the gradient of the luminous intensity in order to favor the attraction of the contour towards points of strong contrast. These methods are effective when the user can initialize the model roughly around the region of interest and therefore require an initialization close to the solution, which requires a priori knowledge of the position of the person to extract in the picture. As a result, no practical implementation of these algorithms can be envisaged. Finally, it will further be noted that remote initialization of the solution requires frequent re-parameterizations which may present difficulties in adapting to the geometry of the studied form.

Aussi, le problème technique à résoudre par l'objet de la présente invention est de proposer un procédé de localisation d'une personne dans une image vidéo codée en couleur selon un système de codage couleur donné, ledit procédé utilisant au moins un algorithme de contour actif destiné à déterminer un contour final de la personne à extraire à partir d'un contour initial approché, procédé qui permettrait d'obtenir une localisation automatique en un temps de traitement optimal en fonction des degrés de complexité de la séquence vidéo, rendant possibles une implémentation en temps réel, une bonne stabilité temporelle et un usage facile en réduisant le nombre de paramètres à régler par l'utilisateur. Also, the technical problem to be solved by the object of the present invention is to propose a method of locating a person in a color coded video image according to a given color coding system, said method using at least one contour algorithm asset for determining a final contour of the person to be extracted from an approximated initial contour, a method which would make it possible to obtain an automatic location in an optimal processing time according to the complexity levels of the video sequence, making possible a Real-time implementation, good temporal stability and easy use by reducing the number of parameters to be set by the user.

La solution au problème technique posé consiste, selon la présente invention, en ce que ledit procédé comporte les étapes suivantes: io -a) détermination d'un contour initial approché de la personne à extraire, consistant à : -a1) transformer le système de codage couleur de l'image en un codage TSL, -a2) effectuer une segmentation initiale de l'image à partir du calcul de is la saturation moyenne Sm de l'image et de sa comparaison à deux seuils prédéterminés Si et S2 (>S1) : -a21) si Sm>S2: appliquer à l'image un filtre de teinte chair , -a22) si S1<Sm<S2: appliquer à l'image une opération d'égalisation d'histogramme, puis un filtre de teinte chair , -a23) si Sm<S1: appliquer à l'image une opération d'égalisation d'histogramme suivie d'une opération de seuillage d'histogramme, puis une première opération de segmentation par le mouvement de l'image, -a3) appliquer un modèle de forme à l'image ainsi segmentée et en 25 déduire ledit contour initial approché, -b) choix d'un algorithme de contour actif, -c) application dudit algorithme de contour actif au contour initial approché pour obtenir le contour final localisant la personne à extraire dans l'image vidéo. The solution to the technical problem posed consists, according to the present invention, in that said method comprises the following steps: io -a) determination of an approximate initial contour of the person to be extracted, consisting of: -a1) transforming the system of color coding the image into a TSL encoding, -a2) performing an initial segmentation of the image from the calculation of the mean saturation Sm of the image and its comparison with two predetermined thresholds S1 and S2 (> S1 ): -a21) if Sm> S2: apply to the image a flesh tint filter, -a22) if S1 <Sm <S2: apply to the image a histogram equalization operation, then a tint filter flesh, -a23) if Sm <S1: apply to the image a histogram equalization operation followed by a histogram thresholding operation, then a first segmentation operation by the movement of the image, -a3 ) apply a shape model to the image thus segmented and deduce from it said initial initial outline oché, -b) choice of an active contour algorithm, -c) application of said active contour algorithm to the approximate initial contour to obtain the final contour locating the person to extract in the video image.

Ainsi, du fait que la première étape a) de recherche d'un contour initial approché est effectuée systématiquement sur l'ensemble de l'image vidéo, on comprend que le procédé de localisation conforme à l'invention n'exige pas une connaissance a priori de la topologie du sujet à détecter et se prête donc bien à un traitement automatique de l'image. Thus, since the first step a) of searching for an approximate initial contour is carried out systematically over the entire video image, it is understood that the localization method according to the invention does not require a knowledge of priori topology of the subject to detect and therefore lends itself to automatic image processing.

Par ailleurs, la référence au seul paramètre de saturation moyenne Sm afin de définir le type de traitement à appliquer pour réaliser la segmentation initiale de l'image simplifie considérablement le procédé de localisation, objet de l'invention, en limitant le paramétrage et en associant un traitement bien déterminé en fonction de la valeur de ce paramètre. II en résulte la possibilité d'une mise en oeuvre en temps réel du procédé de l'invention, ce qui constitue un avantage substantiel par rapport aux autres méthodes de localisation io connues. Furthermore, the reference to the only average saturation parameter Sm in order to define the type of processing to be applied to achieve the initial segmentation of the image considerably simplifies the localization method, which is the subject of the invention, by limiting the parameterization and by associating a definite treatment according to the value of this parameter. This results in the possibility of real-time implementation of the method of the invention, which constitutes a substantial advantage over other known methods of localization.

Comme on le verra en détail plus loin, le procédé de localisation conforme à l'invention peut utiliser deux types d'algorithme de contour actif: l'algorithme connu sous le nom de B-snake décrit par exemple dans les publications suivantes:"Active Contour Models: Overview, Implementation and Applications" (S. Menet, P. Saint-Marc and G. Medioni, IEEE, 1990, pp. 194-199) et "Representation of Range Data with B-spline surface Patches (Chia-Xei Liao end G.Medioni, IEEE, 1992, pp. 745-748). As will be seen in detail below, the localization method according to the invention can use two types of active edge algorithm: the algorithm known as B-snake described for example in the following publications: "Active Contour Models: Overview, Implementation and Applications "(S. Menet, P. Saint-Marc and G. Medioni, IEEE, 1990, pp. 194-199) and" Representation of Range Data with B-Spline Surface Patches "(Chia-Xei Liao end G.Medioni, IEEE, 1992, pp. 745-748).

- un algorithme dit B-snake-GVF qui est un algorithme dérivé du précédent qui prend en compte la force externe connue sous le nom de GVF (Gradient Vector Flow), décrite dans les publications suivantes: "Snakes, Shapes, and Gradient Vector Flow" (Chenyang Xu, IEEE Transactions on Image Processing, vol. 7, n 3, March 1998), "Gradient Vector Flow: A New External Force for Snakes" (Chenyang Xu and Jerry L. Prince, IEEE Proc. On Comp. Vis. Patt. Recog. (CVPR'97), pp. 66-71) et "Global Optimality of Gradient Vector Flow" (Chenyang Xu and Jerry L. Prince, 2000 Conference on Information Sciences and Systems, Princeton University, March 15-17, 2000). an algorithm called B-snake-GVF which is an algorithm derived from the previous one which takes into account the external force known as GVF (Gradient Vector Flow), described in the following publications: Snakes, Shapes, and Gradient Vector Flow "(Chenyang Xu, IEEE Transactions on Image Processing, Volume 7, No. 3, March 1998)," Gradient Vector Flow: A New External Force for Snakes "(Chenyang Xu and Jerry L. Prince, IEEE Proc. Rec. P. (CVPR'97), pp. 66-71) and "Global Optimality of Gradient Vector Flow" (Chenyang Xu and Jerry L. Prince, 2000 Conference on Information Sciences and Systems, Princeton University, March 15-17 , 2000).

L'algorithme B-snake est celui qui est le plus classiquement utilisé. Son implémentation est simple et son temps de calcul relativement rapide. Par contre, il nécessite un contour initial assez proche de l'objet à extraire. En effet, si le contour initial est trop lointain, la convergence vers le sujet est mal aisée et surtout très lente. The B-snake algorithm is the one that is most conventionally used. Its implementation is simple and its calculation time relatively fast. On the other hand, it requires an initial contour quite close to the object to be extracted. Indeed, if the initial contour is too far, the convergence towards the subject is ill easy and especially very slow.

L'algorithme B-snake-GVF est beaucoup plus robuste et conduit à de bons résultats même dans le cas d'un sujet à extraire présentant des concavités ou si les contours initiaux sont lointains. Par contre, son implémentation est plus complexe que pour l'algorithme B-snake, mais son principal inconvénient réside dans le fait qu'il n'est pas utilisable si l'image vidéo contient deux sujets d'intérêt trop rapprochés, il se produit alors de interférences rendant cet algorithme inexploitable. The B-snake-GVF algorithm is much more robust and leads to good results even in the case of a subject to extract having concavities or if the initial contours are far. On the other hand, its implementation is more complex than for the B-snake algorithm, but its main disadvantage lies in the fact that it is not usable if the video image contains two subjects of interest too close together, it occurs then interference making this algorithm unusable.

A cet égard, on peut rappeler que le principe des algorithmes de contour actif repose sur la minimisation constante de l'énergie ET de la courbe représentant le contour. Cette énergie est définie de sorte à présenter un minimum local sur le contour extérieur d'objets d'une image. Ainsi le io contour initial évolue jusqu'à épouser sa forme finale. In this respect, it can be recalled that the principle of active contour algorithms is based on the constant minimization of the energy AND of the curve representing the contour. This energy is defined so as to present a local minimum on the outer contour of objects of an image. Thus the initial contour changes to its final form.

Dans l'algorithme B-snake-GVF, l'expression de l'énergie totale à minimiser est: Eext a etétant des coefficients de réglage expérimentaux et %(sn,)=exp' rj*Sm) un coefficient qui permet de sélectionner entre les modes de segmentation par l'algorithme B-snake classique et l'algorithme B-snake-GVF, en fonction de la valeur de la saturation moyenne Sm de l'image. In the B-snake-GVF algorithm, the expression of the total energy to be minimized is: Eext is and is experimental coefficients of adjustment and% (sn,) = exp 'rj * Sm) a coefficient which makes it possible to select between the segmentation modes by the conventional B-snake algorithm and the B-snake-GVF algorithm, depending on the value of the average saturation Sm of the image.

L'invention propose une optimisation dans le choix de l'algorithme à utiliser consistant en ce que l'étape b) de choix d'un algorithme de contour actif comprend les étapes suivantes: -b1) détermination du nombre NBP de personnes dans l'image vidéo, 25 -b2) choix de l'algorithme: -b21) si NBP=1: choix de l'algorithme B-snake-GVF, -b22) si NBP>1: choix de l'algorithme B-snake. The invention proposes an optimization in the choice of the algorithm to be used consisting in that step b) of choosing an active contour algorithm comprises the following steps: -b1) determining the number NBP of people in the video image, 25 -b2) choice of the algorithm: -b21) if NBP = 1: choice of the algorithm B-snake-GVF, -b22) if NBP> 1: choice of the algorithm B-snake.

De cette façon, on obtient que si une seule personne est présente dans l'image l'algorithme B-snake-GVF, plus robuste, est utilisé pour un traitement efficace du contour même dans des conditions initiales difficiles. A l'inverse, si plusieurs personnes sont présentes, on utilise l'algorithme B-snake qui dans ce contexte donne les meilleurs résultats. In this way, we obtain that if only one person is present in the image, the more robust B-snake-GVF algorithm is used for an efficient treatment of the contour even under difficult initial conditions. On the other hand, if several people are present, we use the B-snake algorithm which in this context gives the best results.

Il faut également signaler qu'afin d'affiner le résultat de segmentation initiale obtenu dans le cas d'une saturation moyenne Sm comprise entre les deux seuils SI et S2, l'invention prévoit que l'étape a22) est complétée par une deuxième opération de segmentation par le mouvement de l'image. D'une manière générale, ladite deuxième opération de segmentation par le mouvement de l'image est une méthode de simple différence d'images, comme cela sera décrit plus loin. It should also be pointed out that in order to refine the initial segmentation result obtained in the case of an average saturation Sm lying between the two thresholds S1 and S2, the invention provides that step a22) is completed by a second operation. of segmentation by the movement of the image. In a general manner, said second operation of segmentation by the movement of the image is a simple difference image method, as will be described later.

Le procédé de localisation conforme à l'invention tel qu'il vient d'être caractérisé permet de définir un procédé de segmentation floue d'une io personne dans une image vidéo qui, dans le cas de l'utilisation d'un algorithme de contour unique, est remarquable, selon l'invention, en ce que ledit procédé consiste à localiser ladite personne au moyen du procédé de localisation précédent, puis à effectuer les opérations de segmentation floue suivantes: -dl) calculer un indice IC de confiance, - d2) comparer ledit indice IC de confiance à un seuil ICs prédéterminé : - d21) si IC≥ICs: effectuer une segmentation simple de l'image en multipliant respectivement par 1 et 0 l'intérieur et l'extérieur du contour final, -d22) si IC<ICs: effectuer une segmentation complexe de l'image en multipliant par 1 l'intérieur du contour final et par un nombre compris entre 0 et 1 fourni par un filtre gaussien l'extérieur dudit contour. The localization method according to the invention as just characterized makes it possible to define a fuzzy segmentation method of a person in a video image which, in the case of the use of a contour algorithm unique, is remarkable, according to the invention, in that said method consists in locating said person by means of the previous location method, then in performing the following fuzzy segmentation operations: -dl) compute a confidence index IC, - d2 ) compare said confidence index IC with a predetermined threshold ICs: - d21) if IC≥ICs: perform a simple segmentation of the image by multiplying the inside and the outside of the final contour by 1 and 0 respectively, -d22) if IC <ICs: perform a complex segmentation of the image by multiplying by 1 the inside of the final contour and by a number between 0 and 1 provided by a Gaussian filter the outside of said contour.

Dans le cas où le procédé de localisation, objet de l'invention, met en oeuvre, au choix, l'un ou l'autre de deux algorithmes de contour actif selon le nombre NBP de personnes présentes dans l'image vidéo, ledit procédé de segmentation est remarquable, selon l'invention, en ce que ledit procédé consiste à localiser ladite personne au moyen du procédé de localisation précédent, puis à effectuer les opérations de segmentation floue suivantes: - dl) calculer un indice IC de confiance, -d2) comparer ledit indice IC de confiance à un seuil ICs prédéterminé : - d21) si IC≥ICs et si IC<lCs avec NBP=1: effectuer une segmentation simple de l'image en multipliant respectivement par 1 et 0 l'intérieur et l'extérieur du contour final, -d22) si IC<lCs avec NBP>1: effectuer une segmentation complexe de l'image en multipliant par 1 l'intérieur du contour final et par un nombre compris entre 0 et 1 fourni par un filtre gaussien l'extérieur dudit contour. In the case where the localization method, which is the subject of the invention, implements, as desired, one or the other of two active contour algorithms according to the number NBP of persons present in the video image, said method Segmentation is remarkable, according to the invention, in that said method consists of locating said person by means of the preceding localization method, then performing the following fuzzy segmentation operations: d1) calculating a confidence index CI, -d2 ) comparing said confidence index CI with a predetermined threshold ICs: - d21) if IC≥ICs and if IC <lCs with NBP = 1: performing a simple segmentation of the image by multiplying respectively by 1 and 0 the interior and the outside of the final contour, -d22) if IC <lCs with NBP> 1: perform a complex segmentation of the image by multiplying by 1 the inside of the final contour and by a number between 0 and 1 provided by a Gaussian filter outside said contour.

La description qui va suivre en regard des dessins annexés, donnés à titre d'exemples non limitatifs, fera bien comprendre en quoi consiste l'invention et comment elle peut être réalisée. The following description with reference to the accompanying drawings, given as non-limiting examples, will make it clear what the invention consists of and how it can be achieved.

La figure 1 est un diagramme représentant un procédé de localisation et de segmentation floue d'une personne dans une image vidéo selon io l'invention. Fig. 1 is a diagram showing a fuzzy location and segmentation method of a person in a video image according to the invention.

La figure 2 est un diagramme représentant l'étape A d'initialisation du procédé de la figure 1. FIG. 2 is a diagram representing step A of initialization of the method of FIG.

La figure 3 est un diagramme représentant un procédé de détermination du nombre NBP de personnes dans l'image vidéo considérée. Fig. 3 is a diagram showing a method of determining the NBP number of persons in the video image under consideration.

La figure 4 est un schéma illustrant deux exemples d'application du procédé de localisation et de segmentation floue conforme à l'invention. FIG. 4 is a diagram illustrating two examples of application of the fuzzy localization and segmentation method according to the invention.

Sur la figure 1 est représenté de façon schématique un procédé de segmentation floue d'une personne dans une image vidéo. In Figure 1 is schematically shown a method of fuzzy segmentation of a person in a video image.

Ce procédé comporte essentiellement quatre étapes désignées par les lettres A, B, C, et D. Les étapes A et B correspondent à la mise en oeuvre d'un procédé de localisation de la personne que l'on cherche à extraire de l'image vidéo considérée. L'étape C est l'étape de segmentation floue proprement dite qui consiste à détacher la personne du fond ou arrière-plan de l'image en traitant ce dernier de manière à le rendre flou ou l'éliminer, sans affecter pour autant le rendu visuel de la personne. La dernière étape D est un traitement de données destiné notamment à déterminer l'évolution temporelle des paramètres utiles afin d'actualiser le contour de la personne image par image et d'être en mesure d'appliquer le meilleur algorithme de localisation. This method essentially comprises four steps designated by the letters A, B, C, and D. Steps A and B correspond to the implementation of a method for locating the person that is to be extracted from the image. video considered. Step C is the fuzzy segmentation step itself which consists in detaching the person from the background or background of the image by treating the image so as to make it blurry or eliminate it, without affecting the rendering of the image. visual of the person. The last step D is a data processing intended in particular to determine the temporal evolution of the useful parameters in order to update the contour of the person frame by frame and to be able to apply the best location algorithm.

Dans le contexte de l'invention, le procédé de localisation décrit ici est du type de ceux utilisant au moins un algorithme de contour actif destiné à déterminer un contour final de la personne à extraire à partir d'un contour initial approché. In the context of the invention, the locating method described here is of the type using at least one active contour algorithm for determining a final contour of the person to be extracted from an approximate initial contour.

La première étape du procédé de localisation conforme à l'invention est représentée par l'étape A de le figure 1 et consiste à définir un contour initial approché de la personne à extraire. Cette opération est réalisée par un module d'initialisation automatique dont la mise en oeuvre va maintenant être décrite en regard de la figure 2. Auparavant, il convient d'insister sur le caractère automatique de cette étape d'initialisation, au sens où, contrairement aux autres algorithmes connus, le contour initial approché est, selon l'invention, déterminé par une analyse globale de l'image vidéo considérée, sans qu'il soit nécessaire d'avoir une connaissance a priori de la io position de la personne recherchée dans cette image. The first step of the locating method according to the invention is represented by step A of FIG. 1 and consists in defining an initial contour approximated to the person to be extracted. This operation is performed by an automatic initialization module, the implementation of which will now be described with reference to FIG. 2. Previously, it is necessary to insist on the automatic character of this initialization step, in the sense that, unlike to the other known algorithms, the initial approach contour is, according to the invention, determined by a global analysis of the video image considered, without it being necessary to have a priori knowledge of the position of the person sought in this image.

Comme l'indique la figure 2, l'étape A du procédé de localisation conforme à l'invention débute par la transformation du codage couleur de l'image vidéo (généralement le codage RVB (Rouge-Vert-Bleu) classique) en un codage TLS, connu en soi, construit autour des trois paramètres constitués par le Teint, la Luminance et la Saturation de l'image. Le codage TLS permet en particulier de calculer la saturation moyenne Sm de l'image. Cette grandeur est particulièrement pertinente pour l'invention puisqu'elle permet de segmenter le procédé de localisation en trois cas possibles en fonction de la valeur de la saturation moyenne Sm, chacun de ces cas étant optimisé en nombre de paramètres et en temps de calcul. En effet, l'idée de base pour la recherche du contour initial approché est d'appliquer un filtre dit de teinte chair sur la composante Teinte de l'image codée TLS, ceci de manière à faire apparaître au moins le visage de la personne à extraire. Or, le degré de pertinence du filtre de teinte chair dépend de la saturation moyenne Sm de l'image; plus la saturation est grande et plus l'application du filtre de teinte chair conduira à de bons résultats en terme de définition du contour initial approché. En pratique, l'invention fixe deux seuils SI et S2>S1 pour la saturation moyenne Sm: - si la saturation Sm est élevée, supérieure à S2, on se trouve dans un cas où 30 le filtrage de teinte chair est suffisant à lui seul pour localiser avec précision la position dans l'image de la personne à extraire, - si la saturation Sm est moyenne, c'est à dire comprise entre SI et S2, le filtrage de teinte chair ne peut être utilisé seul pour donner une localisation approchée acceptable de la personne. Dans ce cas, on a recours à une égalisation d'histogramme de luminace effectuée sur la composante Luminance du codage TLS. Cette opération a pour effet d'éliminer les ombres, sans influence importante sur les caractéristiques de l'image, ce qui rend le filtrage teinte chair plus pertinent et plus significatif. Au besoin, on peut encore améliorer les résultats en appliquant une segmentation complémentaire par le mouvement de l'image en partant de l'hypothèse que dans les images visiophoniques le sujet d'intérêt fait partie des éléments de l'image susceptibles de bouger; dans la visiophonie fixe c'est en général le io seul élément en mouvement, alors qu'en visiophonie mobile d'autres éléments peuvent être en mouvement tels que des feuilles qui bougent ou des voitures qui passent. Dans la mesure où, dans ce cas, une segmentation par le mouvement d'image très précise n'est pas nécessaire, on a recours à une segmentation par simple différence d'images. Cette méthode est expliquée en détail dans l'article Localisation des personnes dans une séquence vidéo (A. Rojbi, J-C. Schmitt, A. Georges, P. Boissanade, 17 janvier 2003, Université Claude-Bernard Lyonl). As indicated in FIG. 2, step A of the localization method according to the invention begins with the transformation of the color coding of the video image (generally the standard RGB (Red-Green-Blue) coding) into a coding TLS, known in itself, builds around the three parameters constituted by the Complexion, the Luminance and the Saturation of the image. The TLS coding makes it possible in particular to calculate the average saturation Sm of the image. This quantity is particularly relevant to the invention since it makes it possible to segment the location method in three possible cases as a function of the value of the average saturation Sm, each of these cases being optimized in number of parameters and in calculation time. Indeed, the basic idea for the search of the initial approach contour is to apply a so-called flesh color filter on the Tint component of the TLS coded image, so as to show at least the face of the person to extract. However, the degree of relevance of the flesh color filter depends on the average saturation Sm of the image; the greater the saturation, the more the application of the flesh color filter will lead to good results in terms of definition of the approximate initial contour. In practice, the invention sets two thresholds S1 and S2> S1 for the average saturation Sm: - if the saturation Sm is high, greater than S2, we find ourselves in a case where the screening of flesh tint is sufficient on its own to precisely locate the position in the image of the person to be extracted, - if the saturation Sm is average, ie between SI and S2, the flesh color filtering can not be used alone to give an approximate location acceptable to the person. In this case, luminous histogram equalization is used on the Luminance component of the TLS coding. This operation has the effect of eliminating shadows, without significant influence on the characteristics of the image, which makes the filtering shade flesh more relevant and more meaningful. If necessary, the results can be further improved by applying a complementary segmentation by the movement of the image on the assumption that in the videophone images the subject of interest is part of the image elements that can move; in fixed video telephony it is usually the only moving element, while in mobile videophone other elements may be moving such as moving leaves or passing cars. Insofar as, in this case, a segmentation by the very precise image movement is not necessary, segmentation is used by simple difference of images. This method is explained in detail in the article Localization of people in a video sequence (A. Rojbi, J.-C. Schmitt, A. Georges, P. Boissanade, January 17, 2003, University Claude-Bernard Lyonl).

- si la saturation Sm est faible, inférieure à st le filtrage teinte chair n'est plus significatif, il faut alors effectuer comme précédemment une égalisation d'histogramme de luminance suivie, cette fois, d'un seuillage d'histogramme de luminance qui consiste à calculer le gradient de luminance afin de déterminer les zones de transition et les contours d'objets. Dans ce cas, la segmentation par le mouvement est réalisée par une autre méthode connue sous le nom de maximum de vraisemblance, beaucoup plus précise que la simple différence d'images. Cette méthode est également décrite dans l'article Localisation des personnes dans une séquence vidéo cité ci-dessus. if the saturation Sm is small, less than st the flesh-tone filtering is no longer significant, then a luminance histogram equalization, followed this time by a luminance histogram thresholding consisting of calculate the luminance gradient to determine the transition zones and the contours of objects. In this case, the motion segmentation is performed by another method known as the maximum likelihood, much more accurate than the simple difference of images. This method is also described in the article Locating People in a Video Sequence Above.

Enfin, on peut voir sur la figure 2 qu'un modèle de forme humaine est appliqué à l'image segmentée obtenue à l'issue de l'un quelconque des trois traitements possibles en relation avec la valeur de la saturation Sm. Le résultat de cette opération est la détermination du contour initial approché recherché pour la personne à extraire et l'élimination des objets retenus par la segmentation par le mouvement mais qui ne sont pas des personnes. L'application du modèle de forme est également décrite dans l'article io Localisation des personnes dans une séquence vidéo mentionné ci- dessus. Finally, it can be seen in FIG. 2 that a human form model is applied to the segmented image obtained at the end of any one of the three possible treatments in relation to the value of the saturation Sm. The result of this operation is the determination of the approximate initial contour sought for the person to be extracted and the elimination of the objects retained by the segmentation by the movement but which are not persons. The application of the shape model is also described in the article Locating people in a video sequence mentioned above.

Les valeurs des seuils SI et S2 de saturation sont déterminées de manière expérimentale. Le demandeur a établi que des valeurs de 0,4 pour 5 SI et de 0,7 pour S2 conduisaient à des résultats satisfaisants. The values of the thresholds S1 and S2 of saturation are determined experimentally. The applicant has established that values of 0.4 for 5 SI and 0.7 for S2 lead to satisfactory results.

Revenant à la figure 1, on voit qu'après l'étape A d'initialisation est effectuée une étape B visant à choisir l'algorithme de contour actif le plus efficace en termes de temps de calcul et de paramétrage pour aboutir au contour final de la personne à extraire en partant du contour initial approché io précédemment obtenu. Returning to FIG. 1, it can be seen that after the initialization step A is carried out a step B aimed at choosing the most effective active contour algorithm in terms of calculation time and parameterization to arrive at the final contour of FIG. the person to be extracted starting from the initial contour approximated io previously obtained.

Ainsi que le montre la figure 1, le choix d'un algorithme de contour actif s'effectue entre deux possibilités, à savoir, d'une part, un algorithme robuste, l'algorithme B-snake-GVF, pouvant converger très rapidement vers le contour final même en partant d'un contour initial approché éloigné, mais inexploitable en présence de plusieurs personnes dans l'image, et, d'autre part, un algorithme moins robuste, l'algorithme B-snake, qui présente néanmoins l'avantage de pouvoir être utilisé même si plusieurs personnes sont présentes dans l'image. As shown in FIG. 1, the choice of an active contour algorithm takes place between two possibilities, namely, on the one hand, a robust algorithm, the B-snake-GVF algorithm, which can converge very rapidly towards the final contour even starting from an approximate initial approach distant, but unusable in the presence of several people in the image, and, on the other hand, a less robust algorithm, the B-snake algorithm, which nevertheless presents the advantage of being able to be used even if several people are present in the image.

Pour choisir entre ces deux algorithmes, il faut donc déterminer le nombre NBP de personnes participant à l'image vidéo à analyser (B1). Si NBP=1, alors on utilisera l'algorithme B-snake-GVF (B3), par contre, si NBP>1 on préférera l'algorithme B-snake (B2). To choose between these two algorithms, it is necessary to determine the number NBP of people participating in the video image to be analyzed (B1). If NBP = 1, then one will use the algorithm B-snake-GVF (B3), on the other hand, if NBP> 1 one will prefer the algorithm B-snake (B2).

La figure 3 montre comment le nombre NBP peut être déterminé. A partir de l'histogramme de luminance de l'image, on identifie la présence ou non d'une ou plusieurs vallées. Le terme vallée signifie ici une zone de minimum global de niveaux de gris défini à partir de l'histogramme de luminance. Si l'image ne comporte pas de telles vallées, on en conclut alors qu'une seule personne ne peut être dans l'image, d'où NBP=1. Figure 3 shows how the NBP number can be determined. From the luminance histogram of the image, the presence or absence of one or more valleys is identified. The term valley here means a global minimum gray level area defined from the luminance histogram. If the image does not include such valleys, it is concluded that only one person can be in the picture, hence NBP = 1.

Dans le cas d'une détection d'une ou plusieurs vallées, les points de séparation correspondants définissent des zones représentant potentiellement des personnes ou des objets. Pour déterminer s'il s'agit de personnes ou d'objets, on applique aux zones ainsi obtenues un modèle de forme humaine classique en effectuant une corrélation entre ces zones et le modèle de forme. Une erreur de 20% dans la corrélation est acceptée pour retenir la zone comme celle d'une personne. Dans le cas où les points de séparation mis en évidence ne sont pas validés en tant que zones représentant des personnes mais plutôt des objets, ces zones ne sont pas retenues. Le nombre NBP de personnes est alors celui des zones restantes retenues et validées. In the case of detection of one or more valleys, the corresponding separation points define areas potentially representing people or objects. To determine whether they are people or objects, the areas thus obtained are applied to a classical human form model by correlating these areas with the shape model. A 20% error in the correlation is accepted to hold the area as a person's. In the case where the points of separation highlighted are not validated as areas representing people but rather objects, these areas are not retained. The NBP number of people is then that of the remaining areas selected and validated.

Enfin, l'algorithme de contour actif ainsi défini à partir du nombre NBP est appliqué au contour initial approché pour obtenir en sortie de l'étape B le contour final de la personne à extraire. Finally, the active contour algorithm thus defined from the NBP number is applied to the approximate initial contour to obtain at the output of step B the final contour of the person to be extracted.

io L'étape C de la figure 1 correspond à la mise en oeuvre du procédé de segmentation floue proprement dit, lequel utilise le contour final obtenu à l'issue de l'étape B du procédé de localisation de personnes qui vient d'être décrit. Step C of FIG. 1 corresponds to the implementation of the fuzzy segmentation method itself, which uses the final contour obtained at the end of step B of the method of localization of persons which has just been described. .

Comme on peut le voir sur la figure 1, la segmentation floue, qui is consiste à rendre flou l'arrière-plan de l'image pour ne conserver que la personne d'intérêt, met en jeu un indice IC de confiance compris entre 0 et 1. Cet indice IC est utilisé lors du choix qui devra s'effectuer entre une segmentation floue simple de l'image consistant à multiplier respectivement par 1 et 0 l'intérieur et l'extérieur du contour final, et une segmentation floue complexe dans laquelle l'intérieur du contour final est multiplié par 1 tandis que l'extérieur de ce contour est multiplié par un nombre compris entre 0 et 1 fourni par un filtre gaussien, ceci pour tenir compte d'une éventuelle incertitude dans la définition du contour final. As can be seen in Figure 1, the fuzzy segmentation, which is to blur the background of the image to keep only the person of interest, involves a confidence index IC between 0 and 1. This IC index is used when choosing between a simple fuzzy segmentation of the image consisting of multiplying the inside and the outside of the final contour by 1 and 0 respectively, and a complex fuzzy segmentation in which the inside of the final contour is multiplied by 1 while the outside of this contour is multiplied by a number between 0 and 1 provided by a Gaussian filter, this to take into account a possible uncertainty in the definition of the final contour .

D'une manière générale, cet indice IC s'exprime de la manière 25 suivante: IC = À*Sm (1 X)*Bm x Sm étant la saturation moyenne de l'image, Bm la valeur moyenne du bruit dans l'image, x un paramètre caractéristique d'un mouvement de rotation dans l'image valant 0,08 en présence d'un mouvement de rotation et 0 en l'absence d'un tel mouvement, et un paramètre caractéristique de la texture de l'image valant 0,06 pour une image texturée et 0 pour une image non texturée (une image sera dite texturée si elle comporte plus de 12 régions homogènes). In general, this index IC is expressed in the following manner: IC = λ * Sm (1 ×) * B × × where Sm is the average saturation of the image, Bm is the average value of the noise in the image , x a parameter characteristic of a rotation movement in the image equal to 0.08 in the presence of a rotation movement and 0 in the absence of such a movement, and a characteristic parameter of the texture of the image 0.06 for a textured image and 0 for a non-textured image (an image will be textured if it has more than 12 homogeneous regions).

Expérimentalement, X est pris égal à 0,9, ce qui montre que la saturation Sm est le paramètre le plus significatif à prendre en considération dans la relation précédente. Experimentally, X is taken as 0.9, which shows that saturation Sm is the most significant parameter to consider in the previous relationship.

L'indice IC de confiance ainsi défini est alors comparé (Cl) à un seuil 5 ICs lui-même déterminé expérimentalement, par exemple 0,52. The confidence index IC thus defined is then compared (Cl) with a threshold 5 ICs itself determined experimentally, for example 0.52.

Si le procédé de localisation n'utilise qu'un seul algorithme de contour actif (B-snake par exemple), on aura recours à une segmentation simple si IC≥ICs ou à une segmentation complexe si IC<ICs. If the localization method uses only one active contour algorithm (B-snake for example), we will use a simple segmentation if IC≥ICs or a complex segmentation if IC <ICs.

Si, conformément à la figure 1, deux algorithmes de contour actif sont lo utilisés en fonction de la valeur du paramètre NBP, alors le choix de la segmentation à appliquer dépend des deux paramètres IC et NBP. Pour IC> =ICs ou si IC<ICs avec NBP=1, on effectue une segmentation floue simple (C2), alors que pour IC<ICs avec NBP>1, on effectue une segmentation floue complexe (C3). If, according to FIG. 1, two active contour algorithms are used according to the value of the NBP parameter, then the choice of the segmentation to be applied depends on the two parameters IC and NBP. For IC> = ICs or if IC <ICs with NBP = 1, a simple fuzzy segmentation (C2) is performed, whereas for IC <ICs with NBP> 1, a complex fuzzy segmentation (C3) is performed.

A titre d'exemple numérique, on peut prendre dans l'équation de IC Sm=0,8, Bm=0,1, avec présence d'un mouvement de rotation et d'une image très texturée, ce qui donne IC=0,57>ICs=0,52. On utilisera donc dans ce cas une segmentation floue simple quelque soit le nombre NBP. As a numerical example, we can take in the equation of IC Sm = 0.8, Bm = 0.1, with the presence of a rotational movement and a very textured image, which gives IC = 0 57> ICs = 0.52. We will use in this case a simple fuzzy segmentation regardless of the NBP number.

Enfin, une dernière étape désignée par D sur la figure 1 consiste à effectuer un traitement des données de segmentation issues de l'étape C. Ce traitement a non seulement pour but de suivre l'évolution temporelle du contour final, mais également celui de détecter d'éventuels changements brusques dans l'image vidéo. Dans ce cas, l'étape A d'initialisation automatique est reprise dès le début afin de reconstruire le contour initial approché et obtenir le contour final par application de l'algorithme de contour actif approprié. Finally, a last step designated by D in FIG. 1 is to perform a processing of the segmentation data resulting from step C. This processing is not only intended to follow the temporal evolution of the final contour, but also to detect sudden changes in the video image. In this case, the automatic initialization step A is taken over from the beginning in order to reconstruct the approximate initial contour and obtain the final contour by applying the appropriate active contour algorithm.

On constate également sur la figure 1 que le traitement des données permet un calcul périodique de l'indice IC de confiance, par exemple toutes les N images avec N pouvant valoir 30. It can also be seen in FIG. 1 that the data processing allows a periodic calculation of the confidence index CI, for example all the N images with N possibly equaling 30.

La figure 4 illustre deux applications possibles non limitatives du procédé de localisation et de segmentation floue conforme à l'invention dans le domaine de la visiophonie. FIG. 4 illustrates two possible non-limiting applications of the fuzzy localization and segmentation method according to the invention in the field of video telephony.

La première relève de la téléphonie mobile où l'on cherche à éliminer de l'image tous les éléments complexes inutiles du fond de l'image afin d'augmenter la compression et de simplifier le codage vidéo du côté émetteur et la décompression vidéo du côté du récepteur. The first concerns mobile telephony where we try to eliminate from the image all unnecessary complex elements from the bottom of the image in order to increase the compression and simplify the video coding of the transmitter side and the video decompression of the side. of the receiver.

La seconde concerne la visiophonie résidentielle où l'on cherche davantage à supprimer l'arrière-plan de l'image dans le but de préserver l'intimité de l'émetteur par segmentation floue réalisée chez le récepteur. The second concerns residential video calling, where the aim is more to suppress the background of the image in order to preserve the intimacy of the transmitter by fuzzy segmentation performed at the receiver.

Dans tout les cas, le procédé de l'invention est mis en oeuvre par un programme d'ordinateur comprenant des instructions de code enregistrées sur lo un support lisible par un module logiciel implanté en aval de la caméra de prise de vues du côté de l'émetteur et, symétriquement, du côté du récepteur. In any case, the method of the invention is implemented by a computer program comprising code instructions recorded on a support readable by a software module installed downstream of the camera on the side of the camera. transmitter and, symmetrically, on the receiver side.

Claims

A method of locating a person in a color coded video image according to a given color coding system, said method using at least one active contour algorithm for determining a final contour of the person to be extracted from a approximated initial contour, characterized in that said method comprises the following steps: -a) determining an approximate initial contour of the person to be extracted, consisting of: - a1) transforming the color coding system of the image into a coding TSL, - a2) perform an initial segmentation of the image from the calculation of the average saturation Sm of the image and its comparison with two predetermined thresholds SI and S2 (> SI): -a21) if Sm> S2 : apply to the image a flesh tint filter, -a22) if SI <Sm <S2: apply to the image a histogram equalization operation, then a flesh tint filter, -a23) if Sm <S1 : Apply a histogram equalization operation to the image followed by a histogram thresholding operation, then a first segmentation operation by the movement of the image, -a3) applying a shape model to the image thus segmented and deducing from it the approximate initial contour; b) selection of an active contour algorithm, -c) application of said active contour algorithm to the approximate initial contour to obtain the final contour locating the person to extract in the video image.

2. Location method according to claim 1, characterized in that said method also comprises a data processing step for determining the temporal evolution of the final contour so as to update it image by image.

3. Location method according to one of claims 1 or 2, characterized in that step a) is resumed in the case of a sudden change in the video image.

4. Location method according to any one of claims 1 to 3, characterized in that step a22) is completed by a second segmentation operation by the movement of the image.

5. Location method according to one of claims 1 to 4, characterized in that said first segmentation operation by the movement of the image is a maximum likelihood method.

io

6. Location method according to one of claims 4 or 5, characterized in that said second segmentation operation by the movement of the image is a simple difference image method.

7. A localization method according to any one of claims 1 to 6, characterized in that the active contour algorithm chosen in step b) is the B-snake algorithm.

8. Location method according to any one of claims 1 to 6, characterized in that the active contour algorithm chosen in step b) is the B-snake-GVF algorithm.

9. A method of locating according to any one of claims 1 to 6, characterized in that step b) of choosing an active contour algorithm comprises the following steps: -b1) determination of the NBP number of people in the video image, -b2) choice of the algorithm: -b21) if NBP = 1: choice of the algorithm B-snake-GVF, -b22) if NBP> 1: choice of the algorithm B-snake.

10. A method of fuzzy segmentation of a person in a video image, characterized in that said method consists of locating said person by means of the localization method according to any one of claims 1 to 8, then to perform the segmentation operations fuzzy: -dl) compute a confidence index CI, -d2) compare said confidence index CI to a predetermined threshold ICs: -d21) if IC≥ICs: perform a simple segmentation of the image by multiplying by 1 and 0 inside and outside the final contour, -d22) if IC <ICs: perform a complex segmentation of the image by multiplying by 1 the inside of the final contour and by a number between 0 and 1 provided by a Gaussian filter outside said contour.

11. A method of fuzzy segmentation of a person in a video image, characterized in that said method consists of locating said person by means of the locating method according to claim 9, then performing the following fuzzy segmentation operations: -dl) calculating a confidence index CI, -d2) comparing said confidence index CI with a predetermined threshold ICs: -d21) if IC≥ICs and if IC <ICs with NBP = 1: performing a simple segmentation of the image by respectively multiplying by 1 and 0 inside and outside the final contour, -d22) if IC <ICs with NBP> 1: perform a complex segmentation of the image by multiplying by 1 the inside of the final contour and by a number between 0 and 1 provided by a Gaussian filter outside said contour.

12. Segmentation method according to one of claims 10 or 11, characterized in that the confidence index IC is calculated every N images.

A computer program product comprising program code instructions recorded on a computer readable medium, for carrying out the steps of the method according to claims 1 to 12 when said program is executed by a computer.