EP2076886A1

EP2076886A1 - Method and device for the virtual simulation of a sequence of video images

Info

Publication number: EP2076886A1
Application number: EP07858653A
Authority: EP
Inventors: Jean-Marc Robin; Christophe Blanc
Original assignee: Individual
Current assignee: Individual
Priority date: 2006-10-24
Filing date: 2007-10-23
Publication date: 2009-07-08
Also published as: BRPI0718306A2; FR2907569B1; FR2907569A1; JP2010507854A; US20100189357A1; EP2450852A1; KR20090098798A; WO2008050062A1; CA2667526A1

Abstract

The invention relates to a method for the virtual simulation of a sequence of video images from a sequence of video images of a moving face/head, comprising : an acquisition and initialisation phase of a face/head image of the real video sequence; an evolution phase for determining specific parametric models from characteristic points extracted from said image and used as initial priming points, and for deforming said specific models for adaptation to the outlines of the features of the analysed face, and also for detecting and analysing the cutaneous structure of one or more regions of the face/head; and a tracking and transformation phase for modifying the characteristic features of other images in the video sequence and the colours of the cutaneous structure, said modifications being carried out according to predetermined criteria stored in at least one database and/or according to decision criteria of at least one expert system of a 0+ or 1 order.

Description

Method and device for virtual simulation of a sequence of video images

The present invention relates to a method and a device for automatically simulating and processing in real time one or more aesthetic images of a real objective, for example a face and / or a head of a moving character in a scene, by detecting and monitoring its characteristic features.

The features of the face participate in the act of communication between human beings. Nevertheless, it should be noted that the visualization of the characteristic features of the face is a support for communication only if these features are extracted with sufficient precision. In the opposite case, the information resulting from an analysis that is too brief constitutes more an inconvenience than an aid, particularly for high-level industrial applications for the aesthetic embellishment of a face / head, for example.

We know that there are in the beauty industry, a number of equipment to achieve virtually a self-aesthetic, for example the digital pose of a makeup, a hair coloring, a hairstyle in form of the hairpiece, and whose method implemented remains a supervised and therefore non-automated version and which uses infographic tools for placing the points as close as possible to the contours. A Bézier curve or parametric polynomial curves connects these points among them. A palette allows you to manually apply the desired transformations manually controlled by an operator and / or the user himself.

For a larger-scale process, an online user on the Internet from their own PC or Mac can use with a delay of more than 24 hours, and after posting on the Internet their color portrait photograph in JPEG format and under a number of constraints. , the services of an ASP web server platform ("Application Service Provider") which provides users of a third-party Web site with contour contouring functions, color detection hair, carnation and eyes, which are obtained according to statistical and manual methods performed by the intervention of human technical teams.

The photograph can also be retouched using different layer techniques. Thus, each element can be placed on a different layer, the final result being obtained by the superposition of all the layers, which allows to obtain the final retouched photograph for its exploitation. In this way, the work can be broken down, which makes the task easier for the user. The retouched photograph can then be exploited locally in a dedicated Microsoft® X Active Type application. This technology and the set of tools developed by Microsoft® make it possible to program components that can make the content of a Web page interact with applications executable on the computer of the Internet user type PC, in particular under a computer system. Windows® operation. Another equivalent technique is to use the Java® application.

These applications are called "Virtual Makeover" for local use on PC or MAC and "Virtual Makeover On One" for the Internet. The advantage of this type of system is to make it possible to obtain aesthetic images without explicit manipulation of professional computer graphics software, such as Adobe Photoshop® or Paintshop Pro®, or other types of software for editing, processing and editing. computer-assisted drawing. They are mainly used for the processing of digital photographs, but are also used to create images from scratch.

More recently, equipment has been developed that uses automated image processing techniques that allow digitized images to be used to produce or extract information from other digital images. improve the use locally. On the other hand, the quality of coding related to the segmentation of facial features requires standardized cabin-type shooting parameters in order to improve the robustness of processing from a static color image to JPEG formats (" Joint Photography Experts Group ") and BMP (" Bitmap "). The Simulation remains supervised and is performed sequentially with a treatment that varies from 5 to 10 minutes to obtain an aesthetic image in 2 dimensions and up to about 60 minutes for a 3-dimensional image, for example the simulation of a make-up.

WO 01/75796 discloses a system for virtually transforming a still image, such as a photo of a face.

However, all these methods and devices listed above remain impractical for lack of instantaneousness and because of a too uncertain accuracy, because of their weak robustness with respect to the various poses constraints of the subject and the various environmental conditions of the subject. physical and / or artificial world. In addition, the techniques are not currently able to offer robust and good quality methods in analysis and face transformation in real time, especially if it is a face / head of a character in motion in a scene a priori any.

The robustness vis-à-vis the great diversity of individuals and conditions of acquisition, including different presentations in the pose of a character, equipment, uncertain lighting conditions, different fixed or mobile funds, etc. . are the crucial point and represent a number of technological and scientific obstacles to consider a large-scale industrialization of such processes in the form of professional or domestic devices.

The present invention aims to provide a method and an image processing device do not reproduce the disadvantages of the prior art.

The present invention is intended in particular to provide such a method and device for processing video image sequences, in particular for moving subjects.

The present invention also aims to provide such a method and device, which is simple, reliable and inexpensive to make and use.

The present invention therefore relates to a method and a high-performance digital device, allowing in a video stream composed of a succession of images, for example from 25 to 30 images per second, component the illusion of movement, to obtain by computer and in real time, an automatic and precise extraction of the contours of all the characteristic features of the face, the hair, the color of the eyes, the complexion and the hair and taking into account certain occlusions. The set leads to an aesthetic and individualized virtual simulation of an initial objective, for example a face / head of a character moving in a scene, in a video stream by a robust and real-time or delayed processing or " play-back ». This simulation can comprise an encoding then a transformation by a new reading of the sequence.

The subject of the present invention is therefore an automatic method for virtual simulation of an individualized video image sequence for each user, which can be produced from a real video image sequence of a moving face / head, comprising, for example, an acquisition and initialization phase: the detection and analysis of the shapes and / or contours and / or dynamic components of an image of the face / head of the actual video sequence, the extraction of characteristic points of the face head, such as the corners of the eyes and mouth, using predefined parametric models; during a phase of evolution: the definition of specific parametric models from said extracted characteristic points, serving as initial points of initiation, the deformation of said specific models to adapt to the contours of the features present on the analyzed face, the detection and analysis of cutaneous structure of one or more regions of the face / head; and during a follow-up and transformation phase: the modification of the characteristic features of the other images of the video sequence, the modification of the colors of the cutaneous structure, said modifications being performed according to criteria provided for in at least one database and / or according to decision criteria of at least one expert system of order 0+ or 1.

Advantageously, the detection and analysis step, for determining region / contour and temporal spatial information, is performed by maximizing the luminance gradient and / or chrominance fluxes. Advantageously, said modifications are obtained by translations of the neighborhoods of the characteristic points of the preceding image in the following image, affine models, including a deformation matrix, which can be used when the neighborhood of the characteristic points can also undergo a deformation.

Advantageously, the tracking phase uses an algorithm to follow characteristic points from one image to another.

Advantageously, said algorithm uses only the neighborhood of characteristic points.

Advantageously, to avoid the accumulation of tracking errors, the characteristic points are recaled using a simplified version of the active contours, and / or by deforming the curves of a model obtained in the previous image.

Advantageously, the method comprises a step of modeling the closed and / or open mouth by means of a plurality of characteristic points connected by a plurality of cubic curves.

The present invention also relates to a device for implementing the method described above, comprising a computer system, a light source, an electronic message management system, at least one database, local or remote on the networks digital, such as the Internet, and / or at least one expert system of order 0+ or 1, for obtaining and transforming a sequence of digital real images into a virtual image sequence, preferably at the speed of 25 frames per second, said virtual image sequence being transformed according to decision criteria of at least one expert system of order 0+ or 1.

Advantageously, said computer system is based on a microprocessor type CPU ("Central Processing Unit") mono, dual, quad cores and higher, or conventional multicore processors, types pentium, athlon or higher or type SPU ("Streaming Processor Unit") , equipped with a main core and up to eight specific cores, arranged in a cabin, a console, a self-service device, a pocket or mobile device, a digital television, a local server, or deported over digital networks, such as the Internet, at least one digital video camera, at least one screen, at least one printer and / or a connection to digital networks, such as the Internet, in wherein the computer system providing the image processing comprises a computer having a hard disk, preferably of a capacity of at least 500K bytes, and / or a digital storage memory, one or more media, such as CD-ROM, DVD, Multimedia Card®, Memory Stick®, MicroDrive®, XD Card®, SmartMedia®, SD Card®, Compact Flash® Type 1 and 2, USB stick, modem or wired or radio frequency connection to digital networks, such as the Internet, and one or more Ethernet LAN, or Bluetooth®, infrared, wifi®, wimax® and similar local area connection modules.

Advantageously, after displaying the virtual image sequence on a screen, a printer proceeds locally, or remotely, printing, preferably color, of at least one photograph selected from all or part of the sequence of virtual images.

Advantageously, the image processing module, to carry out the steps of acquisition, detection, transformation and monitoring, is integrated in one or more processors specialized in the processing of signals of the DSP ("Digital Signal Processor" or "DSP") type. digital signal processor).

The present invention therefore relates to a method and a device for simulating, by automatic processing and in all the environmental conditions and poses of the moving subject, an image or a sequence of aesthetic images in a video stream, starting from a or several images of a real objective, for example a face / head of a moving character in a scene, where one extracts from the image and / or from the sequence of real images, captured preferably by a Color digital video camera, at the real-time rate, the contours of the dynamic components of the face / head of the moving character, to produce relevant parameters to synchronize the tools of virtual aesthetic transformations, for example in the regions of the eyes, eyebrows , mouth and neighborhoods, in multi-criteria function provided in at least one local and / or remote database on digital networks, such as the Internet, and / or according to decision criteria previously defined in the knowledge base of at least one expert system of order 0+ or 1.

The computer system implemented may be installed in a cabin or console or a self-service device or a pocket or mobile device or a digital television or a local server or remote on digital networks, including the Internet, or any form of possible devices to come.

In its first destination, it may include a computer or a microprocessor for processing type CPU ("Central Processing Unit" or CPU), mono, dual, quad cores and higher or multi core processors classic types pentium, athlon or higher , or SPU type ("Streaming Processor Unit"), equipped with a main core and up to eight or more specific cores, with a hard disk of at least 500K bytes and / or a digital storage memory, one or more media of the CD-ROM, DVD, Multimedia Card®, Memory Stick®, MicroDrive®, XD Card®, SmartMedia®, SD Card®, Compact Flash® types 1 and 2, USB key or other, all types of modems or modules for wired or wireless connections to digital networks, such as the Internet, one or more connection modules for the local environment such as Bluetooth®, infrared, wifi®, wimax® and forthcoming , a color camera fixed video or type CCD and superior digital television, a discrete light source or not, all types of screens, preferably colors, current and future, all types of printers monochrome or current and future colors, one or more bases local or remote data on digital networks, including the Internet and, depending on the case, an expert system of order 0+ or 1.

In a second destination, if one wishes to install such a simulation system in the shelves of cosmetics stores, in a specialized institute or cabinet, so as to be the least cumbersome or even kinetic, it may be desirable that depending on the size of the client and for its comfort, the simulator moves by itself to the height of the face of the client. That is to say, it does not take up space on the ground.

In this case, the vision system defined above can be composed of a discreet daylight or white light-emitting diode, a mono CCD or higher camera, a graphics card, a flat and touch-sensitive color screen. a receipt-type printer or A4 and A3 paper color or higher. The assembly can then be fully integrated in a lightweight "panel PC" ultra light whose size is given by the dimensions of all types of flat screen, preferably color. The processing is local and all technical updates or content such as maintenance can be done via a wired connection or radio frequency to digital networks including the Internet.

The system operates by the visual servoing of the device via its camera, automatically leaning on the face of the user by the use of a module for detecting and tracking a face / head of a character. From a position that we say balance, when a simulation is desired by a user, the system will stop when it has detected its purpose, for example the image of a face / head. Depending on the environmental conditions, the vision system can automatically adjust the camera's illumination and zoom in order to have an image size and a facial image code on the screen of almost constant optimal quality.

In a third destination, the control system of the process functions can be a terminal, such as an alphanumeric keyboard, a mouse or any other means. The camera, depending on the terminal, can be connected by all connections or all types of digital networks to an editing system, preferably with the outputs on color screen and / or paper assembled in a single device next to the user. The processing and calculation part can be managed by one or more local servers, or deported on digital networks, including the Internet, and equipped with at least one microprocessor, for example 32 or 64 bit CPU ("Central Processing Unit") type. mono, dual, quad-core and higher or multi-core processors classic types pentium, athlon, or type SPU ("Streaming Process Unit") or a main core and eight specific cores type cell, and all types of electronic or magnetic memories.

Whatever the device implemented, the capture of the color images is advantageously obtained in real time by means of all types of digital video cameras, preferably color, such as a digital color video camera mono or CCD, or a device to load coupling and superior, a complementary color CMOS (metal-oxide semiconductor) video camera, or similar, for example, a webcam, in order to provide, in real time, a valuable aesthetic simulation by high quality detection, components geometric and dynamic facial features and appropriate image processing. To be sufficiently user-friendly, the processing can be done locally or on a remote server and according to the speed of calculation, in real time or assimilated as such, or in playback mode.

The whole treatment can be done without too many lighting constraints and poses for each character present in the image, considering an uncertain background fixed or mobile, and a number of occlusions.

Thus, in various environmental conditions of the physical and / or artificial world, the experiments demonstrate that the method implemented according to the present invention remains robust and sufficiently precise during the phase of extraction and evolution of the contours of the permanent features of the face / head namely namely: the eyes, eyebrows, lips, hair and other morphological elements, according to the work of aesthetic transformation sought, etc.

For each of the features considered, for example a smiling or talking mouth, various specific parametric models capable of accounting for all the possible deformations can be predefined and implemented according to the decision criteria of the expert system base.

The method advantageously comprises three synchronized phases: 1. An acquisition and initialization phase: the shapes and contours of the face / head are analyzed and detected in a digitized video image corresponding to the first image of the image. 'a sequence. Dots characteristics and areas of interest of the face / head are extracted, corners of eyes and mouth for example, and serve as initial anchors for each of the predefined adapted parametric models. In the evolution phase, each model is deformed in order to coincide with the contours of the features present on the analyzed face. This deformation is done by maximizing a luminance gradient flux and / or chrominance along the contours defined by each curve of the model. The definition of models makes it possible to naturally introduce a regularization constraint on the contours sought. Nevertheless, the chosen models remain flexible enough to allow a realistic extraction of the contours of the eyes, the eyebrows and the mouth.

2. A phase of monitoring and transformation. Tracking makes segmentation more robust and faster in subsequent frames of the video clip. The transformation leads to the modification of the fine characteristic areas of the face / head followed in the video sequence according to multiple criteria provided for in the database and / or as the case may be according to decision criteria of an expert system of order 0 + or 1.

3. A rendering phase offering on a screen and / or paper and / or via a server on all digital networks, the results of the transformation phase for the entire video sequence.

During the first phase, the video processing system will coordinate several successive operations.

At first, he proceeds to the first image of the sequence at the location of the face / head of the moving character in a scene by considering the typical chrominance information associated with the skin. The detected face / head corresponds to an area of interest in the image.

Following this extraction, the method makes it possible to overcome illumination variations by using a filter adapted to the behavior of the retina, for the region of interest.

The system then proceeds, for the region of interest thus filtered, to the extraction of the characteristic features of the face, preferably using suitable parametric models, namely the irises, eyes, eyebrows, lips, the contour of the face and the helmet of the hair.

For the iris, we search for the semicircle that maximizes the normalized luminance gradient flux in each right and left quarter of the rectangle encompassing the face.

The initial positioning of each model on the image to be processed takes place after the automatic extraction of characteristic points on the face.

A maximum luminance gradient point tracking process can be used to detect the corner of the eyes. Two Bézier curves, one of which is curved towards its end to naturally follow the drawing of the lower contours of the eye, models chosen for the upper and lower contours of the eye, can be initialized by the two corners of the eyes. and the lowest point of the circle detected for the iris for the lower contour, and by the two corners of the eyes and the center of the circle detected for the iris for the upper contour.

For the initialization of the Bezier curve associated with the eyebrows, one can extract the two inner and outer corners of each eyebrow.

The proposed model for the modeling of the lips is advantageously at least composed of five independent curves, each of them naturally describing part of the outer lip contour and at least two curves for the inner contours. The characteristic points of the mouth in order to initialize the model can be analyzed by using a discriminant information combining luminance and chrominance as well as the convergence of an active contour type that makes it possible to overcome the parameters of the parameters of the contour. as well as its high dependence on the initial position.

The modeling of the contour of the face advantageously uses eight characteristic points situated on this contour. These eight points initialize an outline modeled by deformable ellipse quarters, according to the position of the face in a temporal dimension.

The hair helmet can be segmented from the detection of the contour of the face by associating the filtering of the background of the image with the use of active contours. Characteristic points located on the contour of the hair are thus detected. Between each of these points, the model used can be a cubic polynomial curve.

All proposed initial models can then be deformed so that each desired contour is a set of points of maximum luminance gradient. The selected curves will preferably be those that maximize the normalized luminance gradient flux across the contour.

During the second phase, the tracking step allows the segmentation in the following images of the video sequence. During this step, the results obtained in previous images provide additional information that can make segmentation more robust and faster. The precise tracking procedure, according to an advantageous embodiment of the present invention, uses an algorithm that makes it possible to follow characteristic points from one image to another. This differential method, using only the neighborhood of points, brings a significant time saving compared to a direct extraction technique. To avoid an accumulation of tracking errors, the characteristic points can be recalculated by using a simplified version of the active contours, and / or by deforming the curves of a model obtained in the previous image.

The transformation step may lead to the modification of the fine characteristic areas of the face / head followed in the video sequence according to multiple criteria provided in the database (s) and / or as the case may be according to decision criteria of at least one expert system of order 0+ or 1. The present invention may offer the user different looks, palettes, present in the database for viewing on his face. In order to propose a precise and realistic aesthetic simulation dependent on the treated face, the system can search, from anthropometric ratios calculated by an expert system of order 0+ or 1, the characteristic zones, for example the cheekbones, to transform . In addition, for each face, the expert system can define makeup procedures that are dependent on the shape of the face, round or elongate or square or triangular or oval, and certain features, eyes apart or eyes close together or eyes equal, nose size, etc. These rules can be communicated to the transformation module for a realistic and dependent simulation of each face to transform. The method also carries out during this phase a classification of faces such man, woman, child, teenager in particular.

Finally, the rendering phase offers on a screen and / or on paper and / or via a server on all digital networks, the results of the transformation phase for the entire video sequence and / or for part of this sequence

Other features and advantages of the present invention will appear in the course of the following detailed description, made with reference to the accompanying drawings, given by way of non-limiting examples, and in which:

FIG. 1 is a block diagram of a virtual image simulation system according to an advantageous embodiment of the present invention;

FIG. 2 is a block diagram illustrating the phase of extraction of the faces / heads of characters and the characteristic zones according to an advantageous embodiment of the present invention;

FIG. 3 represents the block diagram of the retinal filtering;

FIG. 4 is a drawing of one of the parametric models adapted to the tracking of moving lips;

FIG. 5 represents the result of the automatic extraction of the characteristic areas of the face from a video sequence presenting a single character with the head moving in front of the camera lens along the axes of orientations X, Y and Z symbolized in this same figure, namely the contour of the face, the iris, the eyes, the mouth, the eyebrows, and the headgear of the hair;

Figure 6 represents the result of an aesthetic simulation such a look, before and after transformation. FIG. 1 represents an example of a detection system and real-time automatic tracking of the characteristic features of a real objective, such as a face / head of a character moving in a scene, with the possibility of virtual simulation of images and comprising an image acquisition and initialization module 1, a tracking and transformation module 2 and a reproduction module 3. Each module will be described in detail below.

The image acquisition and initialization module 1 is implemented from all types of digital color video cameras, such as a color digital video camera CCD mono or charge coupled device and superior, a video camera complementary color CMOS (metal-oxide-semiconductor), or the like.

The sequence of images taken by the acquisition module is analyzed in order to detect the zones and characteristic points of the face / head. This analysis is implemented in the form of a microprocessor of the 32 or 64-bit CPU type, SPU, or a main core and eight specific cores type cell, mono, double, quad cores and higher or multi core processors classic types pentium, athlon , a personal computer or a digital signal processor. The zones and characteristic points of the face / head of the moving character in a scene thus extracted and coupled to the flow of images are sent to the tracking and transformation module which according to the multiple criteria provided in one or more database (s) of data or, depending on the case, according to decision criteria of one or more expert system (s) 21, refers to the rendering module 3 its results: a video sequence with, for example, the masked face. The rendering module offers, according to the present invention, the results on any type of screen (cathodic, LCD, plasma or the like) and / or on any paper format and / or via a server on all digital networks, for example the Internet.

Figure 2 shows a block diagram illustrating the extraction phase of the face / head of the character and characteristic areas according to the present invention. At the level of the initialization module 1, the software for processing the video sequence clocked at the acquisition speed of the digital video sensor will coordinate several successive operations according to the invention. At first, he proceeds to locate the face / head of the character in a scene. For this purpose, the typical chrominance information associated with the skin is considered. This defines the region of interest of the image by a bounding rectangle. A pretreatment phase 12 of this region of interest makes it possible to overcome illumination variations by using adapted filtering inspired by the behavior of the retina. This filtering makes it possible, by performing a succession of filterings and adaptive compressions, to perform a local smoothing of the variations of illumination. Let G be a Gaussian filter of size 15x15 and standard deviation σ = 2. Let I _{1n be} the initial image and Ii the result of its filtering by G. From the image Ii, we define the image X ₀ by the relationship :

0.1 + 410 /, χ _ϋ = - 105.5 + /,

The image X ₀ makes it possible to define the compression function C by the relation:

(255 + X ₀ ) /

C: / → -

X _{0 +} I

Figure 3 gives the block diagram of the retinal filtering, the output of this filtering is noted I _out . For example, at the end of the filtering, on a laterally illuminated face which therefore has a significant variation in luminance between the left and right parts of the face, the luminance variations will be greatly reduced.

The automatic extraction of the contours of the permanent features of the face, namely the contour of the face, whose homogeneity is taken into account, the irises, the eyes, the eyebrows, the lips, the hair helmet, follows in a second time . For each of the traits considered, a specific parametric model (cubic polynomial curves, Bezier curves, circle, etc.) capable of accounting for all possible deformations is defined. For the iris, we search for the semicircle that maximizes the normalized luminance gradient flux in each right and left quarter of the rectangle encompassing the face since the contour of the iris is the border between a dark zone, the iris, and a clear area, the white of the eye. The method of maximizing the standardized gradient flow has the advantage of being very fast, without parameter adjustment, and it leads unambiguously to the selection of the right semicircle since the standardized gradient flow always has a very marked peak corresponding to the correct position for the desired semi-circle.

Characteristic points of the face are extracted (corners of the eyes and mouth for example) and serve as initial anchors 13 for each of the other models.

Bezier curves, one of which is curved towards its end, models chosen for the upper and lower contours of the eye, are initialized by the two corners of the eyes, detected by a process of tracking points of maximum luminance gradient, and the lowest point of the circle detected for the iris for the lower contour and the two corners of the eyes and the center of the circle detected for the iris for the upper contour.

For the initialization of the Bezier curves associated with the eyebrows, the two inner and outer corners of each eyebrow are advantageously extracted. For each eyebrow, the search area of these points is reduced to the area of the image above the detected iris. For the calculation of the abscissae of the inner and outer corners, we search the abscissa of the points for which there is a change of sign or cancellation of the derivative of the horizontal projection of the valley image along the lines. For the computation of the ordinates of these points, one searches the abscissa of the maximum of the vertical projection of the valley image following the columns. The two inner and outer corners and the center of its two corners serve as initial control points for the Bezier curve associated with each eyebrow. Since this method is subject to noise, the points thus detected are readjusted during the deformation phase of the model associated with the eyebrows.

The proposed model for lip modeling can be composed of five independent cubic curves, each of which describes a part of outside lip contour. Figure 4 shows a drawing of this model for a closed mouth. Unlike most models proposed in the literature, this original model is sufficiently deformable to faithfully represent the specificities of very different lips. Between Q2 and _Q4, Cupid's bow is described by a broken line while the other portions of the outline are described by cubic polynomial curves. In addition, it requires having a zero derivative at point Q2, Q4 and _Q6. For example, the cubic between Qi and Q ₂ must have a null derivative in Q ₂ . Extraction of the characteristic points Q ₁ , Q ₂ , Q ₃ , Q ₄ , Q ₅ , Q ₆ from the mouth in order to initialize the model is done by using a discriminant information combining luminance and chrominance as well as the convergence of an active contour type that makes it possible to dispense with the settings of the parameters of the active contour as well as its high dependence on the initial position. The same goes for the internal labial contours where two curves allow to perfectly marry the inner contours.

Detecting the inner contour is more difficult when the mouth is open, because of non-linear variations in appearance inside the mouth. Indeed, during a conversation, the area between the lips can take different configuration: teeth, oral cavity, gums and tongue.

The parametric model for the inner contour, when the mouth is open, can be composed of four cubic. For an open mouth, "Cupid's bow inside" is less pronounced than for a closed mouth; thus two cubic is enough to precisely extract the upper inner contour of the lips. With four cubic, the model is flexible and overcomes the problem of the segmentation of the inner contour for asymmetrical mouths.

Two active contours called jumping snakes can be used to adjust the model; one for the upper contour and one for the lower contour.

The convergence of a jumping snake is a succession of phases of growth and jumping. The "snake" is initialized from a germ, then it grows adding points to the left and right of the germ. Each new point is found by maximizing a gradient flow through the segment formed by the current point to be added and the previous point. Finally, the seed jumps to a new position closer to the desired outline. The growth and jump processes are repeated until the jump amplitude is below a certain threshold. The initialization of the two "snakes" begins with the search for two points on the upper and lower contours, and belonging to the vertical passing through Q ₃ in Figure 4. The difficulty of the task lies in the fact that there can have different areas between the lips, which may have characteristics (color, texture or luminance) similar or completely different than those of the lips, when the mouth is open.

From the key points detected, the final inner contour can be given by four cubic. The two cubes for the upper contour can be calculated by the least squares method. Similarly, the two cubic of the lower contour can also be calculated by the least squares method.

The modeling of the contour of the face preferably uses eight characteristic points situated on this contour a priori since a face can present very long hair which completely cover the forehead and possibly the eyebrows and the eyes: two points at the level of the eyes, two points at the level of the eyebrows, two points at the mouth, a point at the chin and a point at the forehead, which are extracted from a thresholding in the V plane of the HSV representation of the image. These eight points initialize an outline modeled by quarter ellipses.

The helmet of the hair can be segmented from the detection of the contour of the face by associating the filtering of the bottom of the image with the use of active contours. Characteristic points located on the contour of the hair are thus detected. Between each of these points, the model used can be a cubic polynomial curve. It is possible that the automatic extraction of one or more points fails, in this case the point or points can be very easily replaced manually to correctly replace the model or models and approach their evolution phase.

In the evolution phase of the models, each model is deformed to coincide with the contours of the features present on the analyzed face. This deformation is done by maximizing a luminance gradient flux and / or chrominance, along the contours defined by each curve of the model.

The definition of models makes it possible to naturally introduce a regularization constraint on the contours sought. Nevertheless, the chosen models remain flexible enough to allow a realistic extraction of the contours of the eyes, the eyebrows and the mouth. FIG. 5 represents the result of the automatic extraction of the characteristic zones of the face, namely the contour of the face, the irises, the eyes, the mouth, the eyebrows, and the helmet of the hair, which respectively form anthropometric modules of the face, according to an aspect of the present invention.

Third, the software tracks the face / head and facial features in the video sequence. During the follow-up, the results obtained in the previous images provide additional information that can make the segmentation more robust and faster.

The precise tracking procedure, according to an advantageous embodiment of the present invention, uses an algorithm that makes it possible to follow characteristic points from one image to another. This differential method, using only the neighborhood of points, brings a significant time saving compared to a direct extraction technique. This method is based on the apparent motion stress equation derived from a Taylor development of the equation below:

I _t (xd (x)) = I _{t + ι} (x)

We suppose that the neighborhood of the point followed in the image /, is found in the following image I _{t + ι} by a translation. d (x) is the displacement vector of coordinate pixel x or x is a vector. Consider a neighborhood R of size nxn in the reference image taken at time t. The goal is therefore to find in the next image the region most resembling R. If we denote I _t (x) and I _{t + ι} (x) the values of gray levels in these two images, the method minimizes the cost function equal to the sum of the inter pixel differences squared.

In addition, to avoid the accumulation of tracking errors, which would give approximate results, the method advantageously uses a registration of the characteristic points by using a simplified version of the active contours and / or by deforming the curves of the model obtained in the image. previous. Finally, the final contours are extracted. For this, the shape of the characteristic zones in the previous image as well as the characteristic points are used to calculate the optimal curves constituting the different models.

During the transformation phase, the tools for recognizing and monitoring the anthropometric areas of the face in the image communicate all the data they have extracted during the transformation phase. Depending on the multiple criteria provided for in the database and / or according to the decision criteria of an expert system of order 0+ or 1, the module will then determine the treatments to be performed. These will be determined by the theme or themes that the user will have chosen. For example, if it is a make-up operation, the characteristic areas of the face, defined according to the extraction results and according to the function chosen by the user (look / palette), are automatically modified in the sequence of consecutive images according to harmonic and personalized choices. For example, for a round face, the process blurs the sides of the face in a darker tone. On the contrary, for a triangular face, the process blurs the sides of the face in a lighter tone. The user can choose the look, present in a database, which he wishes to apply to the face appearing in the consecutive images. The looks are particular drawings previously defined with the skilled person. These drawings and appropriate forms are characterized as being previously defined virtual templates that will be recalculated and readjusted to the areas of the face where they operate, depending on the information from the extraction and monitoring module, the context of the image and the effects they must suggest.

The user can also choose zone by zone (lips, eyes, cheekbones, face, etc.) the color he wishes to apply. These colors will be in harmony with the characteristics of the face. Thus, the expert system determines a range of available colors, correlated with those of a range available in its database or databases, according to the data from the initialization phase and evolution.

Thus, during the restitution phase, the tool will be able to make a coloring proposal in harmony with the face for example, but also offer a selection of colors, from a range, in perfect harmony with the face. The colors complemented by their original textures are analyzed, calculated and defined in their particular context (lipsticks or glosses or powders in particular).

The tools will then apply depending on the texture of the area (lip, cheek, hair, etc.) the color corresponding to the makeup, but also in a transparent manner the effect of the cosmetic product, that is to say that it will reproduce its real appearance, for example its brilliance, its powdered or glittery appearance (glittery lipstick in figure 6, especially its matte appearance) This operation takes into account the context of the sequence of consecutive images in each of their respective areas (lighting, brightness, shadows, reflections, etc.), which will allow the help of algorithmic tools to calculate their textures, to define them in their real aspect, as they would be reproduced in reality .

By this method, the quality and realistic properties of the consecutive image sequence are substantially improved. In addition, some features of the face are improved. For example, face wrinkles, crow's feet wrinkles, dark circles, lion wrinkles, nasolabial folds, bitter creases, perioral wrinkles, freckles, acne and rosacea are strongly blurred.

Also, aesthetic treatments such as face whitening, tanning, teeth whitening, eyelid lifting, thickening of the lips, the light rectification of the oval of the face, the rectification of the shape of the chin and / or the nose, the raising and raising of the cheekbones are simulated automatically for a face appearing in a video sequence.

We can also improve the aesthetics of the face in relation to a new hairstyle, and / or hair coloring. It is also possible to adjust the appropriate color, material, shape and / or dimensions of eyeglass frames, jewelry and / or ornamental accessories with the face, or to adjust contact lenses of colors or fun in keeping with the hue of the iris. It is also possible to apply the invention to facial biometric techniques, for example to identify with an optimal reliability rate a known face whose characteristic information is loaded into the database of the expert system. Digital passport photos can also be made to the biometric passport standard.

The invention also makes it possible to recognize visemes that describe the different configurations, or different pronounced phonemes, of a talking mouth. It thus makes it possible to determine the personality and the character of a person, examined from the morphological observation of his / her face / head, such as, for example, the presence of the folds of bitterness, the size and the spacing of the eyes, the size and shape of the nose, the lobe of the ears, the database corresponding to the observation of the faces being then supplemented by the techniques implemented by the morpho psychologists, the psychiatrists, the profilers and the anatomists in the domain considered.

It is also conceivable to apply the invention to digital photography carried out in particular in the identity and / or amusement photo booth, on automatic terminals for the development of digital snapshots, on retouching and development computer graphics systems. images, allowing to make, improve or enhance the aesthetics of the image of a user, the database being then supplemented by a collection of aesthetic rules and looks of images, usable simultaneously or not, concerning make-up, fun, hairstyle, hair techniques, skin texture, and accessorization. All the elements RGB or red, green, blue completed indications drawings, thresholds and coordinates constituting the realization of a "Look" or the natural visualization of a lipstick in a palette for example, can be made and recorded in the form of a simple file consisting of a low-weight alphanumeric string broadcast on all digital media or download from a server on digital networks such as the Internet. This file can be used for the artistic update of the database or the expert system in a flexible and fast way or be used immediately by the user from a simple download from a web page for example.

In general, the database associated with the expert system is enriched with specific rules relating to the application of the invention, for example cosmetics and / or dermatology, plastic surgery and / or aesthetic medicine, ophthalmology, hairdressers and / or hairdressers, facial biometrics, etc.

Thus, the treatment is independent of the content which allows a use of the process on an industrial scale and a very strong propagation of its use with a strong increase in yield.

More generally, the characteristic features of the face in the video sequence are modified according to decisions of the database and / or the expert system. Figure 6 shows the before / after result of a makeup simulation (look), accessories (color lenses, piercing), and hair coloring for an image extracted from a video sequence acquired by a color video camera.

The rendering module translates, according to the present invention, the display of the sequence of transformed images on any type of color screen and / or then by printing one or more images simulated on n ' any paper format and / or via a server on any digital network.

For the simulation, the restitution phase results in an aesthetic proposition characterized by the transformation of the initial video sequence into a a new virtual video sequence on which the desired aesthetic modifications appear in perfect harmony. For example a makeup, complete with accessories and a hair color and the references and sales prices of the corresponding products in one or more brands.

A static image chosen by the user from the video sequence, can then be edited locally, on a color printer of needles type, inkjet, solid inkjet, laser, or thermal sublimation, in A4 format or any other format available technically.

The content of its information formulates a beauty prescription, taking the initial image and the transformed image, technical and scientific advice, professional tips, facial features (shape, color, etc.), product photography, personal color palette in harmony with the transformed facial features, a color garment board compared to our colors etc. The results can be similarly edited on high definition delocalized printers from an Internet server which will send them to the user's postal address.

These same results can be translated, in the same way, on or in different pre-printed or non-printed media (CV, virtual postcard, multimedia clip, video, calendar, banner, poster, photo album, etc.) available through the applications. the server. They can be archived in all types of memories of the terminal or on the Internet server for later use.

The new image and / or the new video sequence completed or not information can be sent by the email function and using the command "Insert attachment" to one or more correspondents with an email address type email. The same is true with a mobile phone device having an MMS mode, email or future.

It is easy to imagine that this system can receive a large number of applications by completing the expert system (s) and / or the local database (s) or remote (s) by scientific data. and specific techniques. The invention can find an application for image processing in two or three dimensions. In a 3D application, we can build a 3D modeling of the face, to precisely apply 3D makeup. The 3D reconstruction of the face, from a static image of the face or a flow of images of faces, is made using conventional algorithms and procedures, such as the analysis of shadows, texture , movement, the use of generic 3D models of the face or the use of a stereo scopic system.

Although the invention has been described with reference to various advantageous embodiments, it is understood that it is not limited by this description, and that the person skilled in the art can make any modifications without departing from the scope of the present invention. present invention defined by the appended claims.

Claims

claims

1.- Automatic method for virtual simulation of an individualized video image sequence for each user, realizable from a sequence of real video images of a moving face / head, characterized in that it comprises:

during an acquisition and initialization phase:

the detection and analysis of the shapes and / or contours and / or dynamic components of an image of the face / head of the actual video sequence,

the extraction of characteristic points of the face / head, such as the corners of the eyes and the mouth, using predefined parametric models,

- during an evolution phase:

the definition of specific parametric models from said extracted characteristic points serving as initial initiation points,

the deformation of said specific models to adapt to the contours of the features present on the analyzed face,

the detection and analysis of the cutaneous structure of one or more regions of the face / head, and

- during a monitoring and transformation phase:

the modification of the characteristic features of the other images of the video sequence,

- the modification of the colors of the cutaneous structure,

said modifications being performed according to criteria provided for in at least one database and / or according to decision criteria of at least one expert system of order 0+ or 1.

2. Method according to claim 1, wherein the step of detection and analysis, for the determination of spatial information region / contour and time, is achieved by maximizing luminance gradient flux and / or chrominance.

3. A method according to claim 1 or 2, wherein said modifications are obtained by translations of the neighborhoods of the characteristic points of the preceding image in the following image, affine models, including a deformation matrix, which can be used when the neighborhood of the characteristic points may also undergo deformation.

4. A method according to any one of the preceding claims, wherein the tracking phase uses an algorithm to follow characteristic points from one image to another.

The method of claim 4, wherein said algorithm uses only the vicinity of feature points.

6. A method according to claim 4 or 5, wherein, to avoid the accumulation of tracking errors, the characteristic points are recaled using a simplified version of the active contours, and / or by deforming the curves of a model. obtained in the previous image.

7. A method according to any one of the preceding claims, comprising a step of modeling the closed and / or open mouth by means of a plurality of characteristic points connected by a plurality of cubic curves.

8.- Device for implementing the method according to any one of the preceding claims, characterized in that it comprises a computer system, a light source, an electronic message management system, at least one database, local or deported on digital networks, such as the Internet, and / or at least one expert system of order 0+ or 1, for obtaining and transforming a sequence of digital real images into a virtual image sequence, preferably the speed of 25 frames per second, said virtual image sequence being transformed according to decision criteria of at least one expert system of order 0+ or 1.

9. A device according to claim 8, wherein said computer system is based on a microprocessor type CPU ("Central Processing Unit") mono, dual, quad cores and higher, or conventional multicore processors, types pentium, athlon or higher or type SPU ("Streaming Processor Unit"), equipped with a main core and up to eight specific cores, arranged in a cabin, a console, a self-service device, a pocket or mobile device, a digital television, a server local, or deported over digital networks, such as the Internet, at least one digital video camera, at least one screen, at least one printer and / or a connection to digital networks, such as the Internet, in which the computer system providing the image processing comprises a computer having a hard disk, preferably of a capacity equal to at least 500K bytes, and / or a digital storage memory, one or more media, nota such as CD-ROM, DVD, Multimedia Card®, Memory Stick®, MicroDrive®, XD Card®, SmartMedia®, SD Card®, Compact Flash® Type 1 and 2, USB stick, modem or wired or radio frequency connection to digital networks, such as the Internet, and one or more Ethernet LAN, or Bluetooth®, infrared, wifi®, wimax® and similar local area connection modules.

10.- Device according to claim 8 or 9, wherein, after the display of the virtual image sequence on a screen, a printer proceeds locally, or remote, printing, preferably color, at least one photograph selected from all or part of the virtual image sequence.

I L- Device according to any one of claims 8 to 10, wherein the image processing module, for performing the steps of acquisition, detection, transformation and monitoring, is integrated in one or more specialized processors in the processing of signals of the DSP type ("Digital Signal Processor" or "Digital Signal Processor").