MXPA98004381A - Method for generating animated characters fotorrealis - Google Patents

Method for generating animated characters fotorrealis

Info

Publication number
MXPA98004381A
MXPA98004381A MXPA/A/1998/004381A MX9804381A MXPA98004381A MX PA98004381 A MXPA98004381 A MX PA98004381A MX 9804381 A MX9804381 A MX 9804381A MX PA98004381 A MXPA98004381 A MX PA98004381A
Authority
MX
Mexico
Prior art keywords
parameterized
face
facial parts
facial
animated
Prior art date
Application number
MXPA/A/1998/004381A
Other languages
Spanish (es)
Inventor
Peter Graf Hans
Cosatto Eric
Original Assignee
At&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At&T Corp filed Critical At&T Corp
Publication of MXPA98004381A publication Critical patent/MXPA98004381A/en

Links

Abstract

The present invention relates to a method for generating photorealistic characters in which one or more images of an individual are decomposed into a plurality of parameterized facial parts. The facial parts are stored in memory. To create animated frames, individual facial parts are recovered from memory in a defined form and superimposed on a base face to form an entire face, which in turn can be superimposed on a background image to form an animated frame.

Description

FI CAMPO PB LA IWVmCIOW This invention relates to the field of animation and more particularly to techniques for generating photorealistic animated faces in communications and computer applications and other high-tech operations. AWTBCSPBWTBS PB THE INVENTION In a modern technological era characterized by complex computer interfaces and sophisticated communications devices and videos, techniques that promote consumer interest in high-tech products have become increasingly prevalent.
The computer industry, with a view to directing consumers to the mainstream of their technological innovations, regularly searches for methods by which consumers can interconnect with devices that exhibit friendlier qualities to the user or human type. In addition to increasing the attractiveness of technology to the average, non-technical consumer, these methods raise the potential to generate a group of new and useful computer and video applications. In recent years, the prospect of using animated talking faces in various high-tech operations has become particularly tempting. Animated techniques can be used for a set of REF: 27538 diverse applications. The vision of talking faces to user-computer interfaces for example, can improve the friendliness of the computer user, promote the familiarity of the user with the computers, and increase the entertainment value of the computer. Generating animated agents in user-computer interfaces would be useful, among other applications, to perform tasks such as reading email, making announcements or directing a user through an application. The use of characters synthesized with live type, would also be valuable in any application where avatars are used (visual representations of people). These applications can include virtual conference rooms, video games of multiple users, talks, seminars, security or access devices, scenarios that regulate human instructions and a group of other operations. Right-handers in the industry are also considering model-based coding for video telephony in low-speed data connections. This application would involve generating characters or faces synthesized in the receiving station, so that the shape and structure of the images are regulated by parameters that originate in the transmitting station. Regardless of the particular application, the synthesis of human faces ideally involves the use of living, natural-looking characters ("photorealistic"). The use of photorealistic characters has many benefits. It has great entertainment value versus simple animated characters. In addition, using photorealistic characters as part of a human interface contributes a realistic element to a computer. Consumers who are otherwise intimidated by computer technology may feel more comfortable using a computer with a human-like interface. As another illustration, the use of a photorealistic character to give an office presentation, can create a more favorable impression to the attendees of the presentation that if simple animated characters were used - with simple animation, the characters can not speak concurrently while producing Realistic facial and mouth movements, typical of a person who speaks. Photorealistic characters can convey realistic and significant facial expressions. The simple animation, in contrast, is a caricature and not impressive, particularly in an arena such as in a corporate meeting. Photorealistic characters can also be used as an icon in a virtual reality application. These characters can also be used in media such as the Internet, where the media bandwidth of another form is too sticky to allow high frequency video signals. Using photorealistic techniques, human characters with realistic movements can be transmitted over the Internet instead of video. Right-handers in the specialty have made numerous efforts to synthesize photorealistic characters. A problem common to most of these methods is their ability to make the characters appear sufficiently human or human type. The remaining methods that can, at least in theory, generate more realistic-looking characters account for prohibitively large memory allocations, and processing times to achieve this goal. The utility of these methods is consequently restricted to high capacity media. In this way, an important goal previously unrealized by the right-handers is to describe a technique for generating photorealistic faces that provide a minimum amount of computation for the synthesis of animated sequences. In a natural way, the minimum computation will be required in the case where all the parts and their corresponding bitmaps are produced in advance and stored in a library. The synthesis of a face would then involve simply superimposing the parts. Modern graphic processors have become so powerful however that deforming images to generate animated shapes, can be done in real time. In this event, only the control points are stored in the library, which substantially reduces the memory required to store the model. The enfogues employed by right-handers generally fall into four categories: (1) three-dimensional modeling techniques ("3-D"); (2) deformation and metamorphosis techniques; (3) interpolation between views; and (4) techniques for generating motion printing by passing a sequence of graphics such as a notebook with the changing image or images, similar to the pictures of a movie (flip-book). These problems are described below. (i) Tri-Dimensional Modeling Right-handers have developed 3-D models to create faces and talking heads. Many 3-D modeling techniques use generic mesh models on which the images of people are mapped into textures. In general, the physical structure, such as muscle and bone structure, is designed with great precision and detail to derive the shape of the face. While 3-D modeling is useful for certain applications, the technique is replete with disadvantages when deployed in the structure of a character that speaks. These methods regulate extensive computer processing and consume substantial memory; consequently, they are not suitable for real-time applications on a personal computer. Furthermore, the facial movements created by standard 3-D models typically appear as "robot type" and do not appear natural. (2) Deformation or metamorphosis Other techniques to generate animated characters are based on deformation or metamorphosis of bi-dimensional face images (2-D). The deformation can be defined generally as the intentional distortion of an image in a pre-defined form. Deformation can be used, inter alia, to create expressions on one face by distorting that face with a neutral expression (for example to create frown). Metamorphosis is the transformation of an image into another image using interpolation and / or distortion. Unlike deformation, which simply distorts an image in a pre-defined form, metamorphosis typically uses two sets of fixed parameters that comprise an initial image and an objective image. Various commercial products use deformation or metamorphosis techniques, including certain toys that allow children to produce sequences of funny animated faces. One disadvantage of using deformation or metamorphosis in isolating is the ability of these techniques to provide natural-looking, realistic facial movements. Another disadvantage is that the metamorphosis process is possible only between pre-defined pairs of images. In this way, this technique is not adequate to generate models that must produce unexpected movements, such as when it is designed to pronounce previously unknown text. (3) Interpolation Between Reference Views According to this, researchers have sought to solve disadvantages of existing methods of metamorphosis. To allow for the possibility of extemporaneously generating previously unknown facial movements, researchers have developed techniques to automatically determine metamorphosis parameters. These techniques start with a set of reference views, where new views are generated automatically when interpolating between the reference views. Researchers have shown sequences of turning and tilting heads using this method of interpolation. However, sequences of a talking head have not been synthesized, partly because speech generates transient facial features such as grooves and wrinkles that are impossible to duplicate using simple interpolation. Speech also exposes areas that are not usually seen in reference frames such as teeth. Elements such as teeth that are not present in the reference frames can not be generated using interpolation. (4) Motion Printing Technique by Graphics Sequence (flip-book) The technique of motion printing by graphics sequence, probably. the oldest of all animation techniques, involves the process of storing all the possible expressions of a face in a memory. Individual expressions are subsequently retrieved from memory to generate a sequence of expressions. The use of this technique has several practical limitations. For example, making an adequate amount of expressions and mouth shapes available will generate and store a tremendous amount of pictures in memory. As such. The flexibility of the print-in-motion movement by graphic sequence is significantly reduced, since only facial expressions generated with anticipated time are available for animation. Thus, among other problems, the printing of motion by graphic sequence is not suitable for use on a personal computer which typically has non-trivial limitations with respect to memory size and processing power. In sum, the present techniques for synthesizing animated faces have at least the following disadvantages: (l) the illustrated characters are not sufficiently vivid or natural-looking, especially when they speak; (2) the methods account for a considerable computer memory allocation, often not feasible as well as processing time; (3) the characters are often unable to mention or spontaneously pronounce words that were not previously known; - (4) the methods lack variety in possible facial expressions; (5) many of the methods lack the ability to present various aspects of an important character, facial or otherwise, - and (6) many of the methods lack co-articulation or conflict resolution techniques. According to this, an object of the invention is to describe a technique for generating natural-looking, photo-realistic characters and faces, of a live type, which avoid the complexities of 3-D modeling and which account for less memory and computer processing power. than conventional methods.
Another objective of the invention is to describe this method which is capable of generating live type facial movements, to allow concurrent speech, including speech which comprises previously unknown text. Another object of the invention is to describe a method which is capable of running in a real-time application, including an application in a personal computer. Another object of the invention is to describe a method that provides a wide variety of possible facial expressions and features. COMPUTER IMMUNIZATION The present invention comprises a method for creating animated, photorealistic characters and faces, such as characters and faces of humans and animals, which are capable of speaking and expressing emotions. The model on which the method is based is based on one or more images of an individual. These images are broken down into a hierarchy of parameterized structures and stored in a model library. For the synthesis of faces, the appropriate parts are loaded from the model library and superimposed on a base face, to form a complete face with the desired expression. To synthesize a wide range of facial expressions, each part of the face is stopped by using control points stored in memory. These parameterized face parts comprise forms that cover possible deformations, which can be applied to the face part when different actions are executed such as smiling, frowning, articulating a phoneme, etc. In a preferred embodiment, the head model is combined with a speech synthesizer from which the model derives the sequence and duration of phonemes to be pronounced. The parameters for the face parts are calculated from the phoneme sequence. Also, in a preferred embodiment, co-articulation effects are calculated as portions of the calculations for one or more face parts. Preferably conflicts between reguerimientos are resolved to express emotion and to pronounce a phoneme. A version of each part of the face, corresponding to the desired values, is generated from the library for the synthesis of the entire face. Additional characteristics of the invention, its nature and various advantages, will be more apparent from the accompanying drawings and the following description of the preferred embodiments. BRIEF DESCRIPTION OF THE DRAWINGS The file of this patent contains at least one drawing executed in color. Copies of this patent with the color drawings or drawings will be provided by the U.S. Patent and Trademark Office. upon request and payment of the necessary fee. Figure 1 shows the decomposition of a face into parameterized parts, according to the method of the present invention. Figures 2a and 2b collectively, Figure 2, show a method of parameterizing face parts, according to a preferred embodiment of the invention. Figure 3 shows three visems ((a visema is the visual equivalent of a phoneme or unit of sound in spoken language) generated from photographs according to a preferred embodiment of the invention Figure 4 shows 3 visemas generated by deformation of According to a preferred embodiment of the invention, Figure 5 shows a process of transition from a neutral expression to a smile according to a preferred embodiment of the invention, Figure 6 shows an exemplary flow chart illustrating a synthesis of the sequence of animation according to a preferred embodiment of the invention DESCRIPTION OF THE PREFERRED MODALITIES The method according to the present invention is well suited, among other operations, for virtually any application that involves reading text or giving a talk to a user or another While the invention can be fully understood from the following explanation of its use in In the context of human characters and faces, the invention is also intended to apply to the faces and characters of animals. In a preferred embodiment, the technique according to the invention comprises four groups of elements: (1) base face: an image of a face that is used as a background for the animated sequence. Typically, aungue is not necessary, the base face contains a neutral expression and a closed mouth. (2) facial parts ("PFPs" = parameterizefl facial parts): individual facial parts that overlap the base face during the course of the animation sequence. Preferably, the PFPs include mouth, eyes, nose, cheeks, chin or chin, forehead, teeth, tongue and oral cavity. (3) Parameterized facial parts with hair ("PFPHs" = parameterized facial parts with hair): Individual facial parts with hair that overlap the base face during the course of the animation sequence. Preferably, PFPHs comprise hair, eyebrows, mustache, beard and whiskers. (4) Secondary BCects: Additional features that may be included during the animation sequence. These may include, for example, grooves, wrinkles and salient features. These four groups of elements are described in more detail below in the context of various preferred embodiments of the invention. (1) Base Face The base face is an image of the entire face on which parts are superimposed for the animation sequence. The use of a base face in an animation sequence can save computer processing time and reduce memory regurgitation. This is due to the fact that for many expressions, only a particular section of the base face changes, - all other sections can be undisturbed. If a facial image contains pronounced grooves and wrinkles, these are preferably removed before using the image as a base face. This removal procedure may be necessary to avoid interference with wrinkles and added slots during the course of the animation sequence. (2) Parameterized Facial Parts ("PFPs") PFPs are the individual elements that overlap on the base face to comprehend the whole-face image for an animated frame. The characteristics of the PFPs are defined by one or more parameters. Illustrations of parameters include shape and depth. The parameters of each PFP are stored as control points in a memory. In a preferred embodiment, a PFP is represented by one or more rectangular bitmaps, with each bitmap comprising a set of parameters and a mask. In this mode, the control points are placed in the bitmap. A bitmap can be used, for example, to represent a PFP with a different shape. A related base map can contain the current pixels of the image to be displayed. The mask defines the transparency of each pixel that comprises the image. In this way, the mask can describe which part within the limiting box is visible and how the transparency at the edges gradually increases to melt the PFP with the background. Examples of PFPs, and how they relate to an entire face are illustrated in Figure 1. (The procedure by which PFPs are parameterized is explained in more detail in Figure 3). Figure 1 shows a face that has been decomposed into PFPs. In this illustration, the entire face image 100 is divided into the following parts: front 110, eyes 120, nose 130, mouth 140, hair 150, cheeks or cheekbones 160, teeth 170 and chin 180. Forehead 110 has been retouched to remove the hair. All other parts are left in their original state. In a preferred embodiment, the parameters stored with each PFP include at least its width and height, and additional parameters define how to place the part on the image. As an illustration, for a mouth, points corresponding to the edges of the upper and lower teeth can be used to define the placement of the mouth (see Figure 2). Other parameters can be used to define the profile of a face part and the profile of other prominent features. The inventors currently prefer profiles coded with parallel slots or strips. A strip usually uses a function approximation as a profile piece, such as a third-order polymer. This polynomial function describes the profile between two control points. A second polynomial function describes the profile between the following two points and so on. For the mouth, the contours of the lips can be coded in this way. For the eyes, the profile of the shape, the outline of the eye opening, and the upper edge of the eyelid can be coded.
Another parameter that may be useful is the depth of the PFP, that is, the position of the PFP on the axis perpendicular to the bitmap with respect to the other PFPs. Using this parameter, the PFP can take into account the 3-D structure of the face. In this way, the present invention can incorporate various 3-D modeling values without the associated consequences of increasing memory allocation and processing time. For each PFP, several bitmaps can be stored, with each bitmap comprising a representation of the PFP in a different form. The different forms of the PFP in turn can be obtained by decomposing separate images of the face in a process called photographic extraction. Each of these images can have a different expression and the individual PFPs unique to that expression can be extracted and stored in memory as one or more bitmaps. Alternatively, or in addition to this method, the forms of PFPs can be produced by other methods of decomposition: deformation / metamorphosis from key forms. The preferred methods for decomposing images to obtain PFPs are explained more fully below. An example of the parameterization of a facial part is illustrated in Figure 2. The image 200 comprising Figure 2a, illustrates the parameters of the mouth used to position it and define its appearance. These parameters are internal height (H-ml), external height (H-m2) and width (W-m). Point c comprises the lower edge of the upper teeth and the point d comprises the upper edge of the lower teeth. Points c and d are used to place the mouth on the image. The placement of the mouth is done with respect to the points (a) between the eyes and (b) (center between the nostrils). In this illustration, all distances are normalized to the separation of the eyes (W-e). The eye-nose (H-in) and eye-mouth (H-em) distances are used to estimate the inclination of the head. Figure 2b shows the contours of the lips 210, which comprise the curves which describe the inner contours of the lips (ic), the outer contour (oc) and the shape or edge of the part. Marked on the contours of lips 210 are control points 220 which define the deformation in a deformation or metamorphosis operation. As in this example, deformation or metamorphosis can be used to manipulate individual PFPs. As described above, PFPs can be generated by decomposing single-sided images. One modality of this decomposition method is called photographic extraction. Photographed parts provide a good basis for generating animated sequences that closely resemble the person photographed. To generate the forms, you can ask a person to pronounce a phoneme or express an emotion. From these images, the shapes are simply cut and scanned or digitized or otherwise placed in memory. Two additional considerations are important to the photographic extraction process. First, the total number of frames required to cover all possible forms for a given animation sequence can be large. Second, while the quality of the individual frames synthesized using this photographic extraction method tends to be high, the right-hander in the specialty must ensure that the individual features look natural in the context of full facial expression. This last consideration is related to the action of the person who is photographed. Specifically, when a person intentionally expresses an emotion or utters an isolated phoneme, he will tend to exaggerate the articulation. In this way, a photographed form is often not immediately appropriate for animation. For these reasons, images generated by photographic extraction are usually insufficient to create a complete set of forms for a particular application (ie a set where all the possible shapes to be found in an animation sequence)., it has been produced) . These images however are an important part of the process to generate a complete set of forms. Additional forms can occur when using metamorphosis, deformation and / or interpolation techniques. Using metamorphosis and / or interpolation, all forms that are intermediate to a neutral expression and an exaggerated one, can be generated. Among these intermediate images, several may be suitable for animation. In another modality, deformation can only be used as the decomposition method to create the PFP library. AguĂ­, animated frames are produced by generating all the necessary shapes from a simple form. Generating facial parts to deformation involves (1) referring to memory control points that define an original facial shape; (2) adjusting the control points at new control points that define the new deformed shape; and (3) register the new control points in memory for the new form. It is not surprising that this method in general governs a detailed understanding of how a facial part is deformed. Once created, a library of individual shapes by deformation can advantageously be used in any number of images of photorealistic people since the facial parts are individually parameterized. When dictating the new control points, the strips that describe the characteristic shapes can be easily adapted to take into account the characteristics of a different person (for example the width of the face, the thickness of the lips, etc). While the final photorealistic person created using facial parts derived by deformation may not be exactly what the person actually looks like, the deformation is sufficient to create a realistic realistic representation. Furthermore, a deformation is quick and convenient, since a single image of a person can be used to generate all animated frames. The process of generating images of the mouth is probably the most complex of all the facial features. This is particularly true when the photorealistic person speaks. Undoubtedly, the mouth shows the widest variations of all PFPs. On a talking face, it is also the characteristic to which the observer is most attentive. Humans are sensitive to slight irregularities in the shape or movement of the mouth. Therefore, the skilled in the specialty should ordinarily devote special attention to the animation of the mouth. A mouthform that articulates a phoneme, often referred to as a visema while more than 50 spoken visemas are distinguished in the English language, most researchers consider between 10 and 25 different visemes sufficient to use in an animated sequence. The number of visemas will of course vary depending on the application. In a preferred embodiment of the invention, 12 main visemas are employed, ie: a, e, ee, o, u, f, k, 1, m, t, w, closed mouth. All other possible phonemes are mapped in this set of twelve. Figures 3 and 4 show examples of visemas, generated using the two techniques described above. The visemas in Figure 3 are cut from two separate images. In Figure 3, the area of the mouth 300 is cut from a photograph of the person pronouncing the phoneme "u". Similarly, areas 310 and 320 were cut from photographs of the person pronouncing the phonemes "m" and "a", respectively. The background row shows the three superimposed visemas on the base face. The resulting face is then placed on a background to make frames 330, 340 and 350. The lady in Figure 3 pronounced the isolated phonemes, - as such, they appear strongly articulated.
The visemas in Figure 4 are generated by deformation from a single image. In this way, an image of the person is photographed, and from this image, all the visemas and other expressions were generated by deformation. The visemas 410, 420 and 430 were generated using the phonemes "u", "m" and "a", respectively. The background row shows the three frames 440, 450 and 460 with visages 410, 420 and 430 superimposed on the base face together with eye and eye variations. Judging from these individual pictures, most people would consider the appearance of the faces in Figure 3 more natural than in Figure 4. However, when the visemas in Figure 3 are used for animation, the result is an exaggerated movement , sudden of the mouth that looks unnatural. The much less pronounced articulation that occurs with the visemas of Figure 4, is perceived to resemble a real person more. This observation highlights the meaning of designing the movement instead of just concentrating on the appearance of the individual paintings. For this reason, it is preferable to generate two or three versions of each visema, each version represents a different intensity of articulation. In practice, it is possible to achieve mouth movement of a truly natural appearance, only when co-articulation effects are taken into account. The co-articulation means that the appearance of a mouth shape depends not only on the phoneme produced at the moment, but also on the phonemes that precede and follow that phoneme. For example, when an individual articulates the phrase "boo," the shape of the mouth for "b" reflects the intensity of the individual in pronouncing "oo". In short, taking into account articulation, a more uniform mouth animation is generated, which in turn avoids unnatural exaggeration of the movement of the lips during the animation sequence. Accordingly, a preferred embodiment of the invention uses co-articulation. The preferred coarticulation method involves the assignment of a mathematical time constant to the parameters of the mouth. By using this time constant, it can be done that the present shape of the mouth influences the shape and extent to which the mouth can deform in the following time interval. (3) Parametrized Facial Parts with Hair ("PFPH") The PFPH are a group that includes all parts of the face covered with hair. In a preferred embodiment of the invention, the PFPHs are grouped separately from the PFPs because their deformations are typically processed differently. Techniques of metamorphosis and standard deformation tend to smear the characteristic textures of the hair, or deform the textures in ways that look artificial. The filling of contours with copied textures usually produces better results for whiskers and whiskers. In some modalities, hair animation is only done in a limited way. If the internal texture of the hair is not recognizable, the hair can simply be treated as a smooth surface. For images that have very low resolution, this crude modeling may be appropriate. However, if individual fringes of hair or even individual hair are visible, a more sophisticated approach is preferred. For example, movement can be added to entire fringes of hair by twisting and skewing parts of the hair. In fact, hair movement has limited meaning for a talking head. Therefore, only some random movements require adding to the hair. The most explicit and deterministic movement of a part with hair for a talking face is that of a mustache. A mustache can be deformed by tilting and by cut-and-paste operations. For example, when the mouth changes from a neutral expression to an "o", the profile of the mustache can be estimated from the shape of the upper lip. In response, the original mustache is bent to follow the upper contour using local skew or skew operations. In the proportion where parts of the outline are left blank, neighborhood sections can be copied and filled in blank sections. (4) Side effects The generation of realistic-looking images involves additional intricate details. These intricate details include wrinkles, grooves and protruding aspects. Extreme deformations, such as a broad smile, are difficult to generate without the addition of grooves. Figure 5 illustrates the effect of adding slots to a smile. Instead of generating PFPs with slots, the slots are superimposed on the PFP. Figure 5 shows a frame 510 having a background face and a neutral expression. The frame 510 has the same background face but on the contrary it has eyes 515 and mouth 518 superimposed on the background frame. The result of table 520 is an unnatural expression. The frame 530 is the same as the frame 520, except that the frame 530 has slots 530 superimposed on its base face. The character smile in the 530 frame looks more natural as a result of the slots. Slots and wrinkles are preferably categorized as a distinct group of the head model because both their synthesis and deformations are not treated differently than the PFPs. The deformation and metamorphosis used to generate the different forms of the PFP abnormally distort the grooves. Thus, in a preferred embodiment, grooves and wrinkles on the contrary are represented by strips. The strips define the position of the slots. Adding slots to a bitmap can be achieved by modulating the color of the pixels with a luminance factor. The slot defines the extent of this modulation as well as the gradients of the direction perpendicular to the slot direction. For the synthesis of a natural-looking talking head, the movements of all the facial parts and the head must be planned scrupulously. Conversation signs include subtle movements of facial parts of the head that accentuate, emphasize or otherwise regulate speech. For example, a raised eyebrow can be used to accentuate a vowel or to indicate a question. Blinking of the eyes can also occur frequently and usually synchronize with the flow of speech. Light movements of the head also generally accompany speech. When these movements are stopped, it often means that the interlocutor has finished and waits for the listener to take some action. The emotional state of the interlocutor is also reflected by changes in the appearance of parts of the face. For example, raised and gathered eyebrows may indicate tension or fear. An illustration of the synthesis process is shown in Figure 6. In response to ASCII feeding comprising the words that are desired mentioned (oval 600) a text-to-speech synthesizer (box 610) produces a sequence of phonemes 620, its duration and tension. Each phoneme is mapped to a mouth visema (box 630). In some embodiments, this mapping may comprise a simple lookup operation, such as a table in memory. Once a vise is selected, the visema 640 parameters are available for that visema. These parameters may include, for example, the width and height of the mouth. The parameters of the visema can be provided in a co-articulation module, to take into account co-articulation effects (drawer 660). The coarticulation module can also take into account information regarding the desired facial expression (oval 650). As an illustration, if a smile is requested and the visema asks for a closed mouth, the co-articulation module will increase the width of the mouth. The output of the co-articulation module is a new set of mouth parameters 670. These parameters modified below are used to search the PFP library for the shape with the closest correspondence. The closest correspondence PFP is chosen (drawer 680). Preferably the other PFPs are chosen using the same procedure. To alleviate PFPs affected by mouth movement, phoneme information is considered as well as facial expression information. For the eyes and all over the eyes, only facial expressions should be taken into account, since the movement of the mouth usually does not affect these parts. After the appropriate visems have been chosen, they are ready to be mixed on the base face for the final generation of a frame of the animated sequence (drawer 690). The table is synchronized with the corresponding speech. When the visemas are generated using deformation techniques, they will no longer mix seamlessly in the base head. For this reason, the right-hander in the specialty must perform careful sharpening of the alpha mixing mask. In a preferred embodiment, mixing is performed first with the deepest parts such as eyes and teeth. Next, the parts that are on top, are added and finally the hair and wrinkles. Once the head contains all the facial parts, it can be superimposed on the background image. The movement of the whole head can then be added. Movement vectors are calculated in a semi-random way, for example speech regulates a random mix of pre-defined movements. A model according to a preferred embodiment includes moving, rotating and tilting the head. The rotation of the head around the axis perpendicular to the image plane is easily achieved. Glue rounds around the other two axes can be approximated by simple and fast image deformation techniques. The dynamics of the movements must be carefully designed - exaggerated movements appear as shaken and unnatural. The box is then sent out to a file or to a screen (drawer 695). In some embodiments, these pixel operations may be performed either within a frame buffer in computer memory or directly in the frame buffer of the screen. At the time of submitting this application, the inventors have implemented at least two versions of the invention. One version uses the Microsoft AVI API and AVI files generated from an ASCII text feed. The other version, animation output directly on the screen using the OpenGL graphics library. Also, at the time of submitting this application, the inventors made an experimental implementation of the head model on a personal computer (PentiumPro 150 MHZ). In this implementation, the speed of synthesis was approximately one frame every 100 milliseconds. The size of the PFP library was 500 kilobytes. The PFP library contains all the visemas to speak and all the PFPs to frown and smile, expressions of happy and neutral. The size of the pictures was 240 pixels. It will be understood that the parameters will vary widely depending on the particular mode of the invention. The experiment of the inventors was designed for maximum flexibility instead of for the ultimate speed or maximum compactness. Furthermore, while the present invention contemplates a wide variety of numerical ranges depending on the state of the technology and the particular application, etc., the inventors estimate that a speed optimization would allow a synthesis of at least 30 frames per second and that size of library can be reduced by more than a factor of two. Without using the method according to this invention, the inventors estimate that a library of faces covering the same range of expressions would be more of an order of magnitude larger. It will be understood that the foregoing is merely illustrative of the principles of the invention, and that various modifications and variations may be effected by those skilled in the art without departing from the scope and spirit of the invention. The appended claims are intended to cover all modifications and variations. It is noted that in relation to this date, the best method known by the applicant to carry out the aforementioned invention is that which is clear from the present description of the invention. Having described the invention as above, property is claimed as contained in the following:

Claims (37)

  1. CLAIMS 1. - A method for generating an animated image of a photorealistic face, characterized by comprising the steps of: decomposing one or more images of a face into a hierarchy of parameterized facial parts, - storing the parameterized facial parts in a memory, - loading, in a designated manner, one or more parameterized facial parts of the memory, and superimposing the one or more parameterized facial parts on a base face to form a whole face.
  2. 2. - The method according to claim 1, characterized by the stage of loading and the step of superimposing are performed in real time.
  3. 3. The method according to claim 1, characterized in that the decomposition step comprises photographic extraction.
  4. 4. - The method according to claim 3, characterized in that the photographic extraction stage further comprises deformation.
  5. 5. - The method according to claim 3, characterized by the photographic extraction stage further comprises metamorphosis.
  6. 6. The method according to claim 3, characterized by the photographic extraction stage further comprises interpolation between reference views.
  7. 7. - The method according to claim 1, characterized in that the decomposition step comprises deformation.
  8. 8. - The method according to claim 1, characterized in that the parameterized facial parts comprise parameters in memory.
  9. 9. - The method according to claim 8, characterized by the parameters for each parameterized facial part comprise control points placed on a bitmap.
  10. 10. - The method according to claim 1, characterized in that the step of superposition further comprises the step of: superimposing the entire face with one or more side effects.
  11. 11. The method according to claim 1, characterized by the parameterized facial parts include mouth, eyes, nose, cheeks or cheekbones, chin, forehead, teeth, tongue and mouth cavity.
  12. 12. The method according to claim 1, characterized by further comprising the step of: superimposing the entire face on a background image to form a frame, and sending the frame output to a screen.
  13. 13. - The method according to claim 12, characterized by the loading stage, both superposition steps and the exit stage are performed in real time.
  14. 14. - A method for generating animated pictures of photorealistic characters, characterized by comprising the steps of: decomposing one or more images of a face into a group of parameterized facial parts; store the group of parameterized facial parts in a model library; superimposing designated parameterized facial parts of the group on a base face to form an entire face; and superimpose the whole face on a background image to form an animated frame.
  15. 15. - The method according to claim 14, characterized by both stages of superposition are performed in real time in a computer.
  16. 16. The method according to claim 14, characterized in that the designated parameterized facial parts of the group are chosen based on a designated visema.
  17. 17. The method according to claim 14, characterized in that the decomposition step comprises photographic extraction.
  18. 18. The method according to claim 14, characterized in that the decomposition step comprises deformation.
  19. 19. - The method according to claim 14, characterized by the designated viseme is obtained from a memory when mapping a specified phoneme.
  20. 20. The method according to claim 14, characterized by further comprising the step of: outputting the animated frame to a screen.
  21. 21. The method according to claim 20, characterized by the overlapping stages and the output sending stage are performed in real time in a computer.
  22. 22. The method according to claim 14, characterized by also comprises the step of: sending an animated frame to a file.
  23. 23. - The method according to claim 19, further characterized by comprising the step of synchronizing the frame with a specified phoneme.
  24. 24. The method according to claim 14, characterized by also comprises the steps of: decomposing one or more images in a group of parametric facial parts with hair; and superimposing parameterized facial parts designated with hair on the base face.
  25. 25. - A method for generating an animated sequence of pictures of photorealistic faces, characterized by comprising the steps of: decomposing one or more images of a character into a plurality of parameterized facial parts, each parameterized facial part corresponds to a specific expression of each facial part parameterized which comprises parameters in memory; loading from memory, for each of the plurality of desired facial expressions, a set of parameterized facial parts; superimposing, for each of the plurality of desired facial expressions, the set of parameterized facial parts on a base face to form an entire face having the facial expression, wherein the parameters of each parameterized facial part of the set are determined based on the desired facial expression, and - superimposing, for each of the plurality of desired facial expressions, the entire face in a background to form a sequence of animated frames.
  26. 26. The method according to claim 25, characterized by further comprising the step of: synchronizing, using a speech synthesizer, each frame of the sequence with a phoneme corresponding to the facial expression used in the frame.
  27. 27. A method for generating an animated picture of a photorealistic face that speaks, characterized by comprising the steps of: obtaining from memory a predetermined visema so that it corresponds to a specified phoneme power; select from memory a set of one or more parameterized facial parts, based on parameters corresponding to the visema; superimposing the set of one or more parameterized facial parts on a base face to form an entire face, and superimposing the entire face on a background to form a frame.
  28. 28. The method according to claim 27, characterized by the specified phoneme comprises the output of a text-to-speech synthesizer used to feed text to pronounce concurrently with the animated frame.
  29. 29. The method according to claim 27, characterized by further comprises the step of: synchronizing the frame with a sound corresponding to the phoneme.
  30. 30. A method for generating a sequence of animated frames of photorealistic characters, characterized by comprising the steps of: decomposing one or more images of a frame into parameterized facial parts comprising a plurality of different expressions; storing the plurality of parameterized facial parts in memory, - loading, for each of the plurality of frames, a designated set of parameterized facial parts of the memory, - superposing for each of the plurality of frames, the parameterized facial parts designated on a base face to form an animated image.
  31. 31. The method according to claim 30, characterized in that the decomposition step comprises photographic extraction.
  32. 32. The method according to claim 30, characterized in that the decomposition step comprises deformation.
  33. 33.- The method according to claim 30, characterized by the loading and superposition stages are performed in real time in a computer.
  34. 34. - The method according to claim 33, characterized by the computer is a personal computer. The method according to claim 30, characterized in that each designated set of parameterized facial parts is determined based on a visema. 36.- A method to synthesize an animated picture of a photorealistic character, characterized by comprising the steps of: obtaining a memory visema based on a phoneme feed, - selecting one or more parameterized facial parts, based on parameters that correspond to the visema; calculate co-articulation effects based on the parameters corresponding to the visema; produce modified parameters based at least on the calculation stage; - select modified parameterized face parts based on modified parameters; and generating an animated frame by superimposing facial parts on a base face superimposed on a background image. 37. The method according to claim 36, characterized in that the step of producing modified parameters is based on a feed comprising facial expression information.
MXPA/A/1998/004381A 1997-06-06 1998-06-02 Method for generating animated characters fotorrealis MXPA98004381A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08869531 1997-06-06

Publications (1)

Publication Number Publication Date
MXPA98004381A true MXPA98004381A (en) 1999-09-20

Family

ID=

Similar Documents

Publication Publication Date Title
CA2239402C (en) Method for generating photo-realistic animated characters
US10540817B2 (en) System and method for creating a full head 3D morphable model
US6147692A (en) Method and apparatus for controlling transformation of two and three-dimensional images
US20100182325A1 (en) Apparatus and method for efficient animation of believable speaking 3d characters in real time
CN110874557A (en) Video generation method and device for voice-driven virtual human face
US20020024519A1 (en) System and method for producing three-dimensional moving picture authoring tool supporting synthesis of motion, facial expression, lip synchronizing and lip synchronized voice of three-dimensional character
US10600226B2 (en) System and method for manipulating a facial image and a system for animating a facial image
Breton et al. FaceEngine a 3D facial animation engine for real time applications
Parke Control parameterization for facial animation
KR100813034B1 (en) Method for formulating character
Breen et al. An investigation into the generation of mouth shapes for a talking head
Wang Langwidere: a hierarchical spline based facial animation system with simulated muscles.
Perng et al. Image talk: a real time synthetic talking head using one single image with chinese text-to-speech capability
MXPA98004381A (en) Method for generating animated characters fotorrealis
JPH11306372A (en) Method and device for picture processing and storage medium for storing the method
KR102652652B1 (en) Apparatus and method for generating avatar
Barakonyi et al. A 3D agent with synthetic face and semiautonomous behavior for multimodal presentations
Bibliowicz An automated rigging system for facial animation
Derouet-Jourdan et al. Flexible eye design for japanese animation
King et al. A muscle-based 3d parametric lip model for speech-synchronized facial animation
Erol Modeling and Animating Personalized Faces
King et al. TalkingHead: A Text-to-Audiovisual-Speech system.
KR20230096393A (en) Apparatus and method for generating conversational digital human based on photo
Huang et al. Features and Issues in 3D Facial Animation
Kuratate Talking Head Animation System Driven by Facial Motion Mapping and a 3D face Database