WO1997046974A1 - Dispositif et procede de transmission d'images animees et sonorisees - Google Patents
Dispositif et procede de transmission d'images animees et sonorisees Download PDFInfo
- Publication number
- WO1997046974A1 WO1997046974A1 PCT/FR1997/000981 FR9700981W WO9746974A1 WO 1997046974 A1 WO1997046974 A1 WO 1997046974A1 FR 9700981 W FR9700981 W FR 9700981W WO 9746974 A1 WO9746974 A1 WO 9746974A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- face
- interlocutor
- voice
- image
- message
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
Definitions
- the present invention relates to a device and a method for transmitting animated and sound images representative of at least one face of a person.
- the word “person” is equivalent to the word “interlocutor” and designates the person whose image of the face is transmitted.
- the invention applies equally to the formation of images at a distance, for example for viewers, listeners of radio stations or users of television sets, as well as to the formation of images locally for insertion into a game. video.
- the currently known moving image transmission devices require the use of a photosensitive sensor which supplies an electrical signal representing perceived luminosities.
- the amount of information representative of the image is then very high and, although image compression technologies exist, the transmission of images requires a transmission medium capable of transferring large amounts of information per second.
- the use of the photosensitive sensor involves mastering the shooting conditions, such as lighting, diaphragm, focal length and adjustment of the focus.
- the capture and transmission of moving images is then of high cost because of the quantities of information to be transmitted.
- Document GB-2 250 405 A is also known, which presents a device for voice analysis and image synthesis.
- This device analyzes speech sequences to produce sequences of code words identifying the letter spoken, then the probability that a voice corresponds to a shape of mouth.
- This document does not suggest any correspondence between the person who produced the analyzed voice and the computer generated image providing a speaking face ("talking face").
- this document does not suggest any remote transmission of the face thus synthesized and animated by text data.
- the present invention intends to remedy these drawbacks by proposing to model an image of a face of an interlocutor, to form a modeled face which can be animated, to analyze a message from this interlocutor to determine a facial expression corresponding to a voice. pronouncing this message then animating the face image model to give it said facial expression while emitting the voice. Thanks to these provisions: the animation can be performed in real time since it uses signals corresponding to a voice, and therefore a quantity of information sufficiently small to allow rapid processing, if not instantaneous; the face and the voice correspond to the same interlocutor; and the image of the face of the interlocutor considered is animated at a distance, by the representative message signal of this interlocutor.
- the capture of the movements and expressions of the face of the interlocutor is carried out by capturing not light rays reflected by the face but a message capable of being pronounced by said face, and more particularly by a vocal message pronounced by the mouth. , itself representative of facial expression.
- the cost of the device of the invention is thus limited to that of producing a still image, for example with a camera or by computer, and to that of capturing a message, for example by a telephone apparatus or by a computer.
- the present invention relates to a device for transmitting animated and sound images representative of an interlocutor, characterized in that it comprises:
- a means of analysis of said signals adapted to determine a succession of facial expressions corresponding to the pronunciation of said message by a voice, according to predetermined rules taking into account said signals;
- an animation means adapted to animate the image of the modeled face of the interlocutor so that said face successively presents each expression of said succession of face expressions corresponding to the pronunciation of said message by said voice
- a means of image display and emission of vocal sounds adapted to emit said voice and to simultaneously display said modeled face presenting the expression corresponding at each instant to the pronunciation of said message by said voice.
- the present invention relates to a device for transmitting animated and sound images representative of the face of an interlocutor, characterized in that it comprises:
- a means of analysis of said signals adapted to determine a succession of facial expressions corresponding to the pronunciation of said message by a voice, according to predetermined rules taking into account said signals;
- an animation means adapted to animate the image of the modeled face of the interlocutor so that said face successively presents each expression of said succession of face expressions corresponding to the pronunciation of said message by said voice
- a transmission means signals simultaneously representative of vocal sounds, corresponding to said voice and images corresponding to said modeled face presenting the expression corresponding at each instant to the pronunciation of said message by said voice.
- the moving image transmission medium is a voice or text transmission medium and it is therefore not necessary for it to be capable of transmitting more than the voice frequencies.
- the complexity and the cost of this transmission and of the formation of moving images are therefore very limited.
- the remote transmission means is adapted to transmit signals representative of a voice message spoken by said interlocutor. Thanks to these provisions, capturing a voice message pronounced by the interlocutor, for example by using a microphone, is sufficient to cause the animation, from a distance, of a face representative of this interlocutor.
- the remote transmission means is adapted to transmit signals representative of a text capable of being spoken by the interlocutor.
- the transmission device as succinctly explained above, further comprises an image modeling means adapted to provide an image intended to be animated, as a function of an image taken by a photosensitive sensor. and in that the storage means stores said image.
- the capture of an electronic image for example by an electronic camera or a scanner is sufficient for memorizing the image intended to be modeled.
- the transmission of a still image intended to be animated is carried out by transmission of a photograph or of a video image, and the cost of capture and transmission is very limited.
- the transmission device as succinctly described above, further comprises a telephone receiver connected to a telephone line.
- the signals representative of the message and / or signals representative of vocal sounds and of images can be conveyed over a long distance, for example over a telephone network or IT, whether switched or digital.
- a television service can thus be set up.
- the transmission device as succinctly explained above, further comprises a radio antenna. Thanks to these provisions: radio signals can allow the remote animation of faces representative of the faces of the interlocutors of the radio station, and / or
- the invention also relates to a game console, a computer, an audiovisual editing bench, a television set characterized in that they comprise a transmission device as succinctly presented above.
- the present invention relates to a process for forming animated and sound images representative of the face of an interlocutor, characterized in that it consists in successively carrying out the following steps: - firstly: * an operation receiving an image of a face of said interlocutor;
- an animated image of a viewer participating in the program can be transmitted in a television program.
- the invention also relates to a game console, a computer, an audiovisual editing bench and a television set, characterized in that they implement an image transmission method as succinctly presented above.
- FIG. 2 shows a second embodiment of the present invention, implemented in a radio station
- FIG. 3 shows a third embodiment of the present invention, implemented in a computer network
- FIG. 4 shows a fourth embodiment of the present invention, implemented in a computer network
- FIG. 5 represents a block diagram of image transmission devices implemented in the first embodiment of the present invention
- - Figure 6 shows the successive operations performed by the devices illustrated in Figure 5
- FIG. 7 represents a hardware layout of a device presented in FIG. 5
- - Figure 8 shows a child's face photographed
- FIG. 10 represents the spectrum of a sound signal representative of a voice pronouncing a particular phoneme; 97/4697 '4 11 PC17FR97 / 00981
- FIG. 11 represents the expression corresponds to the sound signal presented in FIG. 10;
- Figure 12 shows a model animated by voice whose spectrum is shown in Figure 10, intended to be matched with the face shown in the photograph presented in Figure R to provide the image of the interlocutor.
- the first embodiment is implemented for a production of television works in which a viewer can intervene, his image animated by his voice being broadcast live as part of the program.
- This viewer also called “interlocutor” in the following description, uses a telephone comprising a handset 101 provided with a microphone 102 and a keyboard 103.
- This telephone is connected to a telephone network 105 by a telephone socket 104.
- a telephone socket 106 makes it possible to receive the signals transmitted by the telephone of the viewer considered.
- a sound signal analysis means 107 analyzes the signals received on the telephone socket 106, and provides information representative of expressions of the face of the interlocutor corresponding to the pronunciation of the voice message carried by said sound signals. The operation of the analysis means 107 is detailed with reference to FIG. 5.
- An image capture means 108 here consisting of a desktop scanner, has previously made it possible to provide a digital electronic image of at least one face photograph that the interlocutor has sent, by post or by fax.
- the digital electronic image has been associated with a so-called “modeled” face consisting of a three-dimensional model which represents the face of the interlocutor and has characteristic points whose displacements allow facial expressions to be presented to this face. These points are, for example, the corners of the lips, the highest and lowest points of the lips, the corners of the eyes.
- a model 109 memory means here consisting of a mass memory, of the hard disk type, of a computer terminal, stores the data necessary for the reconstruction of the so-called modeled face.
- a model animation means 110 animates the modeled face which is kept in the model memorization means 109 to give it the facial expressions provided by the sound signal analysis means 107. To this end, it moves the points characteristics of the modeled face according to known displacement rules and the other points of the modeled face are displaced according to known deformation rules. For the understanding of these displacements, one can refer to the reading of the acts and proceedings of the IMAGINA conference of the year 1997 and, in particular on its pages 246 to 257 which expose the work carried out at the National Institute of Audio-visual.
- the animated image of the modeled face representative of the face of the interlocutor and the sound signal representative of the voice of the interlocutor are diffused simultaneously by a radio transmitter 111, comprising a radio antenna and transmitting bound for a multitude of television receivers connected to terrestrial antennas, making correspond, at any time, the expression of the modeled face to the pronunciation of the voice of the interlocutor.
- Each television receiver 112 is provided with a display screen 114 and a loudspeaker 113. It broadcasts, according to known television techniques:
- the image capture means 108 is a modem used for the reception of an image files transmitted by a computer or a fax machine.
- a second embodiment of the present invention is implemented in a radio station.
- Each presenter or guest of the station considered, also called a "contact person" in the following description, is surrounded by three microphones 120, 121 and 122 connected to a sound processing means 133, and in the optical field of a camera. 123 associated with an image processing means 124.
- the sound processing means 133, the image processing means 124 and a digital console 125 are connected to a means for inserting digital data 134 itself connected to a radio transmitter 126 comprising a radio antenna.
- the microphone 120 is placed to the right of the presenter at the level of his mouth.
- the microphone 122 is placed to the left of the presenter at the level of his mouth.
- the microphone 121 is placed above the animator's head in the median axis of the other two microphones.
- the sound processing means 133 analyzes the ratios of the sound intensities represented by the signals emitted by each of the three microphones and provides information representative of the position of the animator's head. Indeed, the more the head of the presenter is turned towards one of the microphones, the higher the sound intensity received by this microphone and the lower the sound intensity received by the other microphones.
- the camera 123 includes an electronic sensor which supplies a signal representative of the image of the animator's face, according to techniques known in the field of video cameras.
- the image processing means 124 analyzes the signal from the camera 123 and supplies information on facial expressions, such as closing of the eyes, smiles and frowns of the forehead or the eyebrows, by implementing algorithms known image processing.
- facial expressions such as closing of the eyes, smiles and frowns of the forehead or the eyebrows
- the contrasts of different areas of the face are analyzed to determine whether folds have appeared on the skin of each of these areas, which makes it possible to detect smiles and expressions based on the front folds.
- the orientation of the head can also be determined.
- the digital console 125 is operated by a technician and provides information representative of the presenter and his guests, these different interlocutors being each associated with one or more microphones.
- the sound signal from each microphone is automatically associated with a signal representative of the identity of the person speaking.
- only one of the interlocutors is thus identified by the signal leaving the digital console 125 or the sound processing means 133.
- the visual expression information determined by the image processing means 123 and the signals from the digital console 125 and the sound processing means 133 are associated with the stereophonic sound signal, by the data insertion means digital 134, according to techniques known in the field of the transmission of alphanumeric information on a radio channel, for example by modulation of the subcarrier of the signal carried on this channel. It is easy to understand that the signals transmitted by the transmitter 126 to a multitude of receivers 127 are simultaneously representative of: the voice of each interlocutor who speaks, via the microphones, in stereophony, the identity of this contact, via the digital console 125 or the sound processing means 133, the position of the head of this contact, through the sound processing means 133, and
- Each receiver 127 has a radio antenna and is adapted to receive the signal broadcast by the radio transmitter 134. This signal is demodulated by the receiver 127 and transmitted by means of sound signal analysis 128 which analyzes the sound signals and provides information representative of the speaker's facial expressions, pronounced phonemes, head position and facial expressions.
- the model memorization means is here a compact disc 150 associated with a compact disc reader, for example of read-only memory type known as CD-ROM.
- CD-ROM read-only memory type
- This CD-ROM preserves image data representative of models corresponding to a large number of possible presenters and guests and characteristic points which make it possible to animate these modeled faces.
- the model animation means 129 consisting here of a computer which jointly performs the functions of the sound signal analysis means 128, animates the modeled face of the interlocutor, which is kept in the model memorization means 109 and which corresponds to the signal associated by the digital console 125 with each microphone.
- the model animation means gives this modeled face: 7/46974 17 PC17FR97 / 00981
- the animated and sound image of the interlocutor considered is broadcast: via a loudspeaker 131, for the voice of the interlocutor, possibly combined with other sound signals characteristic of the program, and
- the third embodiment of the present invention is implemented in a computer network and animates a face by facial expressions which correspond to a text transmitted remotely.
- Each computer 141 connected to this network here includes a modem 144 and a model storage means 142.
- the computer 141 is associated with a display screen 141 and a keyboard 143.
- the computer 141 is of known type, for example using a PENTIUM processor
- Modem 144 is of known type. It is suitable for transmitting digital data over any telephone network.
- the model storage means 142 is here constituted by a hard disk on which graphic information is recorded representing the model intended to be animated as well as the characteristic points intended for its animation.
- the interlocutor enters a text in the memory of the computer 140. All or part of the words in this text are associated with particular facial expressions, face or body movements and a decor chosen from a multitude of decors, using software.
- said software presents the text considered on lines parallel to lines called "expression" or the interlocutor can position icons representative of expressions, movements, decorations, graphic figures or digital images. , next to each word. It is observed that these indications are sufficient for a person skilled in the art of data processing for the text file considered to be associated with data representative of the icons positioned by the interlocutor. Consequently, this software is not further detailed here.
- the modem 144 modulates, on the telephone socket 145, an audible signal in the frequencies of the bandwidth of a telephone line, so that this signal represents: - the model of the face of the interlocutor,
- a modem 150 connected via a telephone line 146 and a telephone socket 147 to the transmitting modem 144, receives this signal and restores the files corresponding to the three types of information mentioned in the previous paragraph.
- the modem 150 is, in the third embodiment, incorporated in each computer 148 connected to said network.
- Each computer 148 also constitutes a message analysis means 153 and a model animation means 152.
- This computer 148 is associated, in a known manner, with a display screen 149, with a loudspeaker 154 and to a keyboard 151.
- the message analysis means 153 consists of the processor of the computer 148 and of textual data analysis software of known type, which associates with this text a series of phonemes corresponding to the pronunciation of this text.
- the model animation means 152 consisting here of the implementation of the computer processor 148, and of appropriate animation software, associates with each phoneme provided by the message analysis means 153, a facial expression corresponding to the pronunciation of this phoneme, according to techniques for example recalled in the documents of the prior art cited in the preamble of this application, and in the documents to which they refer which are all jointly incorporated here by reference.
- the model animation means 152 animates the modeled face of the interlocutor, as received via the modem 150, to give this modeled face: - the facial expressions corresponding to the pronunciation of the phonemes, accessory expressions associated with the text by the interlocutor, and
- model animation means 152 supplies sound data to a sound emission card, according to techniques known in voice synthesis or automatic text reading systems, such as, for example, switchboards. interactive electronic voice synthesizers.
- the animated and sound image of the interlocutor considered is broadcast:
- a loudspeaker 154 for the voice of the interlocutor, possibly combined with other characteristic sound signals of choice of the interlocutor, such as pronunciation accent, words spoken in a low voice , and
- the fourth embodiment is implemented in a computer network.
- the computer 140 further comprises a means of sound capture 161, known as a sound digitization card, associated with a microphone 162, of known type.
- the fourth embodiment works in the same way as the third embodiment illustrated in FIG. 3, with the difference that the text data file is replaced by a sound data file representing the voice of the interlocutor.
- the interlocutor associates animation, expression, movement, decor or image data with this file.
- the sound signal analysis means 160 For the recipient of this file, the sound signal analysis means 160, made up of the computer processor 148 and dedicated software, analyzes the signals received via the modem 150, and provides information representative of facial expressions of the caller corresponding to the pronunciation of the voice message carried by said sound signals.
- FIG. 5 In the block diagram of image transmission device implemented in the first embodiment of the present invention (FIG. 5), are represented: - a telephone apparatus 1, comprising a handset 2 having a microphone 30, and a keyboard 3, and adapted to emit an electrical signal representative of the sounds which reach the microphone 30 and of the pressures made on the keys of the keyboard 3, according to techniques known in telephone apparatuses with vocal frequencies;
- a telephone network 4 of known type, represented in the form of two rectangles diagramming telephone sockets separated by a broken line; a voice analysis means 5, the operation of which is set out with reference to FIG. 6 and which provides data representative of oral expressions to a face animation means 9,
- the face animation means 9 adapted to animate the model combined with facial expressions corresponding to the oral expressions coming from the voice analysis means 5;
- a person animation means 10 of known type, and adapted to provide information representative of animated images of a person having the face produced by the face animation means 9, as a function of data coming from the means synthetic voice frequency analysis 6;
- a means of person and scene combination 11 adapted to insert the image of the person represented by the information leaving the person animation means 10 with information on a scene comprising, for example mobiles, decorations and characters, according to known techniques; a display means 12 of known type, for example consisting of a television set, adapted to display the image emerging from the combination means 11; a video transmitter 13 of known type, for example consisting of a radio transmitter or a transmitter on a cable television network; and a recording means 14 of known type, for example consisting of a video recorder.
- the voice analysis means 5 which provides data representative of oral expressions consists, for example, of a computer, a sound acquisition card, of known type, and a so-called detection software.
- "FV” initials of the words “Vocal Frequencies”, which determines animation keys, according to the vocal frequencies used.
- Vocal Frequencies initials of the words “Vocal Frequencies”, which determines animation keys, according to the vocal frequencies used.
- Known suppliers of these types of software are SILICLONE and SOFTIMAGE, VIEWER. It is observed that, preferably, an initial learning step is carried out with the interlocutor and outside the antenna.
- the modeling means 8 is adapted to combine data representative of an image received from the part of the image digitizing means 7 with a model. It is, for example, composed of a computer and image processing software suitable for performing image processing known by the English name of "morphing", sometimes translated into French under the name of "metamorphosis" and which matches:
- the modeled face resulting from the operation of the modeling means 8 corresponds to an intermediate state of the metamorphosis between a photograph of a real face and of a model intended to be animated.
- the resulting modeled face has a face of the interlocutor represented in the photograph, the lines of this face being able to be animated by displacement particular points or lines of the model with which the photograph was associated to form the modeled face.
- the face animation means 9 adapted to animate the model combined with facial expressions corresponding to the oral expressions coming from the voice analysis means 5 is of known type in the animation of cartoon characters.
- the reader will be able, for a better understanding, to refer to the reference works cited above as well as to the "Tool Book” and the user manual of the corresponding software from the company ALIAS WAVE FRONT and the user guide. of the "Morph” software from GRYPHON SOFTWARE CORPORATION, these five documents being incorporated by reference in the description of the invention. All of the image processing functions described with reference to FIGS. 5 and 6 can be performed by computers using software such as those mentioned above as well as SOFTIMAGE VIEWER, SILICLONE brand software and the National Audiovisual Institute performing morphings or metamorphoses in three dimensions automatically.
- the operation of the device presented in FIG. 5 is simple: to transmit an animated image, the device uses on the one hand a processing of a fixed image in order to make it suitable for being animated (by means of the digitization means of image 7 and the modeling means 8) and on the other hand a voice processing to determine oral expressions (carried out by the voice analysis means 5) then facial which animate the still image (animation performed by the face animation means 9).
- the telephone apparatus may not be at voice frequency, the voice frequency analysis means 6, the person animation means 10, the combination means person and scene 11, the video transmitter 13 and the recording means 14 are deleted, the display means 12 directly displaying the image emerging from the face animation means.
- FIG. 6 represents the successive operations carried out by the device illustrated in FIG. 5, by implementing a program stored in the read-only memory 17 of the computer 15 (FIG. 7).
- the first two operations, referenced 200 and 201, are carried out preliminary to the reception of the sound signal which carries the voice. During the operation
- the face is digitized, by the digitizing means 7
- the face digitized during operation 200 is combined, by metamorphosis, with a model intended to be animated.
- the resulting facial features are therefore those of the digitized face, that is to say those of the interlocutor, but the elements of this face are set in motion according to procedures depending on the digitized face, but also linked to the model intended for be animated.
- the resulting face is therefore hybrid, its appearance being that of the digitized face and its movements being those of the model.
- the movements are controlled, by means of the analysis of the voice of the interlocutor, by the movements of the face of the interlocutor.
- Operation 202 corresponds to the reception of telephone sound on a signal input of the device presented in FIG. 5.
- Operation 203 corresponds to the spectral analysis of the sound received during operation 202 in order to provide a frequency spectrum of this signal.
- said spectrum with each frequency, on the abscissa, is associated the amplitude of the signal having this frequency, on the ordinate, in the sound signal received (FIG. 10).
- Operation 204 consists in extracting the synthetic voice frequencies to determine whether the keyboard 3 of the telephone apparatus 1 has been used. It is noted that the vocal frequencies used always correspond to a combination of at least two non-harmonic primary frequencies, so that the risks of detection error are limited.
- the result of the extraction of the vocal frequencies is the setting in random access memory 16, in the register freqvoc of the value of the key of the keyboard on which a pressure was detected (figure 7).
- the voice analysis means determines, by analysis of the spectrum carried out during operation 203, the phonemes and other oral expressions used by the interlocutor.
- the spectrum of the sound signal is compared to characteristic spectra of phonemes and oral expressions (such as laughter), said characteristic spectra being stored in the read only memory 17 of the computer. 15.
- this analysis is carried out dynamically, which means that it is not always a single instantaneous spectrum which makes it possible to determine the oral expression but also sometimes a succession of spectra, said succession possibly being characteristic of an oral expression.
- each oral expression is related to a facial expression, for example the phoneme pronounced on reading the letter "O" corresponds to a facial expression in which the jaws are slightly apart and the lips form a small round opening.
- Each of these facial expressions is matched, during operation 207, with a succession of movements of points characteristic of the face model which supports the face of the interlocutor. For example if the pronunciation of the letter "O" is carried out for a period of one second, between two rests, the successive movements of the model and therefore of the modeled person, correspond:
- Operation 208 consists in putting the face in motion according to the successive values kept in the freqvoc register (FIG. 7) to take account of the pressing of the keys of the keyboard 3. This operation 208 corresponds for example to a game which the l 'interlocutor.
- Operation 209 consists in inserting the face resulting from operations 207 and 208 into a scene whose characteristics also depend on the oral expressions and the keyboard keys used. For example, during a game, oral expressions can be used to distort an object, and pressing keys can be used to move the object.
- Operation 210 corresponds to the display of the scene comprising the face, to its memorization and to the emission of the image.
- the interlocutor sees his image on a television and he uses, on the one hand the keyboard 3 of his telephone apparatus 1, and on the other hand his voice, to play the game considered in seeing the animated image of his face on his television screen.
- FIG. 7 represents a hardware layout of a device presented in FIG. 5
- This device is, here, organized according to architectures known in the field of computers, around a computer communication bus 20 to which are connected a central computing unit 21, of known type, comprising in particular a processor, a random access memory 16 which includes memory registers and in particular the freqvoc register, a read only memory 17 which stores the operating program of the device,
- the video output port 18 is of known type, it provides, according to variants, either a signal adapted to television standards, for example the CCIR standard in Europe, or a signal adapted to standards relating to screen displays. computer, for example the super VGA standard.
- the sound input and processing port 19 is of known type. On the one hand it digitizes the sound, and on the other hand it extracts an instantaneous spectrum (figure 10). The information resulting from this processing is stored in the RAM 16.
- the input and image processing port 22 is of known type. It allows to put in RAM 16 digital data representative of a visual scene.
- Figure 8 shows a photograph of a child's face. We observe that this child has flat hair, glasses, each glass of which is noticeably wider in the upper part than in the lower part, that he has the ears normally glued and that he sports a smile while keeping his mouth closed. .
- Figure 9 shows a model of the face shown in Figure 8, to which an articulated arm microphone has been added.
- this child model has flat hair, glasses, each glass of which is noticeably wider in the upper part than in the lower part, that his ears are normally glued and that he sports a smile keeping your mouth closed.
- the modeled face created by the modeling means 8 during operation 201 provides an image which is intermediate in the metamorphosis of the images presented in FIGS. 8 and 9.
- the faces of FIGS. 8 and 9 can be associated by characteristic points 60, on the forehead, 61, on the cheekbones, 62, at the corners of the lips and 63 on the chin.
- These points called "primary" are those which are the sources or landmarks of facial deformation, that is to say that the other points of the face are displaced as a function of the displacement of these primary points, to represent the elasticity of the facial skin.
- FIG. 10 represents the spectrum of a sound signal representative of a voice.
- FIG. 11 represents the expression which corresponds to the sound signal presented in FIG. 10.
- This expression of laughter includes, compared to the same expressionless face:
- FIG. 12 represents the animated image corresponding to the expression carried by the voice presented in FIG. 10.
- Each point of the modeled face, represented in FIG. 9, is associated with three primary points, vertices of a triangle which surrounds the point considered. This point is moved in proportion to the deformation and the displacement of the vertices of this triangle, so that the displacements are continuous on all the surface of the triangle.
- the modeler 8 no longer works from a photograph, but from information transmitted by the interlocutor using the telephone 2.
- this user indicates the characteristics of his face, hair, glasses, braces, skin color, thinness and other optical characteristics.
- it can use many known techniques, for example by stating these characteristics, by using a computer and a modem, by using the keyboard 3, in replacement of the information provided by the scanner 7.
- the modeler 8 receives information representative of an image of a modeled person kept in memory and a voice intended to be associated with this modeled person.
- the memorized image includes the marks or primary points presented above.
- information representative may indicate the political or media person who is represented and the voice may be that person's voice or an imitation of that person's voice.
- the storage of the modeled persons adapted to be animated by the voices can be carried out either by remote transmission, for example by the telephone network or by a hertzian emission, or by distribution of memory supports, like optical discs or magneto-optics or compact discs (CD-ROM).
- the invention thus allows the creation of news or entertainment television channels using only the bandwidth corresponding to human voices, possibly supplemented by digital information processed on reception so as not to be audible by the viewers.
- a broadcasting station can add to its normal programs information which, processed on reception, will allow the device according to the invention to determine the modeled person to be animated by the voice transmitted by said station (this modeled person being able, in addition be animated by non-audible digital information transmitted by the broadcast signal).
- users of communication networks such as the INTERNET network, will not only be able to transmit their image and animate it with their voice, but also to transmit images and voices of other people.
- the image of the interlocutor is taken by a video camera produced with an image digitization card.
- the message or sound analysis means performs only the measurement of an instantaneous sound intensity and the model animation means performs only the animation of the mouth of the modeled face, by opening the mouth by vertical separation of the lips, with an opening of the lips all the greater as the instantaneous sound intensity analyzed by the analysis means is high.
- the interlocutor whose image is transmitted transmits his electronic image, captured by a known photoelectric sensor, by positioning himself the primary points or landmarks allowing to animate this image using the oral expressions transmitted by voice, as described above.
- a computer system comprising a computer and a pointing device, such as for example a mouse, may be used according to techniques known to those skilled in the art.
- the invention is particularly applicable to transmissions of audio-visual messages on a computer network, of the INTERNET type, to the broadcasting of television or radio programs, to the incorporation into game consoles, computers, editing benches. audiovisual or television (not shown).
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP97928304A EP0907934A1 (fr) | 1996-06-03 | 1997-06-03 | Dispositif et procede de transmission d'images animees et sonorisees |
AU32653/97A AU3265397A (en) | 1996-06-03 | 1997-06-03 | Device and method for transmitting animated and sound images |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9606813A FR2749420B1 (fr) | 1996-06-03 | 1996-06-03 | Procede et dispositif de formation d'images animees d'un interlocuteur |
FR96/06813 | 1996-06-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997046974A1 true WO1997046974A1 (fr) | 1997-12-11 |
Family
ID=9492658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR1997/000981 WO1997046974A1 (fr) | 1996-06-03 | 1997-06-03 | Dispositif et procede de transmission d'images animees et sonorisees |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0907934A1 (fr) |
AU (1) | AU3265397A (fr) |
FR (1) | FR2749420B1 (fr) |
WO (1) | WO1997046974A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2359459A (en) * | 2000-02-18 | 2001-08-22 | Sensei Ltd | Mobile telephone with animated display |
WO2001075805A1 (fr) * | 2000-03-31 | 2001-10-11 | Telecom Italia Lab S.P.A. | Procede d'animation d'une maquette synthetisee d'un visage humain actionnee par un signal acoustique |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2250405A (en) * | 1990-09-11 | 1992-06-03 | British Telecomm | Speech analysis and image synthesis |
-
1996
- 1996-06-03 FR FR9606813A patent/FR2749420B1/fr not_active Expired - Fee Related
-
1997
- 1997-06-03 EP EP97928304A patent/EP0907934A1/fr not_active Withdrawn
- 1997-06-03 AU AU32653/97A patent/AU3265397A/en not_active Abandoned
- 1997-06-03 WO PCT/FR1997/000981 patent/WO1997046974A1/fr not_active Application Discontinuation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2250405A (en) * | 1990-09-11 | 1992-06-03 | British Telecomm | Speech analysis and image synthesis |
Non-Patent Citations (3)
Title |
---|
MORISHIMA E.A.: "A FACIAL MOTION SYNTHESIS FOR INTELLIGENT MAN-MACHINE INTERFACE", SYSTEMS & COMPUTERS IN JAPAN, vol. 22, no. 5, 1991, NEW YORK US, pages 50 - 59, XP000240754 * |
MORISHIMA E.A.: "FACIAL EXPRESSION SYNTHESIS BASED ON NATURAL VOICE FOR VIRTUAL FACE-TO-FACE COMMUNICATION WITH MACHINE", IEEE VIRTUAL REALITY ANNUAL INTERNATIONAL SYMPOSIUM, 18 September 1993 (1993-09-18) - 22 September 1993 (1993-09-22), NEW-YORK NY US, pages 486 - 491, XP000457717 * |
TAKEUCHI AND NAGAO: "COMMUNICATIVE FACIAL DISPLAYS AS A NEW CONVERSATIONAL MODALITY", INTERCHI '93 CONFERENCE PROCEEDINGS, 24 April 1993 (1993-04-24) - 29 April 1993 (1993-04-29), AMSTERDAM, pages 187 - 193, XP000473765 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2359459A (en) * | 2000-02-18 | 2001-08-22 | Sensei Ltd | Mobile telephone with animated display |
WO2001075805A1 (fr) * | 2000-03-31 | 2001-10-11 | Telecom Italia Lab S.P.A. | Procede d'animation d'une maquette synthetisee d'un visage humain actionnee par un signal acoustique |
US7123262B2 (en) | 2000-03-31 | 2006-10-17 | Telecom Italia Lab S.P.A. | Method of animating a synthesized model of a human face driven by an acoustic signal |
Also Published As
Publication number | Publication date |
---|---|
EP0907934A1 (fr) | 1999-04-14 |
FR2749420A1 (fr) | 1997-12-05 |
FR2749420B1 (fr) | 1998-10-02 |
AU3265397A (en) | 1998-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6925438B2 (en) | Method and apparatus for providing an animated display with translated speech | |
US8447065B2 (en) | Method of facial image reproduction and related device | |
US20020007276A1 (en) | Virtual representatives for use as communications tools | |
WO2018108013A1 (fr) | Procédé et terminal d'affichage de support | |
CA2228901A1 (fr) | Alignement automatise de signaux vocaux pour la synthese d'image | |
US20030163315A1 (en) | Method and system for generating caricaturized talking heads | |
WO2009071795A1 (fr) | Systeme d'interpretation simultanee automatique | |
US20040107106A1 (en) | Apparatus and methods for generating visual representations of speech verbalized by any of a population of personas | |
CN110324702B (zh) | 视频播放过程中的信息推送方法和装置 | |
WO1997046974A1 (fr) | Dispositif et procede de transmission d'images animees et sonorisees | |
KR100453500B1 (ko) | 3차원 아바타 메일 서비스 방법 및 장치 | |
CN114727120B (zh) | 直播音频流的获取方法、装置、电子设备及存储介质 | |
WO2007110551A1 (fr) | Systeme pour personne malentendante | |
EP0179701A1 (fr) | Procédé de télévision pour programmes multilingues | |
FR3058253B1 (fr) | Procede de traitement de donnees audio issues d'un echange vocal, systeme et programme d'ordinateur correspondant. | |
KR20040076524A (ko) | 애니메이션 캐릭터 제작 방법 및 애니메이션 캐릭터를이용한 인터넷 서비스 시스템 | |
KR20100134022A (ko) | 실사 토킹 헤드 생성, 콘텐트 생성, 분배 시스템 및 방법 | |
CN213935598U (zh) | 气球投影的声光互动系统 | |
CN117636897A (zh) | 一种数字人音视频生成系统 | |
Theobald et al. | Lip-reading enhancement for law enforcement | |
EP4349006A1 (fr) | Procédé de communication en réalite mixte, système de communication, programme d'ordinateur et support d'informations | |
FR2835087A1 (fr) | Personnalisation de la presentation sonore de messages synthetises dans un terminal | |
CN113840152A (zh) | 直播关键点处理方法和装置 | |
CN117857891A (zh) | 视频生成方法、装置、电子设备及存储介质 | |
WO2021214097A1 (fr) | Procédé de transposition d'un flux audiovisuel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN YU AM AZ BY KG KZ MD RU TJ TM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1997928304 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 98500277 Format of ref document f/p: F |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 1997928304 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1997928304 Country of ref document: EP |