GB2468140A - A character animation tool which associates stress values with the locations of vowels - Google Patents

A character animation tool which associates stress values with the locations of vowels Download PDF

Info

Publication number
GB2468140A
GB2468140A GB0903270A GB0903270A GB2468140A GB 2468140 A GB2468140 A GB 2468140A GB 0903270 A GB0903270 A GB 0903270A GB 0903270 A GB0903270 A GB 0903270A GB 2468140 A GB2468140 A GB 2468140A
Authority
GB
United Kingdom
Prior art keywords
vowel
speech
character
stress
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0903270A
Other versions
GB0903270D0 (en
Inventor
Charles Cullen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dublin Institute of Technology
Original Assignee
Dublin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dublin Institute of Technology filed Critical Dublin Institute of Technology
Priority to GB0903270A priority Critical patent/GB2468140A/en
Publication of GB0903270D0 publication Critical patent/GB0903270D0/en
Priority to PCT/EP2010/052445 priority patent/WO2010097452A1/en
Publication of GB2468140A publication Critical patent/GB2468140A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Character animation is key to successful and engaging computer animations whether it be for media such as in movies or in computer games. A known difficulty in animation is the linking of the motions of an animated character with spoken words. The present application solves this problem by detecting the locations of vowels in a piece of speech and determining a stress value for the detected vowels and then animating the characters at the vowel locations in a manner consistent with the determined stress values. The locations of vowels are used as a trigger for a character's motion.

Description

A CHARACTER ANIMATION TOOL
Field
The present application is directed to the field of computer animation, in particular to software tools and production workflow solutions for computer animation.
Background
Character animation is key to successful and engaging computer animations whether it be for media such as in movies or in computer games. A known difficulty in animation is the linking of the motions of an animated character with spoken words. Software is known that animates a characters mouth in response to a speech signal, as a result of which an animated characters is seen to appear to utter the words. Whilst this is useful, the results tend to be regarded by viewers as unnatural. Other techniques have been employed which attempt to process speech to match an animated characters mouth. Again the results of this tend to be unnatural.
Some systems have made investigations into the role of more extensive face and body movements (notably the MIT BEAT prototype). The BEAT system performs linguistic analysis of synthesized text-to-speech (US) audio output in an attempt to predict the formal structure of the associated gestures and movements. However the benefits of this system are limited and artificial insofar as the system only operates on synthesized speech.
Summary
To date however, none of the prior art has considered the overarching importance of speech rhythm in relation to these gestures and movements, and none have considered the prioritization of speech events in relation to their prominence within the signal. In this regard, it has been identified by the inventor that the prior art methods whilst somewhat effective are lacking. In particular, the inventor has appreciated that in human communication linguistic content only accounts for about 7% , with the acoustic properties of speech (rhythm and prosody) accounting for a further 38% or so. Moreover, he has appreciated that the majority of human communication relies on subtle movements and more expansive gestures that comprise 55% of our interactions. The present application focuses on providing these subtle movements and more expansive gestures and relies upon the rhythm and prosody of the speech signal rather than the linguistic content (as with current speech recognition and lip-synching algorithms) to provide a simple system for assigning these movements and gestures to an animated character. Thus the technology places the emphasis of animation on the same criteria that humans use in communication. The approach of the "stress tagging animation" technique may be compared with the human operated characters such as muppets, which adopt a similar approach of concentrating on the rhythms of hand/head movements rather than lip-synching accuracy. The system presented herein by providing a simple list of events prioritized by rhythm and prosody, allows developers to easily match speech with movements, in contrast to most animations which are built from scratch. With "Stress tagging", content may be re-used and characters and voices easily changed as the timing and priority of animation events resides with the speech signal. This allows for automated tools to be provided that allocate animation to speech events, rather than the converse. In addition, as the "stress tagging framework" focuses on acoustic attributes, it is completely language independent. Thus, tools are envisaged where a particular character may be developed to respond to "stress tags", allowing it to be re-purposed in any language desired as often as needed.
The present application employs a pre-defined library of movements and gestures for several distinct characters (as examples), that may be quickly allocated to the prioritized speech events on a manual, semi-automatic or fully automated basis as required by the production.
A first embodiment provides a speech analysis system for assisting in the animation of at least one character in response to a piece of speech. The system comprises a memory for storing the piece of speech and a vowel locator for identifying the locations of vowels within the piece of speech. A vowel stress detector identifies the degree of stress associated with each identified vowel and stores the associated degree of stress for each location.
The vowel locator may determine the duration of each vowel. Suitably, the piece of speech is stored in a database in the memory. The database may store timestamps indicating locations of vowels and durations of vowels and\or a stress value for each vowel.
The vowel stress detector may score at least one characteristic of each vowel against a reference value for the characteristic. This reference value may be determined by averaging the characteristic over a windowed section of the piece of speech. The windowed section may comprise the entire piece of speech. The characteristic may comprise one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e) jitter, and f) voice breaks.
Preferably, the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c) duration.
The animation tool may provide a character animation feature employing the locations of vowels as a trigger for a character's motion. The motion of a character selected at a particular location may be determined with reference to the degree of stress for that location. The motion of the character is automatically selected based upon the degree of stress. The animation tool may allow an animator to select a particular motion from a list presented, suitably where the list is populated with possible character motions based upon the degree of stress. The list is presented for each vowel location allows an animator to select an animated character's motion at each vowel location.
Description of Drawings
Figure 1 is a block diagram of an exemplary system according to the present application, Figure 2 is a flow chart for exemplary methods according to the present application, and Figure 3 is a graphical user interface for use with the system or method of Figures 2 or 3.
Detailed Description
The present invention will now be described with reference to some exemplary methods and systems, in which speech data is provided to a voice analysis system 2 which in turn analyses the speech data to identify the locations of vowels and the corresponding stress levels of these vowels. The inputted speech is desirably monophonic in nature. The speech 1 may be directly inputted, for example by means of a microphone. Alternatively, a pre-corded piece of speech may be employed. A database 3 stored in local memory or external memory may be employed to store different items of speech content. It will be appreciated that such a database may be readily constructed by one skilled in the art. In addition to storing the items of speech content, the database may store the results of analysis performed upon the items of speech content by a vowel locator engine 4 and vowel stress detector 5. The operation of which will be explained in greater detail below. The voice analysis system may be any general purpose computer including those operating under Windows1M' MacintoshTM or LinuxTM operating systems. The Analysis stage of the system may be performed by any suitable set of DSP audio analysis algorithms, such as provided within MATLABTM as provided by The MathWorks, Inc., Natick, USA, or the specific speech software Praat (Boersma, Paul & Weenink, David (2009). Praat: doing phonetics by computer (Version 5.1) [Computer program]. Retrieved January 31, 2009, from http://www.praat.org/) Boersma, Paul & Weenink, David (2009). Praat: doing phonetics by computer (Version 5.1) [Computer program].
Retrieved January 31, 2009, from http://www.praat.org/or purpose built SDK's like MS Speech. The Animation tool may be implemented by any suitably configured animation engine, such as Adobe Flash in A53.
Once the voice analysis system has performed an analysis, the system can provide vowel stress information 6 to an animation tool 7. The manner and mode of use of the vowel stress information by the animation tool is explained below.
The animation tool 7 may operate on the same computing system as the voice analysis system 2 or operate on a separate computing system. Similarly, the animation tool may be provided within the same software program as the vowel locator and vowel stress detectors or separate programs may be employed for each.
The mode and manner of operation of the system 2 and animation tool 7 will now be explained with reference to some exemplary modes of operation, shown in Figure 2, in which the analysis steps 20 are shown separate to the animation steps 23.
The method commences with a recorded piece of speech content which is to be used with an animated character. The piece of speech may be a single item, e.g. a sentence, or it comprises an entire vocabulary for the character in which different phrases are combined into an overall speech recording.
This overall speech recording may be used for example as a library of speech from which different pieces may be retrieved as required.
A primary step in the method, where necessary, may be employed to convert the piece of speech from stereo to monophonic speech. It will become apparent that whilst stereo speech may be employed by analyzing the Left and Right channels, that for the present purposes it is simpler and more efficient to use a monophonic form of speech. A variety of techniques are known for creating a monophonic signal from a stereo signal, including the abandonment of one channel or the simple addition of the two channels.
The monophonic speech piece is then passed through a vowel detector which is employed to detect the positions of vowels in the piece of speech. Where a vowel is located, its position is marked with a time stamp. Each time stamp suitably identifies the location and duration of the associated vowel. The piece of speech and the associated time stamps may be stored together in the database. Vowel detection techniques are well known in the art. One exemplary technique would employ a simple intensity derivative detector, which takes the differential of the input wave to obtain maxima (vowel peaks). The vowel analysis may, for example, be performed using the FFT algorithm provided as part of the Flash A53 core sound classes available from Adobe Systems Incorporated.
Each vowel is analysed for a number of prosodic characteristics. Each prosodic characteristic is then compared with an overall mean for the particular prosodic characteristic for the entire speech clip.
Exemplary prosodic characteristics which are employed include pitch, intensity and duration. These characteristics have been identified as being particularly important prosodic attributes in human speech.
Other characteristics that may be employed would include, for example but not limited to, voice quality, jitter and voice breaks.
The exemplary method described herein uses a simple scoring system and applies it to the characteristics of each vowel. This scoring system ignores interrelationships between characteristics and treats individual characteristics separately and evenly, i.e. each characteristic is scored identically. It will be appreciated that the scoring system may however be adapted to include a weighted scoring formula.
In the exemplary method however, the individual characteristics (pitch, intensity and duration) of each vowel is compared with the mean for the piece of speech as a whole. Where a characteristic for a vowel has a value which exceeds the average then the vowel receives a score of 1, where two characteristics exceed the mean value, the vowel receives a score of 2 and so on. Thus where the pitch of a vowel is above the average pitch for the piece of speech and where the duration of the vowel exceeds the average duration and the intensity exceeds the average intensity, the vowel would receive a score of 3.
This score is stored with the timestamp for the vowel in the database. As a result, the speech, vowel locations and importance (score) of each vowel location are stored or related together within the database.
The values stored in the database may then be employed with a character animation tool by automatically\semi automatically linking gestures to the locations of the time stamped vowels. In particular, the analysis tool may export an XML file for a piece of speech to the animation tool in which the speech is embedded along with information identifying the locations and scores of vowels.
Character animation tools are well known in the art and the techniques employed would be readily familiar to the skilled person. One common technique is the use of games physics to animate characters based on particular inputs as provided, for example, by an animator. These inputs are converted into motion of the character on the screen. The advantage of these animation tools is that the animator does not have to specify the precise movements for a character between frames. Instead, for example, the start point and end points might be detailed over a particular time span and the animation tool using appropriate mathematics can effectively interpolate the characters movements for each frame between the start and end points.
The present system employs such a tool and provides the timestamp and scoring data with the speech data to the character animation tool. The character animation tool employs the scoring data as an input at each identified time stamp. In an automated mode of operation, different scores may be associated with different character actions or character features. For example, a score of one might be associated with a character winking, whereas a score of two might be associated with movement of the hands and a score of three might be associated with head movement. The characters action is timed to occur at the timestamp and for the duration of the timestamp. An exemplary screen shot from an animation tool using the present methods is shown in Figure 3 in which a section of speech content is represented along an abbreviated time line 64. The section of speech is selectable from the entire piece of speech content which is represented in a smaller scale (graphical section 53). One or more slider features 55a, 55b allows a user to select a section of speech from an overall time line 52 for the speech content. Other features including for example a moving window allow a user to select the region of the speech content to be represented by the abbreviated time line. The vowel stress information is represented by dots 57 for the complete item of speech and by diamonds 60, 58, 56, 54 in a separate region 50 for the abbreviated time line.
The character to be animated is represented in a character region 62 above the time line with a variety of different actions (in this example the hand movements). Each diamond represents a vowel, with the degree of stress identified by differently identified diamonds. Thus in the exemplary screenshot shown, the scoring system described above was used with a maximum score of 3. The stress is thus represented by the relative height on the screen of the diamonds with diamonds with a score of 3 being higher than diamonds with a score of 2 etc. In addition, the diamonds contain a numeric representation of the score.
Similarly, the colours of the diamonds may be different to identify different scores, e.g. a diamond representing a score of three could be red, one with a score of 2 could be blue and a score of one might be colored green. To assist the animator, sections with no speech may also be represented 54. When an animator is using the tool, they may move along the time line selecting individual diamonds. As a diamond is selected, using a mouse for example, a motion selection tool may appear, e.g. a drop down list, allowing the animator to select an action for a character. Different actions can be pre-assigned to each drop down list with different levels, i.e. minor actions assigned to lower stress levels and major actions assigned to higher stress levels. The animator can thus select a major action from the list of major actions for a diamond with a value of 3 and a minor action from a list of minor actions presented for a diamond with a value of 1.
The animation tool generates and stores the character actions in response to the animator's selection. It will be appreciated that the speed with which the animation may be completed is extremely fast since the animator does not need to focus on timing or content. Suitably, the animation tool is one that allows for layering, thus the animator may use one layer to store the characters actions resulting from the speech above with other layers employed to account for a characters general movements about a scene.
Whilst this approach may appear relatively primitive compared to animation generally, the reality is that lip\mouth movement is only used by humans for linguistic information, which in itself accounts for a very small percentage of communication (approx 7%) with the large majority of communication hence being performed by motion of other features (55%). More importantly, the context of the exact gesture is less important than the rhythm of the gestures and the present method by tying the gestures into vowel locations and into the relative importance of vowels in the speech provides an effective animation tool. The automatic animation tool is obviously of importance in situations where an animator is not involved to produce the final piece of content, e.g. in a video game, where a character's actions whilst depending on prerecorded speech content may have other inputs, e.g. from a player.
In a semi-automatic arrangement, the tool allows a user to select from different actions for each time stamp. Thus an animator can select different actions from a dropdown box for each timestamp. In this scenario, the contents of the drop down list may be selected based on the associated score for the timestamp.
This character animation technique employs the use of acoustic, linguistic and emotional speech analysis to semi-automatically generate gestures and body movements in response to the acoustic parameters in a character's voice.
The invention is a platform that enables the creation of computer animations for use in a wide number of applications. It is cutting edge given that instead of basing animation on lip-synch, it uses speech events (acoustic, linguistic and emotional) to both manually and automatically define character movements, gestures and facial positions. The techniques have been demonstrated to work in practice.
A software front end as described above has been implemented that takes in user data (speech) and produces a corresponding animation that is close to half complete in a fraction of the time that would be required by an animator using traditional techniques..
The techniques described herein may be used to produce cheaper, faster and more effective character animations in films, games, children's TV programmes and advertisements.
The advantages include lower costs since the overall production overhead is reduced since character animation events may be characterised by non-animators based on a speech clip, freeing animators to work on other aspects of the animation process. Moreover, the animation process is faster since it is semi-automated using pre-defined libraries that allow up to 70% of the animation to be achieved without customization by an animator. The system is character independent, so that the gesture and movement libraries and characters may easily be changed.
In contrast to prior art methods, the system is largely language independent in that the techniques may be used to semi-automate characters in any spoken language.
The technology is character and language independent and the use of re-usable and pre-defined gesture I movement libraries makes it a cheap, fast and effective alternative to conventional character animation techniques.
The systems of the present application have been implemented with a variety of different characters has been tested in various languages and with various voices and the potential to reduce production costs, save time and streamline workflows have been clearly demonstrated. The process and resulting system is essentially a labour saving device that allows animators to achieve better production values in a shorter period of time, given that it takes care of 70% of the ground work-allowing animators to focus on the nuance and detail of the overall animated output.

Claims (29)

  1. Claims 1. A speech analysis system for assisting in the animation of at least one character to a piece of speech, the system comprising: a memory for storing the piece of speech a vowel locator for identifying the locations of vowels within the piece of speech, a vowel stress detector for identifying the degree of stress associated with each identified vowel and storing the associated degree of stress for each location.
  2. 2. A speech analysis system according to claim 1, wherein the vowel locator identifies the duration of each vowel.
  3. 3. A system according to claim 1 or claim 2, wherein the piece of speech is stored in a database in the memory.
  4. 4. A system according to claim 3, wherein the database stores timestamps indicating locations of vowels and durations of vowels.
  5. 5. A system according to claim 4, wherein the database further stores a stress value for each vowel.
  6. 6. A system according to any preceding claim, wherein the vowel stress detector scores at least one characteristic of each vowel against a reference value for the characteristic.
  7. 7. A system according to claim 6, wherein the reference value is determined by averaging the characteristic over a windowed section of the piece of speech.
  8. 8. A system according to claim 7, wherein the windowed section comprises the entire piece of speech.
  9. 9. A system according to anyone of claims 6 to 8, wherein the at least one characteristic comprises one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e)jitter,and f) voice breaks.
  10. 10. A system according to anyone of claims 6 to 9, wherein the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c)duration.
  11. 11. An animation system comprising the system according to any preceding claim, wherein the animation tool provides a character animation feature employing the locations of vowels as a trigger for a character's motion.
  12. 12. A system according to claim 11, wherein the motion of a character selected at a particular location is determined with reference to the degree of stress for that location.
  13. 13. A system according to claim 12, wherein the motion of the character is automatically selected based upon the degree of stress.
  14. 14. A system according to claim 12, wherein the animation tool allows an animator to select a particular motion from a list presented.
  15. 15. A system according to claim 14, wherein the list is populated with possible character motions based upon the degree of stress.
  16. 16. A system according to claim 14 or claim 15, wherein the list is presented for each vowel location allowing an animator to select an animated character's motion at each vowel location.
  17. 17. A computer implemented method of animating a character's actions to a piece of speech, the method comprising the steps of: analyzing the piece of speech to identify at least one locations of a vowel, determining the degree of stress associated with the at least one identified vowel location and selecting the characters action at that at least one location based on the determined degree of stress.
  18. 18. A method according to claim 17 wherein the duration of the at least one vowel is determined.
  19. 19. A method according to claim 17 or claim 18 wherein the degree of stress is determined by comparing at least one characteristic of each vowel against a reference value for the characteristic.
  20. 20. A method according to claim 19, wherein the reference value is determined by averaging the characteristic over a windowed section of the piece of speech.
  21. 21. A method according to claim 19 wherein the windowed section comprises the entire piece of speech.
  22. 22. A method according to any one of claims 17 to 21, wherein the at least one characteristic comprises one or more of the following: a) pitch, b) intensity, c) duration, d) voice quality, e)jitter, and f) voice breaks.
  23. 23. A method according to any one of claims 17 to 23, wherein the at least one characteristic comprises the following characteristics: a) pitch, b) intensity and c) duration.
  24. 24. A method wherein the locations of vowels are used as a trigger for a character's motion in the animation.
  25. 25. A method according to claim 24 wherein the character's motion at a location is determined with reference to the degree of stress for that location.
  26. 26. A method according to claim 25, wherein the motion of the character is automatically selected based upon the degree of stress.
  27. 27. A method according to claim 25, further comprising presenting an animator with a list of possible character motions and allowing the animator to select a particular motion from the list.
  28. 28. A method according to claim 27, wherein the list is populated with possible character motions based upon the degree of stress.
  29. 29. A method according to claim 27 or claim 28, wherein the list is presented for each vowel location allowing an animator to select an animated character's motion at each vowel location.
GB0903270A 2009-02-26 2009-02-26 A character animation tool which associates stress values with the locations of vowels Withdrawn GB2468140A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB0903270A GB2468140A (en) 2009-02-26 2009-02-26 A character animation tool which associates stress values with the locations of vowels
PCT/EP2010/052445 WO2010097452A1 (en) 2009-02-26 2010-02-25 A character animation tool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0903270A GB2468140A (en) 2009-02-26 2009-02-26 A character animation tool which associates stress values with the locations of vowels

Publications (2)

Publication Number Publication Date
GB0903270D0 GB0903270D0 (en) 2009-04-08
GB2468140A true GB2468140A (en) 2010-09-01

Family

ID=40565755

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0903270A Withdrawn GB2468140A (en) 2009-02-26 2009-02-26 A character animation tool which associates stress values with the locations of vowels

Country Status (2)

Country Link
GB (1) GB2468140A (en)
WO (1) WO2010097452A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997036288A1 (en) * 1996-03-26 1997-10-02 British Telecommunications Plc Image synthesis
WO2007076279A2 (en) * 2005-12-29 2007-07-05 Motorola Inc. Method for classifying speech data
WO2008025918A1 (en) * 2006-09-01 2008-03-06 Voxler Procedure for analyzing the voice in real time for the control in real time of a digital device and associated device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
KR100317036B1 (en) 1999-10-27 2001-12-22 최창석 Automatic and adaptive synchronization method of image frame using speech duration time in the system integrated with speech and face animation
DE60224776T2 (en) * 2001-12-20 2009-01-22 Matsushita Electric Industrial Co., Ltd., Kadoma-shi Virtual Videophone
FR2906056B1 (en) * 2006-09-15 2009-02-06 Cantoche Production Sa METHOD AND SYSTEM FOR ANIMATING A REAL-TIME AVATAR FROM THE VOICE OF AN INTERLOCUTOR

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997036288A1 (en) * 1996-03-26 1997-10-02 British Telecommunications Plc Image synthesis
WO2007076279A2 (en) * 2005-12-29 2007-07-05 Motorola Inc. Method for classifying speech data
WO2008025918A1 (en) * 2006-09-01 2008-03-06 Voxler Procedure for analyzing the voice in real time for the control in real time of a digital device and associated device

Also Published As

Publication number Publication date
GB0903270D0 (en) 2009-04-08
WO2010097452A1 (en) 2010-09-02

Similar Documents

Publication Publication Date Title
US9361722B2 (en) Synthetic audiovisual storyteller
US7136818B1 (en) System and method of providing conversational visual prosody for talking heads
US7076430B1 (en) System and method of providing conversational visual prosody for talking heads
Fanelli et al. A 3-d audio-visual corpus of affective communication
US7571099B2 (en) Voice synthesis device
US7933772B1 (en) System and method for triphone-based unit selection for visual speech synthesis
US20170287481A1 (en) System and method to insert visual subtitles in videos
Sargin et al. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation
Albrecht et al. Automatic generation of non-verbal facial expressions from speech
Fernández-Baena et al. Gesture synthesis adapted to speech emphasis
JP4543263B2 (en) Animation data creation device and animation data creation program
Liu et al. Realistic facial expression synthesis for an image-based talking head
Gibbon et al. Audio-visual and multimodal speech-based systems
Oh et al. Characteristic contours of syllabic-level units in laughter.
Edwards et al. Jali-driven expressive facial animation and multilingual speech in cyberpunk 2077
CN116309984A (en) Mouth shape animation generation method and system based on text driving
EP4379716A1 (en) System and method of modulating animation curves
Pan et al. Vocal: Vowel and consonant layering for expressive animator-centric singing animation
JP2009278202A (en) Video editing device, its method, program, and computer-readable recording medium
Tao et al. Emotional Chinese talking head system
Liu et al. Real-time speech-driven animation of expressive talking faces
Kolivand et al. Realistic lip syncing for virtual character using common viseme set
GB2468140A (en) A character animation tool which associates stress values with the locations of vowels
Lee et al. Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Martin et al. Multimodal caricatural mirror

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)