EP2059926A2 - Verfahren und system zur animation eines avatars in echtzeit unter verwendung der stimme eines sprechers - Google Patents
Verfahren und system zur animation eines avatars in echtzeit unter verwendung der stimme eines sprechersInfo
- Publication number
- EP2059926A2 EP2059926A2 EP07848234A EP07848234A EP2059926A2 EP 2059926 A2 EP2059926 A2 EP 2059926A2 EP 07848234 A EP07848234 A EP 07848234A EP 07848234 A EP07848234 A EP 07848234A EP 2059926 A2 EP2059926 A2 EP 2059926A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- avatar
- state
- animation
- parameters
- elementary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000033001 locomotion Effects 0.000 claims abstract description 14
- 230000005236 sound signal Effects 0.000 claims abstract description 12
- 238000004891 communication Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000007704 transition Effects 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000009877 rendering Methods 0.000 claims description 7
- 230000033764 rhythmic process Effects 0.000 claims description 7
- 230000001256 tonic effect Effects 0.000 claims description 5
- 230000001427 coherent effect Effects 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 2
- 230000001360 synchronised effect Effects 0.000 abstract description 4
- 230000007935 neutral effect Effects 0.000 description 29
- 230000006870 function Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 235000012907 honey Nutrition 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
Definitions
- the present invention relates to a method for animating an avatar in real time from the voice of an interlocutor.
- the invention finds. a particularly important application although not exclusive, in the field of mobile devices such as mobile phones or more generally personal devices for portable communication or PDA (English initials for Personal Digital Apparatus).
- Such graphics can then be ⁇ previously integrated the phone and then be referred to when necessary in a telephone conversation.
- Such a system does not solve the control of facial expressions of the avatar depending on the speaker, especially in a synchronized manner.
- Also known is a method of animating an entity on a mobile phone consisting of selecting and digitally processing the words of a message from which "visemes" are identified which are used to modify the mouth of the entity when the voice message is output.
- Such a method in addition to being based on the use of words, and not sounds as such, is limited and gives a mechanical appearance to the visual image of the entity.
- the present invention aims at providing a method and a system for animating a real-time avatar better than those previously known to the requirements of the practice, in particular in that it allows real-time animation not only of the mouth, but also the body of an avatar on a mobile device of reduced capacity such as a mobile phone, with excellent synchronization of movements.
- the invention starts with the idea of using the richness of sound and not just the words themselves.
- the present invention notably proposes a method of animation on a mobile device screen of an avatar equipped with a mouth from a sound input signal corresponding to the voice of a telephone communication interlocutor.
- the sound input signal is converted in real time into an audio and video stream in which on the one hand the movements of the mouth of the avatar are synchronized with the phonemes detected in said sound input signal.
- at least one other part of the avatar is animated coherently with said signal by changes of attitudes and movements by analysis of said signal, and in that in addition to the phonemes, the signal is analyzed.
- level 1 parameters namely the periods of silence, the speech periods and / or other elements contained in said sound signal taken from the ( prosody, intonation, rhythm and / or tonic accent, so that the entire avatar moves and seems to speak in real time or substantially in real time in place of the interlocutor.
- avatars include body and / or arms, neck, legs, eyes, eyebrows, hair, etc., other than the actual mouth. These are therefore not set in motion independently of the signal.
- the avatar is chosen and / or configured through an on-line service on the Internet;
- the mobile device is a mobile phone;
- to animate the avatar we exploit elementary sequences, consisting of images generated by a calculation of 3D rendering, or generated from drawings; elementary sequences are loaded into memory at the beginning of the animation and stored in said memory for the duration of the animation for several simultaneous and / or successive interlocutors;
- the elementary sequence to be played is selected in real time, according to previously calculated and / or determined parameters;
- the list of elementary sequences being common to all the avatars that can be used in the mobile device, an animation graph is defined in which each node represents a point or transition state between two elementary sequences, each connection between two transition states being unidirectional and all the elementary sequences connected through the same state to be visually compatible with the transition from the end of one elementary sequence to the beginning of the other; each elementary sequence is duplicated so as to show a character who speaks or is silent according to the detection or not of a voice sound;
- P e ⁇ Pi x Ci with Pi value of the level 2 parameter calculated from the level 1 parameters detected in the voice and Ci coefficient of the state e according to the dimension i, this calculation being carried out for all states connected to the state to which the current sequence ends in the graph; when an elementary sequence is in progress, the elementary sequence is allowed to go on until the end or we go on to the duplicated sequence that speaks when the voice is detected and vice versa, then, when the sequence ends and When a new state is reached, the next target state is chosen according to a probability defined by the calculations of the probability value of the states connected to the current state.
- the invention also proposes a system implementing the above method.
- an animation system of an avatar equipped with a mouth from a sound input signal corresponding to the voice of a telephone communication interlocutor characterized in that it comprises a mobile telecommunication device for receiving the sound input signal emitted by an external telephone source, a signal receiving proprietary server comprising means for analyzing said signal and transforming in real time said sound input signal into an audio and video stream, calculating means arranged on the one hand to synchronize the movements of the mouth of the avatar transmitted • in said stream with the phonemes detected in said input sound signal and secondly to animate at least another portion of the avatar in a manner coherent with said signal by changes of attitudes and movements, in that it comprises means for analyzing the input sound signal to detect and use to animate one or more additional parameters said parameters level 1 1, namely silence periods, periods of speech and / or other elements contained in said sound signal taken from prosody, intonation, rhythm and / or tonic accent, and that it comprises means for transmitting the images of the avatar and the corresponding sound signal, so that the avatar seems to move and speak
- the system comprises means for configuring the avatar through an online service on the Internet network.
- it comprises means for constituting and storing on a server, elementary animated sequences for animating the avatar, consisting of images generated by a 3D rendering calculation, or generated from drawings.
- it comprises means for selecting in real time the elementary sequence to be played, according to parameters previously calculated and / or determined.
- each node represents a point or transition state between two elementary sequences, each connection between two transition states being unidirectional and all the sequences connected through the same state to be visually compatible with the transition from the end of an elementary sequence to the beginning of the other.
- it comprises means for duplicating each elementary sequence so as to make it possible to show a character who speaks or is silent according to the detection or not of a voice.
- level 2 parameters are used to calculate the so-called level 2 parameters that correspond to features such as the character slow, fast, jerky, happy, sad, or other equivalent type of characters and animating the avatar at least in part from said level 2 parameters.
- parameter of type equivalent to a level 2 parameter we mean a more complex parameter designed from the level 1 parameters, which are themselves simpler.
- the level 2 parameters correspond to an analysis and / or a regrouping of the level 1 parameters, which will make it possible to further refine the states of the characters by making them more suitable for what we wish to represent. .
- Level 2 parameters are considered as dimensions according to which a series of coefficients are defined with values which are fixed for each state of the animation graph.
- computing means are arranged to calculate for a state e the probability value:
- FIG. 1 is a block diagram showing an animation system for an avatar according to the invention
- FIG. 2 gives a state graph as implemented according to the embodiment of the invention more particularly described here.
- Figure 3 shows three types of image sequences, including that obtained with the invention in connection with a sound input signal.
- FIG. 4 schematically illustrates another mode of implementation of the state graph implemented according to the invention.
- Figure 5 shows schematically the method of selecting a state from the relative probabilities, according to an embodiment of the invention.
- FIG. 6 shows an example of a sound input signal allowing the construction of a series of states, to be used for constructing the behavior of the avatar according to the invention.
- Figure 7 shows an example of initial setting made from the mobile phone of the calling party.
- FIG. 1 schematically shows the principle of an animation system 1 for avatar 2, 2 'on a screen 3, 3', 3 '' of mobile apparatus 4, 4 ', 4' '.
- the avatar 2 is provided with a mouth 5, 5 'and is animated from a sound input signal 6 corresponding to the voice 7 of a communication interlocutor 8 by means of a mobile phone 9, or any other means of communication of the sound (fixed telephone, computer, ).
- the system 1 comprises, from a server 10 belonging to a network (telephone, Internet ...), a proprietary server 11 for receiving signals 6.
- This server comprises means 12 for analyzing the signal and real-time transformations of said audio and videomultiplexed stream signal 13 in two voices 14, 15; 14 ', 15' in the case of mobile reception 3D or 2D, or in one voice IG in case of said mobile video.
- the text is scripted in 20 to be transmitted as sound image files 21, before compression in 22 and sent to the mobile 4 '', in the form video stream 23.
- the result obtained is that the avatar 2, and in particular its mouth 5, seems to speak in real time in the place of the interlocutor 8 and that the behavior of the avatar (attitude, gestures) is coherent with the voice.
- the sound signal is analyzed from a buffer corresponding to a small time interval (approximately 10 milliseconds).
- a small time interval approximately 10 milliseconds.
- Each sequence consists of a series of images produced by a 3D or 2D animation software known in themselves, such as for example the software 3dsMax and Maya of the American company Autodesk and XSI of the French company Softimage, or classic proprietary 3D rendering tools, or even digitized drawings.
- 3D or 2D animation software known in themselves, such as for example the software 3dsMax and Maya of the American company Autodesk and XSI of the French company Softimage, or classic proprietary 3D rendering tools, or even digitized drawings.
- a graph 24 of states is then defined (see FIG. 2) in which each node (or state) 26, 27, 28, 29, 30 is defined as a point of transition between elementary sequences.
- connection between two states is unidirectional, in one direction or the other (arrows 25).
- Each elementary sequence is duplicated to show a character who speaks or a character who is silent, depending on whether or not detected words in the voice.
- FIG. 3 shows a sequence of images as obtained with speech 32, the same sequence without speech 33, and depending on the sound input (curve 34) transmitted by the interlocutor, the resulting sequence 35.
- level 1 parameters whose value varies over time and whose average is calculated over a certain interval, for example 100 milliseconds.
- These parameters are, for example: the activity of speech (silence or speech signals) the rhythm of speech the tone (acute or severe) if it is a non-tonal language the length of the vowels the presence more less important tonal accent.
- the speech activity parameter can be calculated as a first approximation, from the power of the sound signal (integral of the signal squared) by considering that there is speech above a certain threshold.
- the threshold is dynamically calculable according to the signal-to-noise ratio. Frequency filtering is also possible to avoid considering for example the passage of a truck as the voice.
- the rhythm of the speech is calculated from the average frequency of the periods of silence and speech.
- Other parameters are also calculable from a frequency analysis of the signal. According to the mode of the invention more particularly described here, simple mathematical formulas (linear combinations, threshold functions, Boolean functions) make it possible to pass from these level 1 parameters to so-called level 2 parameters which correspond to characteristics such as by example the slow, fast, jerky, happy, sad character, etc.
- the level 2 parameters are considered as dimensions according to which one defines a series of coefficients Ci with fixed values for each state e of the graph of animation. Examples of such a parameterization are given below.
- the level 1 parameters are calculated.
- This sum is a relative probability of the state e (relative to the other states) of being selected.
- Some sequences are loops that start from a state and return to it (arrow 31), they are used when the sequencer decides to keep the avatar in its current state, that is to say, chooses as target state following the current state itself.
- Example of generation of animation initialize current state to a predefined starting state initialize state target to null initialize current sequence with zero sequence as long as an incoming audio stream is received: o decode incoming audio stream o calculate level 1 parameters o if current animation sequence is complete:
- target state zero state o if target state zero:
- level 1 parameters indicate the presence of lyrics
- level 2 parameters indicate -. cheerful voice (corresponding to "Hello")
- the probabilistic draw selects the merry target state.
- Level 2 parameters indicate an interrogative voice
- the relative probability of the state 40 is determined with respect to the value calculated above. If the value (arrow 45) is at a certain level, the corresponding state is selected (in the figure, state 42).
- the state graph connects unidirectionally (in both directions) all these states as a star (link 52).
- the dimensions are thus defined, for the calculation of the relative probabilities (dimensions of the parameters and the coefficients):
- IDLE values indicating a silence period
- SPEAK values indicating a speech period
- NEUTRAL values indicating a neutrality period
- GREETING values indicating a reception or presentation phase
- Formulas for passing from first level to second level parameters are also defined:
- step 1 • 8 user configures the settings of the movie he wants to customize.
- step 2 the parameters are transmitted in the form of requests to the server application (server 11) which interprets them, creates the video, and sends it (link 13) to the encoding application.
- step 3 the video sequences are compressed to the "good" format, that is to say readable by the mobile terminals before step 4 where the compressed video sequences are transmitted (links 18, 19, 18 ', 19' 23) to the recipient for example by MMS.
- the invention is not limited to the embodiment more particularly described but encompasses all the variants and in particular those where the ' diffusion is done offline and not in real time or near real time.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Processing Or Creating Images (AREA)
- Telephone Function (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0608078A FR2906056B1 (fr) | 2006-09-15 | 2006-09-15 | Procede et systeme d'animation d'un avatar en temps reel a partir de la voix d'un interlocuteur. |
PCT/FR2007/001495 WO2008031955A2 (fr) | 2006-09-15 | 2007-09-14 | Procede et systeme d'animation d'un avatar en temps reel a partir de la voix d'un interlocuteur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2059926A2 true EP2059926A2 (de) | 2009-05-20 |
Family
ID=37882253
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07848234A Withdrawn EP2059926A2 (de) | 2006-09-15 | 2007-09-14 | Verfahren und system zur animation eines avatars in echtzeit unter verwendung der stimme eines sprechers |
Country Status (4)
Country | Link |
---|---|
US (1) | US20090278851A1 (de) |
EP (1) | EP2059926A2 (de) |
FR (1) | FR2906056B1 (de) |
WO (1) | WO2008031955A2 (de) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2468140A (en) * | 2009-02-26 | 2010-09-01 | Dublin Inst Of Technology | A character animation tool which associates stress values with the locations of vowels |
US9665563B2 (en) * | 2009-05-28 | 2017-05-30 | Samsung Electronics Co., Ltd. | Animation system and methods for generating animation based on text-based data and user information |
US20120058747A1 (en) * | 2010-09-08 | 2012-03-08 | James Yiannios | Method For Communicating and Displaying Interactive Avatar |
US20120069028A1 (en) * | 2010-09-20 | 2012-03-22 | Yahoo! Inc. | Real-time animations of emoticons using facial recognition during a video chat |
US8948893B2 (en) | 2011-06-06 | 2015-02-03 | International Business Machines Corporation | Audio media mood visualization method and system |
EP2783349A4 (de) * | 2011-11-24 | 2015-05-27 | Nokia Corp | Verfahren, vorrichtung und computerprogrammprodukt zur erzeugung von mit multimedia-inhalten assoziierten bewegten bildern |
RU2481640C1 (ru) * | 2011-12-01 | 2013-05-10 | Корпорация "Самсунг Электроникс Ко., Лтд" | Способ и система генерации анимированных художественных эффектов на статичных изображениях |
US9035955B2 (en) | 2012-05-16 | 2015-05-19 | Microsoft Technology Licensing, Llc | Synchronizing virtual actor's performances to a speaker's voice |
US9325809B1 (en) * | 2012-09-07 | 2016-04-26 | Mindmeld, Inc. | Audio recall during voice conversations |
GB201301981D0 (en) * | 2013-02-04 | 2013-03-20 | Headcast Ltd | Presenting audio/visual animations |
GB201315142D0 (en) * | 2013-08-23 | 2013-10-09 | Ucl Business Plc | Audio-Visual Dialogue System and Method |
US20150287403A1 (en) * | 2014-04-07 | 2015-10-08 | Neta Holzer Zaslansky | Device, system, and method of automatically generating an animated content-item |
US11289077B2 (en) * | 2014-07-15 | 2022-03-29 | Avaya Inc. | Systems and methods for speech analytics and phrase spotting using phoneme sequences |
US10291597B2 (en) | 2014-08-14 | 2019-05-14 | Cisco Technology, Inc. | Sharing resources across multiple devices in online meetings |
US10542126B2 (en) | 2014-12-22 | 2020-01-21 | Cisco Technology, Inc. | Offline virtual participation in an online conference meeting |
US9948786B2 (en) | 2015-04-17 | 2018-04-17 | Cisco Technology, Inc. | Handling conferences using highly-distributed agents |
US10592867B2 (en) | 2016-11-11 | 2020-03-17 | Cisco Technology, Inc. | In-meeting graphical user interface display using calendar information and system |
US10516707B2 (en) | 2016-12-15 | 2019-12-24 | Cisco Technology, Inc. | Initiating a conferencing meeting using a conference room device |
US10440073B2 (en) | 2017-04-11 | 2019-10-08 | Cisco Technology, Inc. | User interface for proximity based teleconference transfer |
US10375125B2 (en) | 2017-04-27 | 2019-08-06 | Cisco Technology, Inc. | Automatically joining devices to a video conference |
US10375474B2 (en) | 2017-06-12 | 2019-08-06 | Cisco Technology, Inc. | Hybrid horn microphone |
US10477148B2 (en) | 2017-06-23 | 2019-11-12 | Cisco Technology, Inc. | Speaker anticipation |
US10516709B2 (en) | 2017-06-29 | 2019-12-24 | Cisco Technology, Inc. | Files automatically shared at conference initiation |
US10706391B2 (en) | 2017-07-13 | 2020-07-07 | Cisco Technology, Inc. | Protecting scheduled meeting in physical room |
US10091348B1 (en) | 2017-07-25 | 2018-10-02 | Cisco Technology, Inc. | Predictive model for voice/video over IP calls |
US10812430B2 (en) * | 2018-02-22 | 2020-10-20 | Mercury Universe, LLC | Method and system for creating a mercemoji |
US10580187B2 (en) * | 2018-05-01 | 2020-03-03 | Enas TARAWNEH | System and method for rendering of an animated avatar |
KR20210117066A (ko) * | 2020-03-18 | 2021-09-28 | 라인플러스 주식회사 | 음향 기반 아바타 모션 제어 방법 및 장치 |
CN111988658B (zh) * | 2020-08-28 | 2022-12-06 | 网易(杭州)网络有限公司 | 视频生成方法及装置 |
CN116762103A (zh) * | 2021-01-13 | 2023-09-15 | 三星电子株式会社 | 电子装置及在该电子装置中运行化身视频服务的方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6839672B1 (en) * | 1998-01-30 | 2005-01-04 | At&T Corp. | Integration of talking heads and text-to-speech synthesizers for visual TTS |
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
EP1345179A3 (de) * | 2002-03-13 | 2004-01-21 | Matsushita Electric Industrial Co., Ltd. | Vorrichtung und Verfahren zur Animation von Computergrafiken |
AU2003218320A1 (en) * | 2002-03-21 | 2003-10-08 | U.S. Army Medical Research And Materiel Command | Methods and systems for detecting, measuring, and monitoring stress in speech |
US7136818B1 (en) * | 2002-05-16 | 2006-11-14 | At&T Corp. | System and method of providing conversational visual prosody for talking heads |
GB2423905A (en) * | 2005-03-03 | 2006-09-06 | Sean Smith | Animated messaging |
US7983910B2 (en) * | 2006-03-03 | 2011-07-19 | International Business Machines Corporation | Communicating across voice and text channels with emotion preservation |
-
2006
- 2006-09-15 FR FR0608078A patent/FR2906056B1/fr not_active Expired - Fee Related
-
2007
- 2007-09-14 EP EP07848234A patent/EP2059926A2/de not_active Withdrawn
- 2007-09-14 WO PCT/FR2007/001495 patent/WO2008031955A2/fr active Application Filing
- 2007-09-14 US US12/441,293 patent/US20090278851A1/en not_active Abandoned
Non-Patent Citations (1)
Title |
---|
See references of WO2008031955A3 * |
Also Published As
Publication number | Publication date |
---|---|
WO2008031955A2 (fr) | 2008-03-20 |
US20090278851A1 (en) | 2009-11-12 |
WO2008031955A3 (fr) | 2008-06-05 |
FR2906056B1 (fr) | 2009-02-06 |
FR2906056A1 (fr) | 2008-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008031955A2 (fr) | Procede et systeme d'animation d'un avatar en temps reel a partir de la voix d'un interlocuteur | |
US8326596B2 (en) | Method and apparatus for translating speech during a call | |
US20150287403A1 (en) | Device, system, and method of automatically generating an animated content-item | |
KR101628050B1 (ko) | 텍스트 기반 데이터를 애니메이션으로 재생하는 애니메이션 시스템 | |
JP2008529345A (ja) | 個人化メディアの生成及び配布のためのシステム及び方法 | |
JP2014512049A (ja) | 音声対話型メッセージ交換 | |
TW200947422A (en) | Systems, methods, and apparatus for context suppression using receivers | |
US20180315438A1 (en) | Voice data compensation with machine learning | |
FR3071689A1 (fr) | Presentation de communications | |
FR2923928A1 (fr) | Systeme d'interpretation simultanee automatique. | |
US20200211540A1 (en) | Context-based speech synthesis | |
CN113257218B (zh) | 语音合成方法、装置、电子设备和存储介质 | |
US20090201297A1 (en) | Electronic device with animated character and method | |
JP2022020659A (ja) | 通話中の感情を認識し、認識された感情を活用する方法およびシステム | |
JP2005078427A (ja) | 携帯端末及びコンピュータ・ソフトウエア | |
US20120013620A1 (en) | Animating Speech Of An Avatar Representing A Participant In A Mobile Communications With Background Media | |
WO2022169534A1 (en) | Systems and methods of handling speech audio stream interruptions | |
CN115312079A (zh) | 信息展示方法、装置、电子设备和计算机可读介质 | |
CN110798393B (zh) | 声纹气泡的展示方法及使用声纹气泡的终端 | |
CN112492400A (zh) | 互动方法、装置、设备以及通信方法、拍摄方法 | |
CN111787986A (zh) | 基于面部表情的语音效果 | |
JP2012518308A (ja) | メッセージングシステム | |
CN111091807A (zh) | 语音合成方法、装置、计算机设备及存储介质 | |
CN114866856B (zh) | 音频信号的处理方法、音频生成模型的训练方法及装置 | |
WO2024001462A1 (zh) | 歌曲播放方法、装置、计算机设备和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20090319 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20130403 |