EP1147516A1 - Systeme d'animation de la bouche mu par la voix - Google Patents

Systeme d'animation de la bouche mu par la voix

Info

Publication number
EP1147516A1
EP1147516A1 EP00900764A EP00900764A EP1147516A1 EP 1147516 A1 EP1147516 A1 EP 1147516A1 EP 00900764 A EP00900764 A EP 00900764A EP 00900764 A EP00900764 A EP 00900764A EP 1147516 A1 EP1147516 A1 EP 1147516A1
Authority
EP
European Patent Office
Prior art keywords
character
animation system
mouth
sample
animation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00900764A
Other languages
German (de)
English (en)
Inventor
Ronald Leslie Major
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bright Spark Technologies Pty Ltd
Original Assignee
Bright Spark Technologies Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bright Spark Technologies Pty Ltd filed Critical Bright Spark Technologies Pty Ltd
Publication of EP1147516A1 publication Critical patent/EP1147516A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Definitions

  • THIS invention relates to an animation system which is voice activated.
  • an animation system which is sound activated, the system comprising:
  • sampling means for sampling the input sound signal
  • processing means for generating a value characteristic of each sample
  • comparison means for comparing each value to a plurality of pre-stored value ranges each corresponding to a predetermined graphic
  • the input sound signal may be an analogue signal and the sampling means may comprise an analogue to digital converter.
  • the processing means may be arranged to generate a value characteristic of each sample which is related to the maximum amplitude of the sample.
  • the predetermined graphic may be, for example, a mouth graphic representing a character's mouth. ln a preferred embodiment of the invention the display means is arranged to display the predetermined graphics superimposed upon a display of an animated character or object.
  • the display means comprises a monitor on which a software generated display window is shown, the animated character and the predetermined graphics being displayed within the display window.
  • the predetermined graphics may be stored in a specified directory on the hard drive of a computer.
  • a plurality of sets of predetermined graphics, each corresponding to a basic expression of an animated character, are stored in respective subdirectories.
  • the system may include a software based user interface for allowing the user to select a desired one of a plurality of character expressions, the system selecting the set of predetermined graphics corresponding to the selected expression.
  • means for allowing the character to perform pre-determined actions or gestures is included.
  • the invention further allows the selection of a variety of camera shots, for example a close-up shot, a medium shot or any other kind of camera shot.
  • the invention includes means for controlling the speed at which the value characteristic of each sample is generated.
  • Figure 1 is a schematic block diagram showing the major components of the live performance animation system according to the invention.
  • Figure 2 is a schematic flow chart showing the method used in the voice engine component of the invention.
  • Figure 3 is a graphical representation of the selection method used in determining which mouth position is to be displayed
  • Figure 4 shows the various mouth positions which may be displayed, as well as the associated letter or sounds
  • Figure 5 shows the character display window component of the invention
  • Figure 6 shows the user interface component of the invention
  • Figure 7 is a schematic flow chart showing the routine followed when the user interface component of the invention is initiated.
  • Figure 8 is a schematic flow chart showing the relationship between the voice engine component and the user interface component.
  • FIG. 9 is a schematic illustration of the directory arrangement employed by the invention. DESCRIPTION OF PREFERRED EMBODIMENTS
  • a graphic animation system 10 of the invention comprises a voice engine 12 with a headset microphone 14 and an analogue to digital converter 16 which is connected to a processor 18.
  • the system further comprises a user interface 20 as well as a character display interface using a monitor 22.
  • the voice engine is connected to the microphone 14 into which the user speaks, with the resulting continuous analogue speech signal from the microphone then being amplified by a pre-amplifier 24.
  • the continuous speech signal f(t) is sampled by means of the analogue to digital converter 16, at a sampling rate of 16 kHz, resulting in a digital sampled speech signal f(n).
  • the sampled speech signal f(n) is then multiplied by a Hamming window w(n) which is defined below, in which N is the number of samples and n is the sample number:
  • the resulting weighted signal F(n) is stored in an array called input (n).
  • a Discrete Fast Fourier Transform achieved via the Radix-2 method, is then performed on the weighted signal F(n) resulting in an array of complex Fourier coefficients f(k).
  • the magnitude of each sample's complex coefficients is calculated using the following formula:
  • the maximum magnitude and corresponding sample number n are then found. This n is then compared to a stored set of previously derived ranges for n and the set that has the lowest comparative variance is then determined. This result governs which of a plurality of possible predetermined mouth positions corresponds to that particular sample of the incoming speech signal.
  • the predetermined ranges for n and corresponding mouth values are shown in Figure 3.
  • the actual graphic mouth representations (mouth graphics) corresponding to the various mouth values are shown in Figure 4, from which it may be seen that the user's speech pattern is broken up into nine possible mouth positions which are then displayed to give the illusion of animated speech. The result of this is that as the user speaks into the microphone, an animated character is able to mimic the user's speech with real time lip or mouth synchronisation by superimposing the resultant sequence of mouth graphics on a graphic representation of the character.
  • a typical character display window 26 appearing on the character display monitor 22 is shown in Figure 5.
  • the character shown in the display window 26 in Figure 5 is a two-dimensional image of a person, it will be appreciated that the character can also be three-dimensional, with there being no limitation on the animation style or the design of the character used. It will also be appreciated that the "character” need not be a human or humanoid character at all, but could be any object which is made to "speak".
  • the window 26 comprises an eye picture box 28, a mouth picture box 30 as well as a body picture box 32.
  • the mouth picture box 30 displays the selected mouth position corresponding to the sample of the input speech signal, according to the output of the voice engine.
  • the eye and body picture boxes 28, 30 display expressions and/or actions which the user has assigned to the character, as will be described further below with reference to the user interface.
  • the character display window 26 further comprises a "blink timer" 34 which is a timer object which waits for three seconds and then triggers an event. On this trigger event, five bitmap files are displayed in the eye picture box 28, one after the other, to give the impression that the character is blinking
  • the user interface 34 of the invention allows the user to control the character. If the user wants to change the expression of the character, for example to neutral, happy, angry etc., he or she would click the relevant icon in the expressions box 36.
  • the ability to change expressions is made possible in that for each expression there are provided all nine frames needed for the different mouth positions, adapted for the different expressions. These sets of frames are each stored in a separate directory, and when the user clicks on one of the expression buttons, the software changes to the corresponding directory and loads the nine new images needed.
  • the user wants the character to perform one of the pre-animated sequences of actions, he or she would click the relevant icon on the actions box 38.
  • All images are stored in either Windows Bitmap (BMP), CompuServe Gif (GIF), Joint Picture Experts Group (JPG) or Windows Metafile (WMF) format, which are decoded by appropriate decompression routines within the software.
  • BMP Windows Bitmap
  • GIF CompuServe Gif
  • JPG Joint Picture Experts Group
  • WMF Windows Metafile
  • the system reverts to all of the default settings, and the character display window 26 is opened.
  • the user interface 34 includes a timer 42 which runs continuously and processes the incoming value from the voice engine as is shown in Figure 8.
  • the system first checks to see if any actions are currently running. If the result is "NO” then the application takes the value obtained by the voice engine and compares it to the set of stored values, as described earlier. Based upon the result of this comparison, the relevant mouth graphic bitmap file is loaded and displayed in the mouth picture box 30 of the character display window 26. If, on the other hand, the result of the check in Figure 8 is "YES" which means that an action is currently playing, no further processing takes place.
  • the graphic bitmap files are relatively small, they load and display relatively quickly giving the illusion of real time animation.
  • the rapid change of expressions is achieved by exploiting the character's directory structure on the drive, which is shown in Figure 9.
  • the drive includes a character's base directory having an expressions sub-directory on level B. Within this sub-directory, further sub-directories on level C are provided for each possible expression.
  • the invention further provides for three different camera positions on level D, typically a close up, a medium shot and a long shot.
  • a further sub-directory on level E is created which contains the direction in which the character is looking.
  • a further sub-directory on level F contains the actual bitmap files representing each mouth position.
  • the system includes a speech speed control, shown in Figure 6, which is in the form of a horizontal slider with a range of 1 to 100.
  • the setting of this slider will decide the speed at which the voice engine value is interpreted. If the speed is increased from say 10 to 30, the timer object's value would change, which would have the visual effect that the character's speech would be slower, and vice versa. This value may thus be adjusted to present a particular artistic style.
  • the dominant feature of the present invention is thus its unique ability to convert human speech into graphically represented character speech in real-time or near real-time. It further allows the user the opportunity of manipulating the character in order to obtain the desired animation.
  • the invention is thus a real time animation system which is positioned between conventional animation software and motion capture.
  • the invention allows a single user to control an animated character in real time by speaking into a microphone and triggering gestures and actions on the fly.
  • the voice signal manually to the generated image since, because of the method used by the invention, it could be said that the audio signal is automatically synchronised with the visual images.
  • the main advantage of the invention is that the animated character mimics the operator with real time lip synch which is voice driven. Since the system is mainly software based, no motion capture devices are required, which greatly simplifies implementation of the present invention.
  • the character may thus be any two-dimensional or a three-dimensional image, including human or non-human characters or objects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Cette invention se rapporte à un système d'animation qui comprend un moteur vocal traitant des signaux d'entrée audio, généralement des signaux de parole, et les convertissant en un signal numérique en vue de leur traitement. Le signal numérique est analysé pour produire une valeur caractéristique de chaque échantillon du signal d'entrée, cette valeur étant mise en relation avec l'amplitude maximum de l'échantillon. Le moteur vocal compare chaque valeur ainsi obtenue au nombre de plages de valeurs prédéterminées possibles correspondant à un graphique prédéterminé montrant une position de la bouche et apparie ainsi le signal de parole entré avec une variété de positions possibles de la bouche. Les graphiques de la bouche sont superposés sur une image d'un personnage essentiellement en temps réel, de façon à créer un affichage animé d'un personnage dont la bouche est synchronisée sur le signal de parole entré.
EP00900764A 1999-01-27 2000-01-25 Systeme d'animation de la bouche mu par la voix Withdrawn EP1147516A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
ZA99602 1999-01-27
ZA9900602 1999-01-27
PCT/IB2000/000067 WO2000045380A1 (fr) 1999-01-27 2000-01-25 Systeme d'animation de la bouche mu par la voix

Publications (1)

Publication Number Publication Date
EP1147516A1 true EP1147516A1 (fr) 2001-10-24

Family

ID=25587535

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00900764A Withdrawn EP1147516A1 (fr) 1999-01-27 2000-01-25 Systeme d'animation de la bouche mu par la voix

Country Status (3)

Country Link
EP (1) EP1147516A1 (fr)
AU (1) AU3069200A (fr)
WO (1) WO2000045380A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831424A (zh) * 2018-06-15 2018-11-16 广州酷狗计算机科技有限公司 音频拼接方法、装置及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL1020733C2 (nl) * 2002-05-31 2004-01-13 Isioux B V Werkwijze en systeem voor het vervaardigen van een animatie, alsmede een computerprogramma voor het vervaardigen en het afspelen van een animatie gemaakt volgens de werkwijze.

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4913539A (en) * 1988-04-04 1990-04-03 New York Institute Of Technology Apparatus and method for lip-synching animation
GB9019829D0 (en) * 1990-09-11 1990-10-24 British Telecomm Speech analysis and image synthesis
US5537662A (en) * 1992-05-29 1996-07-16 Casio Computer Co., Ltd. Electronic montage composing apparatus
US5426460A (en) * 1993-12-17 1995-06-20 At&T Corp. Virtual multimedia service for mass market connectivity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0045380A1 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831424A (zh) * 2018-06-15 2018-11-16 广州酷狗计算机科技有限公司 音频拼接方法、装置及存储介质

Also Published As

Publication number Publication date
AU3069200A (en) 2000-08-18
WO2000045380A9 (fr) 2001-07-26
WO2000045380A1 (fr) 2000-08-03

Similar Documents

Publication Publication Date Title
US6208359B1 (en) Systems and methods for communicating through computer animated images
KR102035596B1 (ko) 인공지능 기반의 가상 캐릭터의 페이셜 애니메이션 자동 생성 시스템 및 방법
JP2002150317A (ja) 映像表示装置
JPH06342365A (ja) 線型音声制御方法及び装置
JPH11219446A (ja) 映像音響再生システム
US20030040916A1 (en) Voice driven mouth animation system
JPH02234285A (ja) 画像合成方法及びその装置
WO2008087621A1 (fr) Appareil et procédé d'animation d'objets virtuels à répondant émotionnel
WO2020129959A1 (fr) Programme informatique, dispositif serveur, dispositif terminal et procédé d'affichage
Waters et al. An automatic lip-synchronization algorithm for synthetic faces
US6577998B1 (en) Systems and methods for communicating through computer animated images
JP2019136797A (ja) コミュニケーション装置およびその制御プログラム
CN112512649A (zh) 用于提供音频和视频效果的技术
JP4599606B2 (ja) 頭部動作自動生成のための頭部動作学習装置及び頭部動作合成装置並びにコンピュータプログラム
JP3978506B2 (ja) 楽音生成方法
EP1147516A1 (fr) Systeme d'animation de la bouche mu par la voix
CN112492400B (zh) 互动方法、装置、设备以及通信方法、拍摄方法
JPH08123977A (ja) アニメーションシステム
JP4254400B2 (ja) 画像生成装置およびその画像生成方法、ならびにコンピュータ読み取り可能な記録媒体
JP7152908B2 (ja) 仕草制御装置及び仕草制御プログラム
JPH10143151A (ja) 指揮装置
US11323662B2 (en) Special effects communication techniques
WO2023167212A1 (fr) Programme informatique, procédé et dispositif de traitement d'informations
JP2005189846A (ja) 音声制御スクリーンシステム
JP2003296753A (ja) 聴覚障害者用対話システム

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010824

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030801