WO2010081395A1 - Procédé et dispositif vocaux entraînant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale - Google Patents

Procédé et dispositif vocaux entraînant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale Download PDF

Info

Publication number
WO2010081395A1
WO2010081395A1 PCT/CN2010/070026 CN2010070026W WO2010081395A1 WO 2010081395 A1 WO2010081395 A1 WO 2010081395A1 CN 2010070026 W CN2010070026 W CN 2010070026W WO 2010081395 A1 WO2010081395 A1 WO 2010081395A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
shape
lip shape
model
motion amplitude
Prior art date
Application number
PCT/CN2010/070026
Other languages
English (en)
Chinese (zh)
Inventor
路依莎
王建宇
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=40880071&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2010081395(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to RU2011124736/08A priority Critical patent/RU2487411C2/ru
Priority to BRPI1006026A priority patent/BRPI1006026B1/pt
Priority to MX2011006703A priority patent/MX2011006703A/es
Priority to CA2744347A priority patent/CA2744347C/fr
Publication of WO2010081395A1 publication Critical patent/WO2010081395A1/fr
Priority to US13/117,244 priority patent/US8350859B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Definitions

  • the present invention relates to video animation technology, and more particularly to a method and apparatus for changing lip shape and acquiring lip animation in a voice-driven animation. Background technique
  • Interactive Voice Response is a product type based on sound propagation.
  • IVRs need to be technologically and conceptually innovative: giving sound better expressiveness, and video animation technology can meet this need.
  • Video animation technology uses a mobile phone or web page as a platform to assist the sound with a custom personalized video animation, giving the sound a more vivid expression.
  • the present invention provides a method and apparatus for changing a lip shape in a voice-driven animation, and acquiring a lip animation.
  • the present invention provides a method for changing a shape of a lip in a voice-driven animation, comprising: acquiring an audio signal, and obtaining a ratio of a motion amplitude of a lip shape according to a feature of the audio signal; Obtaining an initial lip shape model input by the user, and generating a motion amplitude value of the lip shape according to the initial lip shape model and the motion amplitude ratio of the lip shape;
  • a set of varying lip shape mesh models is generated based on the magnitude of the motion of the lip shape and a library of pre-established lip pronunciation models.
  • the present invention provides an apparatus for changing a shape of a lip in a voice-driven animation, comprising: an acquisition module, configured to acquire an audio signal, and obtain a lip-shaped motion amplitude ratio according to a characteristic of the audio signal;
  • a first generating module configured to acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape;
  • a second generating module configured to generate a changed lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library.
  • the lip shape change is realized by the voice-driven model library, and the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • the invention also provides a method for obtaining a lip animation, comprising:
  • a lip animation is generated based on the varying set of lip shape mesh models.
  • An embodiment of the present invention provides an apparatus for acquiring an animation of a lip, including:
  • An acquisition module configured to acquire an audio signal, and obtain a lip shape according to characteristics of the audio signal ⁇ The ratio of the magnitude of the large movement;
  • a first generating module configured to acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape;
  • a second generating module configured to generate a changed lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library
  • a third generating module configured to generate a lip animation according to the changed lip shape mesh model set.
  • the lip shape change is realized by the voice driving, and the lip shape is obtained by the voice driving.
  • the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • FIG. 1 is a flow chart of a method for changing a shape of a lip in a voice-driven animation according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram showing a ratio of the number of video frames and the motion amplitude of a lip shape according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic diagram of a lip pronunciation model library provided by Embodiment 1 of the present invention.
  • FIG. 4 is a flow chart of a method for acquiring a lip animation according to Embodiment 2 of the present invention
  • FIG. 5 is a structural diagram of an apparatus for changing a lip shape in a voice-driven animation according to Embodiment 3 of the present invention
  • 6 is a structural diagram of an apparatus for changing a shape of a lip in another voice-driven animation according to Embodiment 3 of the present invention
  • Figure 7 is a structural diagram of an apparatus for changing a shape of a lip in a third voice-driven animation provided in Embodiment 3 of the present invention.
  • FIG. 8 is a structural diagram of an apparatus for acquiring a lip animation according to Embodiment 4 of the present invention.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS In order to make the objects, technical solutions, and advantages of the present invention more comprehensible, the embodiments of the present invention will be further described in detail below. It is apparent that the described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
  • Embodiment 1 of the present invention provides a method for voice-driven lip change, as shown in FIG. 1, including:
  • Step 101 Acquire an audio signal, and obtain a motion amplitude ratio of the lip shape according to the characteristics of the audio signal.
  • the step of obtaining the ratio of the motion amplitude of the lip shape according to the characteristics of the audio signal includes:
  • Step 101A traversing the audio signal to obtain a maximum sampled data value maxSample Value of the audio signal.
  • Step 101B Perform window and group division on the audio signal, obtain an average array avgGroup of sample data values in each group, and obtain an array of maximum values in the average of sample data values of all groups in each window, windowPeak.
  • a syllable is a natural unit of speech. Specifically, in Chinese, each syllable corresponds to a shape of a lip. When hooking, it takes about 200 to 300 milliseconds to complete a syllable. There will also be a phonetic change during each syllable pronunciation period, so it is necessary to divide the syllables again and subdivide the sounds. Prime.
  • the acquired audio signal can be divided into windows according to a certain length, and one window corresponds to one syllable; in each window, the group is divided according to a certain length, and one group corresponds to one phoneme.
  • the average value of the sampled data values in each group obtained is the sum of all sampled data values in the group divided by groupLen, and the average value is placed in the average array avgGroup; and all the sampled data values in each window are obtained.
  • the maximum value in the average array avgGroup is placed in the maximum value windowPeak.
  • the audio signal may be denoised when the audio signal is acquired.
  • Step 101C Acquire a maximum motion amplitude value of a lip shape corresponding to the current window according to the obtained maximum value array windowPeak and the obtained maximum sampling data value.
  • Step 101D Obtain a motion amplitude ratio of a lip shape of each video frame corresponding to the current window according to a maximum motion amplitude value of the lip shape corresponding to the current window.
  • the video sampling rate defaults to 30 frames/second, and the user can also perform 4 tampering according to requirements; j ranges from 0 to frameNumber/2, and then decreases from frameNumber/2 to 0. , as shown in picture 2.
  • Step 102 Acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape.
  • the magnitude of the motion amplitude of the lip shape is divided into: the amplitude value of the lip shape motion in the vertical direction and the horizontal direction; the motion amplitude value of the lip shape in the horizontal direction is calculated as length*scaleForFrame[k], and the vertical direction is 1 r ⁇ lip shape
  • Step 103 Generate a modified lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library.
  • the lip pronunciation model library is based on the pronunciation characteristics of Chinese.
  • Chinese the word consists of the initial and the final, and the focus of the lip shape is placed on the pronunciation of the final.
  • the finals are divided into single vowels, complex vowels and nasal vowels.
  • the pronunciation of a single final is composed of one vowel, and the shape of the lips is constant from beginning to end; the complex final is composed of two or three vowels, the sound is gradual, and the shape of the lips is also gradual; and the effect of the nasal vowel on the shape of the lips Not big. Therefore, when establishing a pronunciation model for the shape of the lips, the pronunciation characteristics of the pronunciation of the single finals are mainly used.
  • the pronunciation of a single vowel is: "Ah, oh, goose, clothes, house, yu", in which the "house” and “yu” lips are similar in shape, so a kind of synthetic; “goose” and “cloth” lip shape comparison Similar, so synthesize one; finally get a lip pronunciation model composed of four lip pronunciation models to represent the lip shape of the single final, as shown in Figure 3, the lip pronunciation model library must include: a primitive lip model and the basis of this model Various lip pronunciation models built on the above principles. It should be noted that the lip pronunciation model library is not limited. The lip pronunciation model of the above four single finals can be different according to the pronunciation characteristics of the language. The lip pronunciation model in the lip pronunciation model library can also be different. For example, according to the pronunciation specificity of the English, the lip pronunciation model can have a simulated English vowel. "aeiou" pronunciation features a lip pronunciation model.
  • the step of generating a varying lip shape mesh model set based on the pre-established lip pronunciation model library and the motion amplitude values of the lip shape comprises:
  • Step 103 A randomly selecting a lip pronunciation model from the pre-established lip pronunciation model library as the original pronunciation model of the current lip shape.
  • Step 103B Acquire an original lip model of the original pronunciation model vertex and the lip pronunciation model library, and calculate an offset ratio of each vertex of the original pronunciation model, and the calculation method is: the vertex of the original pronunciation model relative to the vertex of the original lip model in the lip pronunciation model library.
  • Step 103C Multiply the offset ratios of the vertices of the original pronunciation model by the lip shape motion amplitude values of the current frame to obtain the vertex offset of the current frame.
  • Step 103D In the initial lip shape model of the acquired user input, respectively accumulate the vertex offset of the current frame to obtain a lip shape model of the current frame.
  • step 103E the lip shape models of all the frames are arranged in the corresponding audio order to generate a set of varying lip shape mesh models.
  • the lip shape change is realized by the voice-driven model library, and the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • Embodiment 2 of the present invention provides a method for acquiring a lip animation. As shown in FIG. 4, the method includes the following steps: Step 201: Acquire an audio signal, and obtain a motion shape of a lip shape according to characteristics of the audio signal. Degree ratio.
  • Step 201 is the same as step 101, and details are not described herein again.
  • Step 202 Acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape.
  • Step 202 is the same as step 102, and details are not described herein again.
  • Step 203 Generate a changed lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library.
  • Step 203 is the same as step 103, and details are not described herein again.
  • Step 204 Generate a lip animation according to the changed lip shape mesh model set.
  • lip animation can be obtained by ordinary interpolation techniques.
  • the lip shape change is realized by the voice driving, and the lip shape is obtained by the voice driving.
  • the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • Embodiment 3 of the present invention provides a device for changing a shape of a lip in a voice-driven animation, as shown in FIG. 5, comprising:
  • the obtaining module 501 is configured to acquire an audio signal, and obtain a lip-shaped motion amplitude ratio according to the feature of the audio signal;
  • a first generating module 502 configured to acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape;
  • the second generating module 503 is configured to generate a changed lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library.
  • the obtaining module 501 includes:
  • the first obtaining module 5011 is configured to traverse the audio signal to obtain a maximum sampled data value.
  • the second obtaining module 5012 is configured to perform window and group division on the audio signal, obtain an average array avgGroup of sampled data values in each group, and obtain an average array avgGroup of sampled data values of all groups in each window. The largest array in the windowPeak;
  • the third obtaining module 5013 is configured to obtain, according to the obtained maximum value array windowPeak and the obtained maximum sampled data value, a maximum motion amplitude value of the lip shape corresponding to the current window i.
  • the fourth obtaining module 5014 is configured to correspond to the current window i.
  • the maximum motion amplitude of the lip shape obtains the proportion of the lip shape motion amplitude of the current frame video corresponding to the current window i.
  • the second obtaining module 5012 includes:
  • a fifth obtaining module configured to obtain an average value of all the group sampled data values in the current window i; a sixth obtaining module, configured to obtain a maximum value of the average value of the sampled data values windowPeak[i]; a seventh obtaining module, configured to Calculate the ratio scale [i] of windowPeak[i] and the largest audio sample data value maxSample Value;
  • An eighth obtaining module configured to calculate a maximum motion amplitude value of a lip shape corresponding to the current window i
  • the first generating module 502 generates a motion amplitude value of the lip shape according to the initial lip shape model and the motion amplitude ratio of the lip shape, including:
  • the second generating module 503 includes:
  • a selection module 5031 configured to randomly select a lip pronunciation model from the pre-established lip pronunciation model library as the original pronunciation model of the current lip shape
  • a ninth obtaining module 5032 configured to obtain an original lip model of the original pronunciation model vertex and a lip pronunciation model library, and calculate an offset ratio of each vertex of the original pronunciation model;
  • a tenth obtaining module 5033 configured to multiply the offset ratios of the vertices of the original pronunciation model by the lip shape motion amplitude value of the current frame to obtain a vertex offset of the current frame;
  • An eleventh obtaining module 5034 configured to accumulate a vertex offset of the current frame on the initial lip shape model of the obtained user input, to obtain a lip shape model of the current frame;
  • a model set generation module 5035 is configured to arrange lip shape models of all frames to generate a set of varying lip shape mesh models.
  • the ninth obtaining module 5032 calculates an offset ratio of each vertex of the original pronunciation model, including:
  • the obtaining module 501 is further configured to perform denoising processing on the audio signal.
  • the obtaining module 501 obtains an audio signal, and the detailed process of obtaining the motion amplitude ratio of the lip shape according to the characteristics of the audio signal can be referred to step 101 in Embodiment 1.
  • the first generation module 502 acquires an initial lip shape model, and generates a lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape.
  • the first generation module 502 acquires an initial lip shape model, and generates a lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape.
  • the detailed process of generating the changed lip shape mesh model set by the second generation module 503 according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library may refer to step 103 in Embodiment 1. .
  • the lip shape change is realized by the voice-driven model library, and the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • the embodiment 4 of the present invention provides an apparatus for acquiring a lip animation, as shown in FIG. 8, comprising: an obtaining module 601, configured to acquire an audio signal, and obtain a lip-shaped motion amplitude ratio according to a characteristic of the audio signal;
  • a first generating module 602 configured to acquire an initial lip shape model input by the user, and generate a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape;
  • a second generating module 603, configured to generate a changed lip shape mesh model set according to the obtained motion amplitude value of the lip shape and the pre-established lip pronunciation model library;
  • the third generation module 604 is configured to generate a lip animation according to the changed lip shape mesh model set.
  • the obtaining module 601, the first generating module 602, and the second generating module 603 are respectively equivalent to the obtaining module, the first generating module, and the second generating module in Embodiment 3, and details are not described herein.
  • the acquisition module 601 obtains an audio signal, and the detailed process of obtaining the motion amplitude ratio of the lip shape according to the characteristics of the audio signal can be referred to step 101 in Embodiment 1.
  • the first generation module 602 acquires an initial lip shape model, and a detailed process of generating a motion amplitude value of the lip shape according to the initial lip shape model and the obtained motion amplitude ratio of the lip shape can be seen in Embodiment 1 Step 102 in.
  • the second generation module 603 is based on the obtained motion amplitude of the lip shape.
  • the lip shape change is realized by the voice driving, and the lip shape is obtained by the voice driving.
  • the technical solution provided by the embodiment of the present invention is simple and low in cost.
  • the technical solutions of the embodiments 1-4 of the present invention can be applied to, but not limited to, a terminal video animation or an entertainment web video animation field, and are applicable not only to Chinese but also to English, French, or other languages.
  • the technical solutions of the inventive embodiments 1-4 are only taken in the example of Chinese, and the processing of other languages is similar, and will not be described again.
  • the initial lip shape model input by the user may be from a face, an animal face, a cartoon image, etc.; the audio signal is also user-defined, such as an audio signal that is normally spoken, sung, or specially processed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un procédé vocal entraînant la modification de l'expression labiale dans une animation, ainsi qu'un procédé pour obtenir une animation labiale, ces procédés appartenant au domaine informatique. Le procédé vocal entraînant la modification de l'expression labiale dans une animation comprend : l'acquisition d'un signal audio, et l'acquisition de l'échelle d'amplitude de mouvement de l'expression labiale selon les caractères du signal audio; l'acquisition du modèle original de l'expression labiale et la génération de la valeur d'amplitude de mouvement de l'expression labiale, selon le modèle original de l'expression labiale et l'échelle d'amplitude du mouvement de l'expression labiale; et la génération d'un ensemble de modèle de quadrillage de l'expression labiale selon une base de données du modèle de prononciation labiale préétabli. Le dispositif vocal entraînant la modification de l'expression labiale comprend : un module d'acquisition, un premier module de génération, et un second module de génération. L'invention concerne également un dispositif et un procédé pour obtenir l'animation labiale. La solution technique proposée par l'invention se compose d'algorithme simple et est peu coûteuse.
PCT/CN2010/070026 2009-01-19 2010-01-05 Procédé et dispositif vocaux entraînant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale WO2010081395A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
RU2011124736/08A RU2487411C2 (ru) 2009-01-19 2010-01-05 Способ и устройство для изменения формы губ и получения анимации губ в управляемой голосом анимации
BRPI1006026A BRPI1006026B1 (pt) 2009-01-19 2010-01-05 método e aparelho para alteração do formato de lábio e obtenção de animação de lábio em animação acionada por voz
MX2011006703A MX2011006703A (es) 2009-01-19 2010-01-05 Metodo y aparato para cambiar la forma de los labios y obtener animacion de los labios en animacion estimulada por voz.
CA2744347A CA2744347C (fr) 2009-01-19 2010-01-05 Procede et dispositif vocaux entrainant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale
US13/117,244 US8350859B2 (en) 2009-01-19 2011-05-27 Method and apparatus for changing lip shape and obtaining lip animation in voice-driven animation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2009100030839A CN101482976B (zh) 2009-01-19 2009-01-19 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置
CN200910003083.9 2009-01-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/117,244 Continuation US8350859B2 (en) 2009-01-19 2011-05-27 Method and apparatus for changing lip shape and obtaining lip animation in voice-driven animation

Publications (1)

Publication Number Publication Date
WO2010081395A1 true WO2010081395A1 (fr) 2010-07-22

Family

ID=40880071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/070026 WO2010081395A1 (fr) 2009-01-19 2010-01-05 Procédé et dispositif vocaux entraînant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale

Country Status (7)

Country Link
US (1) US8350859B2 (fr)
CN (1) CN101482976B (fr)
BR (1) BRPI1006026B1 (fr)
CA (1) CA2744347C (fr)
MX (1) MX2011006703A (fr)
RU (1) RU2487411C2 (fr)
WO (1) WO2010081395A1 (fr)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101482976B (zh) * 2009-01-19 2010-10-27 腾讯科技(深圳)有限公司 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置
CN102054287B (zh) * 2009-11-09 2015-05-06 腾讯科技(深圳)有限公司 面部动画视频生成的方法及装置
CN102368198A (zh) * 2011-10-04 2012-03-07 上海量明科技发展有限公司 通过嘴唇图像进行信息提示的方法及系统
CN110164437B (zh) * 2012-03-02 2021-04-16 腾讯科技(深圳)有限公司 一种即时通信的语音识别方法和终端
CN104392729B (zh) * 2013-11-04 2018-10-12 贵阳朗玛信息技术股份有限公司 一种动画内容的提供方法及装置
CN103705218B (zh) * 2013-12-20 2015-11-18 中国科学院深圳先进技术研究院 构音障碍识别的方法、系统和装置
CN104298961B (zh) * 2014-06-30 2018-02-16 中国传媒大学 基于口型识别的视频编排方法
CN106203235B (zh) * 2015-04-30 2020-06-30 腾讯科技(深圳)有限公司 活体鉴别方法和装置
CN104869326B (zh) * 2015-05-27 2018-09-11 网易(杭州)网络有限公司 一种配合音频的图像显示方法和设备
CN105405160B (zh) * 2015-10-14 2018-05-01 佛山精鹰传媒股份有限公司 一种简单规则模型变化效果的实现方法
CN105632497A (zh) * 2016-01-06 2016-06-01 昆山龙腾光电有限公司 一种语音输出方法、语音输出系统
CN107808191A (zh) * 2017-09-13 2018-03-16 北京光年无限科技有限公司 虚拟人多模态交互的输出方法和系统
US10586368B2 (en) * 2017-10-26 2020-03-10 Snap Inc. Joint audio-video facial animation system
US10635893B2 (en) * 2017-10-31 2020-04-28 Baidu Usa Llc Identity authentication method, terminal device, and computer-readable storage medium
CN108538308B (zh) * 2018-01-09 2020-09-29 网易(杭州)网络有限公司 基于语音的口型和/或表情模拟方法及装置
US10657972B2 (en) * 2018-02-02 2020-05-19 Max T. Hall Method of translating and synthesizing a foreign language
CN108538282B (zh) * 2018-03-15 2021-10-08 上海电力学院 一种由唇部视频直接生成语音的方法
WO2019219968A1 (fr) * 2018-05-18 2019-11-21 Deepmind Technologies Limited Reconnaissance visuelle de la parole par prédiction d'un phonème
CN108847234B (zh) * 2018-06-28 2020-10-30 广州华多网络科技有限公司 唇语合成方法、装置、电子设备及存储介质
CN108986191B (zh) * 2018-07-03 2023-06-27 百度在线网络技术(北京)有限公司 人物动作的生成方法、装置及终端设备
US11568864B2 (en) * 2018-08-13 2023-01-31 Carnegie Mellon University Processing speech signals of a user to generate a visual representation of the user
CN111953922B (zh) * 2019-05-16 2022-05-27 南宁富联富桂精密工业有限公司 视频会议的人脸辨识方法、服务器及计算机可读存储介质
CN110277099A (zh) * 2019-06-13 2019-09-24 北京百度网讯科技有限公司 基于语音的嘴型生成方法和装置
CN111415677B (zh) * 2020-03-16 2020-12-25 北京字节跳动网络技术有限公司 用于生成视频的方法、装置、设备和介质
CN113240781A (zh) * 2021-05-20 2021-08-10 东营友帮建安有限公司 基于语音驱动及图像识别的影视动画制作方法、系统
CN113506563A (zh) * 2021-07-06 2021-10-15 北京一起教育科技有限责任公司 一种发音识别的方法、装置及电子设备
CN115222856B (zh) * 2022-05-20 2023-09-26 一点灵犀信息技术(广州)有限公司 表情动画生成方法及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731833A (zh) * 2005-08-23 2006-02-08 孙丹 语音驱动头部图像合成影音文件的方法
JP2006162760A (ja) * 2004-12-03 2006-06-22 Yamaha Corp 語学学習装置
CN1936889A (zh) * 2005-09-20 2007-03-28 文化传信科技(澳门)有限公司 动画生成系统以及方法
CN101482976A (zh) * 2009-01-19 2009-07-15 腾讯科技(深圳)有限公司 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5426460A (en) * 1993-12-17 1995-06-20 At&T Corp. Virtual multimedia service for mass market connectivity
US5657426A (en) * 1994-06-10 1997-08-12 Digital Equipment Corporation Method and apparatus for producing audio-visual synthetic speech
US6737572B1 (en) * 1999-05-20 2004-05-18 Alto Research, Llc Voice controlled electronic musical instrument
US6654018B1 (en) * 2001-03-29 2003-11-25 At&T Corp. Audio-visual selection process for the synthesis of photo-realistic talking-head animations
CN1320497C (zh) * 2002-07-03 2007-06-06 中国科学院计算技术研究所 基于统计与规则结合的语音驱动人脸动画方法
RU2358319C2 (ru) * 2003-08-29 2009-06-10 Самсунг Электроникс Ко., Лтд. Способ и устройство для фотореалистического трехмерного моделирования лица на основе изображения
CN100476877C (zh) * 2006-11-10 2009-04-08 中国科学院计算技术研究所 语音和文本联合驱动的卡通人脸动画生成方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006162760A (ja) * 2004-12-03 2006-06-22 Yamaha Corp 語学学習装置
CN1731833A (zh) * 2005-08-23 2006-02-08 孙丹 语音驱动头部图像合成影音文件的方法
CN1936889A (zh) * 2005-09-20 2007-03-28 文化传信科技(澳门)有限公司 动画生成系统以及方法
CN101482976A (zh) * 2009-01-19 2009-07-15 腾讯科技(深圳)有限公司 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置

Also Published As

Publication number Publication date
CN101482976B (zh) 2010-10-27
CN101482976A (zh) 2009-07-15
BRPI1006026A2 (pt) 2016-05-10
CA2744347C (fr) 2014-02-25
RU2487411C2 (ru) 2013-07-10
US8350859B2 (en) 2013-01-08
US20110227931A1 (en) 2011-09-22
BRPI1006026A8 (pt) 2017-10-10
RU2011124736A (ru) 2013-02-27
CA2744347A1 (fr) 2010-07-22
MX2011006703A (es) 2011-07-28
BRPI1006026B1 (pt) 2020-04-07

Similar Documents

Publication Publication Date Title
WO2010081395A1 (fr) Procédé et dispositif vocaux entraînant la modification de l'expression labiale dans une animation et acquisition de l'animation labiale
US9361722B2 (en) Synthetic audiovisual storyteller
US7636662B2 (en) System and method for audio-visual content synthesis
CN112562722A (zh) 基于语义的音频驱动数字人生成方法及系统
Xie et al. Realistic mouth-synching for speech-driven talking face using articulatory modelling
CN112001992A (zh) 基于深度学习的语音驱动3d虚拟人表情音画同步方法及系统
CN113256821B (zh) 一种三维虚拟形象唇形生成方法、装置及电子设备
CN113744755A (zh) 一种从音频信号生成语音动画的装置及方法
CN106327555A (zh) 一种获得唇形动画的方法及装置
Alexander et al. A modular architecture for articulatory synthesis from gestural specification
CN117275485B (zh) 一种音视频的生成方法、装置、设备及存储介质
CN116912375A (zh) 面部动画生成方法、装置、电子设备及存储介质
CN109525787B (zh) 面向直播场景的实时字幕翻译及系统实现方法
CN116366872A (zh) 基于中之人和人工智能的直播方法、装置及系统
JP4631077B2 (ja) アニメーション作成装置
CN115223224A (zh) 数字人说话视频生成方法、系统、终端设备及介质
CN113362432B (zh) 一种面部动画生成方法及装置
CN114999440A (zh) 虚拟形象生成方法、装置、设备、存储介质以及程序产品
CN113990295A (zh) 一种视频生成方法和装置
Craig et al. A linear model of acoustic-to-facial mapping: Model parameters, data set size, and generalization across speakers
CN112331184A (zh) 语音口型同步方法、装置、电子设备及存储介质
Ra et al. Visual-to-speech conversion based on maximum likelihood estimation
CN114255307A (zh) 虚拟人脸的控制方法、装置、设备及存储介质
CN114972589A (zh) 虚拟数字形象的驱动方法及其装置
CN117975991A (zh) 基于人工智能的数字人驱动方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10731029

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2744347

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: MX/A/2011/006703

Country of ref document: MX

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011124736

Country of ref document: RU

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC, EPO FORM 1205A DATED 14.12.2011.

122 Ep: pct application non-entry in european phase

Ref document number: 10731029

Country of ref document: EP

Kind code of ref document: A1

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: PI1006026

Country of ref document: BR

ENP Entry into the national phase

Ref document number: PI1006026

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20110624