JP7500582B2 - 発話アニメーションのリアルタイム生成 - Google Patents
発話アニメーションのリアルタイム生成 Download PDFInfo
- Publication number
- JP7500582B2 JP7500582B2 JP2021541507A JP2021541507A JP7500582B2 JP 7500582 B2 JP7500582 B2 JP 7500582B2 JP 2021541507 A JP2021541507 A JP 2021541507A JP 2021541507 A JP2021541507 A JP 2021541507A JP 7500582 B2 JP7500582 B2 JP 7500582B2
- Authority
- JP
- Japan
- Prior art keywords
- animation
- speech
- phoneme
- snippets
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims description 45
- 210000003205 muscle Anatomy 0.000 claims description 23
- 230000036961 partial effect Effects 0.000 claims description 2
- 239000012634 fragment Substances 0.000 claims 1
- 230000001815 facial effect Effects 0.000 description 37
- 230000006870 function Effects 0.000 description 25
- 230000008921 facial expression Effects 0.000 description 18
- 210000004709 eyebrow Anatomy 0.000 description 14
- 230000004913 activation Effects 0.000 description 12
- 238000001994 activation Methods 0.000 description 12
- 230000002996 emotional effect Effects 0.000 description 12
- 230000000007 visual effect Effects 0.000 description 12
- 238000009499 grossing Methods 0.000 description 10
- 230000007704 transition Effects 0.000 description 10
- 238000002156 mixing Methods 0.000 description 9
- 230000008451 emotion Effects 0.000 description 8
- 210000003128 head Anatomy 0.000 description 8
- 238000013459 approach Methods 0.000 description 7
- MQJKPEGWNLWLTK-UHFFFAOYSA-N Dapsone Chemical compound C1=CC(N)=CC=C1S(=O)(=O)C1=CC=C(N)C=C1 MQJKPEGWNLWLTK-UHFFFAOYSA-N 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000005764 inhibitory process Effects 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 241001270131 Agaricus moelleri Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 210000001097 facial muscle Anatomy 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000003188 neurobehavioral effect Effects 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000010845 search algorithm Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000000744 eyelid Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000036548 skin texture Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/80—2D [Two Dimensional] animation, e.g. using sprites
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L21/12—Transforming into visible information by displaying time domain information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
Description
1.階層的ルックアップ及び多音字連結
技術的問題
技術的解決法
詳細な説明
モデルの作製
ルックアップテーブルの作製
アニメーションのリアルタイム生成:
発話アニメーションの生成
アニメーションスニペットの選択
2.モデル口形素とのブレンド
技術的問題
技術的解決法
詳細な説明
モデル口形素
活性化曲線
連結されたアニメーションの平滑化
3.連結を介した頭部及び眉のアニメーションの生成
1.文及び単語のタイムスタンプ並びに入力文の文脈情報を生成する。
2.工程1で提供された情報に基づいて選択された、頭部の回転及び並進運動の時系列を連結する。
3.工程1で提供された情報に基づいて選択される、眉のアニメーションの時系列を連結する。
4.アニメーションを平滑化し、ブレンドする。
5.アニメーション信号に感情を追加する。
6.オーディオに同期したアニメーションを再生する。
フレーズコレクション
キーワードコレクション
例示的ポーズからの舌のアニメーションの生成
4.感情的な発話
技術的問題
技術的解決法
詳細な説明
筋肉ベースの記述子クラスの重み付け
優先度の重み付け
アニメーションコンポーザ
w=αs.ws+αe.we
αs=ps+pe.(cs-ce)
αe=pe+ps.(ce-cs)
式中、
ws=入力発話の重み
we=入力表情の重み
ps=発話に対する優先度の重み付け
pe=表情に対する優先度の重み付け
cs=発話に対する筋肉ベースの記述子クラスの重み付け(分類の重み)
ce=表情に対する筋肉ベースの記述子クラスの重み付け
αs=発話に対する出力乗数
αe=表情に対する出力乗数
並びにαs及びαeは、0~1の間に制限される。
発話アニメーションの「アクセント」のカスタマイズ
補間との組み合わせ
例示的実施形態
・発話の口のAU。例えば、AU08(唇を互いに接近させる)、AU18(唇をすぼめる)、AU22(唇を漏斗型にする)など。
・感情の口のAU、例えば、AU12(口角を引き上げる)、AU15(口角を下げる)、AU21(首を引き締める)など。
・他の口のAU、例えば、AU16(下唇を下げる)、AU25(唇を離す)、AU35(頬を吸い込む)など。
・口以外のAU、例えば、AU01(眉の内側を上げる)、AU05(上瞼を上げる)、AU09(鼻にしわを寄せる)など。
・発話の口のAU-100%低減
・感情の口のAU-50%低減
・他の口のAU-100%低減
・口以外のAU-10%低減
上記の方法及び技術は、英語を基準にして記載されてきたが、本発明はこの点において限定されない。実施形態は、任意の言語の発話アニメーションを促進するように変更されてもよい。骨ベースのアニメーションリギング、又は任意の他の好適なアニメーション技術が、ブレンドシェイプアニメーションの代わりに使用されてもよい。
2 ルックアップテーブル
3 コレクション
4 項目
5 インスタンス
6 モデル口形素
7 ストリング
8 筋肉ベースの記述子クラスの重み付け
9 優先度の重み付け
10 出力重み付け関数
11 発話
12 表情
13 筋肉ベースの記述子
14 アニメーションコンポーザ
Claims (9)
- 伝達発声をアニメーション化するための方法であって、
アニメーション化される一連の伝達発声の内容を含むストリングと、
複数のコレクションであって、それぞれのコレクションは、前記ストリングを言語的又は音声的に区分した項目に基づいて階層的に整理し、前記項目に関するアニメーションの断片を示すアニメーションスニペットを含み、複数の音素に関する前記アニメーションスニペットを含む音素コレクションを含む、複数のコレクションと、
を受信することと、
前記コレクションで前記ストリングを区分した、前記ストリングの一部分であるサブストリングと一致する項目を階層的に検索することと、
一致した項目に対するアニメーションスニペットを取得することと、
前記取得したアニメーションスニペットを組み合わせて、前記ストリングをアニメーション化することと、を含み、
前部の音節の最後の音素の右半分と、後部の音節の最初の音素の左半分と、を含む、2つの連続音節のそれぞれの対の間の間隙には、前記音素コレクション内の一致するアニメーションスニペットを組み合わせる、
方法。 - 前記伝達発声が発話である、請求項1に記載の方法。
- 前記階層的な順序により長い項目が優先される、請求項1に記載の方法。
- 少なくとも1つの項目が複数のアニメーションスニペットを含み、アニメーションスニペットは、その持続時間に基づいて取得される、請求項1に記載の方法。
- 少なくとも1つの項目が複数のアニメーションスニペットを含み、アニメーションスニペットは、対応する発話特徴に基づいて取得される、請求項1に記載の方法。
- アニメーションスニペットは、前記アニメーションに対応する音声に関連付けられている、請求項1に記載の方法。
- アニメーションスニペットを圧縮及び/又は伸張して、前記アニメーションに対応する前記音声と一致させる工程を含む、請求項6に記載の方法。
- 前記項目が部分音素ストリングを含む、請求項1に記載の方法。
- アニメーションスニペットは、筋肉ベースの記述子の重みを記憶する、請求項1に記載の方法。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NZ750233 | 2019-01-25 | ||
NZ75023319 | 2019-01-25 | ||
PCT/IB2020/050620 WO2020152657A1 (en) | 2019-01-25 | 2020-01-27 | Real-time generation of speech animation |
Publications (3)
Publication Number | Publication Date |
---|---|
JP2022518721A JP2022518721A (ja) | 2022-03-16 |
JPWO2020152657A5 JPWO2020152657A5 (ja) | 2023-02-06 |
JP7500582B2 true JP7500582B2 (ja) | 2024-06-17 |
Family
ID=71736559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2021541507A Active JP7500582B2 (ja) | 2019-01-25 | 2020-01-27 | 発話アニメーションのリアルタイム生成 |
Country Status (8)
Country | Link |
---|---|
US (1) | US20220108510A1 (ja) |
EP (1) | EP3915108B1 (ja) |
JP (1) | JP7500582B2 (ja) |
KR (1) | KR20210114521A (ja) |
CN (1) | CN113383384A (ja) |
AU (1) | AU2020211809A1 (ja) |
CA (1) | CA3128047A1 (ja) |
WO (1) | WO2020152657A1 (ja) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111354370B (zh) * | 2020-02-13 | 2021-06-25 | 百度在线网络技术(北京)有限公司 | 一种唇形特征预测方法、装置和电子设备 |
US11756251B2 (en) * | 2020-09-03 | 2023-09-12 | Sony Interactive Entertainment Inc. | Facial animation control by automatic generation of facial action units using text and speech |
CN112215927B (zh) * | 2020-09-18 | 2023-06-23 | 腾讯科技(深圳)有限公司 | 人脸视频的合成方法、装置、设备及介质 |
CN112333179B (zh) * | 2020-10-30 | 2023-11-10 | 腾讯科技(深圳)有限公司 | 虚拟视频的直播方法、装置、设备及可读存储介质 |
KR102555103B1 (ko) * | 2021-09-02 | 2023-07-17 | (주)씨유박스 | 얼굴영상을 이용한 액티브 라이브니스 검출 방법 및 장치 |
CN116188649B (zh) * | 2023-04-27 | 2023-10-13 | 科大讯飞股份有限公司 | 基于语音的三维人脸模型驱动方法及相关装置 |
CN117037255B (zh) * | 2023-08-22 | 2024-06-21 | 北京中科深智科技有限公司 | 基于有向图的3d表情合成方法 |
CN116912376B (zh) * | 2023-09-14 | 2023-12-22 | 腾讯科技(深圳)有限公司 | 口型动画生成方法、装置、计算机设备和存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003248841A (ja) | 2001-12-20 | 2003-09-05 | Matsushita Electric Ind Co Ltd | バーチャルテレビ通話装置 |
JP2007299300A (ja) | 2006-05-02 | 2007-11-15 | Advanced Telecommunication Research Institute International | アニメーション作成装置 |
US20120130717A1 (en) | 2010-11-19 | 2012-05-24 | Microsoft Corporation | Real-time Animation for an Expressive Avatar |
JP2016042362A (ja) | 2013-01-29 | 2016-03-31 | 株式会社東芝 | コンピュータ生成ヘッド |
Family Cites Families (82)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5657426A (en) * | 1994-06-10 | 1997-08-12 | Digital Equipment Corporation | Method and apparatus for producing audio-visual synthetic speech |
US5880788A (en) * | 1996-03-25 | 1999-03-09 | Interval Research Corporation | Automated synchronization of video image sequences to new soundtracks |
US6208356B1 (en) * | 1997-03-24 | 2001-03-27 | British Telecommunications Public Limited Company | Image synthesis |
US6970172B2 (en) * | 1997-03-27 | 2005-11-29 | At&T Corp. | Method for defining MPEG 4 animation parameters for an animation definition interface |
US6147692A (en) * | 1997-06-25 | 2000-11-14 | Haptek, Inc. | Method and apparatus for controlling transformation of two and three-dimensional images |
WO2000030069A2 (en) * | 1998-11-13 | 2000-05-25 | Lernout & Hauspie Speech Products N.V. | Speech synthesis using concatenation of speech waveforms |
US6504546B1 (en) * | 2000-02-08 | 2003-01-07 | At&T Corp. | Method of modeling objects to synthesize three-dimensional, photo-realistic animations |
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
AU2001292963A1 (en) * | 2000-09-21 | 2002-04-02 | The Regents Of The University Of California | Visual display methods for use in computer-animated speech production models |
CA2432021A1 (en) * | 2000-12-19 | 2002-06-27 | Speechview Ltd. | Generating visual representation of speech by any individuals of a population |
US6654018B1 (en) * | 2001-03-29 | 2003-11-25 | At&T Corp. | Audio-visual selection process for the synthesis of photo-realistic talking-head animations |
US7209882B1 (en) * | 2002-05-10 | 2007-04-24 | At&T Corp. | System and method for triphone-based unit selection for visual speech synthesis |
US20100085363A1 (en) * | 2002-08-14 | 2010-04-08 | PRTH-Brand-CIP | Photo Realistic Talking Head Creation, Content Creation, and Distribution System and Method |
US7257538B2 (en) * | 2002-10-07 | 2007-08-14 | Intel Corporation | Generating animation from visual and audio input |
US7168953B1 (en) * | 2003-01-27 | 2007-01-30 | Massachusetts Institute Of Technology | Trainable videorealistic speech animation |
WO2004100128A1 (en) * | 2003-04-18 | 2004-11-18 | Unisay Sdn. Bhd. | System for generating a timed phomeme and visem list |
GB2404040A (en) * | 2003-07-16 | 2005-01-19 | Canon Kk | Lattice matching |
US7990384B2 (en) * | 2003-09-15 | 2011-08-02 | At&T Intellectual Property Ii, L.P. | Audio-visual selection process for the synthesis of photo-realistic talking-head animations |
WO2005031654A1 (en) * | 2003-09-30 | 2005-04-07 | Koninklijke Philips Electronics, N.V. | System and method for audio-visual content synthesis |
US20060009978A1 (en) * | 2004-07-02 | 2006-01-12 | The Regents Of The University Of Colorado | Methods and systems for synthesis of accurate visible speech via transformation of motion capture data |
US7388586B2 (en) * | 2005-03-31 | 2008-06-17 | Intel Corporation | Method and apparatus for animation of a human speaker |
US20080294433A1 (en) * | 2005-05-27 | 2008-11-27 | Minerva Yeung | Automatic Text-Speech Mapping Tool |
US20070033042A1 (en) * | 2005-08-03 | 2007-02-08 | International Business Machines Corporation | Speech detection fusing multi-class acoustic-phonetic, and energy features |
CN1991982A (zh) * | 2005-12-29 | 2007-07-04 | 摩托罗拉公司 | 一种使用语音数据激励图像的方法 |
JP2009533786A (ja) * | 2006-04-10 | 2009-09-17 | アヴァワークス インコーポレーテッド | 自分でできるフォトリアリスティックなトーキングヘッド作成システム及び方法 |
KR100813034B1 (ko) * | 2006-12-07 | 2008-03-14 | 한국전자통신연구원 | 캐릭터 형성방법 |
TWI454955B (zh) * | 2006-12-29 | 2014-10-01 | Nuance Communications Inc | 使用模型檔產生動畫的方法及電腦可讀取的訊號承載媒體 |
US20090044112A1 (en) * | 2007-08-09 | 2009-02-12 | H-Care Srl | Animated Digital Assistant |
JP5109038B2 (ja) * | 2007-09-10 | 2012-12-26 | 株式会社国際電気通信基礎技術研究所 | リップシンクアニメーション作成装置及びコンピュータプログラム |
US8489399B2 (en) * | 2008-06-23 | 2013-07-16 | John Nicholas and Kristin Gross Trust | System and method for verifying origin of input through spoken language analysis |
US8392190B2 (en) * | 2008-12-01 | 2013-03-05 | Educational Testing Service | Systems and methods for assessment of non-native spontaneous speech |
CA2745094A1 (en) * | 2008-12-04 | 2010-07-01 | Total Immersion Software, Inc. | Systems and methods for dynamically injecting expression information into an animated facial mesh |
US20100332229A1 (en) * | 2009-06-30 | 2010-12-30 | Sony Corporation | Apparatus control based on visual lip share recognition |
US20110106792A1 (en) * | 2009-11-05 | 2011-05-05 | I2 Limited | System and method for word matching and indexing |
BRPI0904540B1 (pt) * | 2009-11-27 | 2021-01-26 | Samsung Eletrônica Da Amazônia Ltda | método para animar rostos/cabeças/personagens virtuais via processamento de voz |
KR101153736B1 (ko) * | 2010-05-31 | 2012-06-05 | 봉래 박 | 발음기관 애니메이션 생성 장치 및 방법 |
US8744856B1 (en) * | 2011-02-22 | 2014-06-03 | Carnegie Speech Company | Computer implemented system and method and computer program product for evaluating pronunciation of phonemes in a language |
JP6019108B2 (ja) | 2011-05-06 | 2016-11-02 | セイヤー インコーポレイテッド | 文字に基づく映像生成 |
KR101558202B1 (ko) * | 2011-05-23 | 2015-10-12 | 한국전자통신연구원 | 아바타를 이용한 애니메이션 생성 장치 및 방법 |
JP5752060B2 (ja) * | 2012-01-19 | 2015-07-22 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 情報処理装置、大語彙連続音声認識方法及びプログラム |
JP5665780B2 (ja) * | 2012-02-21 | 2015-02-04 | 株式会社東芝 | 音声合成装置、方法およびプログラム |
US9094576B1 (en) * | 2013-03-12 | 2015-07-28 | Amazon Technologies, Inc. | Rendered audiovisual communication |
US10170114B2 (en) * | 2013-05-30 | 2019-01-01 | Promptu Systems Corporation | Systems and methods for adaptive proper name entity recognition and understanding |
KR20140146965A (ko) * | 2013-06-18 | 2014-12-29 | 삼성전자주식회사 | 디스플레이 장치, 서버를 포함하는 변환 시스템 및 디스플레이 장치의 제어 방법 |
GB2517212B (en) | 2013-08-16 | 2018-04-25 | Toshiba Res Europe Limited | A Computer Generated Emulation of a subject |
US20150287403A1 (en) * | 2014-04-07 | 2015-10-08 | Neta Holzer Zaslansky | Device, system, and method of automatically generating an animated content-item |
US9956407B2 (en) * | 2014-08-04 | 2018-05-01 | Cochlear Limited | Tonal deafness compensation in an auditory prosthesis system |
US20190147838A1 (en) * | 2014-08-22 | 2019-05-16 | Zya, Inc. | Systems and methods for generating animated multimedia compositions |
US10360716B1 (en) * | 2015-09-18 | 2019-07-23 | Amazon Technologies, Inc. | Enhanced avatar animation |
WO2017075452A1 (en) * | 2015-10-29 | 2017-05-04 | True Image Interactive, Inc | Systems and methods for machine-generated avatars |
US9911218B2 (en) * | 2015-12-01 | 2018-03-06 | Disney Enterprises, Inc. | Systems and methods for speech animation using visemes with phonetic boundary context |
US9837069B2 (en) * | 2015-12-22 | 2017-12-05 | Intel Corporation | Technologies for end-of-sentence detection using syntactic coherence |
US10217261B2 (en) * | 2016-02-18 | 2019-02-26 | Pinscreen, Inc. | Deep learning-based facial animation for head-mounted display |
JP6690484B2 (ja) * | 2016-09-15 | 2020-04-28 | 富士通株式会社 | 音声認識用コンピュータプログラム、音声認識装置及び音声認識方法 |
US11145100B2 (en) * | 2017-01-12 | 2021-10-12 | The Regents Of The University Of Colorado, A Body Corporate | Method and system for implementing three-dimensional facial modeling and visual speech synthesis |
US10839825B2 (en) * | 2017-03-03 | 2020-11-17 | The Governing Council Of The University Of Toronto | System and method for animated lip synchronization |
US10530928B1 (en) * | 2017-03-15 | 2020-01-07 | Noble Systems Corporation | Answering machine detection (“AMD”) for a contact center by using AMD meta-data |
JP6866715B2 (ja) * | 2017-03-22 | 2021-04-28 | カシオ計算機株式会社 | 情報処理装置、感情認識方法、及び、プログラム |
US12020686B2 (en) * | 2017-03-23 | 2024-06-25 | D&M Holdings Inc. | System providing expressive and emotive text-to-speech |
US10629223B2 (en) * | 2017-05-31 | 2020-04-21 | International Business Machines Corporation | Fast playback in media files with reduced impact to speech quality |
US10732708B1 (en) * | 2017-11-21 | 2020-08-04 | Amazon Technologies, Inc. | Disambiguation of virtual reality information using multi-modal data including speech |
US10586369B1 (en) * | 2018-01-31 | 2020-03-10 | Amazon Technologies, Inc. | Using dialog and contextual data of a virtual reality environment to create metadata to drive avatar animation |
US10643602B2 (en) * | 2018-03-16 | 2020-05-05 | Microsoft Technology Licensing, Llc | Adversarial teacher-student learning for unsupervised domain adaptation |
US11386900B2 (en) * | 2018-05-18 | 2022-07-12 | Deepmind Technologies Limited | Visual speech recognition by phoneme prediction |
US10699705B2 (en) * | 2018-06-22 | 2020-06-30 | Adobe Inc. | Using machine-learning models to determine movements of a mouth corresponding to live speech |
US11270487B1 (en) * | 2018-09-17 | 2022-03-08 | Facebook Technologies, Llc | Systems and methods for improving animation of computer-generated avatars |
US11238885B2 (en) * | 2018-10-29 | 2022-02-01 | Microsoft Technology Licensing, Llc | Computing system for expressive three-dimensional facial animation |
US10825224B2 (en) * | 2018-11-20 | 2020-11-03 | Adobe Inc. | Automatic viseme detection for generating animatable puppet |
US11024071B2 (en) * | 2019-01-02 | 2021-06-01 | Espiritu Technologies, Llc | Method of converting phoneme transcription data into lip sync animation data for 3D animation software |
US20200279553A1 (en) * | 2019-02-28 | 2020-09-03 | Microsoft Technology Licensing, Llc | Linguistic style matching agent |
US11049308B2 (en) * | 2019-03-21 | 2021-06-29 | Electronic Arts Inc. | Generating facial position data based on audio data |
US11627283B2 (en) * | 2019-05-09 | 2023-04-11 | Present Communications, Inc. | Method for enabling synthetic autopilot video functions and for publishing a synthetic video feed as a virtual camera during a video call |
US11671562B2 (en) * | 2019-05-09 | 2023-06-06 | Present Communications, Inc. | Method for enabling synthetic autopilot video functions and for publishing a synthetic video feed as a virtual camera during a video call |
US20230353707A1 (en) * | 2019-05-09 | 2023-11-02 | Present Communications, Inc. | Method for enabling synthetic autopilot video functions and for publishing a synthetic video feed as a virtual camera during a video call |
US11551393B2 (en) * | 2019-07-23 | 2023-01-10 | LoomAi, Inc. | Systems and methods for animation generation |
US11593984B2 (en) * | 2020-02-07 | 2023-02-28 | Apple Inc. | Using text for avatar animation |
US11417041B2 (en) * | 2020-02-12 | 2022-08-16 | Adobe Inc. | Style-aware audio-driven talking head animation from a single image |
US11244668B2 (en) * | 2020-05-29 | 2022-02-08 | TCL Research America Inc. | Device and method for generating speech animation |
US20210390949A1 (en) * | 2020-06-16 | 2021-12-16 | Netflix, Inc. | Systems and methods for phoneme and viseme recognition |
US11682153B2 (en) * | 2020-09-12 | 2023-06-20 | Jingdong Digits Technology Holding Co., Ltd. | System and method for synthesizing photo-realistic video of a speech |
US20230111633A1 (en) * | 2021-10-08 | 2023-04-13 | Accenture Global Solutions Limited | Lead conversion using conversational virtual avatar |
US20230130287A1 (en) * | 2021-10-27 | 2023-04-27 | Samsung Electronics Co., Ltd. | Light-weight machine learning models for lip sync animation on mobile devices or other devices |
-
2020
- 2020-01-27 AU AU2020211809A patent/AU2020211809A1/en active Pending
- 2020-01-27 CA CA3128047A patent/CA3128047A1/en active Pending
- 2020-01-27 JP JP2021541507A patent/JP7500582B2/ja active Active
- 2020-01-27 US US17/422,167 patent/US20220108510A1/en active Pending
- 2020-01-27 CN CN202080008157.2A patent/CN113383384A/zh active Pending
- 2020-01-27 WO PCT/IB2020/050620 patent/WO2020152657A1/en unknown
- 2020-01-27 KR KR1020217026491A patent/KR20210114521A/ko not_active Application Discontinuation
- 2020-01-27 EP EP20744394.6A patent/EP3915108B1/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003248841A (ja) | 2001-12-20 | 2003-09-05 | Matsushita Electric Ind Co Ltd | バーチャルテレビ通話装置 |
JP2007299300A (ja) | 2006-05-02 | 2007-11-15 | Advanced Telecommunication Research Institute International | アニメーション作成装置 |
US20120130717A1 (en) | 2010-11-19 | 2012-05-24 | Microsoft Corporation | Real-time Animation for an Expressive Avatar |
JP2016042362A (ja) | 2013-01-29 | 2016-03-31 | 株式会社東芝 | コンピュータ生成ヘッド |
Non-Patent Citations (2)
Title |
---|
金子 正秀 ほか2名,テキスト情報に対応した口形状変化を有する顔動画像の合成,電子情報通信学会論文誌 ,1992年02月25日,第J75-D-II巻、 第2号,p.203~215 |
須崎 昌彦 ほか3名,顔面像を用いたヒューマンインタフェースの構築,第25回 ヒューマンインタフェースと認知モデル研究会資料,社団法人人工知能学会 ,1995年05月19日,SIG-HICG-9501,p.30~37 |
Also Published As
Publication number | Publication date |
---|---|
EP3915108C0 (en) | 2023-11-29 |
AU2020211809A1 (en) | 2021-07-29 |
EP3915108A1 (en) | 2021-12-01 |
US20220108510A1 (en) | 2022-04-07 |
EP3915108B1 (en) | 2023-11-29 |
CA3128047A1 (en) | 2020-07-30 |
EP3915108A4 (en) | 2022-09-07 |
JP2022518721A (ja) | 2022-03-16 |
WO2020152657A1 (en) | 2020-07-30 |
KR20210114521A (ko) | 2021-09-23 |
CN113383384A (zh) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7500582B2 (ja) | 発話アニメーションのリアルタイム生成 | |
Karras et al. | Audio-driven facial animation by joint end-to-end learning of pose and emotion | |
Cao et al. | Expressive speech-driven facial animation | |
US9959657B2 (en) | Computer generated head | |
US9361722B2 (en) | Synthetic audiovisual storyteller | |
Brand | Voice puppetry | |
Ezzat et al. | Miketalk: A talking facial display based on morphing visemes | |
US7663628B2 (en) | Apparatus and method for efficient animation of believable speaking 3D characters in real time | |
Chuang et al. | Mood swings: expressive speech animation | |
US8224652B2 (en) | Speech and text driven HMM-based body animation synthesis | |
JP3664474B2 (ja) | 視覚的スピーチの言語透過的合成 | |
Xu et al. | A practical and configurable lip sync method for games | |
US20140210831A1 (en) | Computer generated head | |
Albrecht et al. | Automatic generation of non-verbal facial expressions from speech | |
WO2024088321A1 (zh) | 虚拟形象面部驱动方法、装置、电子设备及介质 | |
CN116309984A (zh) | 一种基于文本驱动的口型动画生成方法及系统 | |
Breen et al. | An investigation into the generation of mouth shapes for a talking head | |
Pan et al. | Vocal: Vowel and consonant layering for expressive animator-centric singing animation | |
Minnis et al. | Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis | |
CN113362432B (zh) | 一种面部动画生成方法及装置 | |
Verma et al. | Animating expressive faces across languages | |
d’Alessandro et al. | Reactive statistical mapping: Towards the sketching of performative control with data | |
Kolivand et al. | Realistic lip syncing for virtual character using common viseme set | |
Edge et al. | Model-based synthesis of visual speech movements from 3D video | |
Filntisis et al. | Video-realistic expressive audio-visual speech synthesis for the Greek |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
RD04 | Notification of resignation of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7424 Effective date: 20211115 |
|
RD02 | Notification of acceptance of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7422 Effective date: 20211214 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230126 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20230126 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20231107 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240205 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20240507 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20240605 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 7500582 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |