KR102649818B1 - 3d 립싱크 비디오 생성 장치 및 방법 - Google Patents
3d 립싱크 비디오 생성 장치 및 방법 Download PDFInfo
- Publication number
- KR102649818B1 KR102649818B1 KR1020220064510A KR20220064510A KR102649818B1 KR 102649818 B1 KR102649818 B1 KR 102649818B1 KR 1020220064510 A KR1020220064510 A KR 1020220064510A KR 20220064510 A KR20220064510 A KR 20220064510A KR 102649818 B1 KR102649818 B1 KR 102649818B1
- Authority
- KR
- South Korea
- Prior art keywords
- video
- feature vector
- person
- lip
- speech
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000000284 extract Substances 0.000 claims description 22
- 230000008602 contraction Effects 0.000 claims description 20
- 210000003205 muscle Anatomy 0.000 claims description 20
- 210000002396 uvula Anatomy 0.000 claims description 20
- 238000010801 machine learning Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 19
- 239000003550 marker Substances 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020220064510A KR102649818B1 (ko) | 2022-05-26 | 2022-05-26 | 3d 립싱크 비디오 생성 장치 및 방법 |
PCT/KR2022/008364 WO2023229091A1 (fr) | 2022-05-26 | 2022-06-14 | Appareil et procédé de génération de vidéo de synchronisation labiale 3d |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020220064510A KR102649818B1 (ko) | 2022-05-26 | 2022-05-26 | 3d 립싱크 비디오 생성 장치 및 방법 |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20230164854A KR20230164854A (ko) | 2023-12-05 |
KR102649818B1 true KR102649818B1 (ko) | 2024-03-21 |
Family
ID=88919312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020220064510A KR102649818B1 (ko) | 2022-05-26 | 2022-05-26 | 3d 립싱크 비디오 생성 장치 및 방법 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102649818B1 (fr) |
WO (1) | WO2023229091A1 (fr) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101177408B1 (ko) | 2010-09-16 | 2012-08-27 | 광운대학교 산학협력단 | 시청자의 시점에 따라 홀로그래픽 영상을 복원하는 다시점 기반 대화형 홀로그래픽 복원 장치 및 시스템 |
KR20170062089A (ko) * | 2015-11-27 | 2017-06-07 | 주식회사 매니아마인드 | 3d아바타의 표정 구현 방법 및 프로그램 |
CN110288682B (zh) * | 2019-06-28 | 2023-09-26 | 北京百度网讯科技有限公司 | 用于控制三维虚拟人像口型变化的方法和装置 |
KR102483416B1 (ko) * | 2020-08-25 | 2022-12-30 | 주식회사 딥브레인에이아이 | 발화 동영상 생성 방법 및 장치 |
-
2022
- 2022-05-26 KR KR1020220064510A patent/KR102649818B1/ko active IP Right Grant
- 2022-06-14 WO PCT/KR2022/008364 patent/WO2023229091A1/fr unknown
Non-Patent Citations (3)
Title |
---|
Avisek Lah et al., ‘LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization’, CVPR 2021, 2021.06.19.* |
Rithesh Kumar et al., ‘ObamaNet: Photo-realistic lip-sync from text’, arXiv:1801.01442v1 [cs.CV], 6 Dec 2017.* |
Suzhen Wang et al., ‘One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning’, AAAI-22, February 2022.* |
Also Published As
Publication number | Publication date |
---|---|
WO2023229091A1 (fr) | 2023-11-30 |
KR20230164854A (ko) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102360839B1 (ko) | 머신 러닝 기반의 발화 동영상 생성 방법 및 장치 | |
JP6019108B2 (ja) | 文字に基づく映像生成 | |
US11514634B2 (en) | Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses | |
US20220358703A1 (en) | Method and device for generating speech video on basis of machine learning | |
KR102509666B1 (ko) | 텍스트 및 오디오 기반 실시간 얼굴 재연 | |
KR102346755B1 (ko) | 음성 신호를 이용한 발화 동영상 생성 방법 및 장치 | |
KR101558202B1 (ko) | 아바타를 이용한 애니메이션 생성 장치 및 방법 | |
WO2022106654A2 (fr) | Procédés et systèmes de traduction vidéo | |
KR102437039B1 (ko) | 영상 생성을 위한 학습 장치 및 방법 | |
US20020024519A1 (en) | System and method for producing three-dimensional moving picture authoring tool supporting synthesis of motion, facial expression, lip synchronizing and lip synchronized voice of three-dimensional character | |
CN115004236A (zh) | 来自音频的照片级逼真说话面部 | |
US20220399025A1 (en) | Method and device for generating speech video using audio signal | |
CN110266973A (zh) | 视频处理方法、装置、计算机可读存储介质和计算机设备 | |
US11972516B2 (en) | Method and device for generating speech video by using text | |
US20220375190A1 (en) | Device and method for generating speech video | |
KR102540763B1 (ko) | 머신 러닝 기반의 립싱크 영상 생성을 위한 학습 방법 및 이를 수행하기 위한 립싱크 영상 생성 장치 | |
US20220375224A1 (en) | Device and method for generating speech video along with landmark | |
KR102360840B1 (ko) | 텍스트를 이용한 발화 동영상 생성 방법 및 장치 | |
KR20220111388A (ko) | 영상 품질을 향상시킬 수 있는 영상 합성 장치 및 방법 | |
Chen et al. | DualLip: A system for joint lip reading and generation | |
KR20220111390A (ko) | 영상 품질을 향상시킬 수 있는 영상 합성 장치 및 방법 | |
KR102649818B1 (ko) | 3d 립싱크 비디오 생성 장치 및 방법 | |
JP6291265B2 (ja) | 手話cg合成装置及びそのプログラム | |
KR20200001902A (ko) | 수어 인식 인공신경망 학습데이터 생성방법과 시스템 및 변형 애니메이션 데이터 생성시스템 | |
Morishima et al. | Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-d head model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AMND | Amendment | ||
E601 | Decision to refuse application | ||
AMND | Amendment | ||
X701 | Decision to grant (after re-examination) | ||
GRNT | Written decision to grant |