KR102649818B1 - 3d 립싱크 비디오 생성 장치 및 방법 - Google Patents

3d 립싱크 비디오 생성 장치 및 방법 Download PDF

Info

Publication number
KR102649818B1
KR102649818B1 KR1020220064510A KR20220064510A KR102649818B1 KR 102649818 B1 KR102649818 B1 KR 102649818B1 KR 1020220064510 A KR1020220064510 A KR 1020220064510A KR 20220064510 A KR20220064510 A KR 20220064510A KR 102649818 B1 KR102649818 B1 KR 102649818B1
Authority
KR
South Korea
Prior art keywords
video
feature vector
person
lip
speech
Prior art date
Application number
KR1020220064510A
Other languages
English (en)
Korean (ko)
Other versions
KR20230164854A (ko
Inventor
채경수
김두현
곽희태
조혜진
이기혁
Original Assignee
주식회사 딥브레인에이아이
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 딥브레인에이아이 filed Critical 주식회사 딥브레인에이아이
Priority to KR1020220064510A priority Critical patent/KR102649818B1/ko
Priority to PCT/KR2022/008364 priority patent/WO2023229091A1/fr
Publication of KR20230164854A publication Critical patent/KR20230164854A/ko
Application granted granted Critical
Publication of KR102649818B1 publication Critical patent/KR102649818B1/ko

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
KR1020220064510A 2022-05-26 2022-05-26 3d 립싱크 비디오 생성 장치 및 방법 KR102649818B1 (ko)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020220064510A KR102649818B1 (ko) 2022-05-26 2022-05-26 3d 립싱크 비디오 생성 장치 및 방법
PCT/KR2022/008364 WO2023229091A1 (fr) 2022-05-26 2022-06-14 Appareil et procédé de génération de vidéo de synchronisation labiale 3d

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020220064510A KR102649818B1 (ko) 2022-05-26 2022-05-26 3d 립싱크 비디오 생성 장치 및 방법

Publications (2)

Publication Number Publication Date
KR20230164854A KR20230164854A (ko) 2023-12-05
KR102649818B1 true KR102649818B1 (ko) 2024-03-21

Family

ID=88919312

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020220064510A KR102649818B1 (ko) 2022-05-26 2022-05-26 3d 립싱크 비디오 생성 장치 및 방법

Country Status (2)

Country Link
KR (1) KR102649818B1 (fr)
WO (1) WO2023229091A1 (fr)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101177408B1 (ko) 2010-09-16 2012-08-27 광운대학교 산학협력단 시청자의 시점에 따라 홀로그래픽 영상을 복원하는 다시점 기반 대화형 홀로그래픽 복원 장치 및 시스템
KR20170062089A (ko) * 2015-11-27 2017-06-07 주식회사 매니아마인드 3d아바타의 표정 구현 방법 및 프로그램
CN110288682B (zh) * 2019-06-28 2023-09-26 北京百度网讯科技有限公司 用于控制三维虚拟人像口型变化的方法和装置
KR102483416B1 (ko) * 2020-08-25 2022-12-30 주식회사 딥브레인에이아이 발화 동영상 생성 방법 및 장치

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Avisek Lah et al., ‘LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization’, CVPR 2021, 2021.06.19.*
Rithesh Kumar et al., ‘ObamaNet: Photo-realistic lip-sync from text’, arXiv:1801.01442v1 [cs.CV], 6 Dec 2017.*
Suzhen Wang et al., ‘One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning’, AAAI-22, February 2022.*

Also Published As

Publication number Publication date
WO2023229091A1 (fr) 2023-11-30
KR20230164854A (ko) 2023-12-05

Similar Documents

Publication Publication Date Title
KR102360839B1 (ko) 머신 러닝 기반의 발화 동영상 생성 방법 및 장치
JP6019108B2 (ja) 文字に基づく映像生成
US11514634B2 (en) Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses
US20220358703A1 (en) Method and device for generating speech video on basis of machine learning
KR102509666B1 (ko) 텍스트 및 오디오 기반 실시간 얼굴 재연
KR102346755B1 (ko) 음성 신호를 이용한 발화 동영상 생성 방법 및 장치
KR101558202B1 (ko) 아바타를 이용한 애니메이션 생성 장치 및 방법
WO2022106654A2 (fr) Procédés et systèmes de traduction vidéo
KR102437039B1 (ko) 영상 생성을 위한 학습 장치 및 방법
US20020024519A1 (en) System and method for producing three-dimensional moving picture authoring tool supporting synthesis of motion, facial expression, lip synchronizing and lip synchronized voice of three-dimensional character
CN115004236A (zh) 来自音频的照片级逼真说话面部
US20220399025A1 (en) Method and device for generating speech video using audio signal
CN110266973A (zh) 视频处理方法、装置、计算机可读存储介质和计算机设备
US11972516B2 (en) Method and device for generating speech video by using text
US20220375190A1 (en) Device and method for generating speech video
KR102540763B1 (ko) 머신 러닝 기반의 립싱크 영상 생성을 위한 학습 방법 및 이를 수행하기 위한 립싱크 영상 생성 장치
US20220375224A1 (en) Device and method for generating speech video along with landmark
KR102360840B1 (ko) 텍스트를 이용한 발화 동영상 생성 방법 및 장치
KR20220111388A (ko) 영상 품질을 향상시킬 수 있는 영상 합성 장치 및 방법
Chen et al. DualLip: A system for joint lip reading and generation
KR20220111390A (ko) 영상 품질을 향상시킬 수 있는 영상 합성 장치 및 방법
KR102649818B1 (ko) 3d 립싱크 비디오 생성 장치 및 방법
JP6291265B2 (ja) 手話cg合成装置及びそのプログラム
KR20200001902A (ko) 수어 인식 인공신경망 학습데이터 생성방법과 시스템 및 변형 애니메이션 데이터 생성시스템
Morishima et al. Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-d head model

Legal Events

Date Code Title Description
AMND Amendment
E601 Decision to refuse application
AMND Amendment
X701 Decision to grant (after re-examination)
GRNT Written decision to grant