CN108538282A - 一种由唇部视频直接生成语音的方法 - Google Patents
一种由唇部视频直接生成语音的方法 Download PDFInfo
- Publication number
- CN108538282A CN108538282A CN201810214692.8A CN201810214692A CN108538282A CN 108538282 A CN108538282 A CN 108538282A CN 201810214692 A CN201810214692 A CN 201810214692A CN 108538282 A CN108538282 A CN 108538282A
- Authority
- CN
- China
- Prior art keywords
- lip
- video
- labial
- vector
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 57
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 13
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 13
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 239000000284 extract Substances 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 abstract description 6
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Software Systems (AREA)
- Social Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810214692.8A CN108538282B (zh) | 2018-03-15 | 2018-03-15 | 一种由唇部视频直接生成语音的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810214692.8A CN108538282B (zh) | 2018-03-15 | 2018-03-15 | 一种由唇部视频直接生成语音的方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108538282A true CN108538282A (zh) | 2018-09-14 |
CN108538282B CN108538282B (zh) | 2021-10-08 |
Family
ID=63483616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810214692.8A Active CN108538282B (zh) | 2018-03-15 | 2018-03-15 | 一种由唇部视频直接生成语音的方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108538282B (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111916054A (zh) * | 2020-07-08 | 2020-11-10 | 标贝(北京)科技有限公司 | 基于唇形的语音生成方法、装置和系统及存储介质 |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN2106451U (zh) * | 1991-06-17 | 1992-06-03 | 上海电力学院 | 音频信号间隙检测装置 |
US6332123B1 (en) * | 1989-03-08 | 2001-12-18 | Kokusai Denshin Denwa Kabushiki Kaisha | Mouth shape synthesizing |
US20040120554A1 (en) * | 2002-12-21 | 2004-06-24 | Lin Stephen Ssu-Te | System and method for real time lip synchronization |
CN1556496A (zh) * | 2003-12-31 | 2004-12-22 | 天津大学 | 唇形识别发声器 |
CN101482976A (zh) * | 2009-01-19 | 2009-07-15 | 腾讯科技(深圳)有限公司 | 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置 |
CN101510256A (zh) * | 2009-03-20 | 2009-08-19 | 深圳华为通信技术有限公司 | 一种口型语言的转换方法及装置 |
JP2009251199A (ja) * | 2008-04-04 | 2009-10-29 | Oki Electric Ind Co Ltd | 音声合成装置、方法及びプログラム |
CN101751692A (zh) * | 2009-12-24 | 2010-06-23 | 四川大学 | 语音驱动唇形动画的方法 |
CN105632497A (zh) * | 2016-01-06 | 2016-06-01 | 昆山龙腾光电有限公司 | 一种语音输出方法、语音输出系统 |
CN105654952A (zh) * | 2014-11-28 | 2016-06-08 | 三星电子株式会社 | 用于输出语音的电子设备、服务器和方法 |
US20170039440A1 (en) * | 2015-08-07 | 2017-02-09 | International Business Machines Corporation | Visual liveness detection |
-
2018
- 2018-03-15 CN CN201810214692.8A patent/CN108538282B/zh active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6332123B1 (en) * | 1989-03-08 | 2001-12-18 | Kokusai Denshin Denwa Kabushiki Kaisha | Mouth shape synthesizing |
CN2106451U (zh) * | 1991-06-17 | 1992-06-03 | 上海电力学院 | 音频信号间隙检测装置 |
US20040120554A1 (en) * | 2002-12-21 | 2004-06-24 | Lin Stephen Ssu-Te | System and method for real time lip synchronization |
US20060204060A1 (en) * | 2002-12-21 | 2006-09-14 | Microsoft Corporation | System and method for real time lip synchronization |
CN1556496A (zh) * | 2003-12-31 | 2004-12-22 | 天津大学 | 唇形识别发声器 |
JP2009251199A (ja) * | 2008-04-04 | 2009-10-29 | Oki Electric Ind Co Ltd | 音声合成装置、方法及びプログラム |
CN101482976A (zh) * | 2009-01-19 | 2009-07-15 | 腾讯科技(深圳)有限公司 | 语音驱动嘴唇形状变化的方法、获取嘴唇动画的方法及装置 |
CN101510256A (zh) * | 2009-03-20 | 2009-08-19 | 深圳华为通信技术有限公司 | 一种口型语言的转换方法及装置 |
CN101751692A (zh) * | 2009-12-24 | 2010-06-23 | 四川大学 | 语音驱动唇形动画的方法 |
CN105654952A (zh) * | 2014-11-28 | 2016-06-08 | 三星电子株式会社 | 用于输出语音的电子设备、服务器和方法 |
US20170039440A1 (en) * | 2015-08-07 | 2017-02-09 | International Business Machines Corporation | Visual liveness detection |
CN105632497A (zh) * | 2016-01-06 | 2016-06-01 | 昆山龙腾光电有限公司 | 一种语音输出方法、语音输出系统 |
Non-Patent Citations (3)
Title |
---|
ASSAEL Y M: "LipNet: end-to-end sentence-level", 《HTTPS://ARXIV.ORG/ABS/1611.01599》 * |
J.MA,ET AL.: "Accurate visible speech synthesis based on concatenating variable length motion capture data", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》 * |
陈峰等: "LPC-10e到MELP语音编码转换", 《计算机工程与应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111916054A (zh) * | 2020-07-08 | 2020-11-10 | 标贝(北京)科技有限公司 | 基于唇形的语音生成方法、装置和系统及存储介质 |
CN111916054B (zh) * | 2020-07-08 | 2024-04-26 | 标贝(青岛)科技有限公司 | 基于唇形的语音生成方法、装置和系统及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN108538282B (zh) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111243626B (zh) | 一种说话视频生成方法及系统 | |
CN113192161B (zh) | 一种虚拟人形象视频生成方法、系统、装置及存储介质 | |
Ephrat et al. | Vid2speech: speech reconstruction from silent video | |
CN110853670B (zh) | 音乐驱动的舞蹈生成方法 | |
EP0225729B1 (en) | Image encoding and synthesis | |
CN103650002B (zh) | 基于文本的视频生成 | |
WO2018049979A1 (zh) | 一种动画合成的方法及装置 | |
CN112562722A (zh) | 基于语义的音频驱动数字人生成方法及系统 | |
CN112465935A (zh) | 虚拟形象合成方法、装置、电子设备和存储介质 | |
CN116250036A (zh) | 用于合成语音的照片级真实感视频的系统和方法 | |
CN113378806B (zh) | 一种融合情感编码的音频驱动人脸动画生成方法及系统 | |
CN108538283A (zh) | 一种由唇部图像特征到语音编码参数的转换方法 | |
CN111459450A (zh) | 交互对象的驱动方法、装置、设备以及存储介质 | |
KR20220097121A (ko) | 랜덤 널링 인공신경망을 이용한 입모양 합성 장치 및 방법 | |
CN112308949A (zh) | 模型训练、人脸图像生成方法和装置以及存储介质 | |
ITTO20000303A1 (it) | Procedimento per l'animazione di un modello sintetizzato di volto umano pilotata da un segnale audio. | |
CN114419702B (zh) | 数字人生成模型、模型的训练方法以及数字人生成方法 | |
CN108648745B (zh) | 一种由唇部图像序列到语音编码参数的转换方法 | |
CN113838174B (zh) | 一种音频驱动人脸动画生成方法、装置、设备与介质 | |
CN116597857A (zh) | 一种语音驱动图像的方法、系统、装置及存储介质 | |
CN115187704A (zh) | 虚拟主播生成方法、装置、设备及存储介质 | |
CN113782042B (zh) | 语音合成方法、声码器的训练方法、装置、设备及介质 | |
Sui et al. | A 3D audio-visual corpus for speech recognition | |
CN108538282A (zh) | 一种由唇部视频直接生成语音的方法 | |
CN117409121A (zh) | 基于音频和单幅图像驱动的细粒度情感控制说话人脸视频生成方法、系统、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 200090 No. 2103, Pingliang Road, Shanghai, Yangpu District Patentee after: Shanghai University of Electric Power Country or region after: China Address before: 200090 No. 2103, Pingliang Road, Shanghai, Yangpu District Patentee before: SHANGHAI University OF ELECTRIC POWER Country or region before: China |
|
CP03 | Change of name, title or address | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20180914 Assignee: Shanghai Hongce Electric Power Technology Co.,Ltd. Assignor: Shanghai University of Electric Power Contract record no.: X2024310000053 Denomination of invention: A Method for Directly Generating Speech from Lip Video Granted publication date: 20211008 License type: Common License Record date: 20240509 |
|
EE01 | Entry into force of recordation of patent licensing contract |