CN101199207A - 用于独立于说话者特征测量音频视频同步的方法、系统和程序产品 - Google Patents
用于独立于说话者特征测量音频视频同步的方法、系统和程序产品 Download PDFInfo
- Publication number
- CN101199207A CN101199207A CNA2005800501339A CN200580050133A CN101199207A CN 101199207 A CN101199207 A CN 101199207A CN A2005800501339 A CNA2005800501339 A CN A2005800501339A CN 200580050133 A CN200580050133 A CN 200580050133A CN 101199207 A CN101199207 A CN 101199207A
- Authority
- CN
- China
- Prior art keywords
- video
- audio
- voice
- information
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 238000004458 analytical method Methods 0.000 claims abstract description 27
- 238000001228 spectrum Methods 0.000 claims description 22
- 230000001815 facial effect Effects 0.000 claims description 20
- 230000005236 sound signal Effects 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 7
- 239000012634 fragment Substances 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 6
- 210000004704 glottis Anatomy 0.000 claims description 5
- 230000008676 import Effects 0.000 claims 3
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 230000002596 correlated effect Effects 0.000 abstract 1
- 230000033001 locomotion Effects 0.000 description 26
- 230000001360 synchronised effect Effects 0.000 description 16
- 238000012545 processing Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 7
- 238000001514 detection method Methods 0.000 description 7
- 210000000214 mouth Anatomy 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000008878 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000004087 circulation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000012937 correction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 210000001260 vocal cord Anatomy 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000033764 rhythmic process Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 101001077478 Homo sapiens RNA guanine-N7 methyltransferase activating subunit Proteins 0.000 description 1
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 102100025054 RNA guanine-N7 methyltransferase activating subunit Human genes 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000010349 pulsation Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
Claims (45)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
USPCT/US2005/0012588 | 2005-04-13 | ||
PCT/US2005/012588 WO2005115014A2 (en) | 2004-05-14 | 2005-04-13 | Method, system, and program product for measuring audio video synchronization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101199207A true CN101199207A (zh) | 2008-06-11 |
Family
ID=37561747
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2005800501339A Pending CN101199207A (zh) | 2005-04-13 | 2005-11-16 | 用于独立于说话者特征测量音频视频同步的方法、系统和程序产品 |
CNA2006800211843A Pending CN101199208A (zh) | 2005-04-13 | 2006-04-13 | 使用嘴唇和牙齿特征来测量音频视频同步的方法、系统和程序产品 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2006800211843A Pending CN101199208A (zh) | 2005-04-13 | 2006-04-13 | 使用嘴唇和牙齿特征来测量音频视频同步的方法、系统和程序产品 |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1938622A2 (zh) |
CN (2) | CN101199207A (zh) |
AU (1) | AU2005330569A1 (zh) |
CA (1) | CA2565758A1 (zh) |
GB (1) | GB2440384B (zh) |
WO (1) | WO2007035183A2 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108924646A (zh) * | 2018-07-18 | 2018-11-30 | 北京奇艺世纪科技有限公司 | 一种音视频同步检测方法及系统 |
CN109446947A (zh) * | 2011-06-10 | 2019-03-08 | 亚马逊技术公司 | 视频中的增强的脸部识别 |
CN111988654A (zh) * | 2020-08-31 | 2020-11-24 | 维沃移动通信有限公司 | 视频数据对齐方法、装置和电子设备 |
CN114466179A (zh) * | 2021-09-09 | 2022-05-10 | 马上消费金融股份有限公司 | 语音与图像同步性的衡量方法及装置 |
CN115965724A (zh) * | 2022-12-26 | 2023-04-14 | 华院计算技术(上海)股份有限公司 | 图像生成方法及装置、计算机可读存储介质、终端 |
CN116230003A (zh) * | 2023-03-09 | 2023-06-06 | 湖北雅派文化传播有限公司 | 一种基于人工智能的音视频同步方法及系统 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012098425A1 (en) * | 2011-01-17 | 2012-07-26 | Nokia Corporation | An audio scene processing apparatus |
CN105100647A (zh) * | 2015-07-31 | 2015-11-25 | 深圳市金立通信设备有限公司 | 一种校正字幕的方法及终端 |
CN105512348B (zh) * | 2016-01-28 | 2019-03-26 | 北京旷视科技有限公司 | 用于处理视频和相关音频的方法和装置及检索方法和装置 |
CN106067989B (zh) * | 2016-04-28 | 2022-05-17 | 江苏大学 | 一种人像语音视频同步校准装置及方法 |
US10997979B2 (en) * | 2018-06-21 | 2021-05-04 | Casio Computer Co., Ltd. | Voice recognition device and voice recognition method |
CN108924617B (zh) * | 2018-07-11 | 2020-09-18 | 北京大米科技有限公司 | 同步视频数据和音频数据的方法、存储介质和电子设备 |
CN109087651B (zh) * | 2018-09-05 | 2021-01-19 | 广州势必可赢网络科技有限公司 | 一种基于视频与语谱图的声纹鉴定方法、系统及设备 |
CN110691204B (zh) * | 2019-09-09 | 2021-04-02 | 苏州臻迪智能科技有限公司 | 一种音视频处理方法、装置、电子设备及存储介质 |
CN112653916B (zh) * | 2019-10-10 | 2023-08-29 | 腾讯科技(深圳)有限公司 | 一种音视频同步优化的方法及设备 |
CN113497914B (zh) * | 2020-03-20 | 2024-08-30 | 浙江深象智能科技有限公司 | 信息确定方法及系统、电子设备、自主移动设备、摄像头 |
CN112351273B (zh) * | 2020-11-04 | 2022-03-01 | 新华三大数据技术有限公司 | 一种视频播放质量检测方法及装置 |
CN113242361B (zh) * | 2021-07-13 | 2021-09-24 | 腾讯科技(深圳)有限公司 | 一种视频处理方法、装置以及计算机可读存储介质 |
CN114466178A (zh) * | 2021-09-09 | 2022-05-10 | 马上消费金融股份有限公司 | 语音与图像同步性的衡量方法及装置 |
EP4344199A4 (en) * | 2021-09-09 | 2024-10-09 | Mashang Consumer Finance Co Ltd | SPEECH AND IMAGE SYNCHRONIZATION MEASURING METHOD AND APPARATUS, AND MODEL TRAINING METHOD AND APPARATUS |
CN114494930B (zh) * | 2021-09-09 | 2023-09-22 | 马上消费金融股份有限公司 | 语音与图像同步性衡量模型的训练方法及装置 |
CN114089285B (zh) * | 2022-01-24 | 2022-05-31 | 安徽京淮健锐电子科技有限公司 | 一种基于一阶脉冲重复间隔pri的信号分选方法 |
CN114550075A (zh) * | 2022-04-25 | 2022-05-27 | 北京华科海讯科技有限公司 | 基于视频图像识别的并行信号处理方法及系统 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4313135B1 (en) * | 1980-07-28 | 1996-01-02 | J Carl Cooper | Method and apparatus for preserving or restoring audio to video |
US4975960A (en) * | 1985-06-03 | 1990-12-04 | Petajan Eric D | Electronic facial tracking and detection system and method and apparatus for automated speech recognition |
JPS62239231A (ja) * | 1986-04-10 | 1987-10-20 | Kiyarii Rabo:Kk | 口唇画像入力による音声認識方法 |
US5387943A (en) * | 1992-12-21 | 1995-02-07 | Tektronix, Inc. | Semiautomatic lip sync recovery system |
US5920842A (en) * | 1994-10-12 | 1999-07-06 | Pixel Instruments | Signal synchronization |
US5572261A (en) * | 1995-06-07 | 1996-11-05 | Cooper; J. Carl | Automatic audio to video timing measurement device and method |
US5880788A (en) * | 1996-03-25 | 1999-03-09 | Interval Research Corporation | Automated synchronization of video image sequences to new soundtracks |
US6829018B2 (en) * | 2001-09-17 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Three-dimensional sound creation assisted by visual information |
-
2005
- 2005-11-16 CA CA002565758A patent/CA2565758A1/en not_active Abandoned
- 2005-11-16 EP EP05851741A patent/EP1938622A2/en not_active Withdrawn
- 2005-11-16 AU AU2005330569A patent/AU2005330569A1/en not_active Abandoned
- 2005-11-16 WO PCT/US2005/041623 patent/WO2007035183A2/en active Application Filing
- 2005-11-16 CN CNA2005800501339A patent/CN101199207A/zh active Pending
-
2006
- 2006-04-13 GB GB0622592A patent/GB2440384B/en not_active Expired - Fee Related
- 2006-04-13 CN CNA2006800211843A patent/CN101199208A/zh active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446947A (zh) * | 2011-06-10 | 2019-03-08 | 亚马逊技术公司 | 视频中的增强的脸部识别 |
CN109446947B (zh) * | 2011-06-10 | 2020-07-17 | 亚马逊技术公司 | 视频中的增强的脸部识别 |
CN108924646A (zh) * | 2018-07-18 | 2018-11-30 | 北京奇艺世纪科技有限公司 | 一种音视频同步检测方法及系统 |
CN111988654A (zh) * | 2020-08-31 | 2020-11-24 | 维沃移动通信有限公司 | 视频数据对齐方法、装置和电子设备 |
CN114466179A (zh) * | 2021-09-09 | 2022-05-10 | 马上消费金融股份有限公司 | 语音与图像同步性的衡量方法及装置 |
CN114466179B (zh) * | 2021-09-09 | 2024-09-06 | 马上消费金融股份有限公司 | 语音与图像同步性的衡量方法及装置 |
CN115965724A (zh) * | 2022-12-26 | 2023-04-14 | 华院计算技术(上海)股份有限公司 | 图像生成方法及装置、计算机可读存储介质、终端 |
CN115965724B (zh) * | 2022-12-26 | 2023-08-08 | 华院计算技术(上海)股份有限公司 | 图像生成方法及装置、计算机可读存储介质、终端 |
CN116230003A (zh) * | 2023-03-09 | 2023-06-06 | 湖北雅派文化传播有限公司 | 一种基于人工智能的音视频同步方法及系统 |
CN116230003B (zh) * | 2023-03-09 | 2024-04-26 | 北京安捷智合科技有限公司 | 一种基于人工智能的音视频同步方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
AU2005330569A1 (en) | 2006-12-07 |
GB2440384B (en) | 2010-01-13 |
WO2007035183A2 (en) | 2007-03-29 |
CA2565758A1 (en) | 2006-10-13 |
EP1938622A2 (en) | 2008-07-02 |
WO2007035183A3 (en) | 2007-06-21 |
GB0622592D0 (en) | 2006-12-27 |
GB2440384A (en) | 2008-01-30 |
CN101199208A (zh) | 2008-06-11 |
AU2005330569A8 (en) | 2008-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101199207A (zh) | 用于独立于说话者特征测量音频视频同步的方法、系统和程序产品 | |
US10397646B2 (en) | Method, system, and program product for measuring audio video synchronization using lip and teeth characteristics | |
US20080111887A1 (en) | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics | |
CN105512348B (zh) | 用于处理视频和相关音频的方法和装置及检索方法和装置 | |
CN112037788B (zh) | 一种语音纠正融合方法 | |
CN110807585A (zh) | 一种学员课堂学习状态在线评估方法及系统 | |
CN108648527B (zh) | 一种英语发音匹配纠正方法 | |
JP2010256391A (ja) | 音声情報処理装置 | |
CN110991238A (zh) | 一种基于语音情感分析和微表情识别的演讲辅助系统 | |
CN109326160A (zh) | 一种英语口语发音校对系统 | |
CN112786052A (zh) | 语音识别方法、电子设备和存储装置 | |
US20070153125A1 (en) | Method, system, and program product for measuring audio video synchronization | |
CN108470476B (zh) | 一种英语发音匹配纠正系统 | |
Hassan et al. | Autonomous framework for person identification by analyzing vocal sounds and speech patterns | |
CN109545196B (zh) | 语音识别方法、装置及计算机可读存储介质 | |
CN112584238A (zh) | 影视资源匹配方法、装置及智能电视 | |
Eyben et al. | Audiovisual vocal outburst classification in noisy acoustic conditions | |
CN107123420A (zh) | 一种语音识别系统及其交互方法 | |
Borgstrom et al. | A low-complexity parabolic lip contour model with speaker normalization for high-level feature extraction in noise-robust audiovisual speech recognition | |
WO2006113409A2 (en) | Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics | |
Wojdeł | Automatic lipreading in the Dutch language | |
Shen et al. | Automatic lip-synchronized video-self-modeling intervention for voice disorders | |
AU2006235990A8 (en) | Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics | |
Nellore et al. | Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech. | |
BE | AUDIO AND VISUAL FEATURE ANALYSIS FOR SPEECH RECOGNITION |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CI01 | Publication of corrected invention patent application |
Correction item: Priority Correct: PCT/US2005/012588 False: PCT/US2005/0012588 Number: 24 Page: 1122 Volume: 24 |
|
CI02 | Correction of invention patent application |
Correction item: Priority Correct: PCT/US2005/012588 False: PCT/US2005/0012588 Number: 24 Page: The title page Volume: 24 |
|
COR | Change of bibliographic data |
Free format text: CORRECT: PRIORITY; FROM: PCT/US2005/0012588 TO: PCT/US2005/012588 |
|
ERR | Gazette correction |
Free format text: CORRECT: PRIORITY; FROM: PCT/US2005/0012588 TO: PCT/US2005/012588 |
|
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20080611 |