GB2440384A - Method,system and program product for measuring audio video synchronization using lip and teeth characteristics - Google Patents
Method,system and program product for measuring audio video synchronization using lip and teeth characteristicsInfo
- Publication number
- GB2440384A GB2440384A GB0622592A GB0622592A GB2440384A GB 2440384 A GB2440384 A GB 2440384A GB 0622592 A GB0622592 A GB 0622592A GB 0622592 A GB0622592 A GB 0622592A GB 2440384 A GB2440384 A GB 2440384A
- Authority
- GB
- United Kingdom
- Prior art keywords
- audio
- video
- information
- program product
- determined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Method, system, and program product for measuring audio video synchronization. This is done by first acquiring audio video information into an audio video synchronization system. The step of data acquisition is followed by analyzing the audio information, and analyzing the video information. Next, the audio information is analyzed to locate the presence of sounds therein related to a speaker's personal voice characteristics. In Analysis Phase Audio and Video MuEv-S are calculated from the audio and video information, and the audio and video information is classified into vowel sounds including AA, EE, OO, B, V, TH, F, silence, other sounds, and unclassified phonemes. The inner space between the lips are also identified and determined. This information is used to determine and associate a dominant audio class in a video frame. Matching locations are determined, and the offset of video and audio is determined.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2005/012588 WO2005115014A2 (en) | 2004-05-14 | 2005-04-13 | Method, system, and program product for measuring audio video synchronization |
PCT/US2005/041623 WO2007035183A2 (en) | 2005-04-13 | 2005-11-16 | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics |
PCT/US2006/014023 WO2006113409A2 (en) | 2005-04-13 | 2006-04-13 | Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics |
Publications (3)
Publication Number | Publication Date |
---|---|
GB0622592D0 GB0622592D0 (en) | 2006-12-27 |
GB2440384A true GB2440384A (en) | 2008-01-30 |
GB2440384B GB2440384B (en) | 2010-01-13 |
Family
ID=37561747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0622592A Expired - Fee Related GB2440384B (en) | 2005-04-13 | 2006-04-13 | Method,system and program product for measuring audio video synchronization using lip and teeth characteristics |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1938622A2 (en) |
CN (2) | CN101199207A (en) |
AU (1) | AU2005330569A1 (en) |
CA (1) | CA2565758A1 (en) |
GB (1) | GB2440384B (en) |
WO (1) | WO2007035183A2 (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2666160A4 (en) * | 2011-01-17 | 2014-07-30 | Nokia Corp | An audio scene processing apparatus |
US8705812B2 (en) * | 2011-06-10 | 2014-04-22 | Amazon Technologies, Inc. | Enhanced face recognition in video |
CN105100647A (en) * | 2015-07-31 | 2015-11-25 | 深圳市金立通信设备有限公司 | Subtitle correction method and terminal |
CN105512348B (en) * | 2016-01-28 | 2019-03-26 | 北京旷视科技有限公司 | For handling the method and apparatus and search method and device of video and related audio |
CN106067989B (en) * | 2016-04-28 | 2022-05-17 | 江苏大学 | Portrait voice video synchronous calibration device and method |
US10997979B2 (en) * | 2018-06-21 | 2021-05-04 | Casio Computer Co., Ltd. | Voice recognition device and voice recognition method |
CN108924617B (en) * | 2018-07-11 | 2020-09-18 | 北京大米科技有限公司 | Method of synchronizing video data and audio data, storage medium, and electronic device |
CN108924646B (en) * | 2018-07-18 | 2021-02-09 | 北京奇艺世纪科技有限公司 | Audio and video synchronization detection method and system |
CN109087651B (en) * | 2018-09-05 | 2021-01-19 | 广州势必可赢网络科技有限公司 | Voiceprint identification method, system and equipment based on video and spectrogram |
CN110691204B (en) * | 2019-09-09 | 2021-04-02 | 苏州臻迪智能科技有限公司 | Audio and video processing method and device, electronic equipment and storage medium |
CN112653916B (en) * | 2019-10-10 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Method and equipment for synchronously optimizing audio and video |
CN113497914B (en) * | 2020-03-20 | 2024-08-30 | 浙江深象智能科技有限公司 | Information determination method and system, electronic device, autonomous mobile device and camera |
CN111988654B (en) * | 2020-08-31 | 2022-10-18 | 维沃移动通信有限公司 | Video data alignment method and device and electronic equipment |
CN112351273B (en) * | 2020-11-04 | 2022-03-01 | 新华三大数据技术有限公司 | Video playing quality detection method and device |
CN113242361B (en) * | 2021-07-13 | 2021-09-24 | 腾讯科技(深圳)有限公司 | Video processing method and device and computer readable storage medium |
CN114466178A (en) * | 2021-09-09 | 2022-05-10 | 马上消费金融股份有限公司 | Method and device for measuring synchronism of voice and image |
CN114466179B (en) * | 2021-09-09 | 2024-09-06 | 马上消费金融股份有限公司 | Method and device for measuring synchronism of voice and image |
EP4344199A4 (en) * | 2021-09-09 | 2024-10-09 | Mashang Consumer Finance Co Ltd | Speech and image synchronization measurement method and apparatus, and model training method and apparatus |
CN114494930B (en) * | 2021-09-09 | 2023-09-22 | 马上消费金融股份有限公司 | Training method and device for voice and image synchronism measurement model |
CN114089285B (en) * | 2022-01-24 | 2022-05-31 | 安徽京淮健锐电子科技有限公司 | Signal sorting method based on first-order Pulse Repetition Interval (PRI) |
CN114550075A (en) * | 2022-04-25 | 2022-05-27 | 北京华科海讯科技有限公司 | Parallel signal processing method and system based on video image recognition |
CN115965724B (en) * | 2022-12-26 | 2023-08-08 | 华院计算技术(上海)股份有限公司 | Image generation method and device, computer readable storage medium and terminal |
CN116230003B (en) * | 2023-03-09 | 2024-04-26 | 北京安捷智合科技有限公司 | Audio and video synchronization method and system based on artificial intelligence |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4313135B1 (en) * | 1980-07-28 | 1996-01-02 | J Carl Cooper | Method and apparatus for preserving or restoring audio to video |
US4975960A (en) * | 1985-06-03 | 1990-12-04 | Petajan Eric D | Electronic facial tracking and detection system and method and apparatus for automated speech recognition |
JPS62239231A (en) * | 1986-04-10 | 1987-10-20 | Kiyarii Rabo:Kk | Speech recognition method by inputting lip picture |
US5387943A (en) * | 1992-12-21 | 1995-02-07 | Tektronix, Inc. | Semiautomatic lip sync recovery system |
US5920842A (en) * | 1994-10-12 | 1999-07-06 | Pixel Instruments | Signal synchronization |
US5572261A (en) * | 1995-06-07 | 1996-11-05 | Cooper; J. Carl | Automatic audio to video timing measurement device and method |
US5880788A (en) * | 1996-03-25 | 1999-03-09 | Interval Research Corporation | Automated synchronization of video image sequences to new soundtracks |
US6829018B2 (en) * | 2001-09-17 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Three-dimensional sound creation assisted by visual information |
-
2005
- 2005-11-16 WO PCT/US2005/041623 patent/WO2007035183A2/en active Application Filing
- 2005-11-16 EP EP05851741A patent/EP1938622A2/en not_active Withdrawn
- 2005-11-16 CN CNA2005800501339A patent/CN101199207A/en active Pending
- 2005-11-16 CA CA002565758A patent/CA2565758A1/en not_active Abandoned
- 2005-11-16 AU AU2005330569A patent/AU2005330569A1/en not_active Abandoned
-
2006
- 2006-04-13 GB GB0622592A patent/GB2440384B/en not_active Expired - Fee Related
- 2006-04-13 CN CNA2006800211843A patent/CN101199208A/en active Pending
Non-Patent Citations (1)
Title |
---|
Not yet advised * |
Also Published As
Publication number | Publication date |
---|---|
WO2007035183A2 (en) | 2007-03-29 |
CN101199208A (en) | 2008-06-11 |
GB0622592D0 (en) | 2006-12-27 |
AU2005330569A8 (en) | 2008-08-07 |
CA2565758A1 (en) | 2006-10-13 |
WO2007035183A3 (en) | 2007-06-21 |
GB2440384B (en) | 2010-01-13 |
AU2005330569A1 (en) | 2006-12-07 |
EP1938622A2 (en) | 2008-07-02 |
CN101199207A (en) | 2008-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
GB2440384A (en) | Method,system and program product for measuring audio video synchronization using lip and teeth characteristics | |
GB2429889A (en) | Method, system, and program product for measuring audio video synchronization | |
EP1922720B1 (en) | System and method for synchronizing sound and manually transcribed text | |
MX2021014721A (en) | Systems and methods for machine learning of voice attributes. | |
ATE362632T1 (en) | MESSAGE TRANSMISSION DEVICE | |
EP1657721A3 (en) | Music content reproduction apparatus, method thereof and recording apparatus | |
AU2003225928A1 (en) | Method for robust voice recognition by analyzing redundant features of source signal | |
WO2010024426A1 (en) | Sound recording device | |
EP2267697A3 (en) | Information processing system, method of processing information, and program for processing information | |
TW200741650A (en) | Method and apparatus for processing a audio signal | |
JP2016535305A (en) | A device for improving language processing in autism | |
WO2006082868A3 (en) | Method and system for identifying speech sound and non-speech sound in an environment | |
DE602006019099D1 (en) | LANGUAGE ANALYSIS SYSTEM | |
US9015044B2 (en) | Formant based speech reconstruction from noisy signals | |
JPH04158397A (en) | Voice quality converting system | |
WO2006113409A3 (en) | Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics | |
CN109545196A (en) | Audio recognition method, device and computer readable storage medium | |
Clemins et al. | Application of speech recognition to African elephant (Loxodonta Africana) vocalizations | |
CN107871492A (en) | Music synthesis method and system | |
ATE554479T1 (en) | APPARATUS AND METHOD FOR TRANSMITTING OR REPLAYING A MULTI-CHANNEL AUDIO SIGNAL | |
DE602005012998D1 (en) | METHOD FOR ESTIMATING A LANGUAGE IMPLEMENTATION FUNCTION | |
WO2009142464A3 (en) | Method and apparatus for processing audio signals | |
ATE374990T1 (en) | METHOD FOR SYNTHESIZING LANGUAGE | |
Goldman et al. | C-PhonoGenre: a 7-hours corpus of 7 speaking styles in French: relations between situational features and prosodic properties. | |
Shen et al. | The ability to glimpse dynamic pitch in noise by younger and older listeners |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20100413 |