WO2007035183A3 - Method, system, and program product for measuring audio video synchronization independent of speaker characteristics - Google Patents

Method, system, and program product for measuring audio video synchronization independent of speaker characteristics Download PDF

Info

Publication number
WO2007035183A3
WO2007035183A3 PCT/US2005/041623 US2005041623W WO2007035183A3 WO 2007035183 A3 WO2007035183 A3 WO 2007035183A3 US 2005041623 W US2005041623 W US 2005041623W WO 2007035183 A3 WO2007035183 A3 WO 2007035183A3
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
information
program product
determined
Prior art date
Application number
PCT/US2005/041623
Other languages
French (fr)
Other versions
WO2007035183A2 (en
Inventor
J Carl Cooper
Mirko Dusan Vojnovic
Jibanananda Roy
Saurabh Jan
Christopher Smith
Original Assignee
Pixel Instr Corp
J Carl Cooper
Mirko Dusan Vojnovic
Jibanananda Roy
Saurabh Jan
Christopher Smith
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US2005/012588 external-priority patent/WO2005115014A2/en
Application filed by Pixel Instr Corp, J Carl Cooper, Mirko Dusan Vojnovic, Jibanananda Roy, Saurabh Jan, Christopher Smith filed Critical Pixel Instr Corp
Priority to AU2005330569A priority Critical patent/AU2005330569A1/en
Priority to GB0622589A priority patent/GB2438691A/en
Priority to CA002565758A priority patent/CA2565758A1/en
Priority to EP05851741A priority patent/EP1938622A2/en
Priority to GB0622592A priority patent/GB2440384B/en
Priority to AU2006235990A priority patent/AU2006235990A1/en
Priority to PCT/US2006/014023 priority patent/WO2006113409A2/en
Priority to CA002566844A priority patent/CA2566844A1/en
Priority to EP06750137A priority patent/EP1969858A2/en
Priority to US11/598,870 priority patent/US10397646B2/en
Publication of WO2007035183A2 publication Critical patent/WO2007035183A2/en
Publication of WO2007035183A3 publication Critical patent/WO2007035183A3/en
Priority to US14/460,305 priority patent/US9432555B2/en
Priority to US15/203,592 priority patent/US20160316108A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Abstract

Method, system, and program product for measuring audio video synchronization. This is done by first acquiring audio video information into an audio video synchronization system. The step of data acquisition is followed by analyzing the audio information, and analyzing the video information. Next, the audio information is analyzed to locate the presence of sounds therein related to a speaker’s personal voice characteristics. The audio information is then filtered by removing data related to a speakers personal voice characteristics to produce a filtered audio information. In this phase filtered audio information and video information is analyzed, decision boundaries for Audio and Video MuEv-s are determined, and related Audio and Video MuEv-s are correlated. In Analysis Phase Audio and Video MuEv-s are calculated from the audio and video information, and the audio and video information is classified into vowel sounds including AA, EE, OO, silence, and unclassified phones. This information is used to determine and associate a dominant audio class in a video frame. Matching locations are determined, and the offset of video and audio is determined.
PCT/US2005/041623 2003-05-16 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics WO2007035183A2 (en)

Priority Applications (12)

Application Number Priority Date Filing Date Title
AU2005330569A AU2005330569A1 (en) 2005-04-13 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
GB0622589A GB2438691A (en) 2005-04-13 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
CA002565758A CA2565758A1 (en) 2005-04-13 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
EP05851741A EP1938622A2 (en) 2005-04-13 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
EP06750137A EP1969858A2 (en) 2004-05-14 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
PCT/US2006/014023 WO2006113409A2 (en) 2005-04-13 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
AU2006235990A AU2006235990A1 (en) 2005-04-13 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
GB0622592A GB2440384B (en) 2005-04-13 2006-04-13 Method,system and program product for measuring audio video synchronization using lip and teeth characteristics
CA002566844A CA2566844A1 (en) 2005-04-13 2006-04-13 Method, system, and program product for measuring audio video synchronization using lip and teeth charateristics
US11/598,870 US10397646B2 (en) 2003-05-16 2006-11-13 Method, system, and program product for measuring audio video synchronization using lip and teeth characteristics
US14/460,305 US9432555B2 (en) 2003-05-16 2014-08-14 System and method for AV sync correction by remote sensing
US15/203,592 US20160316108A1 (en) 2003-05-16 2016-07-06 System and Method for AV Sync Correction by Remote Sensing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
USPCT/2005/0012588 2005-04-13
PCT/US2005/012588 WO2005115014A2 (en) 2004-05-14 2005-04-13 Method, system, and program product for measuring audio video synchronization

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/012588 Continuation-In-Part WO2005115014A2 (en) 2003-05-16 2005-04-13 Method, system, and program product for measuring audio video synchronization

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/598,870 Continuation-In-Part US10397646B2 (en) 2003-05-16 2006-11-13 Method, system, and program product for measuring audio video synchronization using lip and teeth characteristics

Publications (2)

Publication Number Publication Date
WO2007035183A2 WO2007035183A2 (en) 2007-03-29
WO2007035183A3 true WO2007035183A3 (en) 2007-06-21

Family

ID=37561747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/041623 WO2007035183A2 (en) 2003-05-16 2005-11-16 Method, system, and program product for measuring audio video synchronization independent of speaker characteristics

Country Status (6)

Country Link
EP (1) EP1938622A2 (en)
CN (2) CN101199207A (en)
AU (1) AU2005330569A1 (en)
CA (1) CA2565758A1 (en)
GB (1) GB2440384B (en)
WO (1) WO2007035183A2 (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2666160A4 (en) * 2011-01-17 2014-07-30 Nokia Corp An audio scene processing apparatus
US8705812B2 (en) * 2011-06-10 2014-04-22 Amazon Technologies, Inc. Enhanced face recognition in video
CN105100647A (en) * 2015-07-31 2015-11-25 深圳市金立通信设备有限公司 Subtitle correction method and terminal
CN105512348B (en) * 2016-01-28 2019-03-26 北京旷视科技有限公司 For handling the method and apparatus and search method and device of video and related audio
CN106067989B (en) * 2016-04-28 2022-05-17 江苏大学 Portrait voice video synchronous calibration device and method
US10997979B2 (en) * 2018-06-21 2021-05-04 Casio Computer Co., Ltd. Voice recognition device and voice recognition method
CN108924617B (en) * 2018-07-11 2020-09-18 北京大米科技有限公司 Method of synchronizing video data and audio data, storage medium, and electronic device
CN108924646B (en) * 2018-07-18 2021-02-09 北京奇艺世纪科技有限公司 Audio and video synchronization detection method and system
CN109087651B (en) * 2018-09-05 2021-01-19 广州势必可赢网络科技有限公司 Voiceprint identification method, system and equipment based on video and spectrogram
CN110691204B (en) * 2019-09-09 2021-04-02 苏州臻迪智能科技有限公司 Audio and video processing method and device, electronic equipment and storage medium
CN112653916B (en) * 2019-10-10 2023-08-29 腾讯科技(深圳)有限公司 Method and equipment for synchronously optimizing audio and video
CN113497914A (en) * 2020-03-20 2021-10-12 阿里巴巴集团控股有限公司 Information determination method and system, electronic equipment, autonomous mobile equipment and camera
CN111988654B (en) * 2020-08-31 2022-10-18 维沃移动通信有限公司 Video data alignment method and device and electronic equipment
CN112351273B (en) * 2020-11-04 2022-03-01 新华三大数据技术有限公司 Video playing quality detection method and device
CN113242361B (en) * 2021-07-13 2021-09-24 腾讯科技(深圳)有限公司 Video processing method and device and computer readable storage medium
CN114466178A (en) * 2021-09-09 2022-05-10 马上消费金融股份有限公司 Method and device for measuring synchronism of voice and image
CN114466179A (en) * 2021-09-09 2022-05-10 马上消费金融股份有限公司 Method and device for measuring synchronism of voice and image
WO2023035969A1 (en) * 2021-09-09 2023-03-16 马上消费金融股份有限公司 Speech and image synchronization measurement method and apparatus, and model training method and apparatus
CN114494930B (en) * 2021-09-09 2023-09-22 马上消费金融股份有限公司 Training method and device for voice and image synchronism measurement model
CN114089285B (en) * 2022-01-24 2022-05-31 安徽京淮健锐电子科技有限公司 Signal sorting method based on first-order Pulse Repetition Interval (PRI)
CN114550075A (en) * 2022-04-25 2022-05-27 北京华科海讯科技有限公司 Parallel signal processing method and system based on video image recognition
CN115965724B (en) * 2022-12-26 2023-08-08 华院计算技术(上海)股份有限公司 Image generation method and device, computer readable storage medium and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4975960A (en) * 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
US5387943A (en) * 1992-12-21 1995-02-07 Tektronix, Inc. Semiautomatic lip sync recovery system
US5572261A (en) * 1995-06-07 1996-11-05 Cooper; J. Carl Automatic audio to video timing measurement device and method
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
US6829018B2 (en) * 2001-09-17 2004-12-07 Koninklijke Philips Electronics N.V. Three-dimensional sound creation assisted by visual information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4313135B1 (en) * 1980-07-28 1996-01-02 J Carl Cooper Method and apparatus for preserving or restoring audio to video
JPS62239231A (en) * 1986-04-10 1987-10-20 Kiyarii Rabo:Kk Speech recognition method by inputting lip picture
US5920842A (en) * 1994-10-12 1999-07-06 Pixel Instruments Signal synchronization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4975960A (en) * 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
US5387943A (en) * 1992-12-21 1995-02-07 Tektronix, Inc. Semiautomatic lip sync recovery system
US5572261A (en) * 1995-06-07 1996-11-05 Cooper; J. Carl Automatic audio to video timing measurement device and method
US5880788A (en) * 1996-03-25 1999-03-09 Interval Research Corporation Automated synchronization of video image sequences to new soundtracks
US6829018B2 (en) * 2001-09-17 2004-12-07 Koninklijke Philips Electronics N.V. Three-dimensional sound creation assisted by visual information

Also Published As

Publication number Publication date
CN101199208A (en) 2008-06-11
WO2007035183A2 (en) 2007-03-29
AU2005330569A1 (en) 2006-12-07
CA2565758A1 (en) 2006-10-13
GB2440384B (en) 2010-01-13
EP1938622A2 (en) 2008-07-02
GB2440384A (en) 2008-01-30
CN101199207A (en) 2008-06-11
GB0622592D0 (en) 2006-12-27
AU2005330569A8 (en) 2008-08-07

Similar Documents

Publication Publication Date Title
WO2007035183A3 (en) Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
WO2005115014A3 (en) Method, system, and program product for measuring audio video synchronization
US8600743B2 (en) Noise profile determination for voice-related feature
EA201290082A1 (en) METHOD OF IDENTIFICATION OF PHONOGRAMMING OF ARBITRARY ORAL SPEECH BASED ON THE FORMANT ALIGNMENT
WO2009148960A3 (en) Systems, methods, apparatus, and computer program products for spectral contrast enhancement
TW200741650A (en) Method and apparatus for processing a audio signal
WO2010036061A3 (en) An apparatus for processing an audio signal and method thereof
CN102214464B (en) Transient state detecting method of audio signals and duration adjusting method based on same
ATE456847T1 (en) CLASSIFICATION OF AUDIO SIGNALS
MX2021014721A (en) Systems and methods for machine learning of voice attributes.
WO2010004056A3 (en) Method and system for speech enhancement in a room
ATE491202T1 (en) COMPENSATING BETWEEN-SESSION VARIABILITY TO AUTOMATICALLY EXTRACT INFORMATION FROM SPEECH
EP2529370B1 (en) Systems and methods for speech extraction
CN105976829B (en) Audio processing device and audio processing method
CN104078051B (en) A kind of voice extracting method, system and voice audio frequency playing method and device
KR101616112B1 (en) Speaker separation system and method using voice feature vectors
CN112133277B (en) Sample generation method and device
US9240190B2 (en) Formant based speech reconstruction from noisy signals
RU2005110662A (en) PROCESSING TELEPHONE NUMBERS IN SOUND FLOWS
GB2438691A (en) Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
US7886303B2 (en) Method for dynamically adjusting audio decoding process
Sztahó et al. Automatic classification of emotions in spontaneous speech
WO2007095413A3 (en) Method and apparatus for detecting affects in speech
WO2009142464A3 (en) Method and apparatus for processing audio signals
CN107493528A (en) A kind of sound processing method, device and microphone

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200580050133.9

Country of ref document: CN

ENP Entry into the national phase

Ref document number: 0622589

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20051116

WWE Wipo information: entry into national phase

Ref document number: 0622589.0

Country of ref document: GB

Ref document number: 11598870

Country of ref document: US

Ref document number: 2005330569

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2565758

Country of ref document: CA

WWP Wipo information: published in national office

Ref document number: 2005330569

Country of ref document: AU

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 11598870

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

WWE Wipo information: entry into national phase

Ref document number: 2005851741

Country of ref document: EP