GB2438691A - Method, system, and program product for measuring audio video synchronization independent of speaker characteristics - Google Patents
Method, system, and program product for measuring audio video synchronization independent of speaker characteristicsInfo
- Publication number
- GB2438691A GB2438691A GB0622589A GB0622589A GB2438691A GB 2438691 A GB2438691 A GB 2438691A GB 0622589 A GB0622589 A GB 0622589A GB 0622589 A GB0622589 A GB 0622589A GB 2438691 A GB2438691 A GB 2438691A
- Authority
- GB
- United Kingdom
- Prior art keywords
- audio
- video
- information
- program product
- determined
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000002596 correlated effect Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/44—Receiver circuitry for the reception of television signals according to analogue transmission standards
- H04N5/60—Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals
- H04N5/602—Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals for digital sound signals
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Image Analysis (AREA)
- Television Signal Processing For Recording (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Method, system, and program product for measuring audio video synchronization. This is done by first acquiring audio video information into an audio video synchronization system. The step of data acquisition is followed by analyzing the audio information, and analyzing the video information. Next, the audio information is analyzed to locate the presence of sounds therein related to a speaker's personal voice characteristics. The audio information is then filtered by removing data related to a speakers personal voice characteristics to produce a filtered audio information. In this phase filtered audio information and video information is analyzed, decision boundaries for Audio and Video MuEv-s are determined, and related Audio and Video MuEv-s are correlated. In Analysis Phase Audio and Video MuEv-s are calculated from the audio and video information, and the audio and video information is classified into vowel sounds including AA, EE, OO, silence, and unclassified phones. This information is used to determine and associate a dominant audio class in a video frame. Matching locations are determined, and the offset of video and audio is determined.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2005/012588 WO2005115014A2 (en) | 2004-05-14 | 2005-04-13 | Method, system, and program product for measuring audio video synchronization |
PCT/US2005/041623 WO2007035183A2 (en) | 2005-04-13 | 2005-11-16 | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
GB0622589D0 GB0622589D0 (en) | 2007-02-21 |
GB2438691A true GB2438691A (en) | 2007-12-05 |
Family
ID=37115719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB0622589A Withdrawn GB2438691A (en) | 2005-04-13 | 2005-11-16 | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics |
Country Status (3)
Country | Link |
---|---|
CA (1) | CA2566844A1 (en) |
GB (1) | GB2438691A (en) |
WO (1) | WO2006113409A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111081270A (en) * | 2019-12-19 | 2020-04-28 | 大连即时智能科技有限公司 | Real-time audio-driven virtual character mouth shape synchronous control method |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102007039603A1 (en) * | 2007-08-22 | 2009-02-26 | Siemens Ag | Method for synchronizing media data streams |
FR3014675A1 (en) * | 2013-12-12 | 2015-06-19 | Oreal | METHOD FOR EVALUATING AT LEAST ONE CLINICAL FACE SIGN |
CN110750152B (en) * | 2019-09-11 | 2023-08-29 | 云知声智能科技股份有限公司 | Man-machine interaction method and system based on lip actions |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4975960A (en) * | 1985-06-03 | 1990-12-04 | Petajan Eric D | Electronic facial tracking and detection system and method and apparatus for automated speech recognition |
US5387943A (en) * | 1992-12-21 | 1995-02-07 | Tektronix, Inc. | Semiautomatic lip sync recovery system |
US5572261A (en) * | 1995-06-07 | 1996-11-05 | Cooper; J. Carl | Automatic audio to video timing measurement device and method |
US5880788A (en) * | 1996-03-25 | 1999-03-09 | Interval Research Corporation | Automated synchronization of video image sequences to new soundtracks |
US6829018B2 (en) * | 2001-09-17 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Three-dimensional sound creation assisted by visual information |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4313135B1 (en) * | 1980-07-28 | 1996-01-02 | J Carl Cooper | Method and apparatus for preserving or restoring audio to video |
JPS62239231A (en) * | 1986-04-10 | 1987-10-20 | Kiyarii Rabo:Kk | Speech recognition method by inputting lip picture |
US5920842A (en) * | 1994-10-12 | 1999-07-06 | Pixel Instruments | Signal synchronization |
-
2005
- 2005-11-16 GB GB0622589A patent/GB2438691A/en not_active Withdrawn
-
2006
- 2006-04-13 WO PCT/US2006/014023 patent/WO2006113409A2/en active Application Filing
- 2006-04-13 CA CA002566844A patent/CA2566844A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4975960A (en) * | 1985-06-03 | 1990-12-04 | Petajan Eric D | Electronic facial tracking and detection system and method and apparatus for automated speech recognition |
US5387943A (en) * | 1992-12-21 | 1995-02-07 | Tektronix, Inc. | Semiautomatic lip sync recovery system |
US5572261A (en) * | 1995-06-07 | 1996-11-05 | Cooper; J. Carl | Automatic audio to video timing measurement device and method |
US5880788A (en) * | 1996-03-25 | 1999-03-09 | Interval Research Corporation | Automated synchronization of video image sequences to new soundtracks |
US6829018B2 (en) * | 2001-09-17 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Three-dimensional sound creation assisted by visual information |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111081270A (en) * | 2019-12-19 | 2020-04-28 | 大连即时智能科技有限公司 | Real-time audio-driven virtual character mouth shape synchronous control method |
CN111081270B (en) * | 2019-12-19 | 2021-06-01 | 大连即时智能科技有限公司 | Real-time audio-driven virtual character mouth shape synchronous control method |
Also Published As
Publication number | Publication date |
---|---|
CA2566844A1 (en) | 2006-10-26 |
WO2006113409A2 (en) | 2006-10-26 |
WO2006113409A3 (en) | 2007-06-07 |
GB0622589D0 (en) | 2007-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2007035183A3 (en) | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics | |
GB2429889A (en) | Method, system, and program product for measuring audio video synchronization | |
MX2021014721A (en) | Systems and methods for machine learning of voice attributes. | |
EA201290082A1 (en) | METHOD OF IDENTIFICATION OF PHONOGRAMMING OF ARBITRARY ORAL SPEECH BASED ON THE FORMANT ALIGNMENT | |
CN102214464B (en) | Transient state detecting method of audio signals and duration adjusting method based on same | |
ATE491202T1 (en) | COMPENSATING BETWEEN-SESSION VARIABILITY TO AUTOMATICALLY EXTRACT INFORMATION FROM SPEECH | |
JP2016535305A (en) | A device for improving language processing in autism | |
ATE456847T1 (en) | CLASSIFICATION OF AUDIO SIGNALS | |
CN112133277B (en) | Sample generation method and device | |
DE60221408D1 (en) | PICTURE AND SOUND PROCESSING METHOD USING VOICE RECOGNITION | |
EP2169670A3 (en) | An apparatus for processing an audio signal and method thereof | |
Koldovsky et al. | Time-domain blind audio source separation using advanced component clustering and reconstruction | |
WO2007018802A3 (en) | Method and system for operation of a voice activity detector | |
US9240190B2 (en) | Formant based speech reconstruction from noisy signals | |
GB2438691A (en) | Method, system, and program product for measuring audio video synchronization independent of speaker characteristics | |
WO2015131634A1 (en) | Audio noise reduction method and terminal | |
SG151123A1 (en) | A decision analysis system | |
CN106559729A (en) | MIC automatic recognition of speech test system and method | |
CN115022767A (en) | Earphone wind noise reduction method and device, earphone and computer readable storage medium | |
CN104869233B (en) | A kind of way of recording | |
CN107493528A (en) | A kind of sound processing method, device and microphone | |
WO2009142464A3 (en) | Method and apparatus for processing audio signals | |
WO2003058419A3 (en) | Virtual assistant, which outputs audible information to a user of a data terminal by means of at least two electroacoustic converters, and method for presenting audible information of a virtual assistant | |
JP5427622B2 (en) | Voice changing device, voice changing method, program, and recording medium | |
CN103035237B (en) | Chinese speech signal processing method, device and hearing aid device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |