AU2001296459A1 - Audio visual speech processing - Google Patents

Audio visual speech processing

Info

Publication number
AU2001296459A1
AU2001296459A1 AU2001296459A AU9645901A AU2001296459A1 AU 2001296459 A1 AU2001296459 A1 AU 2001296459A1 AU 2001296459 A AU2001296459 A AU 2001296459A AU 9645901 A AU9645901 A AU 9645901A AU 2001296459 A1 AU2001296459 A1 AU 2001296459A1
Authority
AU
Australia
Prior art keywords
speech processing
audio visual
visual speech
audio
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2001296459A
Inventor
Gamze Erten
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clarity LLC
Original Assignee
Clarity LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clarity LLC filed Critical Clarity LLC
Publication of AU2001296459A1 publication Critical patent/AU2001296459A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
AU2001296459A 2000-10-02 2001-10-01 Audio visual speech processing Abandoned AU2001296459A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US23672000P 2000-10-02 2000-10-02
US60236720 2000-10-02
PCT/US2001/030727 WO2002029784A1 (en) 2000-10-02 2001-10-01 Audio visual speech processing

Publications (1)

Publication Number Publication Date
AU2001296459A1 true AU2001296459A1 (en) 2002-04-15

Family

ID=22890663

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2001296459A Abandoned AU2001296459A1 (en) 2000-10-02 2001-10-01 Audio visual speech processing

Country Status (3)

Country Link
US (1) US20020116197A1 (en)
AU (1) AU2001296459A1 (en)
WO (1) WO2002029784A1 (en)

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1320002B1 (en) * 2000-03-31 2003-11-12 Cselt Centro Studi Lab Telecom PROCEDURE FOR THE ANIMATION OF A SYNTHESIZED VOLTOHUMAN MODEL DRIVEN BY AN AUDIO SIGNAL.
SE0004187D0 (en) * 2000-11-15 2000-11-15 Coding Technologies Sweden Ab Enhancing the performance of coding systems that use high frequency reconstruction methods
US20030110038A1 (en) * 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US20030083872A1 (en) * 2001-10-25 2003-05-01 Dan Kikinis Method and apparatus for enhancing voice recognition capabilities of voice recognition software and systems
US7587318B2 (en) * 2002-09-12 2009-09-08 Broadcom Corporation Correlating video images of lip movements with audio signals to improve speech recognition
US7319955B2 (en) * 2002-11-29 2008-01-15 International Business Machines Corporation Audio-visual codebook dependent cepstral normalization
EP1443498B1 (en) * 2003-01-24 2008-03-19 Sony Ericsson Mobile Communications AB Noise reduction and audio-visual speech activity detection
DE60319796T2 (en) * 2003-01-24 2009-05-20 Sony Ericsson Mobile Communications Ab Noise reduction and audiovisual voice activity detection
US7251603B2 (en) * 2003-06-23 2007-07-31 International Business Machines Corporation Audio-only backoff in audio-visual speech recognition system
US7269560B2 (en) * 2003-06-27 2007-09-11 Microsoft Corporation Speech detection and enhancement using audio/video fusion
US20050131744A1 (en) * 2003-12-10 2005-06-16 International Business Machines Corporation Apparatus, system and method of automatically identifying participants at a videoconference who exhibit a particular expression
US20050131697A1 (en) * 2003-12-10 2005-06-16 International Business Machines Corporation Speech improving apparatus, system and method
EP1748387B1 (en) * 2004-05-21 2018-12-05 Asahi Kasei Kabushiki Kaisha Devices for classifying the arousal state of the eyes of a driver, corresponding method and computer readable storage medium
US20060009978A1 (en) * 2004-07-02 2006-01-12 The Regents Of The University Of Colorado Methods and systems for synthesis of accurate visible speech via transformation of motion capture data
WO2006006108A2 (en) * 2004-07-08 2006-01-19 Philips Intellectual Property & Standards Gmbh A method and a system for communication between a user and a system
US9779750B2 (en) * 2004-07-30 2017-10-03 Invention Science Fund I, Llc Cue-aware privacy filter for participants in persistent communications
US20150163342A1 (en) * 2004-07-30 2015-06-11 Searete Llc Context-aware filter for participants in persistent communication
US9704502B2 (en) * 2004-07-30 2017-07-11 Invention Science Fund I, Llc Cue-aware privacy filter for participants in persistent communications
JP4708913B2 (en) * 2005-08-12 2011-06-22 キヤノン株式会社 Information processing method and information processing apparatus
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
US7860718B2 (en) * 2005-12-08 2010-12-28 Electronics And Telecommunications Research Institute Apparatus and method for speech segment detection and system for speech recognition
WO2007071025A1 (en) * 2005-12-21 2007-06-28 Jimmy Proximity Inc. Device and method for capturing vocal sound and mouth region images
US20080004879A1 (en) * 2006-06-29 2008-01-03 Wen-Chen Huang Method for assessing learner's pronunciation through voice and image
US20100079573A1 (en) * 2008-09-26 2010-04-01 Maycel Isaac System and method for video telephony by converting facial motion to text
US9002713B2 (en) * 2009-06-09 2015-04-07 At&T Intellectual Property I, L.P. System and method for speech personalization by need
US8676581B2 (en) * 2010-01-22 2014-03-18 Microsoft Corporation Speech recognition analysis via identification information
JP2011186351A (en) * 2010-03-11 2011-09-22 Sony Corp Information processor, information processing method, and program
US8751228B2 (en) * 2010-11-04 2014-06-10 Microsoft Corporation Minimum converted trajectory error (MCTE) audio-to-video engine
US20120201472A1 (en) * 2011-02-08 2012-08-09 Autonomy Corporation Ltd System for the tagging and augmentation of geographically-specific locations using a visual data stream
KR20130022607A (en) * 2011-08-25 2013-03-07 삼성전자주식회사 Voice recognition apparatus and method for recognizing voice
US9263044B1 (en) * 2012-06-27 2016-02-16 Amazon Technologies, Inc. Noise reduction based on mouth area movement recognition
US20140365221A1 (en) * 2012-07-31 2014-12-11 Novospeech Ltd. Method and apparatus for speech recognition
US10019983B2 (en) * 2012-08-30 2018-07-10 Aravind Ganapathiraju Method and system for predicting speech recognition performance using accuracy scores
US9020825B1 (en) * 2012-09-25 2015-04-28 Rawles Llc Voice gestures
US20140122086A1 (en) * 2012-10-26 2014-05-01 Microsoft Corporation Augmenting speech recognition with depth imaging
US9190058B2 (en) 2013-01-25 2015-11-17 Microsoft Technology Licensing, Llc Using visual cues to disambiguate speech inputs
KR101442211B1 (en) * 2013-02-07 2014-10-16 서강대학교산학협력단 Speech recognition system and method using 3D geometric information
WO2014207752A1 (en) * 2013-06-27 2014-12-31 Hewlett-Packard Development Company, L.P. Authenticating user by correlating speech and corresponding lip shape
CN103617801B (en) * 2013-12-18 2017-09-29 联想(北京)有限公司 Speech detection method, device and electronic equipment
US9922667B2 (en) * 2014-04-17 2018-03-20 Microsoft Technology Licensing, Llc Conversation, presence and context detection for hologram suppression
US10529359B2 (en) 2014-04-17 2020-01-07 Microsoft Technology Licensing, Llc Conversation detection
US9870500B2 (en) 2014-06-11 2018-01-16 At&T Intellectual Property I, L.P. Sensor enhanced speech recognition
CN104409075B (en) * 2014-11-28 2018-09-04 深圳创维-Rgb电子有限公司 Audio recognition method and system
US9747068B2 (en) 2014-12-22 2017-08-29 Nokia Technologies Oy Audio processing based upon camera selection
US9521365B2 (en) 2015-04-02 2016-12-13 At&T Intellectual Property I, L.P. Image-based techniques for audio content
US20170092277A1 (en) * 2015-09-30 2017-03-30 Seagate Technology Llc Search and Access System for Media Content Files
US9940932B2 (en) * 2016-03-02 2018-04-10 Wipro Limited System and method for speech-to-text conversion
US10056083B2 (en) 2016-10-18 2018-08-21 Yen4Ken, Inc. Method and system for processing multimedia content to dynamically generate text transcript
EP3698358A1 (en) * 2017-10-18 2020-08-26 Soapbox Labs Ltd. Methods and systems for processing audio signals containing speech data
JP7081164B2 (en) * 2018-01-17 2022-06-07 株式会社Jvcケンウッド Display control device, communication device, display control method and communication method
WO2019161229A1 (en) 2018-02-15 2019-08-22 DMAI, Inc. System and method for reconstructing unoccupied 3d space
WO2019161198A1 (en) * 2018-02-15 2019-08-22 DMAI, Inc. System and method for speech understanding via integrated audio and visual based speech recognition
WO2019161207A1 (en) 2018-02-15 2019-08-22 DMAI, Inc. System and method for conversational agent via adaptive caching of dialogue tree
US10878824B2 (en) * 2018-02-21 2020-12-29 Valyant Al, Inc. Speech-to-text generation using video-speech matching from a primary speaker
US10332543B1 (en) * 2018-03-12 2019-06-25 Cypress Semiconductor Corporation Systems and methods for capturing noise for pattern recognition processing
US10679626B2 (en) * 2018-07-24 2020-06-09 Pegah AARABI Generating interactive audio-visual representations of individuals
CN113228710A (en) * 2018-12-21 2021-08-06 大北欧听力公司 Sound source separation in hearing devices and related methods
CN114175147A (en) * 2019-08-02 2022-03-11 日本电气株式会社 Voice processing apparatus, voice processing method, and recording medium
US11244696B2 (en) 2019-11-06 2022-02-08 Microsoft Technology Licensing, Llc Audio-visual speech enhancement
US20220148050A1 (en) * 2020-11-11 2022-05-12 Cdk Global, Llc Systems and methods for using machine learning for vehicle damage detection and repair cost estimation
CN112634940A (en) * 2020-12-11 2021-04-09 平安科技(深圳)有限公司 Voice endpoint detection method, device, equipment and computer readable storage medium
US11803535B2 (en) 2021-05-24 2023-10-31 Cdk Global, Llc Systems, methods, and apparatuses for simultaneously running parallel databases

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4975960A (en) * 1985-06-03 1990-12-04 Petajan Eric D Electronic facial tracking and detection system and method and apparatus for automated speech recognition
US4757541A (en) * 1985-11-05 1988-07-12 Research Triangle Institute Audio visual speech recognition
JPS62239231A (en) * 1986-04-10 1987-10-20 Kiyarii Rabo:Kk Speech recognition method by inputting lip picture
JPH01158579A (en) * 1987-09-09 1989-06-21 Aisin Seiki Co Ltd Image recognizing device
US5046097A (en) * 1988-09-02 1991-09-03 Qsound Ltd. Sound imaging process
US5440661A (en) * 1990-01-31 1995-08-08 The United States Of America As Represented By The United States Department Of Energy Time series association learning
EP0554437B1 (en) * 1991-07-31 2000-04-05 VYSIS, Inc. Nucleic acid probes for the detection of shigella
US5313522A (en) * 1991-08-23 1994-05-17 Slager Robert P Apparatus for generating from an audio signal a moving visual lip image from which a speech content of the signal can be comprehended by a lipreader
US5621858A (en) * 1992-05-26 1997-04-15 Ricoh Corporation Neural network acoustic and visual speech recognition system training method and apparatus
US5586215A (en) * 1992-05-26 1996-12-17 Ricoh Corporation Neural network acoustic and visual speech recognition system
US5502774A (en) * 1992-06-09 1996-03-26 International Business Machines Corporation Automatic recognition of a consistent message using multiple complimentary sources of information
IT1257073B (en) * 1992-08-11 1996-01-05 Ist Trentino Di Cultura RECOGNITION SYSTEM, ESPECIALLY FOR THE RECOGNITION OF PEOPLE.
US5473759A (en) * 1993-02-22 1995-12-05 Apple Computer, Inc. Sound analysis and resynthesis using correlograms
US5473726A (en) * 1993-07-06 1995-12-05 The United States Of America As Represented By The Secretary Of The Air Force Audio and amplitude modulated photo data collection for speech recognition
US6471420B1 (en) * 1994-05-13 2002-10-29 Matsushita Electric Industrial Co., Ltd. Voice selection apparatus voice response apparatus, and game apparatus using word tables from which selected words are output as voice selections
US5805036A (en) * 1995-05-15 1998-09-08 Illinois Superconductor Magnetically activated switch using a high temperature superconductor component
US6028960A (en) * 1996-09-20 2000-02-22 Lucent Technologies Inc. Face feature analysis for automatic lipreading and character animation
US5995936A (en) * 1997-02-04 1999-11-30 Brais; Louis Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations
DE19740119A1 (en) * 1997-09-12 1999-03-18 Philips Patentverwaltung System for cutting digital video and audio information
US6185529B1 (en) * 1998-09-14 2001-02-06 International Business Machines Corporation Speech recognition aided by lateral profile image
US20020145610A1 (en) * 1999-07-16 2002-10-10 Steve Barilovits Video processing engine overlay filter scaler
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US6581081B1 (en) * 2000-01-24 2003-06-17 3Com Corporation Adaptive size filter for efficient computation of wavelet packet trees

Also Published As

Publication number Publication date
WO2002029784A1 (en) 2002-04-11
US20020116197A1 (en) 2002-08-22

Similar Documents

Publication Publication Date Title
AU2001296459A1 (en) Audio visual speech processing
AU2001243389A1 (en) Voice call processing methods
AU2001294989A1 (en) Speech detection
AU2001250789A1 (en) Audio/visual server
AU2001282568A1 (en) Speech processing device and speech processing method
AU2001261630A1 (en) Audio closure
AU2001276588A1 (en) Adaptive-block-length audio coder
GB0027178D0 (en) Speech processing system
GB2375028B (en) Processing speech signals
AU2001275319A1 (en) Load-adjusted speech recognition
AU2001294835A1 (en) Video processing
GB0028277D0 (en) Speech processing system
AU2002211794A1 (en) Audio on location
AU2001246558A1 (en) Microphone structure
AU2002243386A1 (en) Rf2a and rf2b transcription factors
AU2001273441A1 (en) Audio headset
AUPQ942400A0 (en) Cinema audio processing system
AU2001243642A1 (en) Transcription factors
AU2001256490A1 (en) Digital audio processing
AU2003242903A1 (en) Audio processing
AU2001290122A1 (en) Audio apparatus
AU2001230253A1 (en) Processing method
AU2002256556A1 (en) Sea-trosy and related methods
AU2458201A (en) Spf1-related transcription factors
AU2001238041A1 (en) Separate account processing