AU2001296459A1 - Audio visual speech processing - Google Patents
Audio visual speech processingInfo
- Publication number
- AU2001296459A1 AU2001296459A1 AU2001296459A AU9645901A AU2001296459A1 AU 2001296459 A1 AU2001296459 A1 AU 2001296459A1 AU 2001296459 A AU2001296459 A AU 2001296459A AU 9645901 A AU9645901 A AU 9645901A AU 2001296459 A1 AU2001296459 A1 AU 2001296459A1
- Authority
- AU
- Australia
- Prior art keywords
- speech processing
- audio visual
- visual speech
- audio
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US23672000P | 2000-10-02 | 2000-10-02 | |
US60236720 | 2000-10-02 | ||
PCT/US2001/030727 WO2002029784A1 (en) | 2000-10-02 | 2001-10-01 | Audio visual speech processing |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2001296459A1 true AU2001296459A1 (en) | 2002-04-15 |
Family
ID=22890663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2001296459A Abandoned AU2001296459A1 (en) | 2000-10-02 | 2001-10-01 | Audio visual speech processing |
Country Status (3)
Country | Link |
---|---|
US (1) | US20020116197A1 (en) |
AU (1) | AU2001296459A1 (en) |
WO (1) | WO2002029784A1 (en) |
Families Citing this family (62)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT1320002B1 (en) * | 2000-03-31 | 2003-11-12 | Cselt Centro Studi Lab Telecom | PROCEDURE FOR THE ANIMATION OF A SYNTHESIZED VOLTOHUMAN MODEL DRIVEN BY AN AUDIO SIGNAL. |
SE0004187D0 (en) * | 2000-11-15 | 2000-11-15 | Coding Technologies Sweden Ab | Enhancing the performance of coding systems that use high frequency reconstruction methods |
US20030110038A1 (en) * | 2001-10-16 | 2003-06-12 | Rajeev Sharma | Multi-modal gender classification using support vector machines (SVMs) |
US20030083872A1 (en) * | 2001-10-25 | 2003-05-01 | Dan Kikinis | Method and apparatus for enhancing voice recognition capabilities of voice recognition software and systems |
US7587318B2 (en) * | 2002-09-12 | 2009-09-08 | Broadcom Corporation | Correlating video images of lip movements with audio signals to improve speech recognition |
US7319955B2 (en) * | 2002-11-29 | 2008-01-15 | International Business Machines Corporation | Audio-visual codebook dependent cepstral normalization |
EP1443498B1 (en) * | 2003-01-24 | 2008-03-19 | Sony Ericsson Mobile Communications AB | Noise reduction and audio-visual speech activity detection |
DE60319796T2 (en) * | 2003-01-24 | 2009-05-20 | Sony Ericsson Mobile Communications Ab | Noise reduction and audiovisual voice activity detection |
US7251603B2 (en) * | 2003-06-23 | 2007-07-31 | International Business Machines Corporation | Audio-only backoff in audio-visual speech recognition system |
US7269560B2 (en) * | 2003-06-27 | 2007-09-11 | Microsoft Corporation | Speech detection and enhancement using audio/video fusion |
US20050131744A1 (en) * | 2003-12-10 | 2005-06-16 | International Business Machines Corporation | Apparatus, system and method of automatically identifying participants at a videoconference who exhibit a particular expression |
US20050131697A1 (en) * | 2003-12-10 | 2005-06-16 | International Business Machines Corporation | Speech improving apparatus, system and method |
EP1748387B1 (en) * | 2004-05-21 | 2018-12-05 | Asahi Kasei Kabushiki Kaisha | Devices for classifying the arousal state of the eyes of a driver, corresponding method and computer readable storage medium |
US20060009978A1 (en) * | 2004-07-02 | 2006-01-12 | The Regents Of The University Of Colorado | Methods and systems for synthesis of accurate visible speech via transformation of motion capture data |
WO2006006108A2 (en) * | 2004-07-08 | 2006-01-19 | Philips Intellectual Property & Standards Gmbh | A method and a system for communication between a user and a system |
US9779750B2 (en) * | 2004-07-30 | 2017-10-03 | Invention Science Fund I, Llc | Cue-aware privacy filter for participants in persistent communications |
US20150163342A1 (en) * | 2004-07-30 | 2015-06-11 | Searete Llc | Context-aware filter for participants in persistent communication |
US9704502B2 (en) * | 2004-07-30 | 2017-07-11 | Invention Science Fund I, Llc | Cue-aware privacy filter for participants in persistent communications |
JP4708913B2 (en) * | 2005-08-12 | 2011-06-22 | キヤノン株式会社 | Information processing method and information processing apparatus |
US7697827B2 (en) | 2005-10-17 | 2010-04-13 | Konicek Jeffrey C | User-friendlier interfaces for a camera |
US7860718B2 (en) * | 2005-12-08 | 2010-12-28 | Electronics And Telecommunications Research Institute | Apparatus and method for speech segment detection and system for speech recognition |
WO2007071025A1 (en) * | 2005-12-21 | 2007-06-28 | Jimmy Proximity Inc. | Device and method for capturing vocal sound and mouth region images |
US20080004879A1 (en) * | 2006-06-29 | 2008-01-03 | Wen-Chen Huang | Method for assessing learner's pronunciation through voice and image |
US20100079573A1 (en) * | 2008-09-26 | 2010-04-01 | Maycel Isaac | System and method for video telephony by converting facial motion to text |
US9002713B2 (en) * | 2009-06-09 | 2015-04-07 | At&T Intellectual Property I, L.P. | System and method for speech personalization by need |
US8676581B2 (en) * | 2010-01-22 | 2014-03-18 | Microsoft Corporation | Speech recognition analysis via identification information |
JP2011186351A (en) * | 2010-03-11 | 2011-09-22 | Sony Corp | Information processor, information processing method, and program |
US8751228B2 (en) * | 2010-11-04 | 2014-06-10 | Microsoft Corporation | Minimum converted trajectory error (MCTE) audio-to-video engine |
US20120201472A1 (en) * | 2011-02-08 | 2012-08-09 | Autonomy Corporation Ltd | System for the tagging and augmentation of geographically-specific locations using a visual data stream |
KR20130022607A (en) * | 2011-08-25 | 2013-03-07 | 삼성전자주식회사 | Voice recognition apparatus and method for recognizing voice |
US9263044B1 (en) * | 2012-06-27 | 2016-02-16 | Amazon Technologies, Inc. | Noise reduction based on mouth area movement recognition |
US20140365221A1 (en) * | 2012-07-31 | 2014-12-11 | Novospeech Ltd. | Method and apparatus for speech recognition |
US10019983B2 (en) * | 2012-08-30 | 2018-07-10 | Aravind Ganapathiraju | Method and system for predicting speech recognition performance using accuracy scores |
US9020825B1 (en) * | 2012-09-25 | 2015-04-28 | Rawles Llc | Voice gestures |
US20140122086A1 (en) * | 2012-10-26 | 2014-05-01 | Microsoft Corporation | Augmenting speech recognition with depth imaging |
US9190058B2 (en) | 2013-01-25 | 2015-11-17 | Microsoft Technology Licensing, Llc | Using visual cues to disambiguate speech inputs |
KR101442211B1 (en) * | 2013-02-07 | 2014-10-16 | 서강대학교산학협력단 | Speech recognition system and method using 3D geometric information |
WO2014207752A1 (en) * | 2013-06-27 | 2014-12-31 | Hewlett-Packard Development Company, L.P. | Authenticating user by correlating speech and corresponding lip shape |
CN103617801B (en) * | 2013-12-18 | 2017-09-29 | 联想(北京)有限公司 | Speech detection method, device and electronic equipment |
US9922667B2 (en) * | 2014-04-17 | 2018-03-20 | Microsoft Technology Licensing, Llc | Conversation, presence and context detection for hologram suppression |
US10529359B2 (en) | 2014-04-17 | 2020-01-07 | Microsoft Technology Licensing, Llc | Conversation detection |
US9870500B2 (en) | 2014-06-11 | 2018-01-16 | At&T Intellectual Property I, L.P. | Sensor enhanced speech recognition |
CN104409075B (en) * | 2014-11-28 | 2018-09-04 | 深圳创维-Rgb电子有限公司 | Audio recognition method and system |
US9747068B2 (en) | 2014-12-22 | 2017-08-29 | Nokia Technologies Oy | Audio processing based upon camera selection |
US9521365B2 (en) | 2015-04-02 | 2016-12-13 | At&T Intellectual Property I, L.P. | Image-based techniques for audio content |
US20170092277A1 (en) * | 2015-09-30 | 2017-03-30 | Seagate Technology Llc | Search and Access System for Media Content Files |
US9940932B2 (en) * | 2016-03-02 | 2018-04-10 | Wipro Limited | System and method for speech-to-text conversion |
US10056083B2 (en) | 2016-10-18 | 2018-08-21 | Yen4Ken, Inc. | Method and system for processing multimedia content to dynamically generate text transcript |
EP3698358A1 (en) * | 2017-10-18 | 2020-08-26 | Soapbox Labs Ltd. | Methods and systems for processing audio signals containing speech data |
JP7081164B2 (en) * | 2018-01-17 | 2022-06-07 | 株式会社Jvcケンウッド | Display control device, communication device, display control method and communication method |
WO2019161229A1 (en) | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for reconstructing unoccupied 3d space |
WO2019161198A1 (en) * | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for speech understanding via integrated audio and visual based speech recognition |
WO2019161207A1 (en) | 2018-02-15 | 2019-08-22 | DMAI, Inc. | System and method for conversational agent via adaptive caching of dialogue tree |
US10878824B2 (en) * | 2018-02-21 | 2020-12-29 | Valyant Al, Inc. | Speech-to-text generation using video-speech matching from a primary speaker |
US10332543B1 (en) * | 2018-03-12 | 2019-06-25 | Cypress Semiconductor Corporation | Systems and methods for capturing noise for pattern recognition processing |
US10679626B2 (en) * | 2018-07-24 | 2020-06-09 | Pegah AARABI | Generating interactive audio-visual representations of individuals |
CN113228710A (en) * | 2018-12-21 | 2021-08-06 | 大北欧听力公司 | Sound source separation in hearing devices and related methods |
CN114175147A (en) * | 2019-08-02 | 2022-03-11 | 日本电气株式会社 | Voice processing apparatus, voice processing method, and recording medium |
US11244696B2 (en) | 2019-11-06 | 2022-02-08 | Microsoft Technology Licensing, Llc | Audio-visual speech enhancement |
US20220148050A1 (en) * | 2020-11-11 | 2022-05-12 | Cdk Global, Llc | Systems and methods for using machine learning for vehicle damage detection and repair cost estimation |
CN112634940A (en) * | 2020-12-11 | 2021-04-09 | 平安科技(深圳)有限公司 | Voice endpoint detection method, device, equipment and computer readable storage medium |
US11803535B2 (en) | 2021-05-24 | 2023-10-31 | Cdk Global, Llc | Systems, methods, and apparatuses for simultaneously running parallel databases |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4975960A (en) * | 1985-06-03 | 1990-12-04 | Petajan Eric D | Electronic facial tracking and detection system and method and apparatus for automated speech recognition |
US4757541A (en) * | 1985-11-05 | 1988-07-12 | Research Triangle Institute | Audio visual speech recognition |
JPS62239231A (en) * | 1986-04-10 | 1987-10-20 | Kiyarii Rabo:Kk | Speech recognition method by inputting lip picture |
JPH01158579A (en) * | 1987-09-09 | 1989-06-21 | Aisin Seiki Co Ltd | Image recognizing device |
US5046097A (en) * | 1988-09-02 | 1991-09-03 | Qsound Ltd. | Sound imaging process |
US5440661A (en) * | 1990-01-31 | 1995-08-08 | The United States Of America As Represented By The United States Department Of Energy | Time series association learning |
EP0554437B1 (en) * | 1991-07-31 | 2000-04-05 | VYSIS, Inc. | Nucleic acid probes for the detection of shigella |
US5313522A (en) * | 1991-08-23 | 1994-05-17 | Slager Robert P | Apparatus for generating from an audio signal a moving visual lip image from which a speech content of the signal can be comprehended by a lipreader |
US5621858A (en) * | 1992-05-26 | 1997-04-15 | Ricoh Corporation | Neural network acoustic and visual speech recognition system training method and apparatus |
US5586215A (en) * | 1992-05-26 | 1996-12-17 | Ricoh Corporation | Neural network acoustic and visual speech recognition system |
US5502774A (en) * | 1992-06-09 | 1996-03-26 | International Business Machines Corporation | Automatic recognition of a consistent message using multiple complimentary sources of information |
IT1257073B (en) * | 1992-08-11 | 1996-01-05 | Ist Trentino Di Cultura | RECOGNITION SYSTEM, ESPECIALLY FOR THE RECOGNITION OF PEOPLE. |
US5473759A (en) * | 1993-02-22 | 1995-12-05 | Apple Computer, Inc. | Sound analysis and resynthesis using correlograms |
US5473726A (en) * | 1993-07-06 | 1995-12-05 | The United States Of America As Represented By The Secretary Of The Air Force | Audio and amplitude modulated photo data collection for speech recognition |
US6471420B1 (en) * | 1994-05-13 | 2002-10-29 | Matsushita Electric Industrial Co., Ltd. | Voice selection apparatus voice response apparatus, and game apparatus using word tables from which selected words are output as voice selections |
US5805036A (en) * | 1995-05-15 | 1998-09-08 | Illinois Superconductor | Magnetically activated switch using a high temperature superconductor component |
US6028960A (en) * | 1996-09-20 | 2000-02-22 | Lucent Technologies Inc. | Face feature analysis for automatic lipreading and character animation |
US5995936A (en) * | 1997-02-04 | 1999-11-30 | Brais; Louis | Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations |
DE19740119A1 (en) * | 1997-09-12 | 1999-03-18 | Philips Patentverwaltung | System for cutting digital video and audio information |
US6185529B1 (en) * | 1998-09-14 | 2001-02-06 | International Business Machines Corporation | Speech recognition aided by lateral profile image |
US20020145610A1 (en) * | 1999-07-16 | 2002-10-10 | Steve Barilovits | Video processing engine overlay filter scaler |
US6219640B1 (en) * | 1999-08-06 | 2001-04-17 | International Business Machines Corporation | Methods and apparatus for audio-visual speaker recognition and utterance verification |
US6581081B1 (en) * | 2000-01-24 | 2003-06-17 | 3Com Corporation | Adaptive size filter for efficient computation of wavelet packet trees |
-
2001
- 2001-10-01 US US09/969,406 patent/US20020116197A1/en not_active Abandoned
- 2001-10-01 AU AU2001296459A patent/AU2001296459A1/en not_active Abandoned
- 2001-10-01 WO PCT/US2001/030727 patent/WO2002029784A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2002029784A1 (en) | 2002-04-11 |
US20020116197A1 (en) | 2002-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2001296459A1 (en) | Audio visual speech processing | |
AU2001243389A1 (en) | Voice call processing methods | |
AU2001294989A1 (en) | Speech detection | |
AU2001250789A1 (en) | Audio/visual server | |
AU2001282568A1 (en) | Speech processing device and speech processing method | |
AU2001261630A1 (en) | Audio closure | |
AU2001276588A1 (en) | Adaptive-block-length audio coder | |
GB0027178D0 (en) | Speech processing system | |
GB2375028B (en) | Processing speech signals | |
AU2001275319A1 (en) | Load-adjusted speech recognition | |
AU2001294835A1 (en) | Video processing | |
GB0028277D0 (en) | Speech processing system | |
AU2002211794A1 (en) | Audio on location | |
AU2001246558A1 (en) | Microphone structure | |
AU2002243386A1 (en) | Rf2a and rf2b transcription factors | |
AU2001273441A1 (en) | Audio headset | |
AUPQ942400A0 (en) | Cinema audio processing system | |
AU2001243642A1 (en) | Transcription factors | |
AU2001256490A1 (en) | Digital audio processing | |
AU2003242903A1 (en) | Audio processing | |
AU2001290122A1 (en) | Audio apparatus | |
AU2001230253A1 (en) | Processing method | |
AU2002256556A1 (en) | Sea-trosy and related methods | |
AU2458201A (en) | Spf1-related transcription factors | |
AU2001238041A1 (en) | Separate account processing |