CN107221324B - 语音处理方法及装置 - Google Patents
语音处理方法及装置 Download PDFInfo
- Publication number
- CN107221324B CN107221324B CN201710652375.XA CN201710652375A CN107221324B CN 107221324 B CN107221324 B CN 107221324B CN 201710652375 A CN201710652375 A CN 201710652375A CN 107221324 B CN107221324 B CN 107221324B
- Authority
- CN
- China
- Prior art keywords
- user
- audio signal
- state
- lip
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 230000005236 sound signal Effects 0.000 claims abstract description 180
- 230000009471 action Effects 0.000 claims abstract description 72
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000033001 locomotion Effects 0.000 claims abstract description 36
- 230000008569 process Effects 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims description 20
- 230000008859 change Effects 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 abstract description 8
- 230000004807 localization Effects 0.000 description 4
- 241001125929 Trisopterus luscus Species 0.000 description 2
- 206010048232 Yawning Diseases 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 206010028347 Muscle twitching Diseases 0.000 description 1
- 241001282135 Poromitra oscitans Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710652375.XA CN107221324B (zh) | 2017-08-02 | 2017-08-02 | 语音处理方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710652375.XA CN107221324B (zh) | 2017-08-02 | 2017-08-02 | 语音处理方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107221324A CN107221324A (zh) | 2017-09-29 |
CN107221324B true CN107221324B (zh) | 2021-03-16 |
Family
ID=59955006
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710652375.XA Active CN107221324B (zh) | 2017-08-02 | 2017-08-02 | 语音处理方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107221324B (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145853A (zh) * | 2018-08-31 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | 用于识别噪音的方法和装置 |
CN111868823A (zh) * | 2019-02-27 | 2020-10-30 | 华为技术有限公司 | 一种声源分离方法、装置及设备 |
CN110310668A (zh) * | 2019-05-21 | 2019-10-08 | 深圳壹账通智能科技有限公司 | 静音检测方法、系统、设备及计算机可读存储介质 |
CN111326175A (zh) * | 2020-02-18 | 2020-06-23 | 维沃移动通信有限公司 | 一种对话者的提示方法及穿戴设备 |
CN113362849A (zh) * | 2020-03-02 | 2021-09-07 | 阿里巴巴集团控股有限公司 | 一种语音数据处理方法以及装置 |
CN111933174A (zh) * | 2020-08-16 | 2020-11-13 | 云知声智能科技股份有限公司 | 语音处理方法、装置、设备和系统 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5680481A (en) * | 1992-05-26 | 1997-10-21 | Ricoh Corporation | Facial feature extraction method and apparatus for a neural network acoustic and visual speech recognition system |
JP2000338987A (ja) * | 1999-05-28 | 2000-12-08 | Mitsubishi Electric Corp | 発話開始監視装置、話者同定装置、音声入力システム、および話者同定システム、並びに通信システム |
JP2003255993A (ja) * | 2002-03-04 | 2003-09-10 | Ntt Docomo Inc | 音声認識システム、音声認識方法、音声認識プログラム、音声合成システム、音声合成方法、音声合成プログラム |
KR20100041061A (ko) * | 2008-10-13 | 2010-04-22 | 성균관대학교산학협력단 | 화자의 얼굴을 확대하는 영상 통화 방법 및 이를 위한 단말 |
TWI502583B (zh) * | 2013-04-11 | 2015-10-01 | Wistron Corp | 語音處理裝置和語音處理方法 |
CN105915798A (zh) * | 2016-06-02 | 2016-08-31 | 北京小米移动软件有限公司 | 视频会议中摄像头的控制方法和控制装置 |
-
2017
- 2017-08-02 CN CN201710652375.XA patent/CN107221324B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN107221324A (zh) | 2017-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107221324B (zh) | 语音处理方法及装置 | |
US9595259B2 (en) | Sound source-separating device and sound source-separating method | |
CN107799126B (zh) | 基于有监督机器学习的语音端点检测方法及装置 | |
Choudhury et al. | Multimodal person recognition using unconstrained audio and video | |
JP2021500616A (ja) | オブジェクト識別の方法及びその、コンピュータ装置並びにコンピュータ装置可読記憶媒体 | |
WO2015172630A1 (zh) | 摄像装置及其对焦方法 | |
Sahoo et al. | Emotion recognition from audio-visual data using rule based decision level fusion | |
Scanlon et al. | Feature analysis for automatic speechreading | |
JPWO2019044157A1 (ja) | 収音装置、収音方法、及びプログラム | |
JP4715738B2 (ja) | 発話検出装置及び発話検出方法 | |
JP2001092974A (ja) | 話者認識方法及びその実行装置並びに音声発生確認方法及び装置 | |
EP2721609A1 (en) | Identification of a local speaker | |
WO2012128382A1 (en) | Device and method for lip motion detection | |
US20160078883A1 (en) | Action analysis device, action analysis method, and action analysis program | |
US10964326B2 (en) | System and method for audio-visual speech recognition | |
CN108898042B (zh) | 一种应用于atm机舱内用户异常行为的检测方法 | |
CN110750152A (zh) | 一种基于唇部动作的人机交互方法和系统 | |
Foggia et al. | Cascade classifiers trained on gammatonegrams for reliably detecting audio events | |
May et al. | Environment-aware ideal binary mask estimation using monaural cues | |
CN114282621B (zh) | 一种多模态融合的话者角色区分方法与系统 | |
Hung et al. | Towards audio-visual on-line diarization of participants in group meetings | |
Rentzeperis et al. | The 2006 athens information technology speech activity detection and speaker diarization systems | |
Canton-Ferrer et al. | Audiovisual event detection towards scene understanding | |
Yoshinaga et al. | Audio-visual speech recognition using new lip features extracted from side-face images | |
Bratoszewski et al. | Comparison of acoustic and visual voice activity detection for noisy speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant after: Shanghai zhihuilin Medical Technology Co.,Ltd. Address before: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant before: Shanghai Zhihui Medical Technology Co.,Ltd. Address after: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant after: Shanghai Zhihui Medical Technology Co.,Ltd. Address before: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant before: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd. Address after: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant after: SHANGHAI MROBOT TECHNOLOGY Co.,Ltd. Address before: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Applicant before: SHANGHAI MUYE ROBOT TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 202150 room 205, zone W, second floor, building 3, No. 8, Xiushan Road, Chengqiao Town, Chongming District, Shanghai (Shanghai Chongming Industrial Park) Patentee after: Shanghai Noah Wood Robot Technology Co.,Ltd. Address before: 200336 402 rooms, No. 33, No. 33, Guang Shun Road, Shanghai Patentee before: Shanghai zhihuilin Medical Technology Co.,Ltd. |
|
CP03 | Change of name, title or address |