CN114072875A - 一种语音信号处理方法及其相关设备 - Google Patents
一种语音信号处理方法及其相关设备 Download PDFInfo
- Publication number
- CN114072875A CN114072875A CN202080026583.9A CN202080026583A CN114072875A CN 114072875 A CN114072875 A CN 114072875A CN 202080026583 A CN202080026583 A CN 202080026583A CN 114072875 A CN114072875 A CN 114072875A
- Authority
- CN
- China
- Prior art keywords
- signal
- user
- voice
- sensor
- vibration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 105
- 238000004519 manufacturing process Methods 0.000 claims abstract description 38
- 230000005236 sound signal Effects 0.000 claims description 230
- 210000004556 brain Anatomy 0.000 claims description 142
- 230000015654 memory Effects 0.000 claims description 67
- 230000033001 locomotion Effects 0.000 claims description 57
- 238000003062 neural network model Methods 0.000 claims description 55
- 238000012545 processing Methods 0.000 claims description 50
- 238000001914 filtration Methods 0.000 claims description 49
- 230000000306 recurrent effect Effects 0.000 claims description 35
- 238000003860 storage Methods 0.000 claims description 27
- 230000006399 behavior Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 8
- 210000000867 larynx Anatomy 0.000 claims description 5
- 230000007613 environmental effect Effects 0.000 abstract description 28
- 230000000694 effects Effects 0.000 abstract description 10
- 230000005540 biological transmission Effects 0.000 abstract description 8
- 230000000875 corresponding effect Effects 0.000 description 118
- 230000006870 function Effects 0.000 description 107
- 230000003993 interaction Effects 0.000 description 83
- 239000010410 layer Substances 0.000 description 83
- 238000013528 artificial neural network Methods 0.000 description 49
- 238000012549 training Methods 0.000 description 49
- 239000013598 vector Substances 0.000 description 38
- 230000008569 process Effects 0.000 description 35
- 238000010586 diagram Methods 0.000 description 22
- 239000011159 matrix material Substances 0.000 description 21
- 230000000007 visual effect Effects 0.000 description 20
- 238000004891 communication Methods 0.000 description 17
- 230000004913 activation Effects 0.000 description 13
- 230000001537 neural effect Effects 0.000 description 13
- 230000001755 vocal effect Effects 0.000 description 13
- 210000002569 neuron Anatomy 0.000 description 11
- 238000005070 sampling Methods 0.000 description 11
- 238000001228 spectrum Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000013507 mapping Methods 0.000 description 8
- 238000013500 data storage Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 4
- 238000012886 linear function Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 210000003128 head Anatomy 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000012806 monitoring device Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000011022 operating instruction Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000036544 posture Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 210000003625 skull Anatomy 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 description 1
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000007664 blowing Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 210000000088 lip Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000004373 mandible Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
一种语音信号处理方法及其相关设备,该方法可应用于音频领域,包括:获取传感器采集的用户语音信号;获取所述用户发出所述语音时对应的振动信号;其中所述振动信号用于表示所述用户的身体部位的振动特征;所述身体部位为当所述用户处于发声状态下,基于发声行为进行相应振动的部位;根据所述振动信号和所述传感器采集的用户语音信号,获得目标语音信息。本申请将振动信号作为语音识别的依据,由于振动信号没有包含复杂的声学传输时混入的外界非用户的语音,受其他环境噪声的影响很小(例如混响影响),因此可以相对较好的抑制住这部分噪声干扰,可以实现更好的语音识别效果。
Description
PCT国内申请,说明书已公开。
Claims (37)
- PCT国内申请,权利要求书已公开。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/093523 WO2021237740A1 (zh) | 2020-05-29 | 2020-05-29 | 一种语音信号处理方法及其相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114072875A true CN114072875A (zh) | 2022-02-18 |
Family
ID=78745413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202080026583.9A Pending CN114072875A (zh) | 2020-05-29 | 2020-05-29 | 一种语音信号处理方法及其相关设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230098678A1 (zh) |
EP (1) | EP4141867A4 (zh) |
CN (1) | CN114072875A (zh) |
WO (1) | WO2021237740A1 (zh) |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010217453A (ja) * | 2009-03-16 | 2010-09-30 | Fujitsu Ltd | 音声認識用マイクロホンシステム |
CN101947152B (zh) * | 2010-09-11 | 2012-09-05 | 山东科技大学 | 仿人形义肢的脑电-语音控制系统及工作方法 |
CN103871419B (zh) * | 2012-12-11 | 2017-05-24 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
US10635800B2 (en) * | 2016-06-07 | 2020-04-28 | Vocalzoom Systems Ltd. | System, device, and method of voice-based user authentication utilizing a challenge |
US10573323B2 (en) * | 2017-12-26 | 2020-02-25 | Intel Corporation | Speaker recognition based on vibration signals |
CN110248281A (zh) * | 2018-03-07 | 2019-09-17 | 四川语文通科技有限责任公司 | 在有干扰的环境中独立出自己发声的方法之声带振动匹配 |
EP3582514B1 (en) * | 2018-06-14 | 2023-01-11 | Oticon A/s | Sound processing apparatus |
EP3618457A1 (en) * | 2018-09-02 | 2020-03-04 | Oticon A/s | A hearing device configured to utilize non-audio information to process audio signals |
CN110931031A (zh) * | 2019-10-09 | 2020-03-27 | 大象声科(深圳)科技有限公司 | 一种融合骨振动传感器和麦克风信号的深度学习语音提取和降噪方法 |
-
2020
- 2020-05-29 EP EP20938148.2A patent/EP4141867A4/en active Pending
- 2020-05-29 CN CN202080026583.9A patent/CN114072875A/zh active Pending
- 2020-05-29 WO PCT/CN2020/093523 patent/WO2021237740A1/zh unknown
-
2022
- 2022-11-28 US US17/994,968 patent/US20230098678A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
EP4141867A1 (en) | 2023-03-01 |
EP4141867A4 (en) | 2023-06-14 |
US20230098678A1 (en) | 2023-03-30 |
WO2021237740A1 (zh) | 2021-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210304735A1 (en) | Keyword detection method and related apparatus | |
US20220165288A1 (en) | Audio signal processing method and apparatus, electronic device, and storage medium | |
US20220172737A1 (en) | Speech signal processing method and speech separation method | |
WO2021249053A1 (zh) | 图像处理的方法及相关装置 | |
CN111063342B (zh) | 语音识别方法、装置、计算机设备及存储介质 | |
US20190147875A1 (en) | Continuous topic detection and adaption in audio environments | |
WO2022156654A1 (zh) | 一种文本数据处理方法及装置 | |
CN111696570B (zh) | 语音信号处理方法、装置、设备及存储介质 | |
CN113763532B (zh) | 基于三维虚拟对象的人机交互方法、装置、设备及介质 | |
CN113539290B (zh) | 语音降噪方法和装置 | |
WO2022033556A1 (zh) | 电子设备及其语音识别方法和介质 | |
CN111863020B (zh) | 语音信号处理方法、装置、设备及存储介质 | |
CN113705665B (zh) | 图像变换网络模型的训练方法和电子设备 | |
CN114242037A (zh) | 一种虚拟人物生成方法及其装置 | |
WO2021203880A1 (zh) | 一种语音增强方法、训练神经网络的方法以及相关设备 | |
CN113750523A (zh) | 三维虚拟对象的动作生成方法、装置、设备及存储介质 | |
CN116861850A (zh) | 一种数据处理方法及其装置 | |
CN113646838B (zh) | 在视频聊天过程中提供情绪修改的方法和系统 | |
CN113611318A (zh) | 一种音频数据增强方法及相关设备 | |
US20230334907A1 (en) | Emotion Detection | |
CN115620728B (zh) | 音频处理方法、装置、存储介质及智能眼镜 | |
EP4141867A1 (en) | Voice signal processing method and related device therefor | |
WO2022143314A1 (zh) | 一种对象注册方法及装置 | |
CN112750449A (zh) | 回声消除方法、装置、终端、服务器及存储介质 | |
WO2022253053A1 (zh) | 一种播放视频的方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |