CN113257269A - Beam forming method based on deep learning and storage device - Google Patents
Beam forming method based on deep learning and storage device Download PDFInfo
- Publication number
- CN113257269A CN113257269A CN202110431846.0A CN202110431846A CN113257269A CN 113257269 A CN113257269 A CN 113257269A CN 202110431846 A CN202110431846 A CN 202110431846A CN 113257269 A CN113257269 A CN 113257269A
- Authority
- CN
- China
- Prior art keywords
- voice
- deep learning
- noise
- energy detection
- steps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004364 calculation method Methods 0.000 claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000005516 engineering process Methods 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000001755 vocal effect Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 8
- 101150087667 spk1 gene Proteins 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 241001122767 Theaceae Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The present invention relates to the field of beam processing technologies, and in particular, to a beam forming method and a storage device based on deep learning. The beam forming method based on deep learning comprises the following steps: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
Description
Technical Field
The present invention relates to the field of beam processing technologies, and in particular, to a beam forming method and a storage device based on deep learning.
Background
In conventional adaptive microphone array beamforming techniques, such as super-directional beamforming, scattered noise is minimized while keeping the direction of arrival output unchanged, thereby suppressing noise. However, such methods often need to know the direction of arrival in advance, and the correlated noise of the humanlike sound often causes inaccurate estimation of the direction of arrival, thereby affecting the beam effect.
In an actual conference scene, a requirement of speaking by multiple persons often exists, and if the existing adaptive microphone array beam forming technology is used, because the direction of arrival cannot be known in advance, noise cannot be well removed, the beam effect is influenced, and the requirement of sound pickup of speaking by multiple persons in the conference scene or other arbitrary scenes cannot be met.
Disclosure of Invention
Therefore, a beam forming method based on deep learning is needed to be provided for solving the problems that the existing adaptive microphone array beam forming technology has poor effect of removing non-human voice noise and cannot meet the pickup requirement of multi-person speaking. The specific technical scheme is as follows:
a beam forming method based on deep learning comprises the following steps:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
Further, the "processing the acquired voice data through the deep learning technology to obtain the voice and the non-voice noise" specifically includes the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
Further, the "detecting signal energy in the identified voice direction, and performing weighted superposition calculation on the beam size according to the energy detection result" specifically includes the steps of:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
Further, the preset algorithm includes: and deep learning the trained neural network.
In order to solve the technical problem, the storage device is further provided, and the specific technical scheme is as follows:
a storage device having stored therein a set of instructions for performing:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
Further, the set of instructions is further for performing:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
Further, the set of instructions is further for performing:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
Further, the preset algorithm includes: and deep learning the trained neural network.
The invention has the beneficial effects that: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
Drawings
Fig. 1 is a flowchart illustrating a deep learning based beamforming method according to an embodiment;
FIG. 2 is a schematic diagram of a beam without deep learning processing according to an embodiment;
FIG. 3 is a schematic diagram of a beam processed by a deep learning technique to filter noise according to an embodiment;
FIG. 4 is a schematic diagram of a beam before computation of an unweighted meter stack in accordance with an illustrative embodiment;
FIG. 5 is a schematic diagram of a beam after superposition calculation of the weighting meters according to the embodiment;
fig. 6 is a block diagram of a storage device according to an embodiment.
Description of reference numerals:
600. a storage device.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Referring to fig. 1 to 5, in the present embodiment, a deep learning based beamforming method can be applied to a storage device, including but not limited to: personal computers, servers, general purpose computers, special purpose computers, network devices, embedded devices, programmable devices, intelligent mobile terminals, etc. The specific implementation mode is as follows:
the technical idea of the present application is explained below for use in a conference:
when the application scene is a conference, the core technical idea of the application is as follows: because in the conference application scene, the voice is taken as the main voice, the beam forming should be preferentially pointed to the voice direction of the person, and meanwhile, the conference has the situation that when a plurality of persons speak in discussion, the beam cannot be a single beam. Therefore, the application mainly makes two improvements: one is to introduce deep learning techniques such as: training voice recognition of a person through a neural network, and enabling beam forming to recognize voice and non-voice noise; one is to detect the signal energy in the recognized voice direction, and to perform weighted superposition calculation on the beam size according to the strength of the voice signal, so as to pick up the voice in multiple directions.
It should be noted that, besides the conference scene, the application scene core of the present application is a multi-person conversation scene, and thus may also be an informal tea session occasion, a reading session discussion occasion, and the like, as long as there is a multi-person conversation in the scene.
The following detailed description is made with reference to fig. 1 to 5:
step S101: and processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise.
Step S102: and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
In the present embodiment, the processing may be performed by any array, including, but not limited to, linear array, circular array, etc., and the following description will be given of step S101 and step S102 by taking any array as an example:
suppose that θ is calculated1,θ2And theta3Beams in three directions, where the corresponding beam output is y1=ωbf1x,y2=ωbf2x and y3=ωbf2x。
Step S101 specifically further includes the steps of: and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
In this embodiment, the preset algorithm takes a deep learning trained neural network as an example to determine the existence probability of the speech, and the formula is as follows:
ωdnn1=dnn_speech_probability_compute(ω1x)
in this formula, ω1x is input speech, i.e. neural network input, omegadnn1The probability is output for the network.
dnn _ speed _ mobility _ computer is the whole network flow, and the specific flow includes: audio input- > framing- > feature extraction- > neural network- > decoding- > judgment- > output voice probability.
The beam diagram without deep learning processing is shown in fig. 2, and the beam is directed to noise and speaker spk at the same time; the filtered-noise beam diagram after the deep learning technique is shown in fig. 3, where the beam is only directed to the speaker spk.
After denoising, executing step S102, wherein step S102 further includes:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
The description is continued by taking the above-mentioned 3mic circular array as an example:
meanwhile, energy weighting coefficients are calculated for a plurality of output beam directions, and the calculation formula is as follows:
ωenergy1=energy_weight_compute(ω1x)
in this formula, ω1x is input speech, omegaenergy1Is the multi-beam speech segment energy ratio.
energy _ weight _ computer is a speech segment energy ratio calculation process.
The specific calculation process is as follows: 1. computing total energy y of multi-beam voice segmentenergy=ωbf1x+ωbf2*x+ωbf3X, 2, calculating the sub-beam energy fraction omegaenergy1=ωbf1x/yenergy。
Calculating the final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain the final beam output
y=ωdnn1*ωenergy1*ωbf1x+ωdnn2*ωenergy2*ωbf2*x+ωdnn3*ωenergy3*ωbf3*x。
Wherein fig. 4 and 5 show the beam effect after beam weighting is performed on each directional beam in combination with the energy weighting method. FIG. 4 shows that the beam sizes of the directional talker spk1 and the talker spk2 are the same before being unweighted; FIG. 5 shows the effect of weighting the beams in each direction in conjunction with the energy weighting method, with speaker spk1 pointing at it with a larger beam than speaker spk2 because the sound is larger than speaker spk 2.
The acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
Referring to fig. 2 to fig. 6, in the present embodiment, an embodiment of a memory device 600 is as follows:
a storage device 600 having stored therein a set of instructions for performing:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
The scheme can be applied to any array, including but not limited to linear array, circular array, etc., and in the present embodiment, the command executed by the above instruction set is explained by taking any array as an example:
suppose that any array is calculated at θ1,θ2And theta3Beams in three directions, at this timeThe corresponding beam output is y1=ωbf1x,y2=ωbf2x and y3=ωbf2x。
Further, the set of instructions is further for performing:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
The method for processing the acquired voice data to obtain the voice and the non-voice noise through the deep learning technology specifically comprises the following steps: and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
In this embodiment, the preset algorithm takes a deep learning trained neural network as an example to determine the existence probability of the speech, and the formula is as follows:
ωdnn1=dnn_speech_probability_compute(ω1x)
in this formula, ω1x is input speech, i.e. neural network input, omegadnn1The probability is output for the network.
dnn _ speed _ mobility _ computer is the whole network flow, and the specific flow includes: audio input- > framing- > feature extraction- > neural network- > decoding- > judgment- > output voice probability.
The beam diagram without deep learning processing is shown in fig. 2, and the beam is directed to noise and speaker spk at the same time; the filtered-noise beam diagram after the deep learning technique is shown in fig. 3, where the beam is only directed to the speaker spk.
After denoising, the set of instructions is further configured to perform:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
The description is continued by taking as an example any of the arrays mentioned above:
meanwhile, energy weighting coefficients are calculated for a plurality of beam directions output by any array, and the calculation formula is as follows:
ωenergy1=energy_weight_compute(ω1x)
in this formula, ω1x is input speech, omegaenergy1Is the multi-beam speech segment energy ratio.
energy _ weight _ computer is a speech segment energy ratio calculation process.
The specific calculation process is as follows: 1. computing total energy y of multi-beam voice segmentenergy=ωbf1x+ωbf2*x+ωbf3X, 2, calculating the sub-beam energy fraction omegaenergy1=ωbf1x/yenergy。
Calculating the final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain the final beam output
y=ωdnn1*ωenergy1*ωbf1x+ωdnn2*ωenergy2*ωbf2*x+ωdnn3*ωenergy3*ωbf3*x。
Wherein fig. 4 and 5 show the beam effect after beam weighting is performed on each directional beam in combination with the energy weighting method. FIG. 4 shows that the beam sizes of the directional talker spk1 and the talker spk2 are the same before being unweighted; FIG. 5 shows the effect of weighting the beams in each direction in conjunction with the energy weighting method, with speaker spk1 pointing at it with a larger beam than speaker spk2 because the sound is larger than speaker spk 2.
Executing commands through the instruction set on the storage device 600: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.
Claims (8)
1. A beam forming method based on deep learning is characterized by comprising the following steps:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
2. The method according to claim 1, wherein the step of processing the acquired voice data to obtain vocal sounds and non-vocal sounds noise by using a deep learning technique further comprises the steps of:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
3. The method according to claim 2, wherein the step of performing signal energy detection in the identified human voice direction and performing weighted superposition calculation on the beam size according to the energy detection result further comprises the steps of:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
4. The deep learning based beamforming method according to claim 2 or 3,
the preset algorithm comprises the following steps: and deep learning the trained neural network.
5. A storage device having a set of instructions stored therein, the set of instructions being operable to perform:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
6. The storage device of claim 5, wherein the set of instructions is further configured to perform:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
7. The storage device of claim 6, wherein the set of instructions is further configured to perform:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
8. The storage device according to claim 6 or 7, wherein the preset algorithm comprises: and deep learning the trained neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110431846.0A CN113257269A (en) | 2021-04-21 | 2021-04-21 | Beam forming method based on deep learning and storage device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110431846.0A CN113257269A (en) | 2021-04-21 | 2021-04-21 | Beam forming method based on deep learning and storage device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113257269A true CN113257269A (en) | 2021-08-13 |
Family
ID=77221167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110431846.0A Pending CN113257269A (en) | 2021-04-21 | 2021-04-21 | Beam forming method based on deep learning and storage device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113257269A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080232607A1 (en) * | 2007-03-22 | 2008-09-25 | Microsoft Corporation | Robust adaptive beamforming with enhanced noise suppression |
CN101685638A (en) * | 2008-09-25 | 2010-03-31 | 华为技术有限公司 | Method and device for enhancing voice signals |
KR20130126318A (en) * | 2012-05-11 | 2013-11-20 | 엘지전자 주식회사 | Apparatus and method for removing noise |
CN109272989A (en) * | 2018-08-29 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Voice awakening method, device and computer readable storage medium |
CN110600051A (en) * | 2019-11-12 | 2019-12-20 | 乐鑫信息科技(上海)股份有限公司 | Method for selecting output beams of a microphone array |
CN110648692A (en) * | 2019-09-26 | 2020-01-03 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN110740412A (en) * | 2018-07-18 | 2020-01-31 | 奥迪康有限公司 | Hearing device comprising a speech presence probability estimator |
US10573321B1 (en) * | 2018-09-25 | 2020-02-25 | Sonos, Inc. | Voice detection optimization based on selected voice assistant service |
CN111025233A (en) * | 2019-11-13 | 2020-04-17 | 阿里巴巴集团控股有限公司 | Sound source direction positioning method and device, voice equipment and system |
CN111640428A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Voice recognition method, device, equipment and medium |
CN112652320A (en) * | 2020-12-04 | 2021-04-13 | 深圳地平线机器人科技有限公司 | Sound source positioning method and device, computer readable storage medium and electronic equipment |
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
-
2021
- 2021-04-21 CN CN202110431846.0A patent/CN113257269A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080232607A1 (en) * | 2007-03-22 | 2008-09-25 | Microsoft Corporation | Robust adaptive beamforming with enhanced noise suppression |
CN101685638A (en) * | 2008-09-25 | 2010-03-31 | 华为技术有限公司 | Method and device for enhancing voice signals |
KR20130126318A (en) * | 2012-05-11 | 2013-11-20 | 엘지전자 주식회사 | Apparatus and method for removing noise |
CN110740412A (en) * | 2018-07-18 | 2020-01-31 | 奥迪康有限公司 | Hearing device comprising a speech presence probability estimator |
CN109272989A (en) * | 2018-08-29 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Voice awakening method, device and computer readable storage medium |
US10573321B1 (en) * | 2018-09-25 | 2020-02-25 | Sonos, Inc. | Voice detection optimization based on selected voice assistant service |
CN110648692A (en) * | 2019-09-26 | 2020-01-03 | 苏州思必驰信息科技有限公司 | Voice endpoint detection method and system |
CN110600051A (en) * | 2019-11-12 | 2019-12-20 | 乐鑫信息科技(上海)股份有限公司 | Method for selecting output beams of a microphone array |
CN111025233A (en) * | 2019-11-13 | 2020-04-17 | 阿里巴巴集团控股有限公司 | Sound source direction positioning method and device, voice equipment and system |
CN111640428A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Voice recognition method, device, equipment and medium |
CN112652320A (en) * | 2020-12-04 | 2021-04-13 | 深圳地平线机器人科技有限公司 | Sound source positioning method and device, computer readable storage medium and electronic equipment |
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3819903B1 (en) | Audio data processing method and apparatus, device and storage medium | |
US20210158799A1 (en) | Speech recognition method, device, and computer-readable storage medium | |
US9697826B2 (en) | Processing multi-channel audio waveforms | |
CN110379412B (en) | Voice processing method and device, electronic equipment and computer readable storage medium | |
US11694710B2 (en) | Multi-stream target-speech detection and channel fusion | |
CN107919133A (en) | For the speech-enhancement system and sound enhancement method of destination object | |
US20170365255A1 (en) | Far field automatic speech recognition pre-processing | |
CN107785029A (en) | Target voice detection method and device | |
Kanda et al. | Acoustic modeling for distant multi-talker speech recognition with single-and multi-channel branches | |
CN110400571B (en) | Audio processing method and device, storage medium and electronic equipment | |
JP4964204B2 (en) | Multiple signal section estimation device, multiple signal section estimation method, program thereof, and recording medium | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
Kinoshita et al. | Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system | |
CN110600014A (en) | Model training method and device, storage medium and electronic equipment | |
KR20210137146A (en) | Speech augmentation using clustering of queues | |
CN113257269A (en) | Beam forming method based on deep learning and storage device | |
CN112712818A (en) | Voice enhancement method, device and equipment | |
Nakadai et al. | A robot referee for rock-paper-scissors sound games | |
CN114464184B (en) | Method, apparatus and storage medium for speech recognition | |
Lim et al. | Speaker localization in noisy environments using steered response voice power | |
Li et al. | Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array | |
CN114664288A (en) | Voice recognition method, device, equipment and storage medium | |
CN113223552A (en) | Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program | |
Nakagome et al. | Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation. | |
Lee et al. | Space-time voice activity detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |