CN113257269A - Beam forming method based on deep learning and storage device - Google Patents

Beam forming method based on deep learning and storage device Download PDF

Info

Publication number
CN113257269A
CN113257269A CN202110431846.0A CN202110431846A CN113257269A CN 113257269 A CN113257269 A CN 113257269A CN 202110431846 A CN202110431846 A CN 202110431846A CN 113257269 A CN113257269 A CN 113257269A
Authority
CN
China
Prior art keywords
voice
deep learning
noise
energy detection
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110431846.0A
Other languages
Chinese (zh)
Inventor
李茂发
江正梁
陈时钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rockchip Electronics Co Ltd
Original Assignee
Rockchip Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rockchip Electronics Co Ltd filed Critical Rockchip Electronics Co Ltd
Priority to CN202110431846.0A priority Critical patent/CN113257269A/en
Publication of CN113257269A publication Critical patent/CN113257269A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention relates to the field of beam processing technologies, and in particular, to a beam forming method and a storage device based on deep learning. The beam forming method based on deep learning comprises the following steps: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.

Description

Beam forming method based on deep learning and storage device
Technical Field
The present invention relates to the field of beam processing technologies, and in particular, to a beam forming method and a storage device based on deep learning.
Background
In conventional adaptive microphone array beamforming techniques, such as super-directional beamforming, scattered noise is minimized while keeping the direction of arrival output unchanged, thereby suppressing noise. However, such methods often need to know the direction of arrival in advance, and the correlated noise of the humanlike sound often causes inaccurate estimation of the direction of arrival, thereby affecting the beam effect.
In an actual conference scene, a requirement of speaking by multiple persons often exists, and if the existing adaptive microphone array beam forming technology is used, because the direction of arrival cannot be known in advance, noise cannot be well removed, the beam effect is influenced, and the requirement of sound pickup of speaking by multiple persons in the conference scene or other arbitrary scenes cannot be met.
Disclosure of Invention
Therefore, a beam forming method based on deep learning is needed to be provided for solving the problems that the existing adaptive microphone array beam forming technology has poor effect of removing non-human voice noise and cannot meet the pickup requirement of multi-person speaking. The specific technical scheme is as follows:
a beam forming method based on deep learning comprises the following steps:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
Further, the "processing the acquired voice data through the deep learning technology to obtain the voice and the non-voice noise" specifically includes the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
Further, the "detecting signal energy in the identified voice direction, and performing weighted superposition calculation on the beam size according to the energy detection result" specifically includes the steps of:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
Further, the preset algorithm includes: and deep learning the trained neural network.
In order to solve the technical problem, the storage device is further provided, and the specific technical scheme is as follows:
a storage device having stored therein a set of instructions for performing:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
Further, the set of instructions is further for performing:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
Further, the set of instructions is further for performing:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
Further, the preset algorithm includes: and deep learning the trained neural network.
The invention has the beneficial effects that: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
Drawings
Fig. 1 is a flowchart illustrating a deep learning based beamforming method according to an embodiment;
FIG. 2 is a schematic diagram of a beam without deep learning processing according to an embodiment;
FIG. 3 is a schematic diagram of a beam processed by a deep learning technique to filter noise according to an embodiment;
FIG. 4 is a schematic diagram of a beam before computation of an unweighted meter stack in accordance with an illustrative embodiment;
FIG. 5 is a schematic diagram of a beam after superposition calculation of the weighting meters according to the embodiment;
fig. 6 is a block diagram of a storage device according to an embodiment.
Description of reference numerals:
600. a storage device.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Referring to fig. 1 to 5, in the present embodiment, a deep learning based beamforming method can be applied to a storage device, including but not limited to: personal computers, servers, general purpose computers, special purpose computers, network devices, embedded devices, programmable devices, intelligent mobile terminals, etc. The specific implementation mode is as follows:
the technical idea of the present application is explained below for use in a conference:
when the application scene is a conference, the core technical idea of the application is as follows: because in the conference application scene, the voice is taken as the main voice, the beam forming should be preferentially pointed to the voice direction of the person, and meanwhile, the conference has the situation that when a plurality of persons speak in discussion, the beam cannot be a single beam. Therefore, the application mainly makes two improvements: one is to introduce deep learning techniques such as: training voice recognition of a person through a neural network, and enabling beam forming to recognize voice and non-voice noise; one is to detect the signal energy in the recognized voice direction, and to perform weighted superposition calculation on the beam size according to the strength of the voice signal, so as to pick up the voice in multiple directions.
It should be noted that, besides the conference scene, the application scene core of the present application is a multi-person conversation scene, and thus may also be an informal tea session occasion, a reading session discussion occasion, and the like, as long as there is a multi-person conversation in the scene.
The following detailed description is made with reference to fig. 1 to 5:
step S101: and processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise.
Step S102: and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
In the present embodiment, the processing may be performed by any array, including, but not limited to, linear array, circular array, etc., and the following description will be given of step S101 and step S102 by taking any array as an example:
suppose that θ is calculated1,θ2And theta3Beams in three directions, where the corresponding beam output is y1=ωbf1x,y2=ωbf2x and y3=ωbf2x。
Step S101 specifically further includes the steps of: and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
In this embodiment, the preset algorithm takes a deep learning trained neural network as an example to determine the existence probability of the speech, and the formula is as follows:
ωdnn1=dnn_speech_probability_compute(ω1x)
in this formula, ω1x is input speech, i.e. neural network input, omegadnn1The probability is output for the network.
dnn _ speed _ mobility _ computer is the whole network flow, and the specific flow includes: audio input- > framing- > feature extraction- > neural network- > decoding- > judgment- > output voice probability.
The beam diagram without deep learning processing is shown in fig. 2, and the beam is directed to noise and speaker spk at the same time; the filtered-noise beam diagram after the deep learning technique is shown in fig. 3, where the beam is only directed to the speaker spk.
After denoising, executing step S102, wherein step S102 further includes:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
The description is continued by taking the above-mentioned 3mic circular array as an example:
meanwhile, energy weighting coefficients are calculated for a plurality of output beam directions, and the calculation formula is as follows:
ωenergy1=energy_weight_compute(ω1x)
in this formula, ω1x is input speech, omegaenergy1Is the multi-beam speech segment energy ratio.
energy _ weight _ computer is a speech segment energy ratio calculation process.
The specific calculation process is as follows: 1. computing total energy y of multi-beam voice segmentenergy=ωbf1x+ωbf2*x+ωbf3X, 2, calculating the sub-beam energy fraction omegaenergy1=ωbf1x/yenergy
Calculating the final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain the final beam output
y=ωdnn1energy1bf1x+ωdnn2energy2bf2*x+ωdnn3energy3bf3*x。
Wherein fig. 4 and 5 show the beam effect after beam weighting is performed on each directional beam in combination with the energy weighting method. FIG. 4 shows that the beam sizes of the directional talker spk1 and the talker spk2 are the same before being unweighted; FIG. 5 shows the effect of weighting the beams in each direction in conjunction with the energy weighting method, with speaker spk1 pointing at it with a larger beam than speaker spk2 because the sound is larger than speaker spk 2.
The acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
Referring to fig. 2 to fig. 6, in the present embodiment, an embodiment of a memory device 600 is as follows:
a storage device 600 having stored therein a set of instructions for performing:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
The scheme can be applied to any array, including but not limited to linear array, circular array, etc., and in the present embodiment, the command executed by the above instruction set is explained by taking any array as an example:
suppose that any array is calculated at θ1,θ2And theta3Beams in three directions, at this timeThe corresponding beam output is y1=ωbf1x,y2=ωbf2x and y3=ωbf2x。
Further, the set of instructions is further for performing:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
The method for processing the acquired voice data to obtain the voice and the non-voice noise through the deep learning technology specifically comprises the following steps: and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
In this embodiment, the preset algorithm takes a deep learning trained neural network as an example to determine the existence probability of the speech, and the formula is as follows:
ωdnn1=dnn_speech_probability_compute(ω1x)
in this formula, ω1x is input speech, i.e. neural network input, omegadnn1The probability is output for the network.
dnn _ speed _ mobility _ computer is the whole network flow, and the specific flow includes: audio input- > framing- > feature extraction- > neural network- > decoding- > judgment- > output voice probability.
The beam diagram without deep learning processing is shown in fig. 2, and the beam is directed to noise and speaker spk at the same time; the filtered-noise beam diagram after the deep learning technique is shown in fig. 3, where the beam is only directed to the speaker spk.
After denoising, the set of instructions is further configured to perform:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
The description is continued by taking as an example any of the arrays mentioned above:
meanwhile, energy weighting coefficients are calculated for a plurality of beam directions output by any array, and the calculation formula is as follows:
ωenergy1=energy_weight_compute(ω1x)
in this formula, ω1x is input speech, omegaenergy1Is the multi-beam speech segment energy ratio.
energy _ weight _ computer is a speech segment energy ratio calculation process.
The specific calculation process is as follows: 1. computing total energy y of multi-beam voice segmentenergy=ωbf1x+ωbf2*x+ωbf3X, 2, calculating the sub-beam energy fraction omegaenergy1=ωbf1x/yenergy
Calculating the final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain the final beam output
y=ωdnn1energy1bf1x+ωdnn2energy2bf2*x+ωdnn3energy3bf3*x。
Wherein fig. 4 and 5 show the beam effect after beam weighting is performed on each directional beam in combination with the energy weighting method. FIG. 4 shows that the beam sizes of the directional talker spk1 and the talker spk2 are the same before being unweighted; FIG. 5 shows the effect of weighting the beams in each direction in conjunction with the energy weighting method, with speaker spk1 pointing at it with a larger beam than speaker spk2 because the sound is larger than speaker spk 2.
Executing commands through the instruction set on the storage device 600: the acquired voice data are processed through a deep learning technology to obtain the voice and the non-voice noise, and compared with the traditional self-adaptive beam forming algorithm, the method is more accurate and intelligent in recognition and judgment of the voice and the non-voice noise; and carrying out signal energy detection in the identified voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result, so that voice can be picked up in multiple directions at the same time, and the requirement of picking up the voices of multiple persons in a conference scene or any other scene is met.
It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims (8)

1. A beam forming method based on deep learning is characterized by comprising the following steps:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
2. The method according to claim 1, wherein the step of processing the acquired voice data to obtain vocal sounds and non-vocal sounds noise by using a deep learning technique further comprises the steps of:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
3. The method according to claim 2, wherein the step of performing signal energy detection in the identified human voice direction and performing weighted superposition calculation on the beam size according to the energy detection result further comprises the steps of:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
4. The deep learning based beamforming method according to claim 2 or 3,
the preset algorithm comprises the following steps: and deep learning the trained neural network.
5. A storage device having a set of instructions stored therein, the set of instructions being operable to perform:
processing the obtained voice data through a deep learning technology to obtain voice and non-voice noise;
and carrying out signal energy detection in the identified human voice direction, and carrying out weighted superposition calculation on the beam size according to an energy detection result.
6. The storage device of claim 5, wherein the set of instructions is further configured to perform:
the method comprises the following steps of processing the acquired voice data through a deep learning technology to obtain voice and non-voice noise, and specifically comprises the following steps:
and calculating the voice existence probability of the acquired voice data through a preset algorithm, and obtaining the voice and the non-voice noise according to the calculation result of the voice existence probability.
7. The storage device of claim 6, wherein the set of instructions is further configured to perform:
the method comprises the following steps of performing signal energy detection in the identified human voice direction, and performing weighted superposition calculation on the beam size according to an energy detection result, and specifically comprises the following steps:
calculating energy weighting coefficients for the output multiple beam directions;
and calculating a final beam weighting coefficient according to the voice existence probability and the energy weighting coefficient to obtain final beam output.
8. The storage device according to claim 6 or 7, wherein the preset algorithm comprises: and deep learning the trained neural network.
CN202110431846.0A 2021-04-21 2021-04-21 Beam forming method based on deep learning and storage device Pending CN113257269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110431846.0A CN113257269A (en) 2021-04-21 2021-04-21 Beam forming method based on deep learning and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110431846.0A CN113257269A (en) 2021-04-21 2021-04-21 Beam forming method based on deep learning and storage device

Publications (1)

Publication Number Publication Date
CN113257269A true CN113257269A (en) 2021-08-13

Family

ID=77221167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110431846.0A Pending CN113257269A (en) 2021-04-21 2021-04-21 Beam forming method based on deep learning and storage device

Country Status (1)

Country Link
CN (1) CN113257269A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080232607A1 (en) * 2007-03-22 2008-09-25 Microsoft Corporation Robust adaptive beamforming with enhanced noise suppression
CN101685638A (en) * 2008-09-25 2010-03-31 华为技术有限公司 Method and device for enhancing voice signals
KR20130126318A (en) * 2012-05-11 2013-11-20 엘지전자 주식회사 Apparatus and method for removing noise
CN109272989A (en) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 Voice awakening method, device and computer readable storage medium
CN110600051A (en) * 2019-11-12 2019-12-20 乐鑫信息科技(上海)股份有限公司 Method for selecting output beams of a microphone array
CN110648692A (en) * 2019-09-26 2020-01-03 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110740412A (en) * 2018-07-18 2020-01-31 奥迪康有限公司 Hearing device comprising a speech presence probability estimator
US10573321B1 (en) * 2018-09-25 2020-02-25 Sonos, Inc. Voice detection optimization based on selected voice assistant service
CN111025233A (en) * 2019-11-13 2020-04-17 阿里巴巴集团控股有限公司 Sound source direction positioning method and device, voice equipment and system
CN111640428A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Voice recognition method, device, equipment and medium
CN112652320A (en) * 2020-12-04 2021-04-13 深圳地平线机器人科技有限公司 Sound source positioning method and device, computer readable storage medium and electronic equipment
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080232607A1 (en) * 2007-03-22 2008-09-25 Microsoft Corporation Robust adaptive beamforming with enhanced noise suppression
CN101685638A (en) * 2008-09-25 2010-03-31 华为技术有限公司 Method and device for enhancing voice signals
KR20130126318A (en) * 2012-05-11 2013-11-20 엘지전자 주식회사 Apparatus and method for removing noise
CN110740412A (en) * 2018-07-18 2020-01-31 奥迪康有限公司 Hearing device comprising a speech presence probability estimator
CN109272989A (en) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 Voice awakening method, device and computer readable storage medium
US10573321B1 (en) * 2018-09-25 2020-02-25 Sonos, Inc. Voice detection optimization based on selected voice assistant service
CN110648692A (en) * 2019-09-26 2020-01-03 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110600051A (en) * 2019-11-12 2019-12-20 乐鑫信息科技(上海)股份有限公司 Method for selecting output beams of a microphone array
CN111025233A (en) * 2019-11-13 2020-04-17 阿里巴巴集团控股有限公司 Sound source direction positioning method and device, voice equipment and system
CN111640428A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Voice recognition method, device, equipment and medium
CN112652320A (en) * 2020-12-04 2021-04-13 深圳地平线机器人科技有限公司 Sound source positioning method and device, computer readable storage medium and electronic equipment
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device

Similar Documents

Publication Publication Date Title
EP3819903B1 (en) Audio data processing method and apparatus, device and storage medium
US20210158799A1 (en) Speech recognition method, device, and computer-readable storage medium
US9697826B2 (en) Processing multi-channel audio waveforms
CN110379412B (en) Voice processing method and device, electronic equipment and computer readable storage medium
US11694710B2 (en) Multi-stream target-speech detection and channel fusion
CN107919133A (en) For the speech-enhancement system and sound enhancement method of destination object
US20170365255A1 (en) Far field automatic speech recognition pre-processing
CN107785029A (en) Target voice detection method and device
Kanda et al. Acoustic modeling for distant multi-talker speech recognition with single-and multi-channel branches
CN110400571B (en) Audio processing method and device, storage medium and electronic equipment
JP4964204B2 (en) Multiple signal section estimation device, multiple signal section estimation method, program thereof, and recording medium
CN110610718B (en) Method and device for extracting expected sound source voice signal
Kinoshita et al. Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
CN110600014A (en) Model training method and device, storage medium and electronic equipment
KR20210137146A (en) Speech augmentation using clustering of queues
CN113257269A (en) Beam forming method based on deep learning and storage device
CN112712818A (en) Voice enhancement method, device and equipment
Nakadai et al. A robot referee for rock-paper-scissors sound games
CN114464184B (en) Method, apparatus and storage medium for speech recognition
Lim et al. Speaker localization in noisy environments using steered response voice power
Li et al. Feature mapping of multiple beamformed sources for robust overlapping speech recognition using a microphone array
CN114664288A (en) Voice recognition method, device, equipment and storage medium
CN113223552A (en) Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
Nakagome et al. Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation.
Lee et al. Space-time voice activity detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination