CN113782046A - Microphone array pickup method and system for remote speech recognition - Google Patents
Microphone array pickup method and system for remote speech recognition Download PDFInfo
- Publication number
- CN113782046A CN113782046A CN202111057434.1A CN202111057434A CN113782046A CN 113782046 A CN113782046 A CN 113782046A CN 202111057434 A CN202111057434 A CN 202111057434A CN 113782046 A CN113782046 A CN 113782046A
- Authority
- CN
- China
- Prior art keywords
- covariance matrix
- vector
- signal
- calculating
- omega
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims description 63
- 238000004364 calculation method Methods 0.000 claims description 26
- 230000002452 interceptive effect Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 abstract description 7
- 230000001629 suppression Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 210000000860 cochlear nerve Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a microphone array pickup method and system for remote speech recognition, which are applied to the technical field of array signal processing. Secondly, the direction of the interference signal is randomly specified, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, and therefore sound pickup can be accurately carried out.
Description
Technical Field
The invention relates to the technical field of array signal processing, in particular to a microphone array pickup method and system for remote speech recognition.
Background
Speech is a tool used very frequently and with very important functions in human daily life. However, in practical environment, speech is always affected by environmental noise, room reverberation and interfering speakers, which greatly reduces speech quality and affects speech intelligibility and recognition rate, so we need to enhance the noise-interfered signal to obtain a clean signal. The speech enhancement technology has wide application in many application fields, such as audio and video communication, audio and video recording and playing, man-machine interaction, speech recognition and the like.
In the prior art, speech enhancement methods are classified into two major categories, single-channel enhancement algorithms and array enhancement algorithms. The two algorithms have advantages and disadvantages respectively, complement each other, can be used simultaneously in most environments, and improve the enhancement performance together. The classical single-channel speech enhancement algorithm comprises a spectral subtraction method, a wiener filtering method, a statistical method, a deep learning-based single-channel speech enhancement algorithm and the like. The classical multi-channel speech enhancement algorithm comprises a delay summation beam forming algorithm, a minimum variance undistorted response beam forming algorithm, a linear constraint minimum variance beam forming algorithm, a generalized sidelobe cancellation beam forming algorithm, a multi-channel wiener filtering algorithm and the like. Among them, the minimum variance distortion free response beamforming (MVDR) algorithm has become one of the most widely used adaptive beamforming algorithms currently.
However, when the target voice that we need to record is far away or the energy is small, so that the voice signal-to-noise ratio is extremely low, the performance of the adaptive algorithm is not satisfactory. First, since the target speech has small energy and it is difficult to accurately distinguish between noise segments and speech segments, it is difficult to accurately estimate the direction of the target speech, which may cause distortion of the target speech. Secondly, the anti-noise capability of the human ear and the auditory nerve is strong, but the capability of resisting human voice interference is weak, but the adaptive algorithm cannot distinguish the human voice interference from stable noise interference, and the same weight is suppressed, so that the remaining human voice interference can cause great influence on the intelligibility of the processed target voice.
Therefore, it is an urgent need to solve the above-mentioned problems by those skilled in the art to provide a microphone array sound pickup method and system for long-distance speech recognition.
Disclosure of Invention
In view of the above, the present invention provides a microphone array sound pickup method and system for long-distance speech recognition,
in order to achieve the above purpose, the invention provides the following technical scheme:
on one hand, the microphone array pickup method for the remote speech recognition comprises the following specific steps:
s100: manually selecting the direction of a voice signal, and calculating a guide vector of the voice signal;
s200: manually selecting the direction of an interference signal, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the direction of the interference signal;
s300: collecting sound by multiple microphones, and calculating a noise covariance matrix according to sound data received by the multiple microphones;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector of the voice signal and the weight vector.
Preferably, in S100, the steering vector of the voice signal direction is calculated as follows:
manually selecting a voice signal direction, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the voice signal direction, the positions and the sound velocities of the microphones to obtain a guide vector of a voice signal:
in the formula, τnThe time delay of sound reaching each microphone is N-1, 2, …, N, d (ω), which is the steering vector of the speech signal.
Preferably, in S200, the step of calculating the covariance matrix of the interference signal includes:
s210: manually selecting the direction of an interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
in the formula, τnFor the time delay of sound reaching each of the microphones, N is 1,2, …, N, di(ω) is the steering vector of the interference signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
In the formula (d)i(ω) is the steering vector of the interfering signal,is a guide vector diConjugate transpose of (omega), phiiiAnd (omega) is the covariance matrix of the interference signal.
Preferably, in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
Preferably, after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(ω) is the noise energy;
the interference signal energy calculation formula is as follows:
in the formula, phivv(ω) is the noise covariance matrix, di(ω) is the steering vector of the interfering signal,is a guide vector diConjugate transpose of (ω).
Determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=arg min(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
Preferably, the step of obtaining the target voice in S400 is as follows:
and solving a filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
In another aspect, a microphone array sound collecting system for long-distance speech recognition includes:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining and outputting the target voice according to the guide vector of the voice signal and the weight vector.
According to the technical scheme, compared with the prior art, the microphone array pickup method and the system for remote speech recognition are provided, and the distortion of target speech is reduced by manually specifying the direction of the speech. Secondly, the direction of the interference signal is manually appointed, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, therefore, the sound can be picked up accurately, and a better solution is provided for the remote voice control.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method provided by the present invention;
FIG. 2 is a schematic diagram of a system according to the present invention;
fig. 3 is a schematic view of the environment setup provided in this embodiment 2;
FIG. 4 is a diagram illustrating the processing results of a conventional MVDR method;
fig. 5 is a schematic view of the processing result of the present invention provided in this embodiment 2.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, an embodiment of the present invention discloses a microphone array pickup method for remote speech recognition, which includes the following specific steps:
s100: manually selecting the direction of the voice signal, and calculating a guide vector of the voice signal;
s200: manually selecting the direction of an interference signal, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the direction of the interference signal;
s300: collecting sound by multiple microphones, and calculating a noise covariance matrix according to sound data received by the multiple microphones;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector and the weight vector of the voice signal.
In one embodiment, in S100, the steering vector of the speech signal direction is calculated as follows:
manually selecting the direction of a voice signal, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the direction of the voice signal, the positions and the sound velocities of the microphones to obtain a guide vector of the voice signal:
in the formula, τnThe delay of sound reaching each microphone, where N is 1,2, …, and N, d (ω) is the steering vector of the speech signal.
In one embodiment, in S200, the step of calculating the covariance matrix of the interference signal includes:
s210: manually selecting the direction of the interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
in the formula, τnFor the delay of sound reaching each microphone, N is 1,2, …, N, di(ω) is the steering vector of the speech signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
In the formula (d)i(ω) is the steering vector of the speech signal,is a guide vector diConjugate transpose of (omega), phiiiAnd (omega) is the covariance matrix of the interference signal.
In one embodiment, in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
In a specific embodiment, after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(omega) is noise energy
The interference signal energy calculation formula is as follows:
in the formula, phivv(ω) is the noise covariance matrix, di(ω) is the steering vector of the speech signal,is a guide vector diConjugate transpose of (ω).
Determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=arg min(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
In a specific embodiment, the step of obtaining the target speech in S400 is as follows:
and solving the filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
specifically, in step S300, the equation after the filter coefficient h (ω) is corrected is obtained, and in step S400, the calculation process is the lagrangian multiplier method for the solved h (ω) calculation formula.
Weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
Referring to fig. 2, an embodiment of the present invention further discloses a microphone array sound collecting system for long-distance speech recognition, including:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data received by the multiple microphones;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, obtaining the target voice according to the guide vector and the weight vector of the voice signal and outputting the target voice.
According to the technical scheme, compared with the prior art, the microphone array pickup method and the system for remote speech recognition are provided, and the distortion of target speech is reduced by manually specifying the direction of the speech. Secondly, the direction of the interference signal is manually appointed, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, therefore, the sound can be picked up accurately, and a better solution is provided for the remote voice control.
Example 2
An example of a sound pickup method to which embodiment 1 of the present invention is specifically applied is as follows:
the invention has no requirements on the number, the shape and the size of the array, and only needs the fixed and known positions of all the microphones.
As shown in fig. 3, in the experiment of the conventional MVDR method and the present invention, an 8-microphone array was set and horizontally placed to test the effect. Meanwhile, 4 noise sources, an interference source, and a target voice were placed in the simulation experiment, and they were placed on a horizontal plane at 60-degree intervals at a distance of 20 m.
In addition to noise and interference sources, this embodiment adds independent white noise to each microphone to simulate a real recording environment.
FIG. 4 is a schematic diagram illustrating the processing result of the conventional MVDR method;
referring to FIG. 5, a schematic diagram of the processing results of the present invention provided in example 2 is shown;
as can be seen from fig. 4 and fig. 5, after the conventional MVDR method and the voice control method are tested, the distortion of the voice obtained by the voice control method is smaller, the voice interference is suppressed, and the voice is clearer, so that the target voice intelligibility is improved, the subsequent processing is easier, the accurate sound pickup is performed, and a better solution is provided for the remote voice control.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A microphone array pickup method for remote speech recognition is characterized by comprising the following specific steps:
s100: selecting any voice signal direction, and calculating a guide vector of the voice signal;
s200: selecting any interference signal direction, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the interference signal direction;
s300: picking up sound by using a multi-microphone array, and calculating a noise covariance matrix according to sound data collected by the multi-microphone;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector of the voice signal and the weight vector.
2. The microphone array pickup method for long-distance speech recognition according to claim 1, wherein in S100, the steering vector of the speech signal direction is calculated as follows:
manually selecting a voice signal direction, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the voice signal direction, the positions and the sound velocities of the microphones to obtain a guide vector of a voice signal:
in the formula, τnThe time delay of sound reaching each microphone is N-1, 2, …, N, d (ω), which is the steering vector of the speech signal.
3. The method as claimed in claim 1, wherein the step of calculating the covariance matrix of the interference signal in S200 is as follows:
s210: manually selecting the direction of an interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
in the formula, τnFor the time delay of sound reaching each of the microphones, N is 1,2, …, N, di(ω) is the steering vector of the interference signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
4. The microphone array pickup method for long-distance speech recognition according to claim 1, wherein in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
5. The microphone array pickup method for long-distance speech recognition according to claim 4, wherein after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified by:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(ω) is the noise energy;
the interference signal energy calculation formula is as follows:
in the formula, phiνv(ω) is the noise covariance matrix, di(ω) is the steering vector of the interfering signal,is a guide vector diConjugate transpose of (ω);
determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=argmin(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
6. The microphone array pickup method for remote speech recognition according to claim 5, wherein the step of obtaining the target speech in S400 is as follows:
and solving a filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
7. A microphone array sound pickup system for long-distance speech recognition, comprising:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining and outputting the target voice according to the guide vector of the voice signal and the weight vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111057434.1A CN113782046B (en) | 2021-09-09 | 2021-09-09 | Microphone array pickup method and system for long-distance voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111057434.1A CN113782046B (en) | 2021-09-09 | 2021-09-09 | Microphone array pickup method and system for long-distance voice recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113782046A true CN113782046A (en) | 2021-12-10 |
CN113782046B CN113782046B (en) | 2024-09-17 |
Family
ID=78842224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111057434.1A Active CN113782046B (en) | 2021-09-09 | 2021-09-09 | Microphone array pickup method and system for long-distance voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113782046B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118522285A (en) * | 2024-07-25 | 2024-08-20 | 辽宁汉华信息工程有限公司 | Interactive user voice recognition method for AI intelligent agent |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108181507A (en) * | 2017-12-25 | 2018-06-19 | 中国科学技术大学 | A kind of robust adaptive beamforming method |
US20180176679A1 (en) * | 2016-12-20 | 2018-06-21 | Verizon Patent And Licensing Inc. | Beamforming optimization for receiving audio signals |
JP2018141922A (en) * | 2017-02-28 | 2018-09-13 | 日本電信電話株式会社 | Steering vector estimation device, steering vector estimating method and steering vector estimation program |
CN108694957A (en) * | 2018-04-08 | 2018-10-23 | 湖北工业大学 | The echo cancelltion design method formed based on circular microphone array beams |
CN108831495A (en) * | 2018-06-04 | 2018-11-16 | 桂林电子科技大学 | A kind of sound enhancement method applied to speech recognition under noise circumstance |
JP2019054344A (en) * | 2017-09-13 | 2019-04-04 | 日本電信電話株式会社 | Filter coefficient calculation device, sound pickup device, method thereof, and program |
CN110503971A (en) * | 2018-05-18 | 2019-11-26 | 英特尔公司 | Time-frequency mask neural network based estimation and Wave beam forming for speech processes |
CN110890099A (en) * | 2018-09-10 | 2020-03-17 | 北京京东尚科信息技术有限公司 | Sound signal processing method, device and storage medium |
CN110931036A (en) * | 2019-12-07 | 2020-03-27 | 杭州国芯科技股份有限公司 | Microphone array beam forming method |
CN111052766A (en) * | 2017-09-07 | 2020-04-21 | 三菱电机株式会社 | Noise removing device and noise removing method |
CN111081267A (en) * | 2019-12-31 | 2020-04-28 | 中国科学院声学研究所 | Multi-channel far-field speech enhancement method |
CN112447184A (en) * | 2020-11-10 | 2021-03-05 | 北京小米松果电子有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
-
2021
- 2021-09-09 CN CN202111057434.1A patent/CN113782046B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180176679A1 (en) * | 2016-12-20 | 2018-06-21 | Verizon Patent And Licensing Inc. | Beamforming optimization for receiving audio signals |
JP2018141922A (en) * | 2017-02-28 | 2018-09-13 | 日本電信電話株式会社 | Steering vector estimation device, steering vector estimating method and steering vector estimation program |
CN111052766A (en) * | 2017-09-07 | 2020-04-21 | 三菱电机株式会社 | Noise removing device and noise removing method |
JP2019054344A (en) * | 2017-09-13 | 2019-04-04 | 日本電信電話株式会社 | Filter coefficient calculation device, sound pickup device, method thereof, and program |
CN108181507A (en) * | 2017-12-25 | 2018-06-19 | 中国科学技术大学 | A kind of robust adaptive beamforming method |
CN108694957A (en) * | 2018-04-08 | 2018-10-23 | 湖北工业大学 | The echo cancelltion design method formed based on circular microphone array beams |
CN110503971A (en) * | 2018-05-18 | 2019-11-26 | 英特尔公司 | Time-frequency mask neural network based estimation and Wave beam forming for speech processes |
CN108831495A (en) * | 2018-06-04 | 2018-11-16 | 桂林电子科技大学 | A kind of sound enhancement method applied to speech recognition under noise circumstance |
CN110890099A (en) * | 2018-09-10 | 2020-03-17 | 北京京东尚科信息技术有限公司 | Sound signal processing method, device and storage medium |
CN110931036A (en) * | 2019-12-07 | 2020-03-27 | 杭州国芯科技股份有限公司 | Microphone array beam forming method |
CN111081267A (en) * | 2019-12-31 | 2020-04-28 | 中国科学院声学研究所 | Multi-channel far-field speech enhancement method |
CN112447184A (en) * | 2020-11-10 | 2021-03-05 | 北京小米松果电子有限公司 | Voice signal processing method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
杨志伟;张攀;陈颖;许华健;: "导向矢量和协方差矩阵联合迭代估计的稳健波束形成算法", 电子与信息学报, no. 12, 18 October 2018 (2018-10-18) * |
臧守明;白媛;马秀荣;李俊胜;: "一种改进的嵌套阵列波束形成算法", 计算机仿真, no. 10, 15 October 2016 (2016-10-15) * |
陈明建;罗景青;龙国庆;: "基于协方差矩阵估计的稳健Capon波束形成算法", 火力与指挥控制, no. 10, 15 October 2016 (2016-10-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118522285A (en) * | 2024-07-25 | 2024-08-20 | 辽宁汉华信息工程有限公司 | Interactive user voice recognition method for AI intelligent agent |
Also Published As
Publication number | Publication date |
---|---|
CN113782046B (en) | 2024-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Benesty et al. | Fundamentals of differential beamforming | |
CN106251877B (en) | Voice Sounnd source direction estimation method and device | |
CN106782590B (en) | Microphone array beam forming method based on reverberation environment | |
US9100734B2 (en) | Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation | |
US20090103749A1 (en) | Microphone Array Processor Based on Spatial Analysis | |
Yousefian et al. | A dual-microphone algorithm that can cope with competing-talker scenarios | |
Wang et al. | Noise power spectral density estimation using MaxNSR blocking matrix | |
EP1430472A2 (en) | Selective sound enhancement | |
Jarrett et al. | Noise reduction in the spherical harmonic domain using a tradeoff beamformer and narrowband DOA estimates | |
WO2023108864A1 (en) | Regional pickup method and system for miniature microphone array device | |
CN113257270B (en) | Multi-channel voice enhancement method based on reference microphone optimization | |
CN113782046B (en) | Microphone array pickup method and system for long-distance voice recognition | |
Fejgin et al. | BRUDEX database: Binaural room impulse responses with uniformly distributed external microphones | |
Levin et al. | Near-field signal acquisition for smartglasses using two acoustic vector-sensors | |
Šarić et al. | Supervised speech separation combined with adaptive beamforming | |
Bai et al. | Speech Enhancement by Denoising and Dereverberation Using a Generalized Sidelobe Canceller-Based Multichannel Wiener Filter | |
Geng et al. | A speech enhancement method based on the combination of microphone array and parabolic reflector | |
Koyama et al. | Exploring optimal dnn architecture for end-to-end beamformers based on time-frequency references | |
Zhu et al. | Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming | |
As’ad et al. | Beamforming designs robust to propagation model estimation errors for binaural hearing aids | |
D'Olne et al. | Model-based beamforming for wearable microphone arrays | |
Schwartz et al. | A recursive expectation-maximization algorithm for online multi-microphone noise reduction | |
Yen et al. | Rotor noise-aware noise covariance matrix estimation for unmanned aerial vehicle audition | |
Šarić et al. | Performance analysis of MVDR beamformer applied on an end-fire microphone array composed of unidirectional microphones | |
Bai et al. | Kalman filter-based microphone array signal processing using the equivalent source model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |