CN113782046A - Microphone array pickup method and system for remote speech recognition - Google Patents

Microphone array pickup method and system for remote speech recognition Download PDF

Info

Publication number
CN113782046A
CN113782046A CN202111057434.1A CN202111057434A CN113782046A CN 113782046 A CN113782046 A CN 113782046A CN 202111057434 A CN202111057434 A CN 202111057434A CN 113782046 A CN113782046 A CN 113782046A
Authority
CN
China
Prior art keywords
covariance matrix
vector
signal
calculating
omega
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111057434.1A
Other languages
Chinese (zh)
Other versions
CN113782046B (en
Inventor
马超
李冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202111057434.1A priority Critical patent/CN113782046B/en
Publication of CN113782046A publication Critical patent/CN113782046A/en
Application granted granted Critical
Publication of CN113782046B publication Critical patent/CN113782046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a microphone array pickup method and system for remote speech recognition, which are applied to the technical field of array signal processing. Secondly, the direction of the interference signal is randomly specified, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, and therefore sound pickup can be accurately carried out.

Description

Microphone array pickup method and system for remote speech recognition
Technical Field
The invention relates to the technical field of array signal processing, in particular to a microphone array pickup method and system for remote speech recognition.
Background
Speech is a tool used very frequently and with very important functions in human daily life. However, in practical environment, speech is always affected by environmental noise, room reverberation and interfering speakers, which greatly reduces speech quality and affects speech intelligibility and recognition rate, so we need to enhance the noise-interfered signal to obtain a clean signal. The speech enhancement technology has wide application in many application fields, such as audio and video communication, audio and video recording and playing, man-machine interaction, speech recognition and the like.
In the prior art, speech enhancement methods are classified into two major categories, single-channel enhancement algorithms and array enhancement algorithms. The two algorithms have advantages and disadvantages respectively, complement each other, can be used simultaneously in most environments, and improve the enhancement performance together. The classical single-channel speech enhancement algorithm comprises a spectral subtraction method, a wiener filtering method, a statistical method, a deep learning-based single-channel speech enhancement algorithm and the like. The classical multi-channel speech enhancement algorithm comprises a delay summation beam forming algorithm, a minimum variance undistorted response beam forming algorithm, a linear constraint minimum variance beam forming algorithm, a generalized sidelobe cancellation beam forming algorithm, a multi-channel wiener filtering algorithm and the like. Among them, the minimum variance distortion free response beamforming (MVDR) algorithm has become one of the most widely used adaptive beamforming algorithms currently.
However, when the target voice that we need to record is far away or the energy is small, so that the voice signal-to-noise ratio is extremely low, the performance of the adaptive algorithm is not satisfactory. First, since the target speech has small energy and it is difficult to accurately distinguish between noise segments and speech segments, it is difficult to accurately estimate the direction of the target speech, which may cause distortion of the target speech. Secondly, the anti-noise capability of the human ear and the auditory nerve is strong, but the capability of resisting human voice interference is weak, but the adaptive algorithm cannot distinguish the human voice interference from stable noise interference, and the same weight is suppressed, so that the remaining human voice interference can cause great influence on the intelligibility of the processed target voice.
Therefore, it is an urgent need to solve the above-mentioned problems by those skilled in the art to provide a microphone array sound pickup method and system for long-distance speech recognition.
Disclosure of Invention
In view of the above, the present invention provides a microphone array sound pickup method and system for long-distance speech recognition,
in order to achieve the above purpose, the invention provides the following technical scheme:
on one hand, the microphone array pickup method for the remote speech recognition comprises the following specific steps:
s100: manually selecting the direction of a voice signal, and calculating a guide vector of the voice signal;
s200: manually selecting the direction of an interference signal, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the direction of the interference signal;
s300: collecting sound by multiple microphones, and calculating a noise covariance matrix according to sound data received by the multiple microphones;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector of the voice signal and the weight vector.
Preferably, in S100, the steering vector of the voice signal direction is calculated as follows:
manually selecting a voice signal direction, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the voice signal direction, the positions and the sound velocities of the microphones to obtain a guide vector of a voice signal:
Figure BDA0003255133210000021
in the formula, τnThe time delay of sound reaching each microphone is N-1, 2, …, N, d (ω), which is the steering vector of the speech signal.
Preferably, in S200, the step of calculating the covariance matrix of the interference signal includes:
s210: manually selecting the direction of an interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
Figure BDA0003255133210000022
in the formula, τnFor the time delay of sound reaching each of the microphones, N is 1,2, …, N, di(ω) is the steering vector of the interference signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
Figure BDA0003255133210000031
In the formula (d)i(ω) is the steering vector of the interfering signal,
Figure BDA0003255133210000032
is a guide vector diConjugate transpose of (omega), phiiiAnd (omega) is the covariance matrix of the interference signal.
Preferably, in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
Preferably, after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(ω) is the noise energy;
the interference signal energy calculation formula is as follows:
Figure BDA0003255133210000033
in the formula, phivv(ω) is the noise covariance matrix, di(ω) is the steering vector of the interfering signal,
Figure BDA0003255133210000034
is a guide vector diConjugate transpose of (ω).
Determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
Figure BDA0003255133210000035
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=arg min(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
Preferably, the step of obtaining the target voice in S400 is as follows:
and solving a filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
Figure BDA0003255133210000041
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
In another aspect, a microphone array sound collecting system for long-distance speech recognition includes:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining and outputting the target voice according to the guide vector of the voice signal and the weight vector.
According to the technical scheme, compared with the prior art, the microphone array pickup method and the system for remote speech recognition are provided, and the distortion of target speech is reduced by manually specifying the direction of the speech. Secondly, the direction of the interference signal is manually appointed, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, therefore, the sound can be picked up accurately, and a better solution is provided for the remote voice control.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method provided by the present invention;
FIG. 2 is a schematic diagram of a system according to the present invention;
fig. 3 is a schematic view of the environment setup provided in this embodiment 2;
FIG. 4 is a diagram illustrating the processing results of a conventional MVDR method;
fig. 5 is a schematic view of the processing result of the present invention provided in this embodiment 2.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, an embodiment of the present invention discloses a microphone array pickup method for remote speech recognition, which includes the following specific steps:
s100: manually selecting the direction of the voice signal, and calculating a guide vector of the voice signal;
s200: manually selecting the direction of an interference signal, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the direction of the interference signal;
s300: collecting sound by multiple microphones, and calculating a noise covariance matrix according to sound data received by the multiple microphones;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector and the weight vector of the voice signal.
In one embodiment, in S100, the steering vector of the speech signal direction is calculated as follows:
manually selecting the direction of a voice signal, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the direction of the voice signal, the positions and the sound velocities of the microphones to obtain a guide vector of the voice signal:
Figure BDA0003255133210000061
in the formula, τnThe delay of sound reaching each microphone, where N is 1,2, …, and N, d (ω) is the steering vector of the speech signal.
In one embodiment, in S200, the step of calculating the covariance matrix of the interference signal includes:
s210: manually selecting the direction of the interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
Figure BDA0003255133210000062
in the formula, τnFor the delay of sound reaching each microphone, N is 1,2, …, N, di(ω) is the steering vector of the speech signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
Figure BDA0003255133210000063
In the formula (d)i(ω) is the steering vector of the speech signal,
Figure BDA0003255133210000064
is a guide vector diConjugate transpose of (omega), phiiiAnd (omega) is the covariance matrix of the interference signal.
In one embodiment, in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
In a specific embodiment, after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(omega) is noise energy
The interference signal energy calculation formula is as follows:
Figure BDA0003255133210000065
in the formula, phivv(ω) is the noise covariance matrix, di(ω) is the steering vector of the speech signal,
Figure BDA0003255133210000071
is a guide vector diConjugate transpose of (ω).
Determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
Figure BDA0003255133210000072
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=arg min(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
In a specific embodiment, the step of obtaining the target speech in S400 is as follows:
and solving the filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
Figure BDA0003255133210000073
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
specifically, in step S300, the equation after the filter coefficient h (ω) is corrected is obtained, and in step S400, the calculation process is the lagrangian multiplier method for the solved h (ω) calculation formula.
Weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
Referring to fig. 2, an embodiment of the present invention further discloses a microphone array sound collecting system for long-distance speech recognition, including:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data received by the multiple microphones;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, obtaining the target voice according to the guide vector and the weight vector of the voice signal and outputting the target voice.
According to the technical scheme, compared with the prior art, the microphone array pickup method and the system for remote speech recognition are provided, and the distortion of target speech is reduced by manually specifying the direction of the speech. Secondly, the direction of the interference signal is manually appointed, the human voice interference is weighted, the human voice interference is suppressed to be larger, the stable noise suppression is smaller, therefore, the sound can be picked up accurately, and a better solution is provided for the remote voice control.
Example 2
An example of a sound pickup method to which embodiment 1 of the present invention is specifically applied is as follows:
the invention has no requirements on the number, the shape and the size of the array, and only needs the fixed and known positions of all the microphones.
As shown in fig. 3, in the experiment of the conventional MVDR method and the present invention, an 8-microphone array was set and horizontally placed to test the effect. Meanwhile, 4 noise sources, an interference source, and a target voice were placed in the simulation experiment, and they were placed on a horizontal plane at 60-degree intervals at a distance of 20 m.
In addition to noise and interference sources, this embodiment adds independent white noise to each microphone to simulate a real recording environment.
FIG. 4 is a schematic diagram illustrating the processing result of the conventional MVDR method;
referring to FIG. 5, a schematic diagram of the processing results of the present invention provided in example 2 is shown;
as can be seen from fig. 4 and fig. 5, after the conventional MVDR method and the voice control method are tested, the distortion of the voice obtained by the voice control method is smaller, the voice interference is suppressed, and the voice is clearer, so that the target voice intelligibility is improved, the subsequent processing is easier, the accurate sound pickup is performed, and a better solution is provided for the remote voice control.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A microphone array pickup method for remote speech recognition is characterized by comprising the following specific steps:
s100: selecting any voice signal direction, and calculating a guide vector of the voice signal;
s200: selecting any interference signal direction, calculating a guide vector of the interference signal, and obtaining a covariance matrix of the interference signal according to the guide vector of the interference signal direction;
s300: picking up sound by using a multi-microphone array, and calculating a noise covariance matrix according to sound data collected by the multi-microphone;
s400: and calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining the target voice according to the guide vector of the voice signal and the weight vector.
2. The microphone array pickup method for long-distance speech recognition according to claim 1, wherein in S100, the steering vector of the speech signal direction is calculated as follows:
manually selecting a voice signal direction, acquiring the positions and sound velocities of a plurality of microphones, and calculating the time delay of sound reaching each microphone according to the voice signal direction, the positions and the sound velocities of the microphones to obtain a guide vector of a voice signal:
Figure FDA0003255133200000011
in the formula, τnThe time delay of sound reaching each microphone is N-1, 2, …, N, d (ω), which is the steering vector of the speech signal.
3. The method as claimed in claim 1, wherein the step of calculating the covariance matrix of the interference signal in S200 is as follows:
s210: manually selecting the direction of an interference signal, calculating the time delay of sound reaching each microphone according to the direction of the interference signal, the position and the sound velocity of each microphone, and obtaining a guide vector of the interference signal:
Figure FDA0003255133200000012
in the formula, τnFor the time delay of sound reaching each of the microphones, N is 1,2, …, N, di(ω) is the steering vector of the interference signal;
s220: according to the definition of covariance matrix, the covariance matrix of interference signal can be obtained
Figure FDA0003255133200000013
In the formula (d)i(ω) is the steering vector of the interfering signal,
Figure FDA0003255133200000014
is a guide vector diConjugate transpose of (omega), phiiiAnd (omega) is the covariance matrix of the interference signal.
4. The microphone array pickup method for long-distance speech recognition according to claim 1, wherein in S300, the noise covariance matrix is calculated as follows:
the method comprises the following steps of (1) picking up by multiple microphones, and calculating a noise covariance matrix according to sound data collected by the multiple microphones:
Φvv(ω)=E[y(ω)yH(ω)];
where y (ω) is a frequency domain representation of the signals received by the plurality of microphones, yHAnd (omega) is a conjugate transpose vector of y (omega). Phi is avvAnd (omega) is the noise covariance matrix.
5. The microphone array pickup method for long-distance speech recognition according to claim 4, wherein after the noise covariance matrix is calculated in S300, the noise covariance matrix is further modified by:
calculating noise energy and interference energy, wherein the noise energy calculation formula is as follows:
Ev(ω)=dH(ω)Φvv(ω)d(ω);
in the formula, phivv(ω) is a noise covariance matrix, d (ω) is a steering vector of the speech signal, dH(ω) is the conjugate transpose of the steering vector d (ω), Ev(ω) is the noise energy;
the interference signal energy calculation formula is as follows:
Figure FDA0003255133200000021
in the formula, phiνv(ω) is the noise covariance matrix, di(ω) is the steering vector of the interfering signal,
Figure FDA0003255133200000022
is a guide vector diConjugate transpose of (ω);
determining a weighting coefficient according to the energy ratio of the noise energy and the interference energy, wherein the specific formula is as follows:
Figure FDA0003255133200000023
and correcting the noise covariance matrix according to the weighting coefficient to obtain a corrected equation as follows:
h(ω)=argmin(hH(ω)(Φvv(ω)+λ(ω)Φii(ω))h(ω),s.t.hH(ω)d(ω)=1;
wherein λ (ω) is a weighting coefficient, φvv(ω) is the noise covariance matrix, φii(ω) i.e. the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hHAnd (omega) is a conjugate transpose vector of h (omega), and h (omega) is a filter coefficient.
6. The microphone array pickup method for remote speech recognition according to claim 5, wherein the step of obtaining the target speech in S400 is as follows:
and solving a filter coefficient by adopting a Lagrange multiplier method according to the corrected equation:
Figure FDA0003255133200000031
in the formula, phivv(ω) is a noise covariance matrix, λ (ω) is a weighting coefficient, φii(ω) is the covariance matrix of the interfering signal, d (ω) is the steering vector of the speech signal, hH(omega) is a conjugate transpose vector of h (omega), and h (omega) is the obtained filter coefficient;
weighting the multi-microphone voice according to the filter coefficient to obtain a target voice:
Z(ω)=hH(ω)y(ω);
in the formula, Z (ω) is the clear long-distance voice that we want to record.
7. A microphone array sound pickup system for long-distance speech recognition, comprising:
the first selection module is used for selecting the direction of the voice signal;
the first calculation module is connected with the first selection module and used for calculating the guide vector of the voice signal according to the direction of the voice signal;
the second selection module is used for selecting the direction of the interference signal;
the second calculation module is connected with the second selection module and used for calculating the guide vector of the interference signal according to the direction of the interference signal;
the acquisition module is used for acquiring sound data of multiple microphones;
the third calculation module is connected with the acquisition module and the second calculation module and used for calculating a noise covariance matrix according to the sound data;
and the output module is connected with the third calculation module and the first calculation module and used for calculating the optimal weight vector of the multi-microphone according to the noise covariance matrix, and obtaining and outputting the target voice according to the guide vector of the voice signal and the weight vector.
CN202111057434.1A 2021-09-09 2021-09-09 Microphone array pickup method and system for long-distance voice recognition Active CN113782046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111057434.1A CN113782046B (en) 2021-09-09 2021-09-09 Microphone array pickup method and system for long-distance voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111057434.1A CN113782046B (en) 2021-09-09 2021-09-09 Microphone array pickup method and system for long-distance voice recognition

Publications (2)

Publication Number Publication Date
CN113782046A true CN113782046A (en) 2021-12-10
CN113782046B CN113782046B (en) 2024-09-17

Family

ID=78842224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111057434.1A Active CN113782046B (en) 2021-09-09 2021-09-09 Microphone array pickup method and system for long-distance voice recognition

Country Status (1)

Country Link
CN (1) CN113782046B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118522285A (en) * 2024-07-25 2024-08-20 辽宁汉华信息工程有限公司 Interactive user voice recognition method for AI intelligent agent

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108181507A (en) * 2017-12-25 2018-06-19 中国科学技术大学 A kind of robust adaptive beamforming method
US20180176679A1 (en) * 2016-12-20 2018-06-21 Verizon Patent And Licensing Inc. Beamforming optimization for receiving audio signals
JP2018141922A (en) * 2017-02-28 2018-09-13 日本電信電話株式会社 Steering vector estimation device, steering vector estimating method and steering vector estimation program
CN108694957A (en) * 2018-04-08 2018-10-23 湖北工业大学 The echo cancelltion design method formed based on circular microphone array beams
CN108831495A (en) * 2018-06-04 2018-11-16 桂林电子科技大学 A kind of sound enhancement method applied to speech recognition under noise circumstance
JP2019054344A (en) * 2017-09-13 2019-04-04 日本電信電話株式会社 Filter coefficient calculation device, sound pickup device, method thereof, and program
CN110503971A (en) * 2018-05-18 2019-11-26 英特尔公司 Time-frequency mask neural network based estimation and Wave beam forming for speech processes
CN110890099A (en) * 2018-09-10 2020-03-17 北京京东尚科信息技术有限公司 Sound signal processing method, device and storage medium
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
CN111052766A (en) * 2017-09-07 2020-04-21 三菱电机株式会社 Noise removing device and noise removing method
CN111081267A (en) * 2019-12-31 2020-04-28 中国科学院声学研究所 Multi-channel far-field speech enhancement method
CN112447184A (en) * 2020-11-10 2021-03-05 北京小米松果电子有限公司 Voice signal processing method and device, electronic equipment and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180176679A1 (en) * 2016-12-20 2018-06-21 Verizon Patent And Licensing Inc. Beamforming optimization for receiving audio signals
JP2018141922A (en) * 2017-02-28 2018-09-13 日本電信電話株式会社 Steering vector estimation device, steering vector estimating method and steering vector estimation program
CN111052766A (en) * 2017-09-07 2020-04-21 三菱电机株式会社 Noise removing device and noise removing method
JP2019054344A (en) * 2017-09-13 2019-04-04 日本電信電話株式会社 Filter coefficient calculation device, sound pickup device, method thereof, and program
CN108181507A (en) * 2017-12-25 2018-06-19 中国科学技术大学 A kind of robust adaptive beamforming method
CN108694957A (en) * 2018-04-08 2018-10-23 湖北工业大学 The echo cancelltion design method formed based on circular microphone array beams
CN110503971A (en) * 2018-05-18 2019-11-26 英特尔公司 Time-frequency mask neural network based estimation and Wave beam forming for speech processes
CN108831495A (en) * 2018-06-04 2018-11-16 桂林电子科技大学 A kind of sound enhancement method applied to speech recognition under noise circumstance
CN110890099A (en) * 2018-09-10 2020-03-17 北京京东尚科信息技术有限公司 Sound signal processing method, device and storage medium
CN110931036A (en) * 2019-12-07 2020-03-27 杭州国芯科技股份有限公司 Microphone array beam forming method
CN111081267A (en) * 2019-12-31 2020-04-28 中国科学院声学研究所 Multi-channel far-field speech enhancement method
CN112447184A (en) * 2020-11-10 2021-03-05 北京小米松果电子有限公司 Voice signal processing method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
杨志伟;张攀;陈颖;许华健;: "导向矢量和协方差矩阵联合迭代估计的稳健波束形成算法", 电子与信息学报, no. 12, 18 October 2018 (2018-10-18) *
臧守明;白媛;马秀荣;李俊胜;: "一种改进的嵌套阵列波束形成算法", 计算机仿真, no. 10, 15 October 2016 (2016-10-15) *
陈明建;罗景青;龙国庆;: "基于协方差矩阵估计的稳健Capon波束形成算法", 火力与指挥控制, no. 10, 15 October 2016 (2016-10-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118522285A (en) * 2024-07-25 2024-08-20 辽宁汉华信息工程有限公司 Interactive user voice recognition method for AI intelligent agent

Also Published As

Publication number Publication date
CN113782046B (en) 2024-09-17

Similar Documents

Publication Publication Date Title
Benesty et al. Fundamentals of differential beamforming
CN106251877B (en) Voice Sounnd source direction estimation method and device
CN106782590B (en) Microphone array beam forming method based on reverberation environment
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
US20090103749A1 (en) Microphone Array Processor Based on Spatial Analysis
Yousefian et al. A dual-microphone algorithm that can cope with competing-talker scenarios
Wang et al. Noise power spectral density estimation using MaxNSR blocking matrix
EP1430472A2 (en) Selective sound enhancement
Jarrett et al. Noise reduction in the spherical harmonic domain using a tradeoff beamformer and narrowband DOA estimates
WO2023108864A1 (en) Regional pickup method and system for miniature microphone array device
CN113257270B (en) Multi-channel voice enhancement method based on reference microphone optimization
CN113782046B (en) Microphone array pickup method and system for long-distance voice recognition
Fejgin et al. BRUDEX database: Binaural room impulse responses with uniformly distributed external microphones
Levin et al. Near-field signal acquisition for smartglasses using two acoustic vector-sensors
Šarić et al. Supervised speech separation combined with adaptive beamforming
Bai et al. Speech Enhancement by Denoising and Dereverberation Using a Generalized Sidelobe Canceller-Based Multichannel Wiener Filter
Geng et al. A speech enhancement method based on the combination of microphone array and parabolic reflector
Koyama et al. Exploring optimal dnn architecture for end-to-end beamformers based on time-frequency references
Zhu et al. Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming
As’ad et al. Beamforming designs robust to propagation model estimation errors for binaural hearing aids
D'Olne et al. Model-based beamforming for wearable microphone arrays
Schwartz et al. A recursive expectation-maximization algorithm for online multi-microphone noise reduction
Yen et al. Rotor noise-aware noise covariance matrix estimation for unmanned aerial vehicle audition
Šarić et al. Performance analysis of MVDR beamformer applied on an end-fire microphone array composed of unidirectional microphones
Bai et al. Kalman filter-based microphone array signal processing using the equivalent source model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant