CN113948101A - Noise suppression method and device based on spatial discrimination detection - Google Patents

Noise suppression method and device based on spatial discrimination detection Download PDF

Info

Publication number
CN113948101A
CN113948101A CN202111216600.8A CN202111216600A CN113948101A CN 113948101 A CN113948101 A CN 113948101A CN 202111216600 A CN202111216600 A CN 202111216600A CN 113948101 A CN113948101 A CN 113948101A
Authority
CN
China
Prior art keywords
filter
module
noise
spatial
frequency domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111216600.8A
Other languages
Chinese (zh)
Inventor
何平
蒋升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suirui Technology Group Co Ltd
Original Assignee
Suirui Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suirui Technology Group Co Ltd filed Critical Suirui Technology Group Co Ltd
Priority to CN202111216600.8A priority Critical patent/CN113948101A/en
Publication of CN113948101A publication Critical patent/CN113948101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a noise suppression method and a device based on spatial discrimination detection, belonging to the field of information processing, and the method comprises the following steps: s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone; s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors; s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter; s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice. The invention can ensure the optimality of the filter and improve the performances of noise suppression and reverberation suppression.

Description

Noise suppression method and device based on spatial discrimination detection
Technical Field
The present invention belongs to the field of information processing, and in particular, relates to a noise suppression method and apparatus based on spatial discrimination detection.
Background
In the applications of an actual online conference system, intelligent home voice interaction and the like, a speaker has a certain distance from a microphone, and reverberation and noise are picked up by the microphone at the same time, so that the voice communication quality and the interaction accuracy are influenced. On one hand, the reverberation of the multiple reflections of the wall can cause the performance of the noise suppression filter to be reduced, especially in the application scenes with large reverberation such as conference rooms and the like; on the other hand, the presence of background noise also leads to a degradation of the performance of reverberation suppression.
The currently common scheme firstly converts each channel time domain signal into a time-frequency domain based on short-time Fourier transform, then designs a group of filters to calculate the correlation of each time-frequency unit relative to the historical signal, wherein the correlation is caused by reverberation, and the filters eliminate the reverberation based on the correlation. And then, calculating an ideal guide vector based on the azimuth information of the speaker relative to the microphone array, and designing a filter based on a minimum noise energy mode. Both filters perform reverberation suppression and noise suppression in sequence, usually significantly less effective than noise or scenes where reverberation exists alone.
In the prior art, the method for suppressing noise mainly has the following defects:
1) noise and reverberation exist simultaneously, a model is established for the reverberation and the noise independently, and the performance of sequence type reverberation suppression and noise suppression is generally reduced remarkably.
2) Under a strong reverberation scene, a guide vector purely based on azimuth information is not matched with a real guide vector, so that voice distortion is caused, and the voice interaction quality is reduced.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a noise suppression method and a noise suppression device based on spatial differentiation detection, which can ensure the optimality of a filter and improve the performances of noise suppression and reverberation suppression.
In order to achieve the above object, the present invention provides a noise suppression method based on spatial discriminative detection, comprising the steps of:
s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone;
s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors;
s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter;
s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
Further, before the step S1, the method further includes acquiring a voice signal x of a microphonem(n);
In step S1, the method specifically includes the following steps:
s101: for each frequency band k, a target speech steering vector u (k) is calculated:
Figure BDA0003310900240000021
Figure BDA0003310900240000022
q(θ)=[cos(θ),sin(θ)];
s102: for each frequency band k, compute a super-directional filter h (k):
Figure BDA0003310900240000023
further, the step S2 includes the following steps:
s201: for time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
Figure BDA0003310900240000031
s202: for each frequency band k, a frequency domain prediction vector X (l, k) sum is constructed
Figure BDA0003310900240000032
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T
Figure BDA0003310900240000033
Further, the step S3 includes the following steps:
s301: calculating spatial discriminative coefficients and spatial weight information of the current frame:
the spatial discriminative coefficient is calculated as follows:
Figure BDA0003310900240000034
Figure BDA0003310900240000035
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
Figure BDA0003310900240000036
s302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
Figure BDA0003310900240000037
s303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
Figure BDA0003310900240000038
Figure BDA0003310900240000041
still further, the step S4 includes the steps of:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
Figure BDA0003310900240000042
Figure BDA0003310900240000043
S402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure BDA0003310900240000044
Figure BDA0003310900240000045
The invention also provides a noise suppression device based on the spatial discrimination detection, which comprises an initialization module, a signal decomposition module, a filter calculation module and a target voice estimation module;
the initialization module is used for calculating a guide vector and a super-directional filter of a time domain signal of each microphone;
the signal decomposition module is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector;
the filter calculation module is used for performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the filter calculation module comprises: a first calculation module for calculating spatial discriminative coefficients and spatial weight information, a first update module for updating a weighted autocorrelation matrix, and a first construction module for constructing a noise and reverberation suppression filter;
and the target voice estimation module is used for obtaining the frequency domain estimation of the target voice according to the obtained filter so as to obtain the time domain estimation of the target voice.
Further, the initialization module is also used for acquiring a voice signal x of the microphonem(n);
The initialization module is configured to:
for each frequency band k, a target speech steering vector u (k) is calculated:
Figure BDA0003310900240000051
Figure BDA0003310900240000052
q(θ)=[cos(θ),sin(θ)];
for each frequency band k, compute a super-directional filter h (k):
Figure BDA0003310900240000053
further, the signal decomposition module comprises a signal conversion module and a vector construction module;
the signal conversion module is used for converting the time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
Figure BDA0003310900240000054
the vector construction module constructs frequency domain prediction vectors X (l, k) and X (l, k) for each frequency band k
Figure BDA0003310900240000055
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T
Figure BDA0003310900240000056
Further, the step S3 includes the following steps:
in the first calculation module, the spatial discriminative coefficients are calculated as follows:
Figure BDA0003310900240000057
Figure BDA0003310900240000058
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
Figure BDA0003310900240000059
in the first updating module, for each frequency band k, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
Figure BDA0003310900240000061
in the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
Figure BDA0003310900240000062
Figure BDA0003310900240000063
furthermore, the target voice estimation module comprises a frequency domain estimation module and a target voice estimation module;
the frequency domain estimation module is used for obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
Figure BDA0003310900240000064
Figure BDA0003310900240000065
The target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure BDA0003310900240000066
Figure BDA0003310900240000067
Compared with the prior art, the noise suppression method and device based on the spatial discrimination detection, provided by the invention, have the advantages that compared with a cascading reverberation reduction and noise reduction scheme, the noise and reverberation are jointly modeled, an overall optimal filter is designed, and the reverberation and noise suppression can achieve a better effect. In addition, the spatial discriminative weight designed by the invention can adaptively distinguish the time-frequency region with dominant voice, can further avoid the over-cancellation of the target voice as noise or reverberation, and improves the robustness.
Drawings
Fig. 1 is a flowchart of a noise suppression method based on spatial discrimination detection in the present embodiment.
Fig. 2 is a diagram of a hamming window function used in this embodiment.
Fig. 3 is a schematic diagram of noise suppression based on spatial discrimination detection in the present embodiment.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
As shown in fig. 1, according to the noise suppression method based on spatial discriminative detection in the preferred embodiment of the present invention, a unified filter design rule is adopted to jointly model noise and reverberation, so as to ensure the optimality of the filter; a group of spatial distinguishing characteristics is designed, the speaker guide vector is updated in real time in a reverberation scene, and the performance of noise suppression and reverberation suppression is improved.
The method is applied to a system based on a microphone array, and specifically comprises the following four implementation steps:
s1: the time domain signal for each microphone is subjected to steering vector and super-directional filter calculations.
Before step S1, the method further includes acquiring a voice signal of the microphone, where the acquired voice signal is as follows: let x bem(n) represents original time domain signals picked up by M microphone elements in real time, wherein M represents a microphone serial number label, and the value of the microphone serial number label is from 1 to M; n represents a time stamp; the direction of the target speech relative to the microphone array is known as θ.
The target voice is a voice signal corresponding to a target direction, and for a voice separation task, the target direction is known in advance according to the extracted signal, for example, for a large-screen voice communication device, a target voice signal in a 90-degree direction is expected to be separated.
Specifically, the step S1 specifically includes the following steps:
s101: for each frequency band K (K ═ 1, 2.. K), which is a signal component corresponding to a certain frequency, a target speech guidance vector u (K) is calculated. The specific calculation formula is as follows:
Figure BDA0003310900240000081
Figure BDA0003310900240000082
q(θ)=[cos(θ),sin(θ)]。
wherein f iskK, where K is determined according to subsequent fourier transform, and if the frame length is 512, the value of K is half of the frame length; c is sound speed, and c is 340 m/s; dmIs the two-dimensional coordinate value of the mth microphone; superscript H represents the conjugate transpose operator; j represents an imaginary unit
Figure BDA0003310900240000083
q (theta) is a direction vector, omegakIs the frequency band circle frequency.
This step S101 is used to initialize a steering vector representing the signal difference of each microphone element in the theta direction in the ideal scene without reverberation and array element difference. For calculating the super-directional filter in the subsequent step S102.
S102: for each frequency band k, a super-directional filter h (k) is calculated. The specific calculation formula is as follows:
Figure BDA0003310900240000084
where R (k) represents the autocorrelation matrix of the uniform scattered field and the superscript-1 represents the inverse of the matrix. The super-directional filter can theoretically completely reserve a signal of a target direction theta, and simultaneously suppress uniform scattered field noise to the maximum extent.
S2: and converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors.
Specifically, the step S2 includes the steps of:
s201: for time domain signal xm(n) performing a short-time Fourier transform to obtain a time-frequency domain representation, the purpose of which is to convert the time-domain signal into a time-frequency domain signal. The specific calculation formula is as follows:
Figure BDA0003310900240000085
wherein, N is the frame length, and N is 512; w (n) is a Hamming window of length 512, where n represents the number of times, and thus w (n) represents the value of each corresponding time number n; l is a time frame sequence number and takes a frame as a unit; k is a frequency number. Xm(l, k) is the spectrum of the mth microphone signal, in the mth frame, the kth frequency band. The hamming window function used in the present invention is shown in fig. 2.
S202: for each frequency band k, a frequency domain prediction vector X (l, k) sum is constructed
Figure BDA0003310900240000091
The specific calculation formula is as follows:
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T
Figure BDA0003310900240000092
wherein, the superscript T represents the transpose operator; l represents the length of the filter forward timeframe and typically ranges from 3 to 20. In the above formulaIt can be seen that the vector X (l, k) is an M X1 dimensional column vector,
Figure BDA0003310900240000093
is a (L +1) × 1 dimension column vector. Constructing the frequency domain prediction vector
Figure BDA0003310900240000094
To predict the noise and reverberation components in subsequent steps.
The value of the method is L-12, so that the storage and calculation time can be effectively saved, and the noise suppression performance can not be obviously reduced.
The transformation from the time domain signal to the time-frequency domain can be completed by the above step S2.
S3: and performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression.
Wherein the calculation of the noise suppression filter comprises: calculating spatial discriminative coefficients and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter.
Specifically, the step S3 includes the steps of:
s301: calculating a spatial discriminative coefficient and spatial weight information of a current frame;
the spatial discriminative coefficient is calculated as follows:
Figure BDA0003310900240000095
Figure BDA0003310900240000096
wherein, | represents a modulus of the complex number; alpha is a smoothing factor between adjacent frames and has a value ranging between 0 and 1. In the present invention, it is preferable that α is 0.92, if the value α is less than 0.88, the variation range of the energy estimation exceeds 20%, there is a defect of instability, and if the value α is more than 0.96, the energy estimation is too smooth, and the spatial discrimination is less than 40 degrees. A value of 0.92 can balance robustness and accuracy very well.
In the formula, ρs(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the microphone picked signal, rhos(l-1) and ρx(l-1) respectively represents the speech direction of the l-1 frame and the energy estimation of the microphone pick-up signal, and the difference of the energy distribution represents the spatial distinctiveness.
The spatial weight information of the current frame is calculated as follows:
Figure BDA0003310900240000101
the spatial weight information calculated in step S301 is used to update the weighted autocorrelation matrix in the subsequent steps.
S302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
Figure BDA0003310900240000102
where α is an adjacent inter-frame smoothing factor, and the smoothing factor α is the same as that in step S301. Through the dynamic weight information in the formula, the correlation matrix can selectively accumulate noise and reverberation components, and the noise and reverberation can be inhibited without causing distortion of target voice. The weighted autocorrelation matrix represents the correlation between weighted microphone signals, and the weighted autocorrelation matrix mainly retains the correlation information of noise and reverberation because the weights are smaller at places where the noise and reverberation are larger. The weighted autocorrelation matrix can be used for the computation of the final filter in subsequent steps.
S303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
Figure BDA0003310900240000103
Figure BDA0003310900240000104
wherein the content of the first and second substances,
Figure BDA0003310900240000105
the column vector with dimension (L +1) × 1, which is obtained by the u (k) and 0 vector expansion in step S101, allows the filter to have both the noise reduction and reverberation reduction performance through the expansion process.
The noise and reverberation suppression filter is used to perform the frequency domain estimation calculation of the target speech in the subsequent step S4.
S4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
The method specifically comprises the following steps:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
Figure BDA0003310900240000111
The specific calculation formula is as follows:
Figure BDA0003310900240000112
s402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure BDA0003310900240000113
The specific calculation formula is as follows:
Figure BDA0003310900240000114
through the steps of the invention, the initialization, the signal decomposition, the filter calculation and the target voice estimation of the target voice estimation signal can be realized.
In practical use, based on an 8-microphone linear array, a recording data test in a conference scene with a microphone spacing of 3.5cm, a length of 8 meters and a width of 4 meters and a height of 2.5 meters shows that by adopting the algorithm, the signal-to-noise ratio can be improved by 10dB (noise energy is suppressed by 90%) and the reverberation suppression ratio is 4.5dB (reverberation energy is suppressed by 65%).
As shown in fig. 3, an embodiment of the present invention is a directional information guided real-time speech separation apparatus applied to a microphone array based system, which includes an initialization module 1, a signal decomposition module 2, a filter calculation module 3, and a target speech estimation module 4.
And the initialization module 1 is used for calculating a steering vector and a super-directional filter of the time domain signal of each microphone.
The initialization module 1 can also be used to obtain the speech signal of the microphone, the obtained speech signal is as follows: let x bem(n) represents original time domain signals picked up by M microphone elements in real time, wherein M represents a microphone serial number label, and the value of the microphone serial number label is from 1 to M; n represents a time stamp; the direction of the target speech relative to the microphone array is known as θ.
The target voice is a voice signal corresponding to a target direction, and for a voice separation task, the target direction is known in advance according to the extracted signal, for example, for a large-screen voice communication device, a target voice signal in a 90-degree direction is expected to be separated.
Specifically, the initialization module 1 is configured to perform the following operations:
for each frequency band K (K ═ 1, 2.. K), which is a signal component corresponding to a certain frequency, a target speech guidance vector u (K) is calculated. The specific calculation formula is as follows:
Figure BDA0003310900240000121
Figure BDA0003310900240000122
q(θ)=[cos(θ),sin(θ)]。
wherein f iskK, where K is determined according to subsequent fourier transform, and if the frame length is 512, the value of K is half of the frame length; c is sound speed, and c is 340 m/s; dmIs the two-dimensional coordinate value of the mth microphone; superscript H represents the conjugate transpose operator; j represents an imaginary unit
Figure BDA0003310900240000123
q (theta) is a direction vector, omegakIs the frequency band circle frequency.
The above operation is used to initialize steering vectors representing the signal differences of the microphone elements in the theta direction in an ideal scene without reverberation and array element differences. For calculating the super-directional filter in subsequent operations.
For each frequency band k, a super-directional filter h (k) is calculated. The specific calculation formula is as follows:
Figure BDA0003310900240000124
where R (k) represents the autocorrelation matrix of the uniform scattered field, the inverse of the-1 matrix is superscripted. The filter can theoretically completely reserve the signal of the target direction theta, and simultaneously suppress the uniform scattered field noise to the maximum extent.
And the signal decomposition module 2 is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector.
In particular, the signal decomposition module 2 comprises the following sub-modules: the device comprises a signal conversion module and a vector construction module.
A signal conversion module for converting the time domain signal xm(n) performing a short-time Fourier transform to obtain a time-frequency domain representation, the purpose of which is to convert the time-domain signal into a time-frequency domain signal. The specific calculation formula is as follows:
Figure BDA0003310900240000131
wherein, N is the frame length, and N is 512; w (n) is a Hamming window of length 512, where n represents the number of times, and thus w (n) represents the value of each corresponding time number n; l is a time frame sequence number and takes a frame as a unit; k is a frequency number. Xm(l, k) is the spectrum of the mth microphone signal, in the mth frame, the kth frequency band. The hamming window function used in the present invention is shown in fig. 2.
A vector construction module for constructing a frequency domain original vector X (l, k) sum for each frequency band k
Figure BDA0003310900240000132
The specific calculation formula is as follows:
X(l,k)=[X_1(l,k),X_2(l,k),...,X_M(l,k)]^T;
X^(l,k)=[X^T(l,k),X^T(l-1,k),....,X^T(l-L,k)]^T。
wherein, the superscript T represents the transpose operator; l represents the length of the filter forward timeframe and typically ranges from 3 to 20. In the above formula, it can be seen that the vector X (l, k) is an M × 1 dimension column vector,
Figure BDA0003310900240000133
is a (L +1) × 1 dimension column vector. Constructing the frequency domain prediction vector
Figure BDA0003310900240000134
To predict the noise and reverberation components in subsequent steps.
The value of the method is L-12, so that the storage and calculation time can be effectively saved, and the noise suppression performance can not be obviously reduced.
The transformation from the time domain signal to the time-frequency domain can be completed through the operation.
And the filter calculation module 3 is used for performing noise suppression filter calculation on the time-frequency domain signal to obtain a filter for constructing noise and reverberation suppression.
Wherein, the filter calculation module 3 includes: the first computing module is used for computing spatial differentiation coefficients and spatial weight information, the first updating module is used for updating a weighted autocorrelation matrix, and the first constructing module is used for constructing a noise and reverberation suppression filter.
Specifically, in the first calculation module, the spatial discriminative coefficient is calculated as follows:
Figure BDA0003310900240000141
Figure BDA0003310900240000142
wherein, | represents a modulus of the complex number; alpha is a smoothing factor between adjacent frames and has a value ranging between 0 and 1. In the present invention, it is preferable that α is 0.92, if the value α is less than 0.88, the variation range of the energy estimation exceeds 20%, there is a defect of instability, and if the value α is more than 0.96, the energy estimation is too smooth, and the spatial discrimination is less than 40 degrees. A value of 0.92 can balance robustness and accuracy very well.
In the formula, ρs(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the microphone picked signal, rhos(l-1) and ρxAnd (l-1) respectively represents the speech direction of the l-1 frame and the energy estimation of the microphone pick-up signal.
The spatial weight information of the current frame is calculated as follows:
Figure BDA0003310900240000143
the spatial weight information calculated in the above operation is used for subsequent updating of the weighted autocorrelation matrix.
In the first updating module, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
Figure BDA0003310900240000144
wherein α is a smoothing factor between adjacent frames, and the smoothing factor α is the same as that in the first calculation module. Through the dynamic weight information in the formula, the correlation matrix can selectively accumulate noise and reverberation components, and the noise and reverberation can be inhibited without causing distortion of target voice. The weighted autocorrelation matrix represents the correlation between weighted microphone signals, and the weighted autocorrelation matrix mainly retains the correlation information of noise and reverberation because the weights are smaller at places where the noise and reverberation are larger. The weighted autocorrelation matrix can be used for subsequent final filter calculations.
In the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
Figure BDA0003310900240000151
Figure BDA0003310900240000152
wherein the content of the first and second substances,
Figure BDA0003310900240000153
in order to initialize the column vector with dimension (L +1) × 1 in the module 1, u (k) and 0 vector are expanded, the filter has the performance of noise reduction and reverberation reduction through the expansion process.
The noise suppression filter is used for carrying out frequency domain estimation calculation of the target voice in subsequent operation.
And the target voice estimation module 4 is used for obtaining the frequency domain estimation of the target voice according to the obtained filter, and further obtaining the time domain estimation of the target voice.
Specifically, the target speech estimation module 4 includes the following sub-modules: the device comprises a frequency domain estimation module and a target voice estimation module.
The frequency domain estimation module obtains the target language according to the solved noise and reverberation suppression filterFrequency domain estimation of tones
Figure BDA0003310900240000154
The specific calculation formula is as follows:
Figure BDA0003310900240000155
the target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure BDA0003310900240000156
The specific calculation formula is as follows:
Figure BDA0003310900240000157
the 4 modules are all absent from the invention. And the absence of any step can cause that the target voice cannot be extracted.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. A noise suppression method based on spatial discrimination detection is characterized by comprising the following steps:
s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone;
s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors;
s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter;
s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
2. The noise suppression method based on spatial differentiation detection according to claim 1, characterized in that said step S1 is preceded by further comprising obtaining a speech signal x of a microphonem(n);
In step S1, the method specifically includes the following steps:
s101: for each frequency band k, a target speech steering vector u (k) is calculated:
Figure FDA0003310900230000011
Figure FDA0003310900230000012
q(θ)=[cos(θ),sin(θ)];
s102: for each frequency band k, compute a super-directional filter h (k):
Figure FDA0003310900230000013
3. the noise suppression method based on spatial discrimination detection according to claim 2, wherein said step S2 includes the steps of:
s201: for time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
Figure FDA0003310900230000021
s202: for each frequency band k, a frequency domain prediction vector X (l, k) sum is constructed
Figure FDA0003310900230000022
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T
Figure FDA0003310900230000023
4. The noise suppression method based on spatial discrimination detection according to claim 3, wherein said step S3 includes the steps of:
s301: calculating spatial discriminative coefficients and spatial weight information of the current frame:
the spatial discriminative coefficient is calculated as follows:
Figure FDA0003310900230000024
Figure FDA0003310900230000025
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
Figure FDA0003310900230000026
s302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
Figure FDA0003310900230000027
s303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
Figure FDA0003310900230000028
Figure FDA0003310900230000029
5. the noise suppression method based on spatial differentiation detection according to claim 4, characterized in that said step S4 comprises the steps of:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
Figure FDA0003310900230000031
Figure FDA0003310900230000032
S402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure FDA0003310900230000033
Figure FDA0003310900230000034
6. A noise suppression device based on spatial differentiation detection is characterized by comprising an initialization module, a signal decomposition module, a filter calculation module and a target voice estimation module;
the initialization module is used for calculating a guide vector and a super-directional filter of a time domain signal of each microphone;
the signal decomposition module is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector;
the filter calculation module is used for performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the filter calculation module comprises: a first calculation module for calculating spatial discriminative coefficients and spatial weight information, a first update module for updating a weighted autocorrelation matrix, and a first construction module for constructing a noise and reverberation suppression filter;
and the target voice estimation module is used for obtaining the frequency domain estimation of the target voice according to the obtained filter so as to obtain the time domain estimation of the target voice.
7. The apparatus according to claim 6, wherein the initialization module is further configured to obtain a speech signal x of a microphonem(n);
The initialization module is configured to:
for each frequency band k, a target speech steering vector u (k) is calculated:
Figure FDA0003310900230000041
Figure FDA0003310900230000042
q(θ)=[cos(θ),sin(θ)];
for each frequency band k, compute a super-directional filter h (k):
Figure FDA0003310900230000043
8. the apparatus according to claim 7, wherein the signal decomposition module comprises a signal conversion module and a vector construction module;
the signal conversion module is used for converting the time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
Figure FDA0003310900230000044
the vector construction module constructs frequency domain prediction vectors X (l, k) and X (l, k) for each frequency band k
Figure FDA0003310900230000045
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T
Figure FDA0003310900230000046
9. The noise suppression method based on spatial differentiation detection according to claim 8, characterized in that said step S3 comprises the steps of:
in the first calculation module, the spatial discriminative coefficients are calculated as follows:
Figure FDA0003310900230000047
Figure FDA0003310900230000048
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
Figure FDA0003310900230000051
in the first updating module, for each frequency band k, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
Figure FDA0003310900230000052
in the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
Figure FDA0003310900230000053
Figure FDA0003310900230000054
10. the apparatus for noise suppression based on spatial discrimination detection according to claim 9, wherein the target speech estimation module includes a frequency domain estimation module and a target speech estimation module;
said frequencyA domain estimation module for obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
Figure FDA0003310900230000055
Figure FDA0003310900230000056
The target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Figure FDA0003310900230000057
Figure FDA0003310900230000058
CN202111216600.8A 2021-10-19 2021-10-19 Noise suppression method and device based on spatial discrimination detection Pending CN113948101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111216600.8A CN113948101A (en) 2021-10-19 2021-10-19 Noise suppression method and device based on spatial discrimination detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111216600.8A CN113948101A (en) 2021-10-19 2021-10-19 Noise suppression method and device based on spatial discrimination detection

Publications (1)

Publication Number Publication Date
CN113948101A true CN113948101A (en) 2022-01-18

Family

ID=79331367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111216600.8A Pending CN113948101A (en) 2021-10-19 2021-10-19 Noise suppression method and device based on spatial discrimination detection

Country Status (1)

Country Link
CN (1) CN113948101A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935835A (en) * 2024-03-22 2024-04-26 浙江华创视讯科技有限公司 Audio noise reduction method, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935835A (en) * 2024-03-22 2024-04-26 浙江华创视讯科技有限公司 Audio noise reduction method, electronic device and storage medium
CN117935835B (en) * 2024-03-22 2024-06-07 浙江华创视讯科技有限公司 Audio noise reduction method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN101510426B (en) Method and system for eliminating noise
CN108172235B (en) LS wave beam forming reverberation suppression method based on wiener post filtering
WO2015196729A1 (en) Microphone array speech enhancement method and device
CN107221336B (en) Device and method for enhancing target voice
CN102164328B (en) Audio input system used in home environment based on microphone array
US20170251301A1 (en) Selective audio source enhancement
CN108172231B (en) Dereverberation method and system based on Kalman filtering
CN109285557B (en) Directional pickup method and device and electronic equipment
CN108877827A (en) Voice-enhanced interaction method and system, storage medium and electronic equipment
CN111445919B (en) Speech enhancement method, system, electronic device, and medium incorporating AI model
JP4910568B2 (en) Paper rubbing sound removal device
CN102739886A (en) Stereo echo offset method based on echo spectrum estimation and speech existence probability
CN105376673A (en) Microphone Array Processor Based on Spatial Analysis
JP6225245B2 (en) Signal processing apparatus, method and program
JP7326627B2 (en) AUDIO SIGNAL PROCESSING METHOD, APPARATUS, DEVICE AND COMPUTER PROGRAM
CN113903353A (en) Directional noise elimination method and device based on spatial discrimination detection
Wang et al. Mask weighted STFT ratios for relative transfer function estimation and its application to robust ASR
CN110111802B (en) Kalman filtering-based adaptive dereverberation method
CN114171041A (en) Voice noise reduction method, device and equipment based on environment detection and storage medium
CN112363112A (en) Sound source positioning method and device based on linear microphone array
CN113948101A (en) Noise suppression method and device based on spatial discrimination detection
CN110890099B (en) Sound signal processing method, device and storage medium
CN114255777A (en) Mixing method and system for removing reverberation of real-time voice
CN113687305A (en) Method, device and equipment for positioning sound source azimuth and computer readable storage medium
CN112712818A (en) Voice enhancement method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination