CN113948101A - Noise suppression method and device based on spatial discrimination detection - Google Patents
Noise suppression method and device based on spatial discrimination detection Download PDFInfo
- Publication number
- CN113948101A CN113948101A CN202111216600.8A CN202111216600A CN113948101A CN 113948101 A CN113948101 A CN 113948101A CN 202111216600 A CN202111216600 A CN 202111216600A CN 113948101 A CN113948101 A CN 113948101A
- Authority
- CN
- China
- Prior art keywords
- filter
- module
- noise
- spatial
- frequency domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001629 suppression Effects 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000001514 detection method Methods 0.000 title claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 58
- 238000004364 calculation method Methods 0.000 claims abstract description 45
- 239000011159 matrix material Substances 0.000 claims abstract description 24
- 238000000354 decomposition reaction Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000004069 differentiation Effects 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 5
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000009499 grossing Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 238000003860 storage Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a noise suppression method and a device based on spatial discrimination detection, belonging to the field of information processing, and the method comprises the following steps: s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone; s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors; s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter; s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice. The invention can ensure the optimality of the filter and improve the performances of noise suppression and reverberation suppression.
Description
Technical Field
The present invention belongs to the field of information processing, and in particular, relates to a noise suppression method and apparatus based on spatial discrimination detection.
Background
In the applications of an actual online conference system, intelligent home voice interaction and the like, a speaker has a certain distance from a microphone, and reverberation and noise are picked up by the microphone at the same time, so that the voice communication quality and the interaction accuracy are influenced. On one hand, the reverberation of the multiple reflections of the wall can cause the performance of the noise suppression filter to be reduced, especially in the application scenes with large reverberation such as conference rooms and the like; on the other hand, the presence of background noise also leads to a degradation of the performance of reverberation suppression.
The currently common scheme firstly converts each channel time domain signal into a time-frequency domain based on short-time Fourier transform, then designs a group of filters to calculate the correlation of each time-frequency unit relative to the historical signal, wherein the correlation is caused by reverberation, and the filters eliminate the reverberation based on the correlation. And then, calculating an ideal guide vector based on the azimuth information of the speaker relative to the microphone array, and designing a filter based on a minimum noise energy mode. Both filters perform reverberation suppression and noise suppression in sequence, usually significantly less effective than noise or scenes where reverberation exists alone.
In the prior art, the method for suppressing noise mainly has the following defects:
1) noise and reverberation exist simultaneously, a model is established for the reverberation and the noise independently, and the performance of sequence type reverberation suppression and noise suppression is generally reduced remarkably.
2) Under a strong reverberation scene, a guide vector purely based on azimuth information is not matched with a real guide vector, so that voice distortion is caused, and the voice interaction quality is reduced.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a noise suppression method and a noise suppression device based on spatial differentiation detection, which can ensure the optimality of a filter and improve the performances of noise suppression and reverberation suppression.
In order to achieve the above object, the present invention provides a noise suppression method based on spatial discriminative detection, comprising the steps of:
s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone;
s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors;
s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter;
s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
Further, before the step S1, the method further includes acquiring a voice signal x of a microphonem(n);
In step S1, the method specifically includes the following steps:
s101: for each frequency band k, a target speech steering vector u (k) is calculated:
q(θ)=[cos(θ),sin(θ)];
s102: for each frequency band k, compute a super-directional filter h (k):
further, the step S2 includes the following steps:
s201: for time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T;
Further, the step S3 includes the following steps:
s301: calculating spatial discriminative coefficients and spatial weight information of the current frame:
the spatial discriminative coefficient is calculated as follows:
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
s302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
s303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
still further, the step S4 includes the steps of:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
S402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
The invention also provides a noise suppression device based on the spatial discrimination detection, which comprises an initialization module, a signal decomposition module, a filter calculation module and a target voice estimation module;
the initialization module is used for calculating a guide vector and a super-directional filter of a time domain signal of each microphone;
the signal decomposition module is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector;
the filter calculation module is used for performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the filter calculation module comprises: a first calculation module for calculating spatial discriminative coefficients and spatial weight information, a first update module for updating a weighted autocorrelation matrix, and a first construction module for constructing a noise and reverberation suppression filter;
and the target voice estimation module is used for obtaining the frequency domain estimation of the target voice according to the obtained filter so as to obtain the time domain estimation of the target voice.
Further, the initialization module is also used for acquiring a voice signal x of the microphonem(n);
The initialization module is configured to:
for each frequency band k, a target speech steering vector u (k) is calculated:
q(θ)=[cos(θ),sin(θ)];
for each frequency band k, compute a super-directional filter h (k):
further, the signal decomposition module comprises a signal conversion module and a vector construction module;
the signal conversion module is used for converting the time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
the vector construction module constructs frequency domain prediction vectors X (l, k) and X (l, k) for each frequency band k
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T;
Further, the step S3 includes the following steps:
in the first calculation module, the spatial discriminative coefficients are calculated as follows:
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
in the first updating module, for each frequency band k, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
in the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
furthermore, the target voice estimation module comprises a frequency domain estimation module and a target voice estimation module;
the frequency domain estimation module is used for obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
The target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Compared with the prior art, the noise suppression method and device based on the spatial discrimination detection, provided by the invention, have the advantages that compared with a cascading reverberation reduction and noise reduction scheme, the noise and reverberation are jointly modeled, an overall optimal filter is designed, and the reverberation and noise suppression can achieve a better effect. In addition, the spatial discriminative weight designed by the invention can adaptively distinguish the time-frequency region with dominant voice, can further avoid the over-cancellation of the target voice as noise or reverberation, and improves the robustness.
Drawings
Fig. 1 is a flowchart of a noise suppression method based on spatial discrimination detection in the present embodiment.
Fig. 2 is a diagram of a hamming window function used in this embodiment.
Fig. 3 is a schematic diagram of noise suppression based on spatial discrimination detection in the present embodiment.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
As shown in fig. 1, according to the noise suppression method based on spatial discriminative detection in the preferred embodiment of the present invention, a unified filter design rule is adopted to jointly model noise and reverberation, so as to ensure the optimality of the filter; a group of spatial distinguishing characteristics is designed, the speaker guide vector is updated in real time in a reverberation scene, and the performance of noise suppression and reverberation suppression is improved.
The method is applied to a system based on a microphone array, and specifically comprises the following four implementation steps:
s1: the time domain signal for each microphone is subjected to steering vector and super-directional filter calculations.
Before step S1, the method further includes acquiring a voice signal of the microphone, where the acquired voice signal is as follows: let x bem(n) represents original time domain signals picked up by M microphone elements in real time, wherein M represents a microphone serial number label, and the value of the microphone serial number label is from 1 to M; n represents a time stamp; the direction of the target speech relative to the microphone array is known as θ.
The target voice is a voice signal corresponding to a target direction, and for a voice separation task, the target direction is known in advance according to the extracted signal, for example, for a large-screen voice communication device, a target voice signal in a 90-degree direction is expected to be separated.
Specifically, the step S1 specifically includes the following steps:
s101: for each frequency band K (K ═ 1, 2.. K), which is a signal component corresponding to a certain frequency, a target speech guidance vector u (K) is calculated. The specific calculation formula is as follows:
q(θ)=[cos(θ),sin(θ)]。
wherein f iskK, where K is determined according to subsequent fourier transform, and if the frame length is 512, the value of K is half of the frame length; c is sound speed, and c is 340 m/s; dmIs the two-dimensional coordinate value of the mth microphone; superscript H represents the conjugate transpose operator; j represents an imaginary unitq (theta) is a direction vector, omegakIs the frequency band circle frequency.
This step S101 is used to initialize a steering vector representing the signal difference of each microphone element in the theta direction in the ideal scene without reverberation and array element difference. For calculating the super-directional filter in the subsequent step S102.
S102: for each frequency band k, a super-directional filter h (k) is calculated. The specific calculation formula is as follows:
where R (k) represents the autocorrelation matrix of the uniform scattered field and the superscript-1 represents the inverse of the matrix. The super-directional filter can theoretically completely reserve a signal of a target direction theta, and simultaneously suppress uniform scattered field noise to the maximum extent.
S2: and converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors.
Specifically, the step S2 includes the steps of:
s201: for time domain signal xm(n) performing a short-time Fourier transform to obtain a time-frequency domain representation, the purpose of which is to convert the time-domain signal into a time-frequency domain signal. The specific calculation formula is as follows:
wherein, N is the frame length, and N is 512; w (n) is a Hamming window of length 512, where n represents the number of times, and thus w (n) represents the value of each corresponding time number n; l is a time frame sequence number and takes a frame as a unit; k is a frequency number. Xm(l, k) is the spectrum of the mth microphone signal, in the mth frame, the kth frequency band. The hamming window function used in the present invention is shown in fig. 2.
S202: for each frequency band k, a frequency domain prediction vector X (l, k) sum is constructedThe specific calculation formula is as follows:
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T;
wherein, the superscript T represents the transpose operator; l represents the length of the filter forward timeframe and typically ranges from 3 to 20. In the above formulaIt can be seen that the vector X (l, k) is an M X1 dimensional column vector,is a (L +1) × 1 dimension column vector. Constructing the frequency domain prediction vectorTo predict the noise and reverberation components in subsequent steps.
The value of the method is L-12, so that the storage and calculation time can be effectively saved, and the noise suppression performance can not be obviously reduced.
The transformation from the time domain signal to the time-frequency domain can be completed by the above step S2.
S3: and performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression.
Wherein the calculation of the noise suppression filter comprises: calculating spatial discriminative coefficients and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter.
Specifically, the step S3 includes the steps of:
s301: calculating a spatial discriminative coefficient and spatial weight information of a current frame;
the spatial discriminative coefficient is calculated as follows:
wherein, | represents a modulus of the complex number; alpha is a smoothing factor between adjacent frames and has a value ranging between 0 and 1. In the present invention, it is preferable that α is 0.92, if the value α is less than 0.88, the variation range of the energy estimation exceeds 20%, there is a defect of instability, and if the value α is more than 0.96, the energy estimation is too smooth, and the spatial discrimination is less than 40 degrees. A value of 0.92 can balance robustness and accuracy very well.
In the formula, ρs(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the microphone picked signal, rhos(l-1) and ρx(l-1) respectively represents the speech direction of the l-1 frame and the energy estimation of the microphone pick-up signal, and the difference of the energy distribution represents the spatial distinctiveness.
The spatial weight information of the current frame is calculated as follows:
the spatial weight information calculated in step S301 is used to update the weighted autocorrelation matrix in the subsequent steps.
S302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
where α is an adjacent inter-frame smoothing factor, and the smoothing factor α is the same as that in step S301. Through the dynamic weight information in the formula, the correlation matrix can selectively accumulate noise and reverberation components, and the noise and reverberation can be inhibited without causing distortion of target voice. The weighted autocorrelation matrix represents the correlation between weighted microphone signals, and the weighted autocorrelation matrix mainly retains the correlation information of noise and reverberation because the weights are smaller at places where the noise and reverberation are larger. The weighted autocorrelation matrix can be used for the computation of the final filter in subsequent steps.
S303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
wherein the content of the first and second substances,the column vector with dimension (L +1) × 1, which is obtained by the u (k) and 0 vector expansion in step S101, allows the filter to have both the noise reduction and reverberation reduction performance through the expansion process.
The noise and reverberation suppression filter is used to perform the frequency domain estimation calculation of the target speech in the subsequent step S4.
S4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
The method specifically comprises the following steps:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filterThe specific calculation formula is as follows:
s402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimationThe specific calculation formula is as follows:
through the steps of the invention, the initialization, the signal decomposition, the filter calculation and the target voice estimation of the target voice estimation signal can be realized.
In practical use, based on an 8-microphone linear array, a recording data test in a conference scene with a microphone spacing of 3.5cm, a length of 8 meters and a width of 4 meters and a height of 2.5 meters shows that by adopting the algorithm, the signal-to-noise ratio can be improved by 10dB (noise energy is suppressed by 90%) and the reverberation suppression ratio is 4.5dB (reverberation energy is suppressed by 65%).
As shown in fig. 3, an embodiment of the present invention is a directional information guided real-time speech separation apparatus applied to a microphone array based system, which includes an initialization module 1, a signal decomposition module 2, a filter calculation module 3, and a target speech estimation module 4.
And the initialization module 1 is used for calculating a steering vector and a super-directional filter of the time domain signal of each microphone.
The initialization module 1 can also be used to obtain the speech signal of the microphone, the obtained speech signal is as follows: let x bem(n) represents original time domain signals picked up by M microphone elements in real time, wherein M represents a microphone serial number label, and the value of the microphone serial number label is from 1 to M; n represents a time stamp; the direction of the target speech relative to the microphone array is known as θ.
The target voice is a voice signal corresponding to a target direction, and for a voice separation task, the target direction is known in advance according to the extracted signal, for example, for a large-screen voice communication device, a target voice signal in a 90-degree direction is expected to be separated.
Specifically, the initialization module 1 is configured to perform the following operations:
for each frequency band K (K ═ 1, 2.. K), which is a signal component corresponding to a certain frequency, a target speech guidance vector u (K) is calculated. The specific calculation formula is as follows:
q(θ)=[cos(θ),sin(θ)]。
wherein f iskK, where K is determined according to subsequent fourier transform, and if the frame length is 512, the value of K is half of the frame length; c is sound speed, and c is 340 m/s; dmIs the two-dimensional coordinate value of the mth microphone; superscript H represents the conjugate transpose operator; j represents an imaginary unitq (theta) is a direction vector, omegakIs the frequency band circle frequency.
The above operation is used to initialize steering vectors representing the signal differences of the microphone elements in the theta direction in an ideal scene without reverberation and array element differences. For calculating the super-directional filter in subsequent operations.
For each frequency band k, a super-directional filter h (k) is calculated. The specific calculation formula is as follows:
where R (k) represents the autocorrelation matrix of the uniform scattered field, the inverse of the-1 matrix is superscripted. The filter can theoretically completely reserve the signal of the target direction theta, and simultaneously suppress the uniform scattered field noise to the maximum extent.
And the signal decomposition module 2 is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector.
In particular, the signal decomposition module 2 comprises the following sub-modules: the device comprises a signal conversion module and a vector construction module.
A signal conversion module for converting the time domain signal xm(n) performing a short-time Fourier transform to obtain a time-frequency domain representation, the purpose of which is to convert the time-domain signal into a time-frequency domain signal. The specific calculation formula is as follows:
wherein, N is the frame length, and N is 512; w (n) is a Hamming window of length 512, where n represents the number of times, and thus w (n) represents the value of each corresponding time number n; l is a time frame sequence number and takes a frame as a unit; k is a frequency number. Xm(l, k) is the spectrum of the mth microphone signal, in the mth frame, the kth frequency band. The hamming window function used in the present invention is shown in fig. 2.
A vector construction module for constructing a frequency domain original vector X (l, k) sum for each frequency band kThe specific calculation formula is as follows:
X(l,k)=[X_1(l,k),X_2(l,k),...,X_M(l,k)]^T;
X^(l,k)=[X^T(l,k),X^T(l-1,k),....,X^T(l-L,k)]^T。
wherein, the superscript T represents the transpose operator; l represents the length of the filter forward timeframe and typically ranges from 3 to 20. In the above formula, it can be seen that the vector X (l, k) is an M × 1 dimension column vector,is a (L +1) × 1 dimension column vector. Constructing the frequency domain prediction vectorTo predict the noise and reverberation components in subsequent steps.
The value of the method is L-12, so that the storage and calculation time can be effectively saved, and the noise suppression performance can not be obviously reduced.
The transformation from the time domain signal to the time-frequency domain can be completed through the operation.
And the filter calculation module 3 is used for performing noise suppression filter calculation on the time-frequency domain signal to obtain a filter for constructing noise and reverberation suppression.
Wherein, the filter calculation module 3 includes: the first computing module is used for computing spatial differentiation coefficients and spatial weight information, the first updating module is used for updating a weighted autocorrelation matrix, and the first constructing module is used for constructing a noise and reverberation suppression filter.
Specifically, in the first calculation module, the spatial discriminative coefficient is calculated as follows:
wherein, | represents a modulus of the complex number; alpha is a smoothing factor between adjacent frames and has a value ranging between 0 and 1. In the present invention, it is preferable that α is 0.92, if the value α is less than 0.88, the variation range of the energy estimation exceeds 20%, there is a defect of instability, and if the value α is more than 0.96, the energy estimation is too smooth, and the spatial discrimination is less than 40 degrees. A value of 0.92 can balance robustness and accuracy very well.
In the formula, ρs(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the microphone picked signal, rhos(l-1) and ρxAnd (l-1) respectively represents the speech direction of the l-1 frame and the energy estimation of the microphone pick-up signal.
The spatial weight information of the current frame is calculated as follows:
the spatial weight information calculated in the above operation is used for subsequent updating of the weighted autocorrelation matrix.
In the first updating module, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
wherein α is a smoothing factor between adjacent frames, and the smoothing factor α is the same as that in the first calculation module. Through the dynamic weight information in the formula, the correlation matrix can selectively accumulate noise and reverberation components, and the noise and reverberation can be inhibited without causing distortion of target voice. The weighted autocorrelation matrix represents the correlation between weighted microphone signals, and the weighted autocorrelation matrix mainly retains the correlation information of noise and reverberation because the weights are smaller at places where the noise and reverberation are larger. The weighted autocorrelation matrix can be used for subsequent final filter calculations.
In the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
wherein the content of the first and second substances,in order to initialize the column vector with dimension (L +1) × 1 in the module 1, u (k) and 0 vector are expanded, the filter has the performance of noise reduction and reverberation reduction through the expansion process.
The noise suppression filter is used for carrying out frequency domain estimation calculation of the target voice in subsequent operation.
And the target voice estimation module 4 is used for obtaining the frequency domain estimation of the target voice according to the obtained filter, and further obtaining the time domain estimation of the target voice.
Specifically, the target speech estimation module 4 includes the following sub-modules: the device comprises a frequency domain estimation module and a target voice estimation module.
The frequency domain estimation module obtains the target language according to the solved noise and reverberation suppression filterFrequency domain estimation of tonesThe specific calculation formula is as follows:
the target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimationThe specific calculation formula is as follows:
the 4 modules are all absent from the invention. And the absence of any step can cause that the target voice cannot be extracted.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (10)
1. A noise suppression method based on spatial discrimination detection is characterized by comprising the following steps:
s1: calculating a guide vector and a super-directional filter for the time domain signal of each microphone;
s2: converting the initialized signals into time-frequency domain signals, and constructing frequency domain prediction vectors;
s3: performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the noise suppression filter calculation comprises: calculating a spatial discriminative coefficient and spatial weight information, updating a weighted autocorrelation matrix, and constructing a noise and reverberation suppression filter;
s4: and according to the obtained filter, obtaining the frequency domain estimation of the target voice, and further obtaining the time domain estimation of the target voice.
2. The noise suppression method based on spatial differentiation detection according to claim 1, characterized in that said step S1 is preceded by further comprising obtaining a speech signal x of a microphonem(n);
In step S1, the method specifically includes the following steps:
s101: for each frequency band k, a target speech steering vector u (k) is calculated:
q(θ)=[cos(θ),sin(θ)];
s102: for each frequency band k, compute a super-directional filter h (k):
3. the noise suppression method based on spatial discrimination detection according to claim 2, wherein said step S2 includes the steps of:
s201: for time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T;
4. The noise suppression method based on spatial discrimination detection according to claim 3, wherein said step S3 includes the steps of:
s301: calculating spatial discriminative coefficients and spatial weight information of the current frame:
the spatial discriminative coefficient is calculated as follows:
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
s302: updating the weighted autocorrelation matrix;
for each band k, the update of the cross-correlation coefficient vector R ^ (l, k) is as follows:
s303: the noise and reverberation suppressed filters are updated.
For each frequency band k, the noise and reverberation suppressing filter G (l, k) is constructed as follows:
5. the noise suppression method based on spatial differentiation detection according to claim 4, characterized in that said step S4 comprises the steps of:
s401: obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
S402: carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
6. A noise suppression device based on spatial differentiation detection is characterized by comprising an initialization module, a signal decomposition module, a filter calculation module and a target voice estimation module;
the initialization module is used for calculating a guide vector and a super-directional filter of a time domain signal of each microphone;
the signal decomposition module is used for converting the initialized signal into a time-frequency domain signal and constructing a frequency domain prediction vector;
the filter calculation module is used for performing noise suppression filter calculation on the time-frequency domain signals to obtain a filter for constructing noise and reverberation suppression; wherein the filter calculation module comprises: a first calculation module for calculating spatial discriminative coefficients and spatial weight information, a first update module for updating a weighted autocorrelation matrix, and a first construction module for constructing a noise and reverberation suppression filter;
and the target voice estimation module is used for obtaining the frequency domain estimation of the target voice according to the obtained filter so as to obtain the time domain estimation of the target voice.
7. The apparatus according to claim 6, wherein the initialization module is further configured to obtain a speech signal x of a microphonem(n);
The initialization module is configured to:
for each frequency band k, a target speech steering vector u (k) is calculated:
q(θ)=[cos(θ),sin(θ)];
for each frequency band k, compute a super-directional filter h (k):
8. the apparatus according to claim 7, wherein the signal decomposition module comprises a signal conversion module and a vector construction module;
the signal conversion module is used for converting the time domain signal xm(n) performing short-time Fourier transform to obtain a time-frequency domain expression:
the vector construction module constructs frequency domain prediction vectors X (l, k) and X (l, k) for each frequency band k
X(l,k)=[X1(l,k),X2(l,k),...,XM(l,k)]T;
9. The noise suppression method based on spatial differentiation detection according to claim 8, characterized in that said step S3 comprises the steps of:
in the first calculation module, the spatial discriminative coefficients are calculated as follows:
where ρ iss(l) And ρx(l) Respectively representing the voice direction of the l frame and the energy estimation of the signals picked up by the microphone, wherein the difference of the energy distribution represents the spatial distinctiveness;
the spatial weight information of the current frame is calculated as follows:
in the first updating module, for each frequency band k, the cross-correlation coefficient vector R ^ (l, k) is updated as follows:
in the first building block, for each frequency band k, the noise and reverberation suppressing filter G (l, k) is built as follows:
10. the apparatus for noise suppression based on spatial discrimination detection according to claim 9, wherein the target speech estimation module includes a frequency domain estimation module and a target speech estimation module;
said frequencyA domain estimation module for obtaining the frequency domain estimation of the target voice according to the solved noise and reverberation suppression filter
The target voice estimation module is used for carrying out inverse Fourier transform on the frequency domain estimation of the target voice to obtain the final target voice estimation
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111216600.8A CN113948101A (en) | 2021-10-19 | 2021-10-19 | Noise suppression method and device based on spatial discrimination detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111216600.8A CN113948101A (en) | 2021-10-19 | 2021-10-19 | Noise suppression method and device based on spatial discrimination detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113948101A true CN113948101A (en) | 2022-01-18 |
Family
ID=79331367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111216600.8A Pending CN113948101A (en) | 2021-10-19 | 2021-10-19 | Noise suppression method and device based on spatial discrimination detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113948101A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117935835A (en) * | 2024-03-22 | 2024-04-26 | 浙江华创视讯科技有限公司 | Audio noise reduction method, electronic device and storage medium |
-
2021
- 2021-10-19 CN CN202111216600.8A patent/CN113948101A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117935835A (en) * | 2024-03-22 | 2024-04-26 | 浙江华创视讯科技有限公司 | Audio noise reduction method, electronic device and storage medium |
CN117935835B (en) * | 2024-03-22 | 2024-06-07 | 浙江华创视讯科技有限公司 | Audio noise reduction method, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101510426B (en) | Method and system for eliminating noise | |
CN108172235B (en) | LS wave beam forming reverberation suppression method based on wiener post filtering | |
WO2015196729A1 (en) | Microphone array speech enhancement method and device | |
CN107221336B (en) | Device and method for enhancing target voice | |
CN102164328B (en) | Audio input system used in home environment based on microphone array | |
US20170251301A1 (en) | Selective audio source enhancement | |
CN108172231B (en) | Dereverberation method and system based on Kalman filtering | |
CN109285557B (en) | Directional pickup method and device and electronic equipment | |
CN108877827A (en) | Voice-enhanced interaction method and system, storage medium and electronic equipment | |
CN111445919B (en) | Speech enhancement method, system, electronic device, and medium incorporating AI model | |
JP4910568B2 (en) | Paper rubbing sound removal device | |
CN102739886A (en) | Stereo echo offset method based on echo spectrum estimation and speech existence probability | |
CN105376673A (en) | Microphone Array Processor Based on Spatial Analysis | |
JP6225245B2 (en) | Signal processing apparatus, method and program | |
JP7326627B2 (en) | AUDIO SIGNAL PROCESSING METHOD, APPARATUS, DEVICE AND COMPUTER PROGRAM | |
CN113903353A (en) | Directional noise elimination method and device based on spatial discrimination detection | |
Wang et al. | Mask weighted STFT ratios for relative transfer function estimation and its application to robust ASR | |
CN110111802B (en) | Kalman filtering-based adaptive dereverberation method | |
CN114171041A (en) | Voice noise reduction method, device and equipment based on environment detection and storage medium | |
CN112363112A (en) | Sound source positioning method and device based on linear microphone array | |
CN113948101A (en) | Noise suppression method and device based on spatial discrimination detection | |
CN110890099B (en) | Sound signal processing method, device and storage medium | |
CN114255777A (en) | Mixing method and system for removing reverberation of real-time voice | |
CN113687305A (en) | Method, device and equipment for positioning sound source azimuth and computer readable storage medium | |
CN112712818A (en) | Voice enhancement method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |