CN114373475A - Voice noise reduction method and device based on microphone array and storage medium - Google Patents
Voice noise reduction method and device based on microphone array and storage medium Download PDFInfo
- Publication number
- CN114373475A CN114373475A CN202111621218.5A CN202111621218A CN114373475A CN 114373475 A CN114373475 A CN 114373475A CN 202111621218 A CN202111621218 A CN 202111621218A CN 114373475 A CN114373475 A CN 114373475A
- Authority
- CN
- China
- Prior art keywords
- noise
- noise reduction
- voice signal
- frequency domain
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009467 reduction Effects 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000003860 storage Methods 0.000 title claims description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000003491 array Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 abstract description 31
- 230000008569 process Effects 0.000 abstract description 9
- 230000008859 change Effects 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 10
- 239000000126 substance Substances 0.000 description 10
- 238000007796 conventional method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000009499 grossing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Abstract
The application discloses a voice noise reduction method based on a microphone array, which solves the problems that the complexity of filter solving in the prior art is rapidly increased along with the increase of the length of a filter, and the tracking capability of the change of the statistical characteristics of voice signals and noise is reduced, and comprises the following steps: acquiring a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristics of the frequency domain voice signal with noise and the noise signal; dividing a microphone array into a plurality of sub-arrays, respectively estimating a plurality of sub-filters, and determining a frequency domain noise reduction filter; the noise reduction processing is carried out on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and the frequency domain voice signal with noise is converted into a time domain noise reduction voice signal, so that the signal covariance matrix dimension required in the solving process of the filter is smaller, the complexity of the voice noise reduction filter is obviously reduced, and the tracking capability of the filter on the change of the statistical characteristics of the voice signal and the noise is improved.
Description
Technical Field
The present disclosure relates to the field of microphone arrays, and in particular, to a method and an apparatus for reducing noise of speech based on a microphone array, and a storage medium.
Background
The voice noise reduction plays a significant role in systems such as intelligent voice, man-machine interaction, teleconferencing, hearing-aid equipment, vehicle-mounted, virtual reality, in-situ communication and military voice communication with ultrahigh background noise, and the experience of voice interaction is directly influenced by the performance of the voice noise reduction.
Early voice interactive systems were usually equipped with only one microphone, and the corresponding noise reduction method was noise reduction for single channel voice. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has larger limitation. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, multi-channel speech noise reduction methods have the potential to significantly improve the signal-to-noise ratio with little or no introduction of speech distortion. The classic multi-channel voice noise reduction method comprises multi-channel wiener filtering, multi-channel compromise filtering, minimum variance undistorted response filtering, linear constraint minimum variance filtering, generalized sidelobe cancellation and the like. In recent years, researchers at home and abroad propose a voice noise reduction method based on deep learning, which can obtain better performance, but because the generalization capability of the method is generally weaker, the method is currently difficult to be applied to an actual system in a large range.
To achieve better voice noise reduction performance, more microphones are usually required to obtain richer space-time-frequency information. But this also generally means that longer filters need to be designed. The use of longer filters brings about the following two problems. First, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, so more observation samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in a reduced ability to track changes in the statistical characteristics of speech signals and noise, and fails to better handle the non-stationary noise that is common in practice.
Disclosure of Invention
The embodiment of the application provides a voice noise reduction method based on a microphone array, and two problems caused by longer filter length in the prior art are solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the filter length; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The method and the device have the advantages that the complexity of solving the filter is obviously reduced, the signal covariance matrix dimension required in the solving process of the filter is smaller, and therefore the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise is improved.
In a first aspect, an embodiment of the present invention provides a speech noise reduction method based on a microphone array, where the method includes:
acquiring a voice signal with noise;
preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;
estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;
dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;
determining a frequency domain noise reduction filter according to the plurality of sub-filters;
performing noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;
and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
With reference to the first aspect, in a possible implementation manner, the preprocessing the noisy speech signal includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the frequency-domain noisy speech signal includes estimating the statistical characteristic of the noisy speech signal according to a time smoothing estimation manner.
With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the noise signal includes estimating the statistical characteristic of the noise signal according to an existing noise estimation algorithm.
With reference to the first aspect, in a possible implementation manner, the dividing the microphone array into a plurality of sub-arrays and respectively estimating the plurality of sub-filters includes iteratively estimating the plurality of sub-filters by using a low rank structure of a noise reduction filter.
In a second aspect, an embodiment of the present invention provides a speech noise reduction apparatus based on a microphone array, which is characterized by including
The signal acquisition module is used for acquiring a voice signal with noise;
the signal preprocessing module is used for preprocessing the voice signal with noise and determining a frequency domain voice signal with noise;
the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;
the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;
a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;
the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;
and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
With reference to the second aspect, in a possible implementation manner, the signal preprocessing module includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: the method comprises the step of estimating the statistical characteristics of the noisy speech signal according to a time smoothing estimation mode.
With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: including estimating the statistical properties of the noise signal according to existing noise estimation algorithms.
With reference to the second aspect, in a possible implementation manner, the frequency domain noise reduction filter determining module includes: and iteratively estimating a plurality of sub-filters by using a low-rank structure of the noise reduction filter.
In a third aspect, an embodiment of the present invention provides a voice noise reduction server based on a microphone array, including a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores executable instructions, and when the computer executes the executable instructions, the computer can implement the method according to any one of the first aspect.
One or more technical schemes provided in the embodiment of the invention have at least the following technical effects or advantages:
the embodiment of the invention adopts a voice noise reduction method based on a microphone array, which comprises the steps of obtaining a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal; dividing the microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters; determining a frequency domain noise reduction filter according to the plurality of sub-filters; carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal; and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal. The two problems caused by the fact that the length of the filter is long in the prior art are effectively solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the length of the filter; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The embodiment of the invention obviously reduces the complexity of solving the filter, and the dimension of the signal covariance matrix required in the solving process of the filter is smaller, so that the covariance matrix can be estimated by using fewer signal observation samples, thereby improving the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating steps of a method for reducing noise in speech based on a microphone array according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an apparatus for microphone array based speech noise reduction according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a server for microphone array based speech noise reduction according to an embodiment of the present disclosure;
FIG. 4 is a graph comparing the complexity of a method provided by an embodiment of the present application with the complexity of a conventional method;
FIG. 5 is an image of the mean square error of the method provided by the embodiments of the present application as a function of the number of iterations;
fig. 6 is a comparison graph of mean square error over time of the method provided by the embodiment of the present application and the conventional method when the statistical characteristics of noise suddenly change.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In early voice interactive systems, only one microphone was usually provided, and the corresponding voice noise reduction method was single-channel voice noise reduction. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has great limitations. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, the multi-channel speech noise reduction method has more potential, and the signal-to-noise ratio is remarkably improved on the premise of introducing little or no speech distortion. Multi-channel speech noise reduction typically requires more microphones to be equipped to acquire richer space-time-frequency information. But this in turn leads to two problems, first, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, and therefore more measurement samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in the degradation of its ability to track statistical variations of speech signals and noise, and the non-stationary noise that is common in practice cannot be better handled.
An embodiment of the present invention provides a speech noise reduction method based on a microphone array, as shown in fig. 1, the method includes the following steps,
step S101, acquiring a voice signal with noise.
And step S102, preprocessing the voice signal with noise, and determining the voice signal with noise in a frequency domain.
Step S103, estimating the statistical characteristics of the frequency domain voice signal with noise and estimating the statistical characteristics of the noise signal.
Step S104, the microphone array is divided into a plurality of sub-arrays, and a plurality of sub-filters are estimated respectively.
Step S105, determining a frequency domain noise reduction filter according to the plurality of sub-filters.
And step S106, carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal.
Step S107, converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
By combining the steps of the method, a more reasonable filter is constructed, so that the phenomenon that a very long filter is integrally calculated like the conventional multi-channel voice noise reduction method is avoided, and a shorter filter means less filter coefficients. Therefore, compared with the existing method, the method provided by the application obviously reduces the complexity of solving the voice noise reduction filter, and the dimension of the signal covariance matrix required in the solving process of the filter is small, so that the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise can be improved.
In a specific embodiment of the present application, we represent the time-domain noisy speech signal as,
ym(t)=xm(t)+vm(t),m=1,2,...,M (1)
wherein, ym(t) represents the noisy speech signal received by the mth microphone; x is the number ofm(t) represents a clean speech signal received by the mth microphone; v. ofm(t) represents the background noise signal received by the mth microphone; t represents a discrete time point; m represents the number of microphones.
In a specific embodiment of the present application, it is assumed that all signals are zero-mean, bandwidth signals, while the speech signal and the noise signal are assumed to be uncorrelated. The purpose of voice noise reduction is to recover a clean voice signal from a noisy voice signal. For the sake of no loss of generality, in the present application, the microphone 1 is set as a reference microphone, i.e. x is set1(t) is the desired signal (the signal that needs to be recovered).
Preprocessing a noisy speech signal, comprising: performing framing and windowing on the noisy speech signal, and then performing fast Fourier transform to obtain a frequency domain noisy speech signal, which is expressed as:
wherein w represents a window function; t represents the length of the window function (which is also the length of the speech signal frame); l represents a step length between two adjacent frames; zero mean random variable Ym(k,n),Xm(k,n),Vm(k, n) are each ym(t),xm(t),vm(t), Fourier transform values at the kth band of the nth frame, where K ∈ {0, 1.
For convenience, the signal model is represented in vector form as
y(k,n)=x(k,n)+v(k,n) (3)
Wherein the content of the first and second substances,
y(k,n)=[Y1(k,n),Y2(k,n),...,YM(k,n)]T (4)
x (k, n) and x (k, n) are defined similarly to y (k, n), with the superscript T being the transpose.
In the conventional method, it is usually necessary to design a filter h (k, n) with a length of M to implement speech noise reduction, that is:
Z(k,n)=hH(k,n)y(k,n) (5)
wherein
h(k,n)=[H1(k,n),H2(k,n),...,HM(k,n)]T (6)
Z (k, n) is X1An estimate of (k, n). However, when M is large, two problems described in the background art are caused.
And estimating the statistical characteristics of the frequency domain voice signal with the noise, wherein the estimation of the statistical characteristics of the voice signal with the noise is carried out according to a time smoothing mode. Estimating the statistical properties of the noise signal includes estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.
Since the speech signal and the noise are uncorrelated, the variance of Z (k, n) can be expressed as:
ΦZ(k,n)=hH(k,n)Φy(k,n)h(k,n)
=hH(k,n)Φx(k,n)h(k,n)+hH(k,n)Φv(k,n)h(k,n) (7)
wherein phia(k,n)=E[a(k,n)aH(k,n)]A (k, n) ∈ { y (k, n), x (k, n), v (k, n) }. In general, we can estimate Φ by applying temporal smoothingy(k, n), and phiv(k, n) can be obtained according to the noise estimation method in the prior art. To obtain phiy(k, n) and phivAfter the estimated value of (k, n), the value can be passed throughy(k,n)-Φv(k, n) to give phix(k,n)。
To derive the method of the invention, the microphone array is divided into M2A plurality of sub-arrays, each sub-array having M1One microphone, i.e. M ═ M1*M 21 st to M1The microphones forming a first sub-array, Mth1+1 to 2M1The microphones form a second sub-array, and so on. In the present invention, we assume M1≤M2. Also, the filter h (k, n) can be decomposed in the manner described above, i.e.
Wherein the content of the first and second substances,
at this time, the sub-filter h may be switchedm(k,n),m=1,2,...,M2Form a dimension of M1×M2The matrix of (a), namely:
H(k,n)=[h1(k,n),h2(k,n),...,hM2(k,n)] (10)
note that H (k, n) ═ vec [ H (k, n) ], vec (·) represents a vectorization operator of the matrix. For simplicity, the symbols k and n will be dropped where no ambiguity will arise later. The matrix H is subjected to Singular Value Decomposition (SVD), which can be decomposed into:
wherein the content of the first and second substances,
is a M2×M2The matrix of (a) is,
is a M2×M2Of the matrix of (a). H1And H2For two orthogonal matrices, sigma is M1×M2The diagonal matrix of (a) whose diagonal elements are non-negative real numbers. In this application, they are arranged in descending order, i.e. from large to smallSuperscript H is conjugatedAnd (4) transposing the characters.
The noisy speech signals received by each channel are strongly correlated, so the sub-filters hm(k,n),m=1,2,...,M2Are also typically strongly correlated, resulting in matrix H typically not being a row full rank matrix. The matrix H can usually be well approximated with the first P largest singular values and the corresponding singular vectors, i.e.:
wherein the content of the first and second substances,it should be noted thatThe resulting ambiguity has no effect on the matrix H. Accordingly, the filter h can be approximately expressed as:
it should be noted that when P ═ M1When h is presentP=h。
Applying the relation:
can be combined withPWrite as:
wherein the content of the first and second substances,size of MxM2,Size of MxM1. At this time, the output value Z (k, n) of the filter can be written as:
wherein the content of the first and second substances,
Hσ1,P=[Hσ1,1 Hσ1,2...Hσ1,P]H (24)
Hσ2,P=[Hσ2,1 Hσ2,2...Hσ2,P]H (25)
h σ1,P,h σ2,P,yσ1,P(t),yσ2,P(t),Hσ1,Pand Hσ2,PAre respectively M1P×1,M2P×1,M2P×1, M1P×1,M2P×M,M1P is multiplied by M. It can be seen that when the parameter P is small, the sub-filtersh σ1,PAndh σ2,Pis much shorter than the length of the filter h.
Desired signal X1And its estimated value Z has a Mean Square Error (MSE) of
Wherein the content of the first and second substances,e (-) represents the mathematical expectation that,representing the real part, superscript*Representing a complex conjugate.
To derive the filter in the present invention, the MSE is written as follows:
wherein the content of the first and second substances,
it should be noted that when the parameter P is small, the matrix phiyσ1,p(M2P×M2P), and Φyσ2,p(M1P×M1P) is much smaller than the matrix phiyDimension of (M × M).
This can bring about two advantages:
1) compared with solving based on phiySolving the traditional multi-channel voice noise reduction filter based on phiyσ1,pAnd phiyσ2,pOf the inverse matrix ofh σ1,PAndh σ2,Pthe required complexity is significantly reduced;
2) compared to the estimated matrix phiyThe matrix phi can be estimated with fewer signal observation samplesyσ1,pAnd phiyσ2,pSo that the sub-filtersh σ1,PAnd are andh σ2,Pchanges in the statistical properties of the signal can be tracked more quickly.
Operating on an approximation filter, comprising: and obtaining the wiener filter by adopting an iterative solution mode.
Based on equations (27) and (28), it is difficult to derive the sub-filtersh σ1,PAndh σ2,Pclosed-form solution of (1). Therefore, the invention adopts an iterative solution mode. For this reason, when solving for one of the sub-filters, it is assumed that the other sub-filter is fixed, i.e. it is fixed
Sub-filterh σ1,PInitialization is as follows:
wherein the content of the first and second substances,
xpdefinition of (a) and ypSimilarly. It can be seen that hσ1,W,pWiener filter of length M for the p-th sub-matrix1。
Substituting equations (38) and (39) into equation (34) may result:
will be the pair of formula (40)Derivation and zeroing of the result to obtain a sub-filterWiener solution of (a):
in the above manner, when iterating to the nth step, we have:
wherein the content of the first and second substances,
at this time, the iterative wiener filter in the present application can be obtained:
the embodiment of the invention provides a voice noise reduction device based on a microphone array, which comprises a signal acquisition module 201, a signal preprocessing module 202, a statistical characteristic estimation module 203, a sub-filter determination module 204, a frequency domain noise reduction filter determination module 205, a noise reduction module 206 and a time domain noise reduction voice signal determination module 207, as shown in fig. 2. A signal obtaining module 201, configured to obtain a voice signal with noise; a signal preprocessing module 202, configured to preprocess the voice signal with noise, and determine a frequency domain voice signal with noise; a statistical characteristic estimating module 203, configured to estimate statistical characteristics of the frequency domain noisy speech signal and statistical characteristics of the noise signal; a sub-filter determining module 204, configured to divide the microphone array into a plurality of sub-arrays, and estimate a plurality of sub-filters respectively; a frequency domain noise reduction filter determining module 205, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters; the noise reduction module 206 is configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter, and determine a frequency-domain noise-reduced speech signal; and a time domain noise reduction voice signal determination module 207, configured to convert the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
Fig. 4 is a comparison of the complexity of the method provided by the present application with the complexity of the conventional method, fig. 5 is a graph of the mean square error of the method provided by the present application as a function of the number of iterations, and fig. 6 is a graph of the mean square error of the method provided by the present application and the conventional method as a function of time when the statistical properties of noise suddenly change. The method provided by the application effectively reduces the complexity and improves the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.
The embodiment of the invention provides a server for voice noise reduction based on a microphone array, as shown in fig. 3, comprising a memory 301 and a processor 302; the memory 301 is used to store computer executable instructions; processor 302 is used to execute computer-executable instructions.
The embodiment of the invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores executable instructions, and the computer can execute the executable instructions.
The storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk (Hard Disk Drive) or a Memory Card (HDD). The memory may be used to store computer program instructions.
Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive efforts. The sequence of steps recited in this embodiment is only one of many steps performed and does not represent a unique order of execution. When an actual apparatus or client product executes, it can execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the method shown in this embodiment or the figures.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.
The methods, apparatus or modules described herein may be implemented in computer readable program code embodied in a controller in any suitable manner, e.g., the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded micro-controllers, and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solution of the present application, which essentially or contributes to the prior art, may be embodied in the form of a software product, and may also be embodied in the implementation process of data migration. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computing device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications can be made to the technical solutions described in the foregoing embodiments, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure.
Claims (8)
1. A speech noise reduction method based on microphone array is characterized by comprising
Acquiring a voice signal with noise;
preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;
estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;
dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;
determining a frequency domain noise reduction filter according to the plurality of sub-filters;
carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;
and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
2. The method of claim 1, wherein the pre-processing the noisy speech signal comprises: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
3. The method according to claim 1, wherein said estimating the statistical properties of the frequency-domain noisy speech signal comprises estimating the statistical properties of the noisy speech signal according to a time-smoothed estimation.
4. The method of claim 1, wherein estimating the statistical properties of the noise signal comprises estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.
5. The method of claim 1, wherein the dividing the microphone array into a plurality of sub-arrays and estimating a plurality of sub-filters separately comprises iteratively estimating the plurality of sub-filters using a low rank architecture of a noise reduction filter.
6. A speech noise reduction device based on microphone array is characterized by comprising
The signal acquisition module is used for acquiring a voice signal with noise;
the signal preprocessing module is used for preprocessing the voice signal with the noise and determining a frequency domain voice signal with the noise;
the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;
the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;
a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;
the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;
and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
7. A microphone array based speech noise reduction server comprising a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-5.
8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111621218.5A CN114373475A (en) | 2021-12-28 | 2021-12-28 | Voice noise reduction method and device based on microphone array and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111621218.5A CN114373475A (en) | 2021-12-28 | 2021-12-28 | Voice noise reduction method and device based on microphone array and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114373475A true CN114373475A (en) | 2022-04-19 |
Family
ID=81142867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111621218.5A Pending CN114373475A (en) | 2021-12-28 | 2021-12-28 | Voice noise reduction method and device based on microphone array and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114373475A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5917919A (en) * | 1995-12-04 | 1999-06-29 | Rosenthal; Felix | Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment |
WO2006114100A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Estimation of signal from noisy observations |
CN110517701A (en) * | 2019-07-25 | 2019-11-29 | 华南理工大学 | A kind of microphone array voice enhancement method and realization device |
CN112802490A (en) * | 2021-03-11 | 2021-05-14 | 北京声加科技有限公司 | Beam forming method and device based on microphone array |
CN113409804A (en) * | 2020-12-22 | 2021-09-17 | 声耕智能科技(西安)研究院有限公司 | Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace |
-
2021
- 2021-12-28 CN CN202111621218.5A patent/CN114373475A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5917919A (en) * | 1995-12-04 | 1999-06-29 | Rosenthal; Felix | Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment |
WO2006114100A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Estimation of signal from noisy observations |
CN110517701A (en) * | 2019-07-25 | 2019-11-29 | 华南理工大学 | A kind of microphone array voice enhancement method and realization device |
CN113409804A (en) * | 2020-12-22 | 2021-09-17 | 声耕智能科技(西安)研究院有限公司 | Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace |
CN112802490A (en) * | 2021-03-11 | 2021-05-14 | 北京声加科技有限公司 | Beam forming method and device based on microphone array |
Non-Patent Citations (2)
Title |
---|
XIANGHUI WANG ET AL: "Multichannel Iterative Noise Reduction Filters in the Short-Time-Fourier-Transform Domain Based on Kronecker Product Decomposition", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING ( VOLUME: 29), pages 2725 - 2739 * |
何成林, 杜利民, 马昕: "麦克风阵列语音增强的研究", 计算机工程与应用, no. 24 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020042370A1 (en) | Noise reduction method for multicomponent seismic data vector, and noise reduction device for multicomponent seismic data vector | |
Boashash et al. | Robust multisensor time–frequency signal processing: A tutorial review with illustrations of performance enhancement in selected application areas | |
JP2007526511A (en) | Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain | |
EP1913496A2 (en) | System and method for optimizing the operation of an oversampled discrete fourier transform filter bank | |
Jain et al. | Blind source separation and ICA techniques: a review | |
Ali Khan et al. | Sparsity-aware adaptive directional time–frequency distribution for source localization | |
Yang et al. | A noise reduction method based on LMS adaptive filter of audio signals | |
Xie et al. | Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation | |
Das et al. | ICA methods for blind source separation of instantaneous mixtures: A case study | |
CN114373475A (en) | Voice noise reduction method and device based on microphone array and storage medium | |
Albataineh et al. | A RobustICA-based algorithmic system for blind separation of convolutive mixtures | |
Hong et al. | Independent component analysis based single channel speech enhancement | |
Kulkarni et al. | Periodicity-aware signal denoising using Capon-optimized Ramanujan filter banks and pruned Ramanujan dictionaries | |
Makino et al. | Underdetermined blind source separation using acoustic arrays | |
Shivamurti et al. | Analytic discrete cosine harmonic wavelet transform (ADCHWT) and its application to signal/image denoising | |
Onativia et al. | Finite dimensional FRI | |
Cichocki | Blind source separation: new tools for extraction of source signals and denoising | |
CN111899754A (en) | Speech separation effect algorithm of GA _ FastICA algorithm | |
Shivamurti et al. | A dual tree complex discrete cosine Harmonic wavelet transform (ADCHWT) and its application to signal/image denoising | |
Ganage et al. | DTCWT-ICA based Improved DOA Estimation of Closely Spaced and Coherent Signals with Uniform Linear Array. | |
Pukenas | Three-mode biomedical signal denoising in the local phase space based on a tensor approach | |
Ling et al. | Optimal design of Hermitian transform and vectors of both mask and window coefficients for denoising applications with both unknown noise characteristics and distortions | |
CN117894332A (en) | Time domain multichannel voice noise reduction method based on Cronecker decomposition | |
EP2840570A1 (en) | Enhanced estimation of at least one target signal | |
Yue et al. | Application of the Recursive Hybird Myriad Filter in Seismic Data Denoising |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |