CN114373475A - Voice noise reduction method and device based on microphone array and storage medium - Google Patents

Voice noise reduction method and device based on microphone array and storage medium Download PDF

Info

Publication number
CN114373475A
CN114373475A CN202111621218.5A CN202111621218A CN114373475A CN 114373475 A CN114373475 A CN 114373475A CN 202111621218 A CN202111621218 A CN 202111621218A CN 114373475 A CN114373475 A CN 114373475A
Authority
CN
China
Prior art keywords
noise
noise reduction
voice signal
frequency domain
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111621218.5A
Other languages
Chinese (zh)
Inventor
王向辉
高朴
韩冬
陈捷
王瑞琪
王姣
李梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202111621218.5A priority Critical patent/CN114373475A/en
Publication of CN114373475A publication Critical patent/CN114373475A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The application discloses a voice noise reduction method based on a microphone array, which solves the problems that the complexity of filter solving in the prior art is rapidly increased along with the increase of the length of a filter, and the tracking capability of the change of the statistical characteristics of voice signals and noise is reduced, and comprises the following steps: acquiring a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristics of the frequency domain voice signal with noise and the noise signal; dividing a microphone array into a plurality of sub-arrays, respectively estimating a plurality of sub-filters, and determining a frequency domain noise reduction filter; the noise reduction processing is carried out on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and the frequency domain voice signal with noise is converted into a time domain noise reduction voice signal, so that the signal covariance matrix dimension required in the solving process of the filter is smaller, the complexity of the voice noise reduction filter is obviously reduced, and the tracking capability of the filter on the change of the statistical characteristics of the voice signal and the noise is improved.

Description

Voice noise reduction method and device based on microphone array and storage medium
Technical Field
The present disclosure relates to the field of microphone arrays, and in particular, to a method and an apparatus for reducing noise of speech based on a microphone array, and a storage medium.
Background
The voice noise reduction plays a significant role in systems such as intelligent voice, man-machine interaction, teleconferencing, hearing-aid equipment, vehicle-mounted, virtual reality, in-situ communication and military voice communication with ultrahigh background noise, and the experience of voice interaction is directly influenced by the performance of the voice noise reduction.
Early voice interactive systems were usually equipped with only one microphone, and the corresponding noise reduction method was noise reduction for single channel voice. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has larger limitation. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, multi-channel speech noise reduction methods have the potential to significantly improve the signal-to-noise ratio with little or no introduction of speech distortion. The classic multi-channel voice noise reduction method comprises multi-channel wiener filtering, multi-channel compromise filtering, minimum variance undistorted response filtering, linear constraint minimum variance filtering, generalized sidelobe cancellation and the like. In recent years, researchers at home and abroad propose a voice noise reduction method based on deep learning, which can obtain better performance, but because the generalization capability of the method is generally weaker, the method is currently difficult to be applied to an actual system in a large range.
To achieve better voice noise reduction performance, more microphones are usually required to obtain richer space-time-frequency information. But this also generally means that longer filters need to be designed. The use of longer filters brings about the following two problems. First, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, so more observation samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in a reduced ability to track changes in the statistical characteristics of speech signals and noise, and fails to better handle the non-stationary noise that is common in practice.
Disclosure of Invention
The embodiment of the application provides a voice noise reduction method based on a microphone array, and two problems caused by longer filter length in the prior art are solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the filter length; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The method and the device have the advantages that the complexity of solving the filter is obviously reduced, the signal covariance matrix dimension required in the solving process of the filter is smaller, and therefore the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise is improved.
In a first aspect, an embodiment of the present invention provides a speech noise reduction method based on a microphone array, where the method includes:
acquiring a voice signal with noise;
preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;
estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;
dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;
determining a frequency domain noise reduction filter according to the plurality of sub-filters;
performing noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;
and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
With reference to the first aspect, in a possible implementation manner, the preprocessing the noisy speech signal includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the frequency-domain noisy speech signal includes estimating the statistical characteristic of the noisy speech signal according to a time smoothing estimation manner.
With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the noise signal includes estimating the statistical characteristic of the noise signal according to an existing noise estimation algorithm.
With reference to the first aspect, in a possible implementation manner, the dividing the microphone array into a plurality of sub-arrays and respectively estimating the plurality of sub-filters includes iteratively estimating the plurality of sub-filters by using a low rank structure of a noise reduction filter.
In a second aspect, an embodiment of the present invention provides a speech noise reduction apparatus based on a microphone array, which is characterized by including
The signal acquisition module is used for acquiring a voice signal with noise;
the signal preprocessing module is used for preprocessing the voice signal with noise and determining a frequency domain voice signal with noise;
the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;
the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;
a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;
the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;
and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
With reference to the second aspect, in a possible implementation manner, the signal preprocessing module includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: the method comprises the step of estimating the statistical characteristics of the noisy speech signal according to a time smoothing estimation mode.
With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: including estimating the statistical properties of the noise signal according to existing noise estimation algorithms.
With reference to the second aspect, in a possible implementation manner, the frequency domain noise reduction filter determining module includes: and iteratively estimating a plurality of sub-filters by using a low-rank structure of the noise reduction filter.
In a third aspect, an embodiment of the present invention provides a voice noise reduction server based on a microphone array, including a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores executable instructions, and when the computer executes the executable instructions, the computer can implement the method according to any one of the first aspect.
One or more technical schemes provided in the embodiment of the invention have at least the following technical effects or advantages:
the embodiment of the invention adopts a voice noise reduction method based on a microphone array, which comprises the steps of obtaining a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal; dividing the microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters; determining a frequency domain noise reduction filter according to the plurality of sub-filters; carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal; and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal. The two problems caused by the fact that the length of the filter is long in the prior art are effectively solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the length of the filter; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The embodiment of the invention obviously reduces the complexity of solving the filter, and the dimension of the signal covariance matrix required in the solving process of the filter is smaller, so that the covariance matrix can be estimated by using fewer signal observation samples, thereby improving the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating steps of a method for reducing noise in speech based on a microphone array according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an apparatus for microphone array based speech noise reduction according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a server for microphone array based speech noise reduction according to an embodiment of the present disclosure;
FIG. 4 is a graph comparing the complexity of a method provided by an embodiment of the present application with the complexity of a conventional method;
FIG. 5 is an image of the mean square error of the method provided by the embodiments of the present application as a function of the number of iterations;
fig. 6 is a comparison graph of mean square error over time of the method provided by the embodiment of the present application and the conventional method when the statistical characteristics of noise suddenly change.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In early voice interactive systems, only one microphone was usually provided, and the corresponding voice noise reduction method was single-channel voice noise reduction. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has great limitations. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, the multi-channel speech noise reduction method has more potential, and the signal-to-noise ratio is remarkably improved on the premise of introducing little or no speech distortion. Multi-channel speech noise reduction typically requires more microphones to be equipped to acquire richer space-time-frequency information. But this in turn leads to two problems, first, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, and therefore more measurement samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in the degradation of its ability to track statistical variations of speech signals and noise, and the non-stationary noise that is common in practice cannot be better handled.
An embodiment of the present invention provides a speech noise reduction method based on a microphone array, as shown in fig. 1, the method includes the following steps,
step S101, acquiring a voice signal with noise.
And step S102, preprocessing the voice signal with noise, and determining the voice signal with noise in a frequency domain.
Step S103, estimating the statistical characteristics of the frequency domain voice signal with noise and estimating the statistical characteristics of the noise signal.
Step S104, the microphone array is divided into a plurality of sub-arrays, and a plurality of sub-filters are estimated respectively.
Step S105, determining a frequency domain noise reduction filter according to the plurality of sub-filters.
And step S106, carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal.
Step S107, converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
By combining the steps of the method, a more reasonable filter is constructed, so that the phenomenon that a very long filter is integrally calculated like the conventional multi-channel voice noise reduction method is avoided, and a shorter filter means less filter coefficients. Therefore, compared with the existing method, the method provided by the application obviously reduces the complexity of solving the voice noise reduction filter, and the dimension of the signal covariance matrix required in the solving process of the filter is small, so that the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise can be improved.
In a specific embodiment of the present application, we represent the time-domain noisy speech signal as,
ym(t)=xm(t)+vm(t),m=1,2,...,M (1)
wherein, ym(t) represents the noisy speech signal received by the mth microphone; x is the number ofm(t) represents a clean speech signal received by the mth microphone; v. ofm(t) represents the background noise signal received by the mth microphone; t represents a discrete time point; m represents the number of microphones.
In a specific embodiment of the present application, it is assumed that all signals are zero-mean, bandwidth signals, while the speech signal and the noise signal are assumed to be uncorrelated. The purpose of voice noise reduction is to recover a clean voice signal from a noisy voice signal. For the sake of no loss of generality, in the present application, the microphone 1 is set as a reference microphone, i.e. x is set1(t) is the desired signal (the signal that needs to be recovered).
Preprocessing a noisy speech signal, comprising: performing framing and windowing on the noisy speech signal, and then performing fast Fourier transform to obtain a frequency domain noisy speech signal, which is expressed as:
Figure BDA0003437550130000071
wherein w represents a window function; t represents the length of the window function (which is also the length of the speech signal frame); l represents a step length between two adjacent frames; zero mean random variable Ym(k,n),Xm(k,n),Vm(k, n) are each ym(t),xm(t),vm(t), Fourier transform values at the kth band of the nth frame, where K ∈ {0, 1.
For convenience, the signal model is represented in vector form as
y(k,n)=x(k,n)+v(k,n) (3)
Wherein the content of the first and second substances,
y(k,n)=[Y1(k,n),Y2(k,n),...,YM(k,n)]T (4)
x (k, n) and x (k, n) are defined similarly to y (k, n), with the superscript T being the transpose.
In the conventional method, it is usually necessary to design a filter h (k, n) with a length of M to implement speech noise reduction, that is:
Z(k,n)=hH(k,n)y(k,n) (5)
wherein
h(k,n)=[H1(k,n),H2(k,n),...,HM(k,n)]T (6)
Z (k, n) is X1An estimate of (k, n). However, when M is large, two problems described in the background art are caused.
And estimating the statistical characteristics of the frequency domain voice signal with the noise, wherein the estimation of the statistical characteristics of the voice signal with the noise is carried out according to a time smoothing mode. Estimating the statistical properties of the noise signal includes estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.
Since the speech signal and the noise are uncorrelated, the variance of Z (k, n) can be expressed as:
ΦZ(k,n)=hH(k,n)Φy(k,n)h(k,n)
=hH(k,n)Φx(k,n)h(k,n)+hH(k,n)Φv(k,n)h(k,n) (7)
wherein phia(k,n)=E[a(k,n)aH(k,n)]A (k, n) ∈ { y (k, n), x (k, n), v (k, n) }. In general, we can estimate Φ by applying temporal smoothingy(k, n), and phiv(k, n) can be obtained according to the noise estimation method in the prior art. To obtain phiy(k, n) and phivAfter the estimated value of (k, n), the value can be passed throughy(k,n)-Φv(k, n) to give phix(k,n)。
To derive the method of the invention, the microphone array is divided into M2A plurality of sub-arrays, each sub-array having M1One microphone, i.e. M ═ M1*M 21 st to M1The microphones forming a first sub-array, Mth1+1 to 2M1The microphones form a second sub-array, and so on. In the present invention, we assume M1≤M2. Also, the filter h (k, n) can be decomposed in the manner described above, i.e.
Figure BDA0003437550130000081
Wherein the content of the first and second substances,
Figure BDA0003437550130000082
at this time, the sub-filter h may be switchedm(k,n),m=1,2,...,M2Form a dimension of M1×M2The matrix of (a), namely:
H(k,n)=[h1(k,n),h2(k,n),...,hM2(k,n)] (10)
note that H (k, n) ═ vec [ H (k, n) ], vec (·) represents a vectorization operator of the matrix. For simplicity, the symbols k and n will be dropped where no ambiguity will arise later. The matrix H is subjected to Singular Value Decomposition (SVD), which can be decomposed into:
Figure BDA0003437550130000083
wherein the content of the first and second substances,
Figure BDA0003437550130000084
is a M2×M2The matrix of (a) is,
Figure BDA0003437550130000091
is a M2×M2Of the matrix of (a). H1And H2For two orthogonal matrices, sigma is M1×M2The diagonal matrix of (a) whose diagonal elements are non-negative real numbers. In this application, they are arranged in descending order, i.e. from large to small
Figure BDA0003437550130000092
Superscript H is conjugatedAnd (4) transposing the characters.
The noisy speech signals received by each channel are strongly correlated, so the sub-filters hm(k,n),m=1,2,...,M2Are also typically strongly correlated, resulting in matrix H typically not being a row full rank matrix. The matrix H can usually be well approximated with the first P largest singular values and the corresponding singular vectors, i.e.:
Figure BDA0003437550130000093
wherein the content of the first and second substances,
Figure BDA0003437550130000094
it should be noted that
Figure BDA0003437550130000095
The resulting ambiguity has no effect on the matrix H. Accordingly, the filter h can be approximately expressed as:
Figure BDA0003437550130000096
it should be noted that when P ═ M1When h is presentP=h。
Applying the relation:
Figure BDA0003437550130000097
can be combined withPWrite as:
Figure BDA0003437550130000098
wherein the content of the first and second substances,
Figure BDA0003437550130000099
size of MxM2
Figure BDA00034375501300000910
Size of MxM1. At this time, the output value Z (k, n) of the filter can be written as:
Figure BDA0003437550130000101
wherein the content of the first and second substances,
Figure BDA0003437550130000102
Figure BDA0003437550130000103
Figure BDA0003437550130000104
Figure BDA0003437550130000105
Hσ1,P=[Hσ1,1 Hσ1,2...Hσ1,P]H (24)
Hσ2,P=[Hσ2,1 Hσ2,2...Hσ2,P]H (25)
h σ1,Ph σ2,P,yσ1,P(t),yσ2,P(t),Hσ1,Pand Hσ2,PAre respectively M1P×1,M2P×1,M2P×1, M1P×1,M2P×M,M1P is multiplied by M. It can be seen that when the parameter P is small, the sub-filtersh σ1,PAndh σ2,Pis much shorter than the length of the filter h.
Desired signal X1And its estimated value Z has a Mean Square Error (MSE) of
Figure BDA0003437550130000106
Wherein the content of the first and second substances,
Figure BDA0003437550130000107
e (-) represents the mathematical expectation that,
Figure BDA0003437550130000108
representing the real part, superscript*Representing a complex conjugate.
To derive the filter in the present invention, the MSE is written as follows:
Figure BDA0003437550130000109
wherein the content of the first and second substances,
Figure BDA0003437550130000111
Figure BDA0003437550130000112
Figure BDA0003437550130000113
Figure BDA0003437550130000114
it should be noted that when the parameter P is small, the matrix phiyσ1,p(M2P×M2P), and Φyσ2,p(M1P×M1P) is much smaller than the matrix phiyDimension of (M × M).
This can bring about two advantages:
1) compared with solving based on phiySolving the traditional multi-channel voice noise reduction filter based on phiyσ1,pAnd phiyσ2,pOf the inverse matrix ofh σ1,PAndh σ2,Pthe required complexity is significantly reduced;
2) compared to the estimated matrix phiyThe matrix phi can be estimated with fewer signal observation samplesyσ1,pAnd phiyσ2,pSo that the sub-filtersh σ1,PAnd are andh σ2,Pchanges in the statistical properties of the signal can be tracked more quickly.
Operating on an approximation filter, comprising: and obtaining the wiener filter by adopting an iterative solution mode.
Based on equations (27) and (28), it is difficult to derive the sub-filtersh σ1,PAndh σ2,Pclosed-form solution of (1). Therefore, the invention adopts an iterative solution mode. For this reason, when solving for one of the sub-filters, it is assumed that the other sub-filter is fixed, i.e. it is fixed
Figure BDA0003437550130000115
Figure BDA0003437550130000116
Sub-filterh σ1,PInitialization is as follows:
Figure BDA0003437550130000117
wherein the content of the first and second substances,
Figure BDA0003437550130000118
Figure BDA0003437550130000119
Figure BDA00034375501300001110
xpdefinition of (a) and ypSimilarly. It can be seen that hσ1,W,pWiener filter of length M for the p-th sub-matrix1
Applications of
Figure BDA00034375501300001111
Construction of
Figure BDA00034375501300001112
And brought into the formulae (29) and (30) to obtain
Figure BDA0003437550130000121
Figure BDA0003437550130000122
Substituting equations (38) and (39) into equation (34) may result:
Figure BDA0003437550130000123
will be the pair of formula (40)
Figure BDA0003437550130000124
Derivation and zeroing of the result to obtain a sub-filter
Figure BDA0003437550130000125
Wiener solution of (a):
Figure BDA0003437550130000126
applications of
Figure BDA0003437550130000127
Construction of
Figure BDA0003437550130000128
And brought into the formulae (31) and (32) to obtain:
Figure BDA0003437550130000129
Figure BDA00034375501300001210
will be provided with
Figure BDA00034375501300001211
And
Figure BDA00034375501300001212
in the formula (33):
Figure BDA00034375501300001213
based on (44), the sub-filter can be obtained
Figure BDA00034375501300001214
Wiener solution of (a):
Figure BDA00034375501300001215
in the above manner, when iterating to the nth step, we have:
Figure BDA00034375501300001216
wherein the content of the first and second substances,
Figure BDA00034375501300001217
Figure BDA00034375501300001218
Figure BDA00034375501300001219
at this time, the iterative wiener filter in the present application can be obtained:
Figure BDA00034375501300001220
the embodiment of the invention provides a voice noise reduction device based on a microphone array, which comprises a signal acquisition module 201, a signal preprocessing module 202, a statistical characteristic estimation module 203, a sub-filter determination module 204, a frequency domain noise reduction filter determination module 205, a noise reduction module 206 and a time domain noise reduction voice signal determination module 207, as shown in fig. 2. A signal obtaining module 201, configured to obtain a voice signal with noise; a signal preprocessing module 202, configured to preprocess the voice signal with noise, and determine a frequency domain voice signal with noise; a statistical characteristic estimating module 203, configured to estimate statistical characteristics of the frequency domain noisy speech signal and statistical characteristics of the noise signal; a sub-filter determining module 204, configured to divide the microphone array into a plurality of sub-arrays, and estimate a plurality of sub-filters respectively; a frequency domain noise reduction filter determining module 205, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters; the noise reduction module 206 is configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter, and determine a frequency-domain noise-reduced speech signal; and a time domain noise reduction voice signal determination module 207, configured to convert the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
Fig. 4 is a comparison of the complexity of the method provided by the present application with the complexity of the conventional method, fig. 5 is a graph of the mean square error of the method provided by the present application as a function of the number of iterations, and fig. 6 is a graph of the mean square error of the method provided by the present application and the conventional method as a function of time when the statistical properties of noise suddenly change. The method provided by the application effectively reduces the complexity and improves the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.
The embodiment of the invention provides a server for voice noise reduction based on a microphone array, as shown in fig. 3, comprising a memory 301 and a processor 302; the memory 301 is used to store computer executable instructions; processor 302 is used to execute computer-executable instructions.
The embodiment of the invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores executable instructions, and the computer can execute the executable instructions.
The storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk (Hard Disk Drive) or a Memory Card (HDD). The memory may be used to store computer program instructions.
Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive efforts. The sequence of steps recited in this embodiment is only one of many steps performed and does not represent a unique order of execution. When an actual apparatus or client product executes, it can execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the method shown in this embodiment or the figures.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.
The methods, apparatus or modules described herein may be implemented in computer readable program code embodied in a controller in any suitable manner, e.g., the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded micro-controllers, and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solution of the present application, which essentially or contributes to the prior art, may be embodied in the form of a software product, and may also be embodied in the implementation process of data migration. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computing device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications can be made to the technical solutions described in the foregoing embodiments, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure.

Claims (8)

1. A speech noise reduction method based on microphone array is characterized by comprising
Acquiring a voice signal with noise;
preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;
estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;
dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;
determining a frequency domain noise reduction filter according to the plurality of sub-filters;
carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;
and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
2. The method of claim 1, wherein the pre-processing the noisy speech signal comprises: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.
3. The method according to claim 1, wherein said estimating the statistical properties of the frequency-domain noisy speech signal comprises estimating the statistical properties of the noisy speech signal according to a time-smoothed estimation.
4. The method of claim 1, wherein estimating the statistical properties of the noise signal comprises estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.
5. The method of claim 1, wherein the dividing the microphone array into a plurality of sub-arrays and estimating a plurality of sub-filters separately comprises iteratively estimating the plurality of sub-filters using a low rank architecture of a noise reduction filter.
6. A speech noise reduction device based on microphone array is characterized by comprising
The signal acquisition module is used for acquiring a voice signal with noise;
the signal preprocessing module is used for preprocessing the voice signal with the noise and determining a frequency domain voice signal with the noise;
the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;
the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;
a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;
the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;
and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.
7. A microphone array based speech noise reduction server comprising a memory and a processor;
the memory is to store computer-executable instructions;
the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-5.
8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-5.
CN202111621218.5A 2021-12-28 2021-12-28 Voice noise reduction method and device based on microphone array and storage medium Pending CN114373475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111621218.5A CN114373475A (en) 2021-12-28 2021-12-28 Voice noise reduction method and device based on microphone array and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111621218.5A CN114373475A (en) 2021-12-28 2021-12-28 Voice noise reduction method and device based on microphone array and storage medium

Publications (1)

Publication Number Publication Date
CN114373475A true CN114373475A (en) 2022-04-19

Family

ID=81142867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111621218.5A Pending CN114373475A (en) 2021-12-28 2021-12-28 Voice noise reduction method and device based on microphone array and storage medium

Country Status (1)

Country Link
CN (1) CN114373475A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917919A (en) * 1995-12-04 1999-06-29 Rosenthal; Felix Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment
WO2006114100A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Estimation of signal from noisy observations
CN110517701A (en) * 2019-07-25 2019-11-29 华南理工大学 A kind of microphone array voice enhancement method and realization device
CN112802490A (en) * 2021-03-11 2021-05-14 北京声加科技有限公司 Beam forming method and device based on microphone array
CN113409804A (en) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5917919A (en) * 1995-12-04 1999-06-29 Rosenthal; Felix Method and apparatus for multi-channel active control of noise or vibration or of multi-channel separation of a signal from a noisy environment
WO2006114100A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Estimation of signal from noisy observations
CN110517701A (en) * 2019-07-25 2019-11-29 华南理工大学 A kind of microphone array voice enhancement method and realization device
CN113409804A (en) * 2020-12-22 2021-09-17 声耕智能科技(西安)研究院有限公司 Multichannel frequency domain speech enhancement algorithm based on variable-span generalized subspace
CN112802490A (en) * 2021-03-11 2021-05-14 北京声加科技有限公司 Beam forming method and device based on microphone array

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANGHUI WANG ET AL: "Multichannel Iterative Noise Reduction Filters in the Short-Time-Fourier-Transform Domain Based on Kronecker Product Decomposition", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING ( VOLUME: 29), pages 2725 - 2739 *
何成林, 杜利民, 马昕: "麦克风阵列语音增强的研究", 计算机工程与应用, no. 24 *

Similar Documents

Publication Publication Date Title
WO2020042370A1 (en) Noise reduction method for multicomponent seismic data vector, and noise reduction device for multicomponent seismic data vector
Boashash et al. Robust multisensor time–frequency signal processing: A tutorial review with illustrations of performance enhancement in selected application areas
JP2007526511A (en) Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain
EP1913496A2 (en) System and method for optimizing the operation of an oversampled discrete fourier transform filter bank
Jain et al. Blind source separation and ICA techniques: a review
Ali Khan et al. Sparsity-aware adaptive directional time–frequency distribution for source localization
Yang et al. A noise reduction method based on LMS adaptive filter of audio signals
Xie et al. Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation
Das et al. ICA methods for blind source separation of instantaneous mixtures: A case study
CN114373475A (en) Voice noise reduction method and device based on microphone array and storage medium
Albataineh et al. A RobustICA-based algorithmic system for blind separation of convolutive mixtures
Hong et al. Independent component analysis based single channel speech enhancement
Kulkarni et al. Periodicity-aware signal denoising using Capon-optimized Ramanujan filter banks and pruned Ramanujan dictionaries
Makino et al. Underdetermined blind source separation using acoustic arrays
Shivamurti et al. Analytic discrete cosine harmonic wavelet transform (ADCHWT) and its application to signal/image denoising
Onativia et al. Finite dimensional FRI
Cichocki Blind source separation: new tools for extraction of source signals and denoising
CN111899754A (en) Speech separation effect algorithm of GA _ FastICA algorithm
Shivamurti et al. A dual tree complex discrete cosine Harmonic wavelet transform (ADCHWT) and its application to signal/image denoising
Ganage et al. DTCWT-ICA based Improved DOA Estimation of Closely Spaced and Coherent Signals with Uniform Linear Array.
Pukenas Three-mode biomedical signal denoising in the local phase space based on a tensor approach
Ling et al. Optimal design of Hermitian transform and vectors of both mask and window coefficients for denoising applications with both unknown noise characteristics and distortions
CN117894332A (en) Time domain multichannel voice noise reduction method based on Cronecker decomposition
EP2840570A1 (en) Enhanced estimation of at least one target signal
Yue et al. Application of the Recursive Hybird Myriad Filter in Seismic Data Denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination