CN114373475A

CN114373475A - Voice noise reduction method and device based on microphone array and storage medium

Info

Publication number: CN114373475A
Application number: CN202111621218.5A
Authority: CN
Inventors: 王向辉; 高朴; 韩冬; 陈捷; 王瑞琪; 王姣; 李梅
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-19

Abstract

The application discloses a voice noise reduction method based on a microphone array, which solves the problems that the complexity of filter solving in the prior art is rapidly increased along with the increase of the length of a filter, and the tracking capability of the change of the statistical characteristics of voice signals and noise is reduced, and comprises the following steps: acquiring a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristics of the frequency domain voice signal with noise and the noise signal; dividing a microphone array into a plurality of sub-arrays, respectively estimating a plurality of sub-filters, and determining a frequency domain noise reduction filter; the noise reduction processing is carried out on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and the frequency domain voice signal with noise is converted into a time domain noise reduction voice signal, so that the signal covariance matrix dimension required in the solving process of the filter is smaller, the complexity of the voice noise reduction filter is obviously reduced, and the tracking capability of the filter on the change of the statistical characteristics of the voice signal and the noise is improved.

Description

Voice noise reduction method and device based on microphone array and storage medium

Technical Field

The present disclosure relates to the field of microphone arrays, and in particular, to a method and an apparatus for reducing noise of speech based on a microphone array, and a storage medium.

Background

The voice noise reduction plays a significant role in systems such as intelligent voice, man-machine interaction, teleconferencing, hearing-aid equipment, vehicle-mounted, virtual reality, in-situ communication and military voice communication with ultrahigh background noise, and the experience of voice interaction is directly influenced by the performance of the voice noise reduction.

Early voice interactive systems were usually equipped with only one microphone, and the corresponding noise reduction method was noise reduction for single channel voice. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has larger limitation. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, multi-channel speech noise reduction methods have the potential to significantly improve the signal-to-noise ratio with little or no introduction of speech distortion. The classic multi-channel voice noise reduction method comprises multi-channel wiener filtering, multi-channel compromise filtering, minimum variance undistorted response filtering, linear constraint minimum variance filtering, generalized sidelobe cancellation and the like. In recent years, researchers at home and abroad propose a voice noise reduction method based on deep learning, which can obtain better performance, but because the generalization capability of the method is generally weaker, the method is currently difficult to be applied to an actual system in a large range.

To achieve better voice noise reduction performance, more microphones are usually required to obtain richer space-time-frequency information. But this also generally means that longer filters need to be designed. The use of longer filters brings about the following two problems. First, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, so more observation samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in a reduced ability to track changes in the statistical characteristics of speech signals and noise, and fails to better handle the non-stationary noise that is common in practice.

Disclosure of Invention

The embodiment of the application provides a voice noise reduction method based on a microphone array, and two problems caused by longer filter length in the prior art are solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the filter length; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The method and the device have the advantages that the complexity of solving the filter is obviously reduced, the signal covariance matrix dimension required in the solving process of the filter is smaller, and therefore the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise is improved.

In a first aspect, an embodiment of the present invention provides a speech noise reduction method based on a microphone array, where the method includes:

acquiring a voice signal with noise;

preprocessing the voice signal with noise to determine a frequency domain voice signal with noise;

estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal;

dividing a microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters;

determining a frequency domain noise reduction filter according to the plurality of sub-filters;

performing noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;

and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

With reference to the first aspect, in a possible implementation manner, the preprocessing the noisy speech signal includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.

With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the frequency-domain noisy speech signal includes estimating the statistical characteristic of the noisy speech signal according to a time smoothing estimation manner.

With reference to the first aspect, in a possible implementation manner, the estimating the statistical characteristic of the noise signal includes estimating the statistical characteristic of the noise signal according to an existing noise estimation algorithm.

With reference to the first aspect, in a possible implementation manner, the dividing the microphone array into a plurality of sub-arrays and respectively estimating the plurality of sub-filters includes iteratively estimating the plurality of sub-filters by using a low rank structure of a noise reduction filter.

In a second aspect, an embodiment of the present invention provides a speech noise reduction apparatus based on a microphone array, which is characterized by including

The signal acquisition module is used for acquiring a voice signal with noise;

the signal preprocessing module is used for preprocessing the voice signal with noise and determining a frequency domain voice signal with noise;

the statistical characteristic estimation module is used for estimating the statistical characteristic of the frequency domain voice signal with noise and estimating the statistical characteristic of the noise signal;

the sub-filter determining module is used for dividing the microphone array into a plurality of sub-arrays and respectively estimating a plurality of sub-filters;

a frequency domain noise reduction filter determining module, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters;

the noise reduction module is used for carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter and determining a frequency domain noise reduction voice signal;

and the time domain noise reduction voice signal determination module is used for converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

With reference to the second aspect, in a possible implementation manner, the signal preprocessing module includes: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.

With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: the method comprises the step of estimating the statistical characteristics of the noisy speech signal according to a time smoothing estimation mode.

With reference to the second aspect, in a possible implementation manner, the statistical property estimation module includes: including estimating the statistical properties of the noise signal according to existing noise estimation algorithms.

With reference to the second aspect, in a possible implementation manner, the frequency domain noise reduction filter determining module includes: and iteratively estimating a plurality of sub-filters by using a low-rank structure of the noise reduction filter.

In a third aspect, an embodiment of the present invention provides a voice noise reduction server based on a microphone array, including a memory and a processor;

the memory is to store computer-executable instructions;

the processor is configured to execute the computer-executable instructions to implement the method according to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, which stores executable instructions, and when the computer executes the executable instructions, the computer can implement the method according to any one of the first aspect.

One or more technical schemes provided in the embodiment of the invention have at least the following technical effects or advantages:

the embodiment of the invention adopts a voice noise reduction method based on a microphone array, which comprises the steps of obtaining a voice signal with noise; preprocessing a voice signal with noise, and determining a frequency domain voice signal with noise; estimating the statistical characteristic of the frequency domain voice signal with noise, and estimating the statistical characteristic of the noise signal; dividing the microphone array into a plurality of sub-arrays, and respectively estimating a plurality of sub-filters; determining a frequency domain noise reduction filter according to the plurality of sub-filters; carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal; and converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal. The two problems caused by the fact that the length of the filter is long in the prior art are effectively solved, namely, firstly, the complexity of solving the filter is rapidly increased along with the increase of the length of the filter; secondly, the dimension of the signal covariance matrix required in the solving process of the filter is larger, so more observation samples are required to estimate the covariance matrix of the signal for calculating the coefficient of the filter, which results in the reduced tracking capability for the statistical characteristic changes of the speech signal and the noise, and the nonstationary noise which is common in practice cannot be better handled. The embodiment of the invention obviously reduces the complexity of solving the filter, and the dimension of the signal covariance matrix required in the solving process of the filter is smaller, so that the covariance matrix can be estimated by using fewer signal observation samples, thereby improving the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present invention or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating steps of a method for reducing noise in speech based on a microphone array according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an apparatus for microphone array based speech noise reduction according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a server for microphone array based speech noise reduction according to an embodiment of the present disclosure;

FIG. 4 is a graph comparing the complexity of a method provided by an embodiment of the present application with the complexity of a conventional method;

FIG. 5 is an image of the mean square error of the method provided by the embodiments of the present application as a function of the number of iterations;

fig. 6 is a comparison graph of mean square error over time of the method provided by the embodiment of the present application and the conventional method when the statistical characteristics of noise suddenly change.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In early voice interactive systems, only one microphone was usually provided, and the corresponding voice noise reduction method was single-channel voice noise reduction. The single-channel voice noise reduction method has the advantages of simplicity in implementation, high operation efficiency and the like, can obtain a certain effect, and has great limitations. Research shows that under certain conditions, voice distortion is introduced into single-channel noise reduction, and the larger the signal-to-noise ratio is, the larger the introduced voice distortion is. In contrast, the multi-channel speech noise reduction method has more potential, and the signal-to-noise ratio is remarkably improved on the premise of introducing little or no speech distortion. Multi-channel speech noise reduction typically requires more microphones to be equipped to acquire richer space-time-frequency information. But this in turn leads to two problems, first, the complexity of solving the filter increases rapidly with increasing filter length; second, the dimension of the signal covariance matrix required in the filter solution process is larger, and therefore more measurement samples are required to estimate the signal covariance matrix for calculating the filter coefficients, which results in the degradation of its ability to track statistical variations of speech signals and noise, and the non-stationary noise that is common in practice cannot be better handled.

An embodiment of the present invention provides a speech noise reduction method based on a microphone array, as shown in fig. 1, the method includes the following steps,

step S101, acquiring a voice signal with noise.

And step S102, preprocessing the voice signal with noise, and determining the voice signal with noise in a frequency domain.

Step S103, estimating the statistical characteristics of the frequency domain voice signal with noise and estimating the statistical characteristics of the noise signal.

Step S104, the microphone array is divided into a plurality of sub-arrays, and a plurality of sub-filters are estimated respectively.

Step S105, determining a frequency domain noise reduction filter according to the plurality of sub-filters.

And step S106, carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter, and determining the frequency domain noise reduction voice signal.

Step S107, converting the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

By combining the steps of the method, a more reasonable filter is constructed, so that the phenomenon that a very long filter is integrally calculated like the conventional multi-channel voice noise reduction method is avoided, and a shorter filter means less filter coefficients. Therefore, compared with the existing method, the method provided by the application obviously reduces the complexity of solving the voice noise reduction filter, and the dimension of the signal covariance matrix required in the solving process of the filter is small, so that the covariance matrix can be estimated by using fewer signal observation samples, and the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise can be improved.

In a specific embodiment of the present application, we represent the time-domain noisy speech signal as,

y_m(t)＝x_m(t)+v_m(t),m＝1,2,...,M (1)

wherein, y_m(t) represents the noisy speech signal received by the mth microphone; x is the number of_m(t) represents a clean speech signal received by the mth microphone; v. of_m(t) represents the background noise signal received by the mth microphone; t represents a discrete time point; m represents the number of microphones.

In a specific embodiment of the present application, it is assumed that all signals are zero-mean, bandwidth signals, while the speech signal and the noise signal are assumed to be uncorrelated. The purpose of voice noise reduction is to recover a clean voice signal from a noisy voice signal. For the sake of no loss of generality, in the present application, the microphone 1 is set as a reference microphone, i.e. x is set₁(t) is the desired signal (the signal that needs to be recovered).

Preprocessing a noisy speech signal, comprising: performing framing and windowing on the noisy speech signal, and then performing fast Fourier transform to obtain a frequency domain noisy speech signal, which is expressed as:

wherein w represents a window function; t represents the length of the window function (which is also the length of the speech signal frame); l represents a step length between two adjacent frames; zero mean random variable Y_m(k,n),X_m(k,n),V_m(k, n) are each y_m(t),x_m(t),v_m(t), Fourier transform values at the kth band of the nth frame, where K ∈ {0, 1.

For convenience, the signal model is represented in vector form as

y(k,n)＝x(k,n)+v(k,n) (3)

Wherein the content of the first and second substances,

y(k,n)＝[Y₁(k,n),Y₂(k,n),...,Y_M(k,n)]^T (4)

x (k, n) and x (k, n) are defined similarly to y (k, n), with the superscript T being the transpose.

In the conventional method, it is usually necessary to design a filter h (k, n) with a length of M to implement speech noise reduction, that is:

Z(k,n)＝h^H(k,n)y(k,n) (5)

wherein

h(k,n)＝[H₁(k,n),H₂(k,n),...,H_M(k,n)]^T (6)

Z (k, n) is X₁An estimate of (k, n). However, when M is large, two problems described in the background art are caused.

And estimating the statistical characteristics of the frequency domain voice signal with the noise, wherein the estimation of the statistical characteristics of the voice signal with the noise is carried out according to a time smoothing mode. Estimating the statistical properties of the noise signal includes estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.

Since the speech signal and the noise are uncorrelated, the variance of Z (k, n) can be expressed as:

Φ_Z(k,n)＝h^H(k,n)Φ_y(k,n)h(k,n)

＝h^H(k,n)Φ_x(k,n)h(k,n)+h^H(k,n)Φ_v(k,n)h(k,n) (7)

wherein phi_a(k,n)＝E[a(k,n)a^H(k,n)]A (k, n) ∈ { y (k, n), x (k, n), v (k, n) }. In general, we can estimate Φ by applying temporal smoothing_y(k, n), and phi_v(k, n) can be obtained according to the noise estimation method in the prior art. To obtain phi_y(k, n) and phi_vAfter the estimated value of (k, n), the value can be passed through_y(k,n)-Φ_v(k, n) to give phi_x(k,n)。

To derive the method of the invention, the microphone array is divided into M₂A plurality of sub-arrays, each sub-array having M₁One microphone, i.e. M ═ M₁*M ₂1 st to M₁The microphones forming a first sub-array, Mth₁+1 to 2M₁The microphones form a second sub-array, and so on. In the present invention, we assume M₁≤M₂. Also, the filter h (k, n) can be decomposed in the manner described above, i.e.

Wherein the content of the first and second substances,

at this time, the sub-filter h may be switched_m(k,n),m＝1,2,...,M₂Form a dimension of M₁×M₂The matrix of (a), namely:

H(k,n)＝[h₁(k,n),h₂(k,n),...,h_M2(k,n)] (10)

note that H (k, n) ═ vec [ H (k, n) ], vec (·) represents a vectorization operator of the matrix. For simplicity, the symbols k and n will be dropped where no ambiguity will arise later. The matrix H is subjected to Singular Value Decomposition (SVD), which can be decomposed into:

wherein the content of the first and second substances,

is a M₂×M₂The matrix of (a) is,

is a M₂×M₂Of the matrix of (a). H₁And H₂For two orthogonal matrices, sigma is M₁×M₂The diagonal matrix of (a) whose diagonal elements are non-negative real numbers. In this application, they are arranged in descending order, i.e. from large to small

Superscript H is conjugatedAnd (4) transposing the characters.

The noisy speech signals received by each channel are strongly correlated, so the sub-filters h_m(k,n),m＝1,2,...,M₂Are also typically strongly correlated, resulting in matrix H typically not being a row full rank matrix. The matrix H can usually be well approximated with the first P largest singular values and the corresponding singular vectors, i.e.:

wherein the content of the first and second substances,

it should be noted that

The resulting ambiguity has no effect on the matrix H. Accordingly, the filter h can be approximately expressed as:

it should be noted that when P ═ M₁When h is present_P＝h。

Applying the relation:

can be combined with_PWrite as:

wherein the content of the first and second substances,

size of MxM₂，

Size of MxM₁. At this time, the output value Z (k, n) of the filter can be written as:

wherein the content of the first and second substances,

H_σ1,P＝[H_σ1,1 H_σ1,2...H_σ1,P]^H (24)

H_σ2,P＝[H_σ2,1 H_σ2,2...H_σ2,P]^H (25)

h _σ1,P，h _σ2,P，y_σ1,P(t)，y_σ2,P(t),H_σ1,Pand H_σ2,PAre respectively M₁P×1，M₂P×1，M₂P×1， M₁P×1，M₂P×M，M₁P is multiplied by M. It can be seen that when the parameter P is small, the sub-filtersh _σ1,PAndh _σ2,Pis much shorter than the length of the filter h.

Desired signal X₁And its estimated value Z has a Mean Square Error (MSE) of

Wherein the content of the first and second substances,

e (-) represents the mathematical expectation that,

representing the real part, superscript^*Representing a complex conjugate.

To derive the filter in the present invention, the MSE is written as follows:

wherein the content of the first and second substances,

it should be noted that when the parameter P is small, the matrix phi_yσ1,p(M₂P×M₂P), and Φ_yσ2,p(M₁P×M₁P) is much smaller than the matrix phi_yDimension of (M × M).

This can bring about two advantages:

1) compared with solving based on phi_ySolving the traditional multi-channel voice noise reduction filter based on phi_yσ1,pAnd phi_yσ2,pOf the inverse matrix ofh _σ1，PAndh _σ2，Pthe required complexity is significantly reduced;

2) compared to the estimated matrix phi_yThe matrix phi can be estimated with fewer signal observation samples_yσ1,pAnd phi_yσ2,pSo that the sub-filtersh _σ1，PAnd are andh _σ2，Pchanges in the statistical properties of the signal can be tracked more quickly.

Operating on an approximation filter, comprising: and obtaining the wiener filter by adopting an iterative solution mode.

Based on equations (27) and (28), it is difficult to derive the sub-filtersh _σ1，PAndh _σ2，Pclosed-form solution of (1). Therefore, the invention adopts an iterative solution mode. For this reason, when solving for one of the sub-filters, it is assumed that the other sub-filter is fixed, i.e. it is fixed

Sub-filterh _σ1，PInitialization is as follows:

wherein the content of the first and second substances,

x_pdefinition of (a) and y_pSimilarly. It can be seen that h_σ1,W,pWiener filter of length M for the p-th sub-matrix₁。

Applications of

Construction of

And brought into the formulae (29) and (30) to obtain

Substituting equations (38) and (39) into equation (34) may result:

will be the pair of formula (40)

Derivation and zeroing of the result to obtain a sub-filter

Wiener solution of (a):

applications of

Construction of

And brought into the formulae (31) and (32) to obtain:

will be provided with

And

in the formula (33):

based on (44), the sub-filter can be obtained

Wiener solution of (a):

in the above manner, when iterating to the nth step, we have:

wherein the content of the first and second substances,

at this time, the iterative wiener filter in the present application can be obtained:

the embodiment of the invention provides a voice noise reduction device based on a microphone array, which comprises a signal acquisition module 201, a signal preprocessing module 202, a statistical characteristic estimation module 203, a sub-filter determination module 204, a frequency domain noise reduction filter determination module 205, a noise reduction module 206 and a time domain noise reduction voice signal determination module 207, as shown in fig. 2. A signal obtaining module 201, configured to obtain a voice signal with noise; a signal preprocessing module 202, configured to preprocess the voice signal with noise, and determine a frequency domain voice signal with noise; a statistical characteristic estimating module 203, configured to estimate statistical characteristics of the frequency domain noisy speech signal and statistical characteristics of the noise signal; a sub-filter determining module 204, configured to divide the microphone array into a plurality of sub-arrays, and estimate a plurality of sub-filters respectively; a frequency domain noise reduction filter determining module 205, configured to determine a frequency domain noise reduction filter according to the plurality of sub-filters; the noise reduction module 206 is configured to perform noise reduction processing on the frequency-domain noisy speech signal according to the frequency-domain noise reduction filter, and determine a frequency-domain noise-reduced speech signal; and a time domain noise reduction voice signal determination module 207, configured to convert the frequency domain noise reduction voice signal into a time domain noise reduction voice signal.

Fig. 4 is a comparison of the complexity of the method provided by the present application with the complexity of the conventional method, fig. 5 is a graph of the mean square error of the method provided by the present application as a function of the number of iterations, and fig. 6 is a graph of the mean square error of the method provided by the present application and the conventional method as a function of time when the statistical properties of noise suddenly change. The method provided by the application effectively reduces the complexity and improves the tracking capability of the filter on the change of the statistical characteristics of the voice signals and the noise.

The embodiment of the invention provides a server for voice noise reduction based on a microphone array, as shown in fig. 3, comprising a memory 301 and a processor 302; the memory 301 is used to store computer executable instructions; processor 302 is used to execute computer-executable instructions.

The embodiment of the invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores executable instructions, and the computer can execute the executable instructions.

The storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk (Hard Disk Drive) or a Memory Card (HDD). The memory may be used to store computer program instructions.

Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive efforts. The sequence of steps recited in this embodiment is only one of many steps performed and does not represent a unique order of execution. When an actual apparatus or client product executes, it can execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the method shown in this embodiment or the figures.

The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. The functionality of the modules may be implemented in the same one or more software and/or hardware implementations of the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or sub-units in combination.

The methods, apparatus or modules described herein may be implemented in computer readable program code embodied in a controller in any suitable manner, e.g., the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded micro-controllers, and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

Some of the modules in the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary hardware. Based on such understanding, the technical solution of the present application, which essentially or contributes to the prior art, may be embodied in the form of a software product, and may also be embodied in the implementation process of data migration. The computer software product may be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computing device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. All or portions of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the present application; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications can be made to the technical solutions described in the foregoing embodiments, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure.

Claims

1. A speech noise reduction method based on microphone array is characterized by comprising

Acquiring a voice signal with noise;

carrying out noise reduction processing on the frequency domain voice signal with noise according to the frequency domain noise reduction filter to determine a frequency domain noise reduction voice signal;

2. The method of claim 1, wherein the pre-processing the noisy speech signal comprises: and performing frame division and windowing on the voice signal with the noise, and then performing fast Fourier transform.

3. The method according to claim 1, wherein said estimating the statistical properties of the frequency-domain noisy speech signal comprises estimating the statistical properties of the noisy speech signal according to a time-smoothed estimation.

4. The method of claim 1, wherein estimating the statistical properties of the noise signal comprises estimating the statistical properties of the noise signal according to an existing noise estimation algorithm.

5. The method of claim 1, wherein the dividing the microphone array into a plurality of sub-arrays and estimating a plurality of sub-filters separately comprises iteratively estimating the plurality of sub-filters using a low rank architecture of a noise reduction filter.

6. A speech noise reduction device based on microphone array is characterized by comprising

The signal acquisition module is used for acquiring a voice signal with noise;

the signal preprocessing module is used for preprocessing the voice signal with the noise and determining a frequency domain voice signal with the noise;

7. A microphone array based speech noise reduction server comprising a memory and a processor;

the memory is to store computer-executable instructions;

the processor is configured to execute the computer-executable instructions to implement the method of any of claims 1-5.

8. A computer-readable storage medium having stored thereon executable instructions that, when executed by a computer, are capable of implementing the method of any one of claims 1-5.