CN101447190A

CN101447190A - Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction

Info

Publication number: CN101447190A
Application number: CNA200810068000XA
Authority: CN
Inventors: 邹月娴; 赵璟; 万波
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2008-06-25
Filing date: 2008-06-25
Publication date: 2009-06-03

Abstract

The invention discloses a voice enhancement method employing a combination of nesting-subarray-based post filtering and spectrum-subtraction and is suitable for indoor environment, comprising the enhancement of multi-channel voice signal in vehicle environment; as the problems of unstable the broadband of the voice signals, the inconsistent frequency response of the microphone-array-based multi-channel voice-enhancement method to the voice signal and the correlation among all-channel noise in actual noise-field environment are considered, by utilizing the microphone array nested by the subarrays with different spacing, the voice signals are collected; and the voice signals formed by subarray beams are divided into a high-frequency section and a low-frequency section, different voice-enhancement algorism are adopted for carrying out the treatment; all the advantages are complementary with each other, thus improving the effect of voice enhancement.

Description

Post-filtering and spectrum-subtraction associating sound enhancement method based on nested subarray

Technical field

The present invention relates to computer speech signal Processing field, more particularly, the present invention relates to a kind of post-filtering and spectrum-subtraction associating sound enhancement method, be particularly useful for the enhancing of voice signal in the indoor noisy environment based on nested subarray.

Background technology

It is exactly that the voice that band is made an uproar are carried out relevant treatment that voice strengthen (Speech Enhancement) technology, therefrom extract pure as far as possible raw tone, to improve the receiving end voice quality, improve sharpness, intelligibility and the comfort level of voice, make the people be easy to accept or improve the performance of speech processing system.Be commonly used in fields such as automatic speech recognition system, vehicle-mounted hands-free telephone, multimedia conferencing, radio communication, scene recording, military eavesdropping, hearing-aid device and intelligent robot.The research and development of speech enhancement technique has the history of four more than ten years, and traditional method all is based on the system of single microphone, and problems such as its pickup scope, directive property variation, noise inhibiting ability all are subjected to certain restriction.Adaptive voice enhancement techniques based on microphone array has merged multinomial gordian techniquies such as array signal process technique, voice process technology and multi-channel signal acquiring technology.Its technical advantage is that it not only can utilize the time domain and the frequency domain characteristic of voice signal, and can utilize its spatial information to realize elimination to noise, reaches to strengthen and the purpose of purifying voice.Based on the typical workflow of the sound enhancement method of microphone array as shown in Figure 1, specifically describe as follows:

1) according to application requirements design microphone array array structure;

2) utilize time, frequency, the spatial information of the multicenter voice signal that microphone array receives, initial, the end caps of voice signal detected, the time delay between the estimating channel simultaneously, estimated signal attitude information;

3) adopt voice enhancement algorithm that multi channel signals is handled, realize the enhancing of voice signal.

Abovementioned steps 1) the microphone array structural design in is a critical step.Traditional array structure has even linear array, non-homogeneous linear array, uniform circular array and sphere array etc.The selection of the design of array structure and multi channel signals model has substantial connection.

The array signal model is divided near field model and far field model, and its maximum difference is: in the model of far field, the signal amplitude that each array element receives is considered to identical, and phase differential is arranged; The near field model then needs to consider the signal amplitude decay that the difference of travel path is brought, and promptly the near field model also must be considered the distance of information source to each microphone except the arrival direction that will consider information source.Under the situation of near field, adopt the spherical wave front model usually, replace the plane wave front model in far field.

Similar with time domain sampling theorem, in order to prevent the space aliasing phenomenon, spatial sampling also needs to meet some requirements based on the microphone array sensor, is called the spatial sampling theorem, is described as formula (1):

d≤λ/2 (1)

Wherein, d is the air line distance between adjacent microphone array element, and λ is the wavelength of sound wave.Have only the spatial sampling rate enough high, just can avoid the space to mix repeatedly.Yet, if the array element distance is too small, be just sampling, adopt more microphone sensor that more signal space information can not be provided.

In addition, distance also will influence the foundation of signal model between signal source and the microphone array.Definition r is the air line distance that sound source arrives the microphone array center, and L is the total length of linear microphone array.If satisfy formula (2), then meet far field condition; Otherwise, then need adopt the near field model.

|r|>>2L ²/λ (2)

For an even linear microphone array, adopt the far field plane wave model, then the output discrete signal of m microphone can be expressed as:

x _m[n]＝s[n-Δn _m]+η _m[n] (3)

Wherein, s[n] be sound-source signal, Δ n _mBe that signal and the sample point between the sound-source signal that m microphone receives postpones η _m[n] is m the noise signal that microphone receives.

Δ τ _mBe the signal that receives of m microphone and the time delay between the sound-source signal, following relational expression then arranged:

Δn _m＝f _s·Δτ _m＝f _s·r/c (4)

In the formula (4), f _sBe sample frequency, c is the speed of sound wave at spatial transmission.

Abovementioned steps 2) sound end in detects (Voice Activity Detection, VAD) the visual different phonetic enhancement algorithms of step or increase or subtract.The VAD method of robust is estimated the statistical nature of realizing noise signal, and the performance of follow-up voice enhancement algorithm all has important effect.Usual practice is the VAD method based on short-time energy that adopts single passage, based on the VAD method of zero-crossing rate, based on VAD method of linear prediction system or the like.In addition, based on commonly used the having of the end-point detecting method of array structure: based on the VAD method of Beam-former, based on the VAD method of phase vectors and based on the SPACE V AD method of GSC.

Abovementioned steps 3) in, speech enhancement technique mainly can be divided into based on the method for single microphone with based on the method for microphone array, and wherein the most ripe also the most simple and effective based on the method for single microphone is the spectrum-subtraction voice enhancement algorithm; And based on having that the method for microphone array extensively adopts at present: a) fixed beam former (Fixed Beamformng, FBF); B) adaptive beam former (Adaptive Beamforming, ABF); C) has the wave beam forming method (Microphone Arrays with AdaptivePostfiltering) of self-adaptation postfilter; D) the generalized sidelobe null method (Generalized Sidelobe Canceller, GSC) or the like.In addition, the algorithm of some improved algorithms, associating also emerges in an endless stream.Commonly used have a sound enhancement method that spectrum-subtraction is combined with fixed beam former; Fixed beam former and self-adaptation post-filtering associating sound enhancement method; Generalized sidelobe based on the space transition function is eliminated sound enhancement method etc.Commonly used forms based on the time delay-wave beam that adds up

Aforesaid spectrum-subtraction (Spectral Subtraction, SS) voice enhancement algorithm is one of classical single passage sound enhancement method, is a kind ofly to be widely used in the voice enhancement algorithm that single channel contains additive noise by what Steven professor F.Boll of Univ Utah USA (University of Utah) proposed in 1979.As shown in Figure 2, this method is subtracted each other processing by the amplitude spectrum in short-term to the noise signal of contaminated voice signal and estimation, obtains pure voice signal, and its effect is equivalent at transform domain noisy speech signal have been carried out certain equalization and handles.Yet, the frequency spectrum Gaussian distributed of noise in the reality, the frame power spectrum variation range of noise is very wide, and the ratio of the maximum in frequency domain, minimum value often reaches several magnitude, and the ratio of maximal value and average also reaches 6-8 doubly.Therefore, after deducting noise spectrum, have the remainder of bigger power spectrum component, on frequency spectrum, present the spike that occurs at random, form residual noise acoustically.This noise has certain rhythm fluctuating sense, is called " music noise ".In addition, the influence that the various piece of voice is subjected in spectrum cuts algorithm is different.Fricative is because its feature is similar to noise, and meeting and noise are suppressed together in processing procedure.The nasal sound energy is lower, and the amplitude and the noise of its power spectrum are approaching, strengthens effect and can not show a candle to voiced sound.The attenuation of spectrum-subtraction makes the non-voiced sound part and the HFS of voice weaken the reason that the intelligibility of voice descends after the enhancing that Here it is.

(Delay-and-Sum Beamformer is a kind of typical fixed beam former DSBF) to the time delay-Beam-former that adds up, and is divided into delay compensation and weighted sum two parts.As shown in Figure 3, adopt the far field model, suppose that noise is an additive noise, the signal that receives with the m passage is an example, and its expression formula is:

x_{m}^{'} [n] = s [n - {Δn}_{m}] + η_{m} [n] - - - (5)

Utilize the time delay algorithm for estimating to obtain the time delay of voice signal at each passage, adopt again delay compensation with each channel signal in time domain alignment, obtain:

x_{m} [n] = x_{m}^{'} [n + {Δn}_{m}] - - - (6)

Each channel signal is weighted summation, promptly obtains wave beam and form output signal:

y [n] = Σ_{m = 1}^{M} x_{m} [n] \cdot w_{m} [n] - - - (7)

In beamforming algorithm, time delay estimates it is the basis that multicenter voice strengthens accurately.Postpone-add up that Beam-former has that system is simple, algorithm robust, advantage that calculated amount is little, can be applicable to real system.This algorithm can obtain 10log in theory ₁₀The signal to noise ratio (S/N ratio) of M improves.So the voice that obtain strengthen, and then will adopt more microphone array element.In addition, this algorithm has hiding precondition, promptly needs to obtain precise time and postpones to estimate Δ n _m, incoming signal is narrow band signal, do not have space loss and reflected signal and reverb signal, the main deficiency of algorithm is: algorithm is for the situation of space more than a voice sound source or directivity noise, reverberation serious interference, its performance descends very fast, in addition, different frequency composition to signal, it responds different, and the low frequency part spatial resolution is poor usually, and HFS is better relatively.

1988, the output rear end that R.Zelinski has proposed to postpone-adding up Beam-former increases the method for a rearmounted adaptive wiener filter (Wiener Filter), has formed classical postfilter voice enhancement algorithm (Delay-and-Sum Beamforming with an Additional Postfiltering).Rearmounted adaptive filter method is in conjunction with linear adaptive beam former (ABF) and postfilter (Postfilter), utilize the spa-tial filter properties of linear ABF and the noncoherent noise rejection characteristic of postfilter, can reach the effect that spatial filtering and frequency filtering voice strengthen simultaneously, further improve output signal-to-noise ratio.

The effect of rearmounted auto adapted filtering is to adopt the adaptive wiener filter method further to estimate the target voice to the signal that delay-accumulation method obtains.Its main thought is hypothesis:

1) voice signal and the noise signal that receive of each passage is incoherent;

2) noise signal that different microphones receive in the array is incoherent;

3) power spectrum density of the noise signal that receives of each microphone is identical.

As shown in Figure 4, behind delay compensation, do Fourier transform and be transformed into frequency domain, the signal of each microphone channel comprises target voice signal and noise signal, after the weighting:

Y _m(f)＝W _m(f)[S(f)+η _m(f)] (8)

Y (f) = Σ_{m = 1}^{M} Y_{m} (f) - - - (9)

Based on aforementioned three hypothesis, spectral density and the interchannel mutual spectral density of calculating each passage respectively can obtain:

Φ_{yiyi} (f) = E {[W_{i} (f) (S (f) + η_{i} (f))] [W_{i}^{*} (f) (S (f) + η_{i} (f))]}

= E {[W_{i} (f) S (f) + W_{i} (f) η_{i} (f)] [W_{i}^{*} (f) S (f) + W_{i}^{*} (f) η_{i} (f)]}

= {| W_{i} (f) |}^{2} Φ_{ss} (f) + {| W_{i} (f) |}^{2} Φ_{ηiηi} (f)

= {| W_{i} (f) |}^{2} [Φ_{ss} (f) + Φ_{ηiηi} (f)]

(10)

Φ_{yiyj} (f) = E {[W_{i} (f) (S (f) + η_{i} (f))] [W_{j}^{*} (f) (S (f) + η_{j} (f))]}

= E {[W_{i} (f) S (f) + W_{i} (f) η_{i} (f)] [W_{j}^{*} (f) S (f) + W_{j}^{*} (f) η_{j} (f)]}

= E \{\begin{matrix} W_{i} (f) W_{j}^{*} (f) S (f) S (f) + W_{i} (f) W_{j}^{*} (f) S (f) η_{j} (f) + \\ W_{i} (f) W_{j}^{*} (f) η_{i} (f) S (f) + W_{i} (f) W_{j}^{*} (f) η_{i} (f) η_{j} (f) \end{matrix}\} - - - (11)

{= W}_{i} (f) W_{j}^{*} (f) Φ_{ss} (f)

Optimal transfer function expression formula according to S filter:

H (f) = \frac{Φ_{ss} (f)}{Φ_{ss} (f) + Φ_{ηη} (f)} - - - (12)

By asking echo signal and the autocorrelation spectrum density of noise signal and the molecule and the denominator that coherence spectra density can obtain transport function respectively of each passage of input.

Can obtain Φ respectively by formula (10) and formula (11) _Ss(f) and Φ _Ss(f)+Φ _{η η}(f), that is:

Φ_{ss} (f) = \frac{Φ_{yiyj} (f)}{W_{i} (f) W_{j}^{*} (f)} - - - (13)

Φ_{ss} (f) + Φ_{ηη} (f) = \frac{Φ_{yiyj} (f)}{{| W_{i} (f) |}^{2}} - - - (14)

Thereby, can obtain the transport function estimated value of rearmounted adaptive wiener filter:

Wherein, M represents number of active lanes,

For getting the real part computing, ^*Be adjoint operator, W _i(f) be the weight of the signal delay of each microphone channel-add up, that is:

W_{i} (f) = \frac{1}{4} e^{j \frac{- 2 πf}{c} (i - 1) d \cos φ} - - - (16)

Then the estimated value of the target voice signal of adaptive wiener filter output is:

Z (f) = \hat{H} (f) Y (f) - - - (17)

By above-mentioned formula as seen, rearmounted adaptive wiener filter method is not limited by the number of noise source.But this method is owing to be based on assumed condition 2), be that the noise signal that different microphones receive in the array is incoherent, and in fact, the cross correlation function of the noise signal that each passage of each microphone array receives only could be ignored under high frequency situations substantially, under the low frequency situation, the simple crosscorrelation of the noise signal that each passage receives is comparatively obvious, can not be left in the basket, thereby this method is the same with fixed beam formation algorithm, HFS enhancing effect for signal is better, and it is relatively poor that low frequency part strengthens effect.

As seen, spectrum-subtraction and post-filtering method respectively have quality, adopt a kind of method can't reach desirable voice separately and strengthen effect, need a kind of algorithm that all is suitable for for low frequency and high frequency voice signal to handle.

Summary of the invention

The objective of the invention is in order to solve at present in the multicenter voice enhancement techniques uniform array the inconsistent problem of the frequency response performance of wide band voice signal, and traditional sound enhancement method problem of also existing high band and low-frequency range to be difficult to take into account.

In order to solve the problems of the technologies described above, the present invention proposes a kind of sound enhancement method that combines with spectrum-subtraction based on the post-filtering of nested type subarray.The technical solution used in the present invention is:

The first step: design the collection that two nested microphone arrays of even subarray are used for multi channel signals; Described multicenter voice signal based on nested subarray comprises five passage voice signals at least;

Second step: detect initial, the end caps of voice signal, estimate the power spectrum of pure noise signal;

The 3rd step: estimated speech signal is in the time delay of each passage;

The 4th the step: each passage voice signal is carried out delay compensation, with each passage voice signal in time domain alignment;

The 5th step: each channel signal is transformed into frequency domain from time domain with Fourier transform;

The 6th step: estimate the auto-power spectrum of clean speech signal and the auto-power spectrum of Noisy Speech Signal, obtain the frequency response function of S filter;

The 7th step:, with fixed beam former the signal of each passage of each subarray is carried out wave beam respectively and form for the signal of two subarrays;

The 8th step: the beamformer output with two subarrays carries out low-pass filtering and high-pass filtering respectively;

The 9th step: the beamformer output to filtered two subarrays carries out spectrum-subtraction or the processing of rearmounted Wiener filtering method, realizes that voice strengthen;

The tenth step: with the wave beam overlap-add after the two-way enhancing, carry out inversefouriertransform, the voice signal after obtaining strengthening in the time domain.

The present invention has following advantage:

1) nested subarray has frequency response preferably to wide band space voice signal;

2) array structure is simple, utilizes public array element to reduce the size of array, and the computational complexity of algorithm is less;

3) adopt hyperchannel post-filtering voice enhancement algorithm only the HFS of target voice signal to be carried out enhancement process, the problem of having avoided the post-filtering voice enhancement algorithm that the voice signal of low-frequency range is strengthened the property and descended;

4) algorithm is easy to realize that calculated amount is little, is applicable to PC platform and embedded platform.

Description of drawings

Fig. 1. typical sound enhancement method block diagram

Fig. 2. amplitude spectrum subtraction sound enhancement method process flow diagram

Fig. 3. postpone-add up the Beam-former process flow diagram

Fig. 4. rearmounted adaptive wiener filter sound enhancement method process flow diagram

Fig. 5. based on the post-filtering and the spectrum-subtraction associating sound enhancement method process flow diagram of nested subarray

Fig. 6. nested subarray design drawing

Embodiment

Based on the FB(flow block) of the post-filtering of nested subarray and spectrum-subtraction associating sound enhancement method as shown in Figure 5, wherein by multi-channel signal acquiring, delay compensation, wave beam form, rearmounted auto adapted filtering four parts form.Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.The implementation case does not limit the present invention, for those skilled in the art, under the prerequisite that does not break away from the principle of the invention, can also make some improvement and variation, and these improvement and variation also should be considered as within protection scope of the present invention.

This enforcement safe operation is on ordinary PC, and concrete configuration is as follows:

CPU：

?2.80GHz

Internal memory: 1GHz

Operating system: Windows XP Professional Edition

Running environment:

MATLAB R2006b

Adopt case study on implementation of the present invention, at sound source characteristic in the indoor environment and noise field characteristic, adopt scattered noise field (Diffuse Noise Field) model and nested subarray (Harmonically NestedSubarrays, HNSA) model carries out modeling to the hyperchannel noisy speech signal in the actual environment.Gather voice signal in the space by the array of two subarray nested structures being made up of 7 omni-directional microphone, each subarray comprises 5 array elements, and then M=5 uses

With

The signal of representing a certain passage of little subarray (Small) and big subarray (Large) respectively, and i=1 ..., 5, j=1 ..., 5.Because nested property, wherein the part microphone channel is shared:

x_{S 1}^{'} [n] = x_{L 2}^{'} [n],

?

x_{s 3}^{'} [n] = x_{L 3}^{'} [n],

?

x_{s 5}^{'} [n] = x_{L}^{4} [n] - - - (18)

For formula (5) and the given signal model of formula (6), behind the compensation of delay, pass through Fourier transform again, the frequency-region signal expression formula of two a certain passages of subarray:

X_{Si} (f) = S (f) + η_{Si} (f) e^{j \frac{2 π}{N} f τ_{si}} - - - (19)

X_{Lj} (f) = S (f) + η_{Lj} (f) e^{j \frac{2 π}{N} {fτ}_{Lj}} - - - (20)

Wherein, S (f) is the Fourier transform of clean speech signal, η _Si(f) and η _Lj(f) Fourier transform of the noise of difference two subarray i passages and j passage, N is a frame length.

Big or small two subarrays are done the wave beam that adds up respectively to be formed:

Y_{S} (f) = \frac{1}{5} Σ_{i = 1}^{5} X_{Si} (f) - - - (21)

Y_{L} (f) = \frac{1}{5} Σ_{j = 1}^{5} X_{Lj} (f) - - - (22)

Wave beam is formed output Y _S(f) and Y _L(f) respectively by high pass (HP) FIR wave filter and low pass (LP) FIR wave filter, obtain

With

Wide band voice signal is divided into two frequency ranges to be handled with different voice enhancement algorithms respectively.

For low frequency signal, adopt spectrum-subtraction as shown in Figure 4 to carry out the denoising enhancing:

| {\hat{S}}_{L} (f) | = | {\hat{Y}}_{L}^{'} (f) | - ζ (f) - - - (23)

Wherein,

Be the estimated value through the target voice signal after the spectrum-subtraction denoising, ζ (f) adopts the amplitude mean value of voice activity detection method in the noise signal of non-speech segment estimation.

And for high-frequency signal, adopt as Fig. 6 and rearmounted adaptive wiener filter method shown in Figure 1 and carry out the voice enhancing.For any two passage i and j in the subarray, i ≠ j, the autopower spectral density and the cross-spectral density of Noisy Speech Signal are respectively:

Φ_{X_{i} X_{i}} (f) = Φ_{ss} (f) + Φ_{ηiηi} (f) - - - (24)

Φ_{X_{i} X_{j}} (f) = E {X_{i} (f) X_{j}^{*} (f)} = Φ_{ss} (f) + Φ_{sηi} (f) + Φ_{sηj} (f) + Φ_{ηiηj} (f) - - - (25)

Based on three assumed conditions of aforementioned rearmounted adaptive wiener filter method, the noise signal of each passage is uncorrelated mutually, and also uncorrelated with sound-source signal, then:

Φ _sηi(f)＝Φ _sηj(f)＝Φ _ηiηj(f)＝0 (26)

And the power spectrum density of the noise signal that each microphone receives is identical, is defined as:

Φ _ηiηi(f)＝Φ _ηjηj(f)＝Φ _ηη(f) (27)

Then formula (24) and formula (25) can be rewritten as:

Φ_{X_{i} X_{i}} (f) = Φ_{ss} (f) + Φ_{ηη} (f) - - - (28)

Φ_{X_{i} X_{j}} = Φ_{ss} (f) - - - (29)

Wherein

{\hat{Φ}}_{X_{i} X_{i}} (f) = \frac{1}{M} Σ_{i = 1}^{M} {| X_{i} (f) |}^{2} - - - (30)

{\hat{Φ}}_{X_{i} X_{j}} (f) = \frac{2}{M (M - 1)} Σ_{i = 1}^{M - 1} Σ_{j = i + 1}^{M} X_{i} (f) X_{j}^{*} (f) - - - (31)

Consider the signal stationarity in short-term in the actual conditions, the length L of FFT is limited, thereby in the formula (25) back three can not be 0, but levels off to a plural number of 0.Because power spectrum signal Φ _Ss(f) may be arithmetic number only, so obtain:

In addition, the signal of each passage is handled by a kind of iteration smooth mode and is obtained.For a certain Frequency point k, the smoothing interval [k-p, k+p] that to define a length be 2p+1, then

{\hat{Φ}}_{ss} [k] = \frac{1}{2 p + 1} Σ_{l = - p}^{p} Φ_{ss} [k + l] - - - (33)

{\hat{Φ}}_{X_{i} X_{i}} [k] = \frac{1}{2 p + 1} Σ_{l = - p}^{p} Φ_{YY} [k + l] - - - (34)

\hat{H} [k] = \frac{{\hat{Φ}}_{ss} [k]}{{\hat{Φ}}_{X_{i} X_{i}} [k]} - - - (35)

Take all factors into consideration the relation between precision and the calculated amount, get p=1 or 2 usually.

Then by the output signal behind the Hi-pass filter

Pass through adaptive wiener filter again, the voice signal of the high band after being enhanced:

{\hat{S}}_{S} (f) = Y_{S}^{'} (f) \cdot \hat{H} (f) - - - (36)

The voice signal of high and low two frequency bands is carried out overlap-add Fourier synthesis (Fourier SynthesisOverlap-Add), convert the voice signal after strengthening in the time domain to

Claims

1, a kind of sound enhancement method that adopts the post-filtering spectrum-subtraction associating of nested subarray, the multicenter voice signal that is used for indoor environment strengthens, and it is characterized in that described method comprises:

1) two nested microphone arrays of even subarray of design are used for the collection of multi channel signals;

2) detect initial, the end caps of voice signal, estimate the power spectrum of pure noise signal;

3) estimated speech signal is in the time delay of each passage;

4) each passage voice signal is carried out delay compensation, with each passage voice signal in time domain alignment;

5) with Fourier transform each channel signal is transformed into frequency domain from time domain;

6) estimate the auto-power spectrum of clean speech signal and the auto-power spectrum of Noisy Speech Signal, obtain the frequency response function of S filter;

7), with fixed beam former the signal of each passage of each subarray is carried out wave beam respectively and form for the signal of two subarrays;

8) respectively the beamformer output of two subarrays is carried out low-pass filtering and high-pass filtering;

9) beamformer output to filtered two subarrays carries out spectrum-subtraction or the processing of rearmounted Wiener filtering method, realizes that voice strengthen;

10) with the wave beam overlap-add after the two-way enhancing, carry out inversefouriertransform, the voice signal after obtaining strengthening in the time domain.

2, the microphone array array structure of nested subarray according to claim 1, it is characterized in that step (1) is described, each subarray is all to be the fixing uniform linear array of spacing, and the spacing of big subarray is 2 times of boy's array pitch, and part array element can be shared.

3, the voice signal after two sub-array beam are formed according to claim 1 and 2 carries out low-pass filtering or high-pass filtering, it is characterized in that, step (8) is described, voice signal after each passage wave beam of big subarray formed carries out low-pass filtering, voice signal after each passage wave beam of little subarray formed carries out high-pass filtering, makes voice signal that frequency response preferably all be arranged on whole frequency band.

4, describedly with spectrum-subtraction and rearmounted S filter the beamformer output of two subarrays is carried out enhancement process respectively according to claim 1 or 3, it is characterized in that, step (9) is described, carry out spectrum subtraction with the beamformer output of power spectrum-subtraction after and handle, realize the enhancing of voice signal low frequency part low-pass filtering; Carry out filtering with the beamformer output of described rearmounted S filter after, realize the enhancing of voice signal HFS high-pass filtering.

5, the post-filtering spectrum-subtraction of the nested subarray of employing according to claim 1 and 2 associating sound enhancement method is characterized in that described multicenter voice signal comprises five passage voice signals at least.