CN103854660B

CN103854660B - A kind of four Mike's sound enhancement methods based on independent component analysis

Info

Publication number: CN103854660B
Application number: CN201410061180.4A
Authority: CN
Inventors: 张彦芳; 王芳; 周海瑞; 王犇; 朱冰
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2014-02-24
Filing date: 2014-02-24
Publication date: 2016-08-17
Anticipated expiration: 2034-02-24
Also published as: CN103854660A

Abstract

The invention discloses a kind of four Mike's sound enhancement methods based on independent component analysis, comprise the steps of four Mikes of use and form array, gather four road Noisy Speech Signals and be converted into frequency-region signal；The two paths of signals randomly choosed in four microphone arrays on same one side is as first group of input signal, using other two paths of signals as second group of input signal, estimates a separation matrix W respectively_f；Step 3: by the frequency-region signal obtained in step 1, utilize multiple signal classification algorithm to process, obtains source speech signal direction θ；Step 4: using source speech signal direction θ calculated in step 3 as reference information, extracts voice signal during often group separates signal from step 2；Step 5: carry out frequency sequence, the amplitude that carries out gained voice signal smooths, and utilizes a window function that the signal of side frequency carries out amplitude and smooths；Step 6: obtain the time-domain signal of voice signal through windowing and inverse Fourier transform.

Description

A kind of four Mike's sound enhancement methods based on independent component analysis

Technical field

A kind of method that the present invention relates to speech enhan-cement, a kind of four Mike's voices based on independent component analysis increase Strong method.

Background technology

In the communication system of reality, it is often necessary to a Mike gathers sound, during gathering, target voice is frequent Can be polluted by background noise.Therefore, the signals with noise that Mike collects needs, through speech-enhancement system, to remove noise Impact after play again, transmit or preserve.The research of voice enhancement algorithm has had the history of more than 40 year, be the earliest by A kind of algorithm that Schroeder in Bell laboratory proposes.A lot of people is had to propose other algorithms, wherein the most successively Spectrum-subtraction is the most most widely used algorithm, but spectrum-subtraction exists obvious defect, meeting while suppression noise Damage voice, and artificial introducing noise, e.g., " music " (musical) noise.

Voice enhancement algorithm based on single Mike can not improve voice quality and the intelligibility of speech, carrying of voice quality simultaneously Height is usually associated with the reduction of the intelligibility of speech.In recent years, voice enhancement algorithm based on microphone array is increasingly subject to Paying attention to, microphone array can utilize the spatial information of source location, therefore voice enhancement algorithm based on microphone array Noise can be suppressed in the case of not damaging voice.Voice enhancement algorithm based on microphone array has become as a kind of new Research tendency.More ripe voice enhancement algorithm based on microphone array has: adaptive beamforming algorithm, nothing Distortion minimization variance response beam forming, generalized sidelobe cancellation and generalized singular value decomposition scheduling algorithm.Existing based on In the voice enhancement algorithm of microphone array, microphone array is larger, and the space complexity of algorithm and time complexity are the most relatively High, it is impossible in flexible Application and portable communication device, such as mobile phone and transmitter receiver.

Summary of the invention

In order to overcome above-mentioned existing voice to strengthen the problem that algorithm exists, the invention provides one and divide based on independent element Four Mike's sound enhancement methods of analysis, can reduce the damage to voice during speech enhan-cement, reach good noise reduction Effect.

The technical scheme used of a kind of four Mike's sound enhancement methods based on independent component analysis of the present invention is:

Step 1: utilize the four road Noisy Speech Signals that microphone array gathers, signal is carried out pretreatment, through Fourier It is converted into frequency-region signal；

Step 2: using in four Mike's square formations with on two paths of signals as first group of input signal, additionally two paths of signals As second group of input signal, to often organizing pretreated frequency-region signal, utilize frequency domain independent composition analysis algorithm, pin Each frequency f in frequency domain is estimated a separation matrix W respectively_f, often group signal correspondence obtains including a road voice letter Number and the separation signal of a road noise signal；

Step 3: by the frequency-region signal obtained in step 1, utilize multiple signal classification algorithm to process, obtain language Tone signal source direction θ；

Step 4: will estimate in step 3 that the source speech signal direction θ obtained, as reference information, is used for from often organizing separation Signal extracts voice signal；

Step 5: the amplitude that carries out frequency domain speech signal smooths；

Step 6: obtain the time-domain signal of voice signal through windowing and inverse Fourier transform.

In step 4, utilize the source speech signal direction θ estimating to obtain in step 3 as reference information, from step 2 Extract voice signal during often group separates signal to comprise the steps:

1) the separation matrix W of frequency f is utilized_fCalculating source speech signal direction, each frequency is calculated two letters Number direction, source(f) andF (), frequency number is the half of Fourier transformation length；

2) the above-mentioned calculated signal source direction of all frequencies is carried out K mean cluster, be polymerized to two classes, utilize Source speech signal direction θ, the cluster conduct within selecting source speech signal direction θ angle to differ 10 degree The incident direction of voice signal；

3) frequency with a low credibility is rejected: for same frequency, if the incident direction of one of them signal and language Within tone signal source direction θ differs 10 °, and the incident direction of its another signal clusters apart from another Center less than 20 °, then this frequency is with a high credibility, the language that chosen distance source speech signal direction θ is near The incident direction of tone signal is as final source speech signal direction；Otherwise, it is determined that this frequency was for losing efficacy Frequency；

4) for inefficacy frequency, the dependency of adjacent frequency spectrum envelope is utilized again to extract:

First, the frequency spectrum of inefficacy frequency is calculatedHarmonic structure with effective frequencyDependency it With cor (f), computing formula is:

cor (f) = \underset{g}{Σ} \frac{{\hat{s}}_{i} (f) {\hat{s}}_{i} (g)}{| {\hat{s}}_{i} (f) | | {\hat{s}}_{i} (g) |},

Wherein g=..., 1/3f, 1/2f, 2f, 3f ...,

Wherein g is effective frequency,

If dependency sum cor (f) is more than 0.9, then extracting this frequency is effective frequency；

Above-mentioned iterative step is repeated. until iterations is more than 100 times, for final undrawn frequency, The frequency extracted being adjacent by described frequency calculates dependency, extracts.

In step 5, the voice signal extracted being carried out windowing to smooth, use Hanning window, windowed function is as follows:

{\tilde{w}}_{kl} (f) = \frac{1}{4} [w_{kl} (f - Δf) + {2 w}_{kl} (f) + w_{kl} (f + Δf)],

Wherein w_klF () is separation matrix W_fIn a row k l row element, Δ f is between the frequency of Fourier transformation Every, equal to signal sampling rate divided by Fourier transformation length.

Speech enhan-cement, as the pretreatment module of a lot of speech signal processing systems, has important meaning to the performance improving system Justice.Voice while suppression noise, can be produced by traditional voice enhancement algorithm based on single Mike, these algorithms Raw bigger damage, and the introducing noise that meeting is artificial.For present in voice enhancement algorithm based on single Mike this A little problems, the present invention proposes a kind of voice enhancement algorithm based on four Mikes, utilizes multiple signal classification algorithm The direction in (Multiple signal classification, MUSIC) estimated speech signal source, then utilizes frequency domain independently to become Analyze (Independent component analysis, ICA) algorithm and be individually separated two groups of signals, utilize and estimate gained language The direction in tone signal source, reconfigures separating signal, judges to obtain voice signal from two-way separation signal. Compared with original voice enhancement algorithm based on single Mike, four Mike speech enhan-cement sides based on independent component analysis Method, can under various noise circumstances preferable isolated voice signal, suppression noise while the most not to voice Signal causes damage.The algorithm that the present invention mentions can be applied with the communication apparatus such as mobile phone, transmitter receiver, reaches preferable Speech enhan-cement effect.

Accompanying drawing explanation

Fig. 1 is the placement location of microphone array in the present invention.

Fig. 2 is the present invention four Mike's sound enhancement method flow charts based on independent component analysis.

Fig. 3 is to the sort algorithm flow chart between independent component analysis isolated signal different frequency in the present invention.

Fig. 4 is the graph of a relation that in pretreatment, frame length moves with frame.

Fig. 5 is the data acquisition environment schematic diagram in embodiment.

Detailed description of the invention

As in figure 2 it is shown, the invention discloses a kind of four Mike's sound enhancement methods based on independent component analysis, including Following steps.

Step 1: as it is shown in figure 1, the four road Noisy Speech Signals utilizing microphone array to gather, signal is carried out pretreatment, It is converted into frequency-region signal through Fourier transform；

The signal s that i-th mike collects_i' (n), n=1,2 ..., i value 1～4, carry out framing such as Fig. 4, frame length is L, it is N that frame moves, and the overlap between frame is L-N.

After framing, signal being carried out windowing process, use a length of L Hanning window function w (n), after windowing, signal is:

s_i(n, m)=s_i' (n, m) * w (n) (1)

Wherein n is frequency point number n=0,1,2 ..., L-1 is frame number.

To signal s_i(n, m) carries out discrete Fourier transform (DFT), obtain correspondence frequency-region signal:

s_{i} (f, m) = Σ_{n = 0}^{L - 1} s_{i} (n, m) e^{- j \frac{2 π}{L} fn} - - - (2)

Wherein f is frequency, and m is frame number, and e is the nature truth of a matter, and j is imaginary unit.In order to statement below is convenient, Omit frame number, the result of step 1 is expressed as frequency-region signal s_i(f)

Step 2: by four microphone arrays with on two mikes, such as, No. 1 in Fig. 1 and No. 2 Mikes The signal that wind collects, after pretreatment, forms one group of input signal and carries out independent component analysis, for each frequency Point is calculated a separation matrix W_f, utilize W_fIsolated includes a road voice signal and a road noise signal Separate signal s₁(f)s₂(f)。

[\begin{matrix} y_{1} (f) \\ y_{2} (f) \end{matrix}] = A_{f} [\begin{matrix} s_{1} (f) \\ s_{2} (f) \end{matrix}] - - - (3)

Hybrid matrix A_fIt is unknown, it is considered that A_fIt it is full rank.The Fast Fixed-point independent element utilizing complex field divides Analysis algorithm (Fast fix-pointed ICA) algorithm estimates W_fSo that:

[\begin{matrix} s_{1} (f) \\ s_{2} (f) \end{matrix}] = W_{f} [\begin{matrix} y_{1} (f) \\ y_{2} (f) \end{matrix}] - - - (4)

Solve W_f, first structure cost function is as follows:

J_G(w)=E{G (| w^H _y|²)}

(5)

E{w^Hy²}=1

Wherein G:C^r→ R is a smooth function, w ∈ CⁿWeight vector matrix, is W_fIn row, Y=[y₁(f)y₂(f)].Need observation signal y is carried out whitening pretreatment so that it is meet during calculating E{yy^H}=I.

Assume between each signal source it is independent, meet E{ss^T}=O, s=[s₁(f)s₂(f)].According to Ku En- Plutarch (Kuhn-Tucker) condition, optimization above formula, wherein restrictive condition E{ | w^Hy|²}=| | w | |²=1 can be converted into down Formula

&dtri; E {G ({| w^{H} y |}^{2})} - β &dtri; E {{| w^{H} y |}^{2}} = 0 - - - (6)

Wherein β ∈ R.

▽E{G(|w^Hy|²) Jacobian matrix be calculated as follows:

\begin{matrix} {&dtri;}^{2} E {G ({| w^{H} y |}^{2})} = 2 E {({&dtri;}^{2} | {w^{H} y |}^{2}) g ({| w^{H} y |}^{2}) + 2 (&dtri; {{| w}^{H} y |}^{2}) (&dtri; \\ \approx 2 E {g ({| w^{H} y |}^{2}) + {| w^{H} y |}^{2} g^{' ({| w^{H} y |}^{2})} I} \end{matrix}, - - - (7)

β▽E{|w^Hy|²Jacobian matrix be β²E{|w^Hy|²}=2 β I, the therefore whole Jacobian matrix of formula (7) For:

J=2 (E{g (| w^Hy|²)+|w^Hy|²g′(|w^Hy|²)-β) I (8)

Utilizing newton (Newton) iterative method, the iterative formula that can obtain w is as follows:

w^{'} = w - \frac{E {y (w^{H} y) g ({| w^{H} y |}^{2})} - βw}{E {g ({| w^{H} y |}^{1}) + {| w^{H} y |}^{2} g^{'} ({| w^{H} y |}^{5})} - β} - - - (9)

w_{new} = \frac{w^{'}}{| | w | |}

Above formula is utilized to estimate the vectorial w in W matrix respectively_i, i=1 ... n.In order to avoid finding same local Big value, needs to go phase with Glan-Schmidt-similar (Gram-Schmidt-like) in iterative process each time Close algorithm pairCarry out decorrelation.

Step 3: four road voice signals of step 1 are carried out multiple signal classification, utilizes MUSIC algorithm to estimate voice Signal source direction includes: four Mikes of step 1 are divided the signal collected after pretreatment, as multiple signal The input of sorting algorithm.The basic thought of multiple signal classification algorithm is to be carried out by the covariance matrix of array input signal Feature decomposition, obtains a signal subspace and an orthogonal to that noise subspace.Utilize voice signal and noise The character of Orthogonal Subspaces carrys out the incident direction of estimated speech signal.

The covariance matrix R of estimation array input signal:

RightCarry out Eigenvalues Decomposition, obtain noise subspace feature matrixThe guiding of signal subspace is vowed Amount a^H(θ) withOrthogonal, but due to the existence of noise, both are the most orthogonal, can be made by searchMinimum θ is exactly corresponding source speech signal direction θ.

Step 4: as it is shown on figure 3, having an important problem in frequency domain independent composition analysis algorithm is exactly from separating letter Voice signal is extracted in number.In the present invention a length of the 1024 of Fourier transformation, therefore have 512 frequencies；Frequency domain Each frequency is implemented by independent composition analysis algorithm respectively, when separate two paths of signals n=2, for different frequent points i, J, i ≠ j, separates through frequency domain independent component analysis (ICA) algorithm, can obtain:

\begin{matrix} [\begin{matrix} {\hat{S}}_{1} (i) \\ {\hat{S}}_{2} (i) \end{matrix}] = W_{i} y (i) \end{matrix} [\begin{matrix} {\hat{S}}_{1} (j) \\ {\hat{S}}_{2} (j) \end{matrix}] = W_{j} y (j) - - - (12)

Because to the signal not priori of isolated, we judge where signal is voice signal, it is therefore desirable to FromIn determine which component is voice signal.Utilize the separation matrix W of each frequency_i, estimate two points Signal source direction from signal

{\hat{θ}}_{1} (i) = \arccos \frac{\arg ({[W_{i}^{- 1}]}_{11} / {[W^{- 1}]}_{21})}{2 π {fc}^{- 1} d} - - - (13)

{\hat{θ}}_{2} (i) = \arccos \frac{\arg ({[W_{i}^{- 1}]}_{21} / {[W^{- 1}]}_{22})}{2 π {fc}^{- 1} d} - - - (14)

Different frequent points is estimated that the direction obtained clusters, with reference to θ, selects the class in voice signal direction.If(i) be The direction of voice signal is thenI () is voice signal, combined by the voice signal of all frequencies and obtain voice signal. Concrete algorithm steps is as follows:

1) the separation matrix W of frequency f is utilized_fCalculating source speech signal direction, each frequency is calculated two letters Number direction, source(f) and(f), in an embodiment, a length of the 1024 of Fourier transformation, therefore altogether There are 512 frequencies；

2) above-mentioned all frequencies are estimated that the signal source direction obtained carries out K mean cluster, be polymerized to two classes, utilize Source speech signal direction θ, selects cluster the entering as voice signal near the θ of source speech signal direction Penetrate direction；

3) frequency with a low credibility is rejected: for same frequency, if one of them signal source direction differs with θ Within 10 °, and another signal source direction apart from the center of another class less than 20 °, then this frequency With a high credibility, the near signal source direction of chosen distance signal source direction θ is as source speech signal direction； Otherwise, it is determined that this frequency is inefficacy frequency；

cor (f) = \underset{g}{Σ} \frac{{\hat{s}}_{i} (f) {\hat{s}}_{i} (g)}{| {\hat{s}}_{i} (f) | | {\hat{s}}_{i} (g) |},

Wherein g=..., 1/3f, 1/2f, 2f, 3f ...,

Wherein g is effective frequency, if dependency sum cor (f) is more than 0.9, then extracts this frequency for effectively frequency Point；

5) above-mentioned iterative step is repeated, until iterations is more than 100 times, for final undrawn frequency, The frequency extracted being adjacent by described frequency calculates dependency, extracts.All frequencies The signal source direction that voice signal is corresponding isF () then separation matrix is constant, if the voice of this frequency The signal source direction that signal is corresponding isF () is then by separation matrix W_fFirst row and secondary series exchange.

Step 5: amplitude smooths.To processing the new separation matrix obtained through step 4, between adjacent frequency, use one Individual window function carries out amplitude and smooths；Discrete frequency cyclicity refers to when frequency domain processes signal, L frequency of frequency domain, Sample frequency is f_s, frequency interval is f_s/ L, the signal at time-domain representation is to be L/f in the cycle_sPeriodic signal, only When choosing a cycle, sharp thorn can be produced at cycle intersection.Applying wider a kind of algorithm is smooth spectrum.Pass through To the element w of separation matrix_klF () is multiplied a window function, such as, Hanning window.

{\tilde{w}}_{kl} (f) = Σ_{φ = 0}^{f_{s} - Δf} g (φ) w_{kl} (f - φ) - - - (15)

Wherein, w_klF () is separation matrix W_fRow k l column element, g (f) is window function, Δ f=f_s/L.If G (f) is Hanning window, then above formula can be expressed as follows:

{\tilde{w}}_{kl} (f) = \frac{1}{4} [w_{kl} (f - Δf) + {2 w}_{kl} (f) + w_{kl} (f + Δf)] .

W_fIn element w_klF () obtains new separation matrix through smooth combination afterwardsVoice can be calculated according to formula (4) Signal:

[\begin{matrix} s_{1} (f) \\ s_{2} (f) \end{matrix}] = {\tilde{W}}_{f} [\begin{matrix} y_{1} (f) \\ y_{2} (f) \end{matrix}] - - - (4)

WhereinF) it is the voice signal estimating in step 4 to obtain,F () is noise signal.

Step 6: to estimating the voice signal that obtains in step 5Carry out inverse Fourier transform and windowing obtains voice The time-domain signal of signal.

To estimating that the voice signal that obtains carries out inverse Fourier transform:

{\hat{s}}_{i} (n) = \frac{1}{L} Σ_{f = 0}^{L - 1} {\hat{s}}_{1} (f) * e^{j \frac{2 π}{L} fn} - - - (16)

To estimating that the voice signal obtained carries out windowing process, window function is identical with the window function in preprocessing process:

{\hat{s}}_{1}^{'} (n) = {\hat{s}}_{1} (n) * w (n) - - - (17)

For the enhancing voice finally given.

Embodiment

As it is shown in figure 5, at a long 10m, wide 5m, in the room of high 3m, as figure places microphone array and letter Number source, is at level height 1.1m, plan-position coordinate as it can be seen, the centre coordinate of microphone array is (5,2.5), Mike's spacing is 3cm, and source speech signal is positioned at (5,2.4), and other three signal sources are noise signal source: (2.5,4.5) believe Making an uproar the ratio white noise for 0dB, (2.4,0.6) signal to noise ratio is the white noise of 0dB, and (8,3.4) signal to noise ratio is to make an uproar in the pink of 5dB Sound.In that noise circumstance, gather signals with noise, utilize voice enhancement algorithm in this paper to strengthen and process, Wherein a length of the 1024 of Fourier transform, it is 512 that frame moves, i.e. L=1024, N=512 in Fig. 4.Utilize this algorithm After the enhancing obtained, the signal to noise ratio of voice is 13.5dB, utilizes traditional minimum variance undistorted response (MVDR) wave beam After the enhancing that molding algorithm obtains, voice signal to noise ratio is 9.6dB, it is seen that the noise suppression effect of algorithm is fine.

Claims

1. four Mike's sound enhancement methods based on independent component analysis, it is characterised in that comprise the steps of

Step 1: use four Mikes to form rectangular array, gather four road Noisy Speech Signals, signal is carried out pretreatment, It is converted into frequency-region signal through Fourier transformation；

Step 2: randomly choose in four microphone arrays with on two paths of signals as first group of input signal, will be separately Outer two paths of signals is as second group of input signal, to often organizing pretreated frequency-region signal, utilizes frequency domain independent element to divide Analysis algorithm, estimates a separation matrix W respectively for each frequency f in frequency domain_f, often group signal correspondence is included One road voice signal and the separation signal of a road noise signal；

Step 4: using source speech signal direction θ calculated in step 3 as reference information, every group from step 2 Separate and signal extracts voice signal；

Step 5: carry out frequency sequence, the amplitude that carries out gained voice signal smooths, and utilizes a window function to adjacent frequency The signal of rate carries out amplitude and smooths；

2. according to four Mike's sound enhancement methods based on independent component analysis a kind of described in claim 1, its feature Be, in step 4, utilize the source speech signal direction θ estimating to obtain in step 3 as reference information, from step 2 Extract voice signal during often group separates signal to comprise the steps:

1) the separation matrix W of frequency f is utilized_fCalculating source speech signal direction, each frequency is calculated two letters Number direction, sourceWithFrequency number is the half of Fourier transformation length；

c o r (f) = \underset{g}{Σ} \frac{{\hat{s}}_{i} (f) {\hat{s}}_{i} (g)}{| {\hat{s}}_{i} (f) | | {\hat{s}}_{i} (g) |},

Wherein g=..., 1/3f, 1/2f, 2f, 3f ...,

Wherein g is effective frequency,

Above-mentioned iterative step is repeated, until iterations is more than 100 times, for final undrawn frequency Point, the frequency extracted being adjacent by described frequency calculates dependency, extracts.

3., according to a kind of based on independent component analysis the four Mike's sound enhancement methods described in right 1, its feature exists In, step 5 carrying out windowing to the voice signal extracted and smooths, use Hanning window, windowed function is as follows:

{\tilde{w}}_{k l} (f) = \frac{1}{4} [w_{k l} (f - Δ f) + 2 w_{k l} (f) + w_{k l} (f + Δ f)],