CN104715758A

CN104715758A - Branched processing array type speech positioning and enhancement method

Info

Publication number: CN104715758A
Application number: CN201510066532.XA
Authority: CN
Inventors: 樊滨温; 王兆阳; 王明江; 刘明; 蒋贤慧; 张健; 曹彬; 曾伟浩
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2015-02-06
Filing date: 2015-02-06
Publication date: 2015-06-17

Abstract

The invention provides a branched processing array type speech positioning and enhancement method. The method relates to the basic structure of a generalized sidelobe canceller, the design of a block matrix, the design of a component filter and an external Wiener filtering part. According to the method, a component structure is used for reference, a rear Wiener filter is additionally arranged, a part of adaptive technology is used, the denoising performance of an algorithm is guaranteed, noncoherent noise and coherent noise are effectively restrained, the rate of convergence of the algorithm is increased, and the operational complexity is lowered. Compared with a microphone array speech enhancement system of a traditional generalized sidelobe canceller, the improved speed enhancement system has a higher output signal-to-noise ratio. A simulation experiment test structure shows that compared with the microphone array speech enhancement system based on the full-band generalized sidelobe canceller, the method can achieve the higher output signal-to-noise ratio.

Description

A kind of branch's process formula array voice location and Enhancement Method

Technical field

The present invention relates to voice process technology field, particularly relate to a kind of voice location and Enhancement Method.

Background technology

Adaptive beamformer technology is widely used in Aeronautics and Astronautics, radar and communication system.Adaptive beamformer, by regulating the weighted amplitude of each array element and weighted phases, changes the directional diagram of array, makes the main lobe of array antenna aim at desired user, and zero point of array antenna and secondary lobe aim at other users simultaneously, thus improve received signal to noise ratio.

Linear constraint minimal variance (LCMV) criterion is the most frequently used Adaptive beamformer method.Generalized Sidelobe Canceller (GSC) be a kind of equivalence of LCMV realize structure, the constrained optimization question variation of Adaptive beamformer is unconfined optimization problem by GSC structure, be divided into self-adaptation and non-self-adapting two branch roads, be called main branch road and auxiliary branch, require that wanted signal can only pass through from the main branch road of non-self-adapting, and only containing interference and noise component in adaptive auxiliary branch, when high s/n ratio, some wanted signal is leaked in auxiliary branch, has occurred signal cancellation phenomenon.

Summary of the invention

In order to solve the problems of the prior art, the present invention proposes a kind of branch process formula array voice location and Enhancement Method, adopt transfer function generalized sidelobe canceller structure and rearmounted S filter structure, the input signal of its adaptation module is decomposed in component filter structure and processes, and hyperchannel Wiener filtering module is introduced the non-self-adapting branch road of transport function, can more effectively suppress coherent noise and noncoherent noise, and the effective speed of convergence accelerating speech-enhancement system.

The present invention is achieved through the following technical solutions:

A kind of branch's process formula array voice location and Enhancement Method, comprise the following steps:

Step 1: acoustic pickup is arranged with linear array, acoustic pickup number more than 3 or 3, accepts external sound signal;

Step 2: utilize the cross correlation function of every road signal to obtain the time delay of Received signal strength, carries out delay compensation to every road signal and to align every road signal, utilizes the time delay obtained accurately can estimate Sounnd source direction, the location of realize target source of students;

Step 3: the digital signal after compensation comprises noise, sends into the self-adaptation blocking matrix module after optimizing, separates the noise signal on a road fewer than total way;

Step 4: this noise signal is sent into adaptive component filters, obtains directivity noisiness after filtering;

Step 5: this step and step 4 are carried out simultaneously, sends the signal after compensating into the rearmounted S filter module of hyperchannel, obtains elementary denoising result, now still comprise directivity noise;

Step 6: the result of step 5 and step 4 result are done difference and drawn final speech de-noising signal.

The invention has the beneficial effects as follows: the method mirror component structure that the present invention proposes, additional rearmounted S filter, utilize partial adaptivity technology, ensure that the denoising performance of algorithm, effectively suppress noncoherent noise and coherent noise, accelerate convergence of algorithm speed, reduce computational complexity, relative to the microphone array speech enhancement system of traditional Generalized Sidelobe Canceller, the speech-enhancement system improved is adopted to have higher output signal-to-noise ratio.

Accompanying drawing explanation

Fig. 1 is branch of the present invention process formula array voice location and Enhancement Method process flow diagram;

Fig. 2 is self-adaptation blocking matrix design drawing of the present invention;

Fig. 3 is the sample modulation design of filter figure of M passage.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

Accompanying drawing 1 is branch of the present invention process formula array voice location and Enhancement Method process flow diagram, and method of the present invention comprises the following steps:

Step 1: acoustic pickup is arranged with linear array, acoustic pickup number more than 3 or 3, accepts external sound signal; For acoustic pickup number 4, interval 2.25cm, accept extraneous voice analog signal then AD conversion be digital signal.

Step 2: because sound source may from different directions, each road acknowledge(ment) signal sometimes.Utilize the cross correlation function of every road signal to obtain time delay, delay compensation is carried out to every road signal and to align every road signal.Utilize the time delay obtained accurately can estimate Sounnd source direction, the location of realize target sound source.

Step 3: the digital signal after compensation comprises noise, sends into the self-adaptation blocking matrix module after optimizing, separates the noise signal on a road fewer than total way.

Step 4: this noise signal is sent into adaptive component filters, obtains directivity noisiness after filtering.

Step 5: this step and step 4 are carried out simultaneously, sends the signal after compensating into the rearmounted Wiener filtering module of hyperchannel, obtains elementary denoising result (still comprising directivity noise).

Step 6: a upper result and step 4 result are done difference and drawn final speech de-noising signal, reaches more significantly speech enhan-cement effect.

Wherein, step 2 is specially: by four road time-domain signals by x in Fourier transform to frequency domain _i(n)=s (n-τ _i)+v _i(n) → X _i(ω)=Se ^{j ω τ i}+ V, then utilizes calculate cross-power spectrum peak, obtain time delay τ.Sound in atmosphere velocity of propagation v is about 340m/s, so orient sound source Plane Angle is finally respectively all the other sound channel signals are postponed τ respectively, 2 τ, 3 τ ..., obtain the speech data alignd.

Step 3 replaces corresponding transition function by utilizing the ratio of transition function, designs a kind of self-adaptation blocking matrix and meets output only noisy requirement.In accompanying drawing 2, the matrix of design meets the requirements, and the component of signal that non-self-adapting branch road exports is not original signal S, but the component of signal A that first row array element receives _is, wherein A _ithe coefficient of signal after Short Time Fourier Transform.Acknowledge(ment) signal can be expressed as x _i=a _i* s+v _ii=1,2 ..., after the Fourier transform of M both sides, have X _i=A _is+V _i,wherein v _ibe noise, * represents convolution, and M is acoustic pickup number.

The basic thought of step 4 self-adaptation component filters group is that the noise signal of input is carried out band splitting, and by the component signal that signal decomposition becomes to be positioned on different frequency bands, the feature then for each component signal processes respectively.Self-adaptation component filters is made up of analysis filter and composite filter.Input signal bank of filters H by analysis in accompanying drawing 3 _iz () resolves into a series of component signal y _in (), becomes y after the process such as coding, compression, transmission _i' (n), finally by synthesis filter banks G _iz () reconstructs signal x (n).

Its analysis filter transfer function H _i(z) and synthesis filter transport function G _i(z), i=1,2 ..., M carries out cosine modulation acquisition respectively by low-pass filter l (z) having linear phase characteristic to.Therefore analysis filter H _i(z) and composite filter G _iz the impulse response of () is as follows:

h_{i} (k) = 2 l (k) \cos ((2 i - 1) \frac{π}{2 M} (k - \frac{K}{2}) + {(- 1)}^{k} π / 4)

g_{i} (k) = 2 l (k) \cos ((2 i - 1) \frac{π}{2 M} (k - \frac{K}{2}) - {(- 1)}^{k} π / 4)

K=0 in formula, 1 ..., K-1, i=1,2 ..., M, M are decomposition channels numbers, and K is filter order.This shows that analysis filter and composite filter have following relation g _ithe conjugate transpose of (z).

In step 5 coefficient of S filter be according to each Channel Received Signal between auto-correlation and cross-correlation obtain, namely the coefficient of S filter is adaptive change.Noisy speech signal after postponing-adding up obtains the target voice estimated signal under minimum mean square error criterion after Wiener filtering.This method obtains good de-noising performance by relatively less microphone number under the environment of uncorrelated noise.

The invention has the beneficial effects as follows: the method mirror component structure that the present invention proposes, additional rearmounted S filter, utilize partial adaptivity technology, ensure that the denoising performance of algorithm, effectively suppress noncoherent noise and coherent noise, accelerate convergence of algorithm speed, reduce computational complexity, relative to the microphone array speech enhancement system of traditional Generalized Sidelobe Canceller, the speech-enhancement system improved is adopted to have higher output signal-to-noise ratio.Emulation experiment test structure shows, relative to the microphone array speech enhancement system based on full band generalized sidelobe canceller, method of the present invention has higher output signal-to-noise ratio.

Above content is in conjunction with concrete preferred implementation further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, some simple deduction or replace can also be made, all should be considered as belonging to protection scope of the present invention.

Claims

1. branch's process formula array voice location and an Enhancement Method, comprises the following steps:

2. method according to claim 1, is characterized in that: the acoustic pickup in described step 1 is four, and interval d=2.25cm arranges in a line, accept voice analog signal then AD conversion be digital signal.

3. method according to claim 1, is characterized in that: described step 2 is specially: every road time-domain signal by Fourier transform in frequency domain

x_{i} (n) = s (n - τ_{i}) + v_{i} (n) &RightArrow; X_{i} (ω) = {Se}^{jω τ_{i}} + V,

Then utilize calculate cross-power spectrum peak, obtain time delay τ, orienting sound source Plane Angle is finally respectively all the other sound channel signals are postponed τ respectively, 2 τ, 3 τ ..., obtain the speech data alignd.

4. method according to claim 1, is characterized in that: the self-adaptation blocking matrix in described step 3 replaces corresponding transition function to design by utilizing the ratio of transition function, satisfied output only noisy requirement, and described self-adaptation blocking matrix is:

The component of signal that non-self-adapting branch road exports is not original signal S, but the component of signal A that first row array element receives _is, wherein A _ibe the coefficient of signal after Short Time Fourier Transform, Received signal strength can be expressed as x _i=a _i* s+v _ii=1,2 ..., after the Fourier transform of M both sides, have X _i=A _is+V _i,wherein v _ibe noise, * represents convolution, and M is acoustic pickup number.

5. method according to claim 1, it is characterized in that: the noise signal of input is carried out band splitting by the self-adaptation component filters group in described step 4, by the component signal that signal decomposition becomes to be positioned on different frequency bands, the feature then for each component signal processes respectively.Self-adaptation component filters is made up of analysis filter and composite filter, input noise signal bank of filters H by analysis _iz () resolves into a series of component signal y _in (), becomes y after the process such as coding, compression, transmission _i' (n), finally by synthesis filter banks G _iz () reconstructs signal x (n); Its analysis filter transfer function H _i(z) and synthesis filter transport function G _i(z), i=1,2 ..., M carries out cosine modulation acquisition respectively by the low-pass filter having linear phase characteristic to.

6. method according to claim 5, is characterized in that: analysis filter H _i(z) and composite filter G _iz the impulse response of () is as follows:

h_{i} (k) = 2 l (k) \cos ((2 i - 1) \frac{π}{2 M} (k - \frac{K}{2}) + {(- 1)}^{k} π / 4)

g_{i} (k) = 2 l (k) \cos ((2 i - 1) \frac{π}{2 M} (k - \frac{K}{2}) - {(- 1)}^{k} π / 4)

7. method according to claim 1, is characterized in that: in described step 5 coefficient of S filter be according to each Channel Received Signal between auto-correlation and cross-correlation obtain, namely the coefficient of S filter is adaptive change; Noisy speech signal after postponing-adding up obtains the target voice estimated signal under minimum mean square error criterion after Wiener filtering.