CN102664023A

CN102664023A - Method for optimizing speech enhancement of microphone array

Info

Publication number: CN102664023A
Application number: CN2012101277578A
Authority: CN
Inventors: 王辉; 张玲华
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2012-04-26
Filing date: 2012-04-26
Publication date: 2012-09-12

Abstract

The invention discloses a method for optimizing speech enhancement of a microphone array, belongs to the technical field of speech signal processing and relates to speech enhancement technologies, in particular to the speech enhancement of the microphone array. According to the method, firstly a generalized sidelobe canceller (GSC) structure is utilized, and in order to solve the problem of speech leakage of the generalized sidelobe canceller caused by a wrong direction of arrival of signals, a blockmatrix is subjected to a self-adaption adjustment by means of a characteristic of relevance between the output of the GSC and the output of the blockmatrix, so that the blockmatrix can approach to the direction of a target speech, the target speech leakage in the blockmatrix is reduced, and the robustness of a system is enhanced.

Description

The optimization method that a kind of microphone array voice strengthen

Technical field

The present invention relates to speech enhancement technique, particularly the microphone array voice strengthen, and belong to the voice process technology field.

Background technology

It is the research focus of field of voice signal that voice strengthen always, and the introducing that microphone array is handled provides a new approach to carry out the voice enhancing.Microphone array not only provides the information of signal on time domain and frequency domain; A spatial domain also is provided, the signal from the space different directions has been carried out the sky time-frequency combination handle, it is a theoretical foundation with the algorithm of aerial array; Method in conjunction with the single channel speech processes; With the mode of spatial filter, the sound-source signal locus is provided, suppress the purpose of undesired signal when reaching the leaching sound-source signal.

The target that voice strengthen is to guarantee under the prerequisite of not damaging the target speech structure, the noise that exists in minimizing even the elimination acknowledge(ment) signal, thereby the sharpness of raising voice.

The realization that the microphone array voice strengthen can be divided into the auditory localization stage and voice strengthen the stage.In the auditory localization stage, system obtains the azimuth information on speaker's the space; Strengthen the stage at voice, utilize acquired azimuth information, adopt array signal processing method, the information of leaching Sounnd source direction suppresses the interfere information on other directions, realizes that voice strengthen.

The microphone array voice strengthen combination array treatment technology, and the research through a large amount of has mainly formed three kinds of main flow algorithms: the wave beam forming method of fixed beam forming method, adaptive beam forming method and postfilter at present.Wherein the adaptive beam forming method of GSC (Generalized Sidelobe Canceller, generalized sidelobe Canceller) structure relies on low calculated amount high-performance, therefore widely uses.But the problem that adopts ARRAY PROCESSING to occur the most easily is, when the target signal direction evaluated error occurring, causes the leakage of echo signal easily, has a strong impact on the performance of voice enhancing.In the GSC structure; Main constructing module is BM (Blocking Matrix, a blocking matrix) module, and it can utilize the directional information that estimates; Filtering target direction signal is so focus on the optimization of blocking matrix to the optimization of microphone array voice enhancement algorithm.

Summary of the invention

The object of the present invention is to provide a kind of optimization method of microphone array voice enhancement algorithm, improve the adaptive faculty of blocking matrix, the voice that blocking matrix leaks are reduced, improve voice enhanced robust property.

The technical solution that realizes the object of the invention is: the optimization method that a kind of microphone array voice strengthen, and step is following:

The first step, handle early stage, promptly accomplish the input array voice signal carried out pre-emphasis, divides frame and windowing process after, utilize delay time estimation method to obtain the directional information of sound source, utilize directional information to obtain the steering vector of signal;

Second step, utilize microphone array to build the GSC structural model, realize that at first fixed beam forms algorithm; Be different from conventional GSC structure treatment, it is with the FBF separated into two parts: signal alignment forms with wave beam, at first utilizes the directional information that early stage, processing obtained to carry out signal alignment; Signal alignment is to utilize the steering vector that obtains in aforementioned, will have the microphone array signals of direction time delay to become from array normal direction input signal, so in theory; Microphone array will be from 0 ° of direction incident; Signal after the alignment is divided into two-way, and one the tunnel proceeds the fixed beam forming process, adds up and asks average; Another road gets into the blocking matrix module echo signal is blocked;

In the 3rd step, realize the blocking matrix module, because through carrying out signal alignment in second step, sense does in theory

0 °, when adopting the even battle array of straight line, blocking matrix adopts following form;

Wherein

Figure 2012101277578100002DEST_PATH_IMAGE002

Be blocking matrix,

For blocking direction is that signal is estimated direction, dBe array element distance,

Be wave length of sound, MBe the input signal number, and though this moment arrival direction why, initial

All be 0, through signal input MC module behind the blocking matrix;

The 4th step; Realize MC (Multiple-input Canceller, many input offsets device) module, in theory by FBF (Fixed BeamFomer; Fixed beam former) output deducts BM output; With obtaining pure target speech, consider at this moment to have speech leakage when the direction misjudgment takes place that the output of MC is not temporarily as final output;

In the 5th step, the output of extracting MC utilizes the correlativity between MC output and the BM output; When related function is big, exists and leak voice, pair correlation function value setting threshold; When surpassing threshold value, be 0 as initial parameter with

, set the adjustment step-length; Be reduced to the adjustment direction with correlation function value; Through doubly taking advantage of mode to adjust parameter, finally make correlation function value less than threshold value, at last just at MC module output voice.

The present invention compared with prior art, its advantage is: weakened the influence that the direction evaluated error strengthens microphone array voice, improved the robustness of adaptive beam former.The direction that blocking matrix is pointed to converges on true directions, reduces target speech and leaks, and improves the output signal-to-noise ratio and the sharpness of output voice, and the Beam-former that overcomes the GSC structure is depended on the weakness that target signal direction is estimated unduly.

Below in conjunction with accompanying drawing the present invention is described in further detail.

Description of drawings

Fig. 1 is a GSC structure microphone array voice enhancement algorithm synoptic diagram among the present invention.

Embodiment

In conjunction with Fig. 1, the microphone array voice of GSC structure of the present invention strengthen optimization method, and step is following:

The first step is at first carried out pre-service, and is promptly right MThe road input speech signal

Figure 2012101277578100002DEST_PATH_IMAGE006

After carrying out pre-emphasis, dividing frame and windowing process, utilize time of arrival (toa) different, the phase-shift characterisitc that exists between the sampled signal of each microphone array element estimates the DOA (Direction Of Arrival, arrival direction) of signal.Detailed process is following:

(1) voice signal is carried out pre-service, pre emphasis factor is 0.96, and the 16kHz sampling divides frame by 512 sampled points, and it is 256 sampled points that frame moves, and uses Hamming window to carry out windowing process afterwards;

(2) correlativity between the comparison two array element frame signals is calculated phase shift and time delay between the two-way adjacent signals, estimates the DOA of signal; The DOA that utilization obtains; The array signal carries out angle compensation, makes signal DOA become the array normal direction, i.e. FBF step 1 among Fig. 1;

Second step; Realize FBF step 2; Signal added up ask average, the signal after obtaining fixed beam and forming is for two-way is up and down handled the output alignment; In actual the use, need carry out Q time delay to FBF output.Simultaneously with road under the input of the output among the FBF step 1.Realize the blocking matrix module, confirm initial blocking matrix, its matrix structure is as follows;

Wherein

Figure 2012101277578100002DEST_PATH_IMAGE009

Be blocking matrix,

Figure 2012101277578100002DEST_PATH_IMAGE010

Be the obstruction direction, dBe array element distance, Be the wave length of sound under the SF, dValue satisfies

Figure 2012101277578100002DEST_PATH_IMAGE011

, MBe the input signal number, and though this moment arrival direction why, initial

All be 0, through signal input MC module behind the blocking matrix.

The 3rd step; Realize the MC module; To BM output carrying out weighted sum;

Figure 2012101277578100002DEST_PATH_IMAGE012

is adaptive filter coefficient, and general value is 1.Deduct BM output by FBF output in theory, can obtain pure target speech.At this moment consider to have speech leakage when the direction misjudgment takes place, the output of MC temporarily not as final output, is carried out auto adapted filtering to the MC module simultaneously, reduces speech leakage;

In the 4th step, the output of extracting MC utilizes the correlativity between MC output and the BM output, when related function is big, exists and leaks voice.Pair correlation function value setting threshold, when surpassing threshold value, with

Be 0 as initial parameter, set the adjustment step-length μ, be reduced to the adjustment direction with correlation function value, adjust parameter through doubly taking advantage of mode, finally make correlation function value less than threshold value, at last just at MC module output voice.

During the misjudgment of GSC structure voice enhancement process generation direction; Blocking matrix can not the total blockage target speech; Cause the part voice through blocking matrix, cause the target speech of FBF output in later stage MC module and the target speech counteracting that the BM module is leaked, cause the loss of target speech.Microphone array adopts the uniform straight line array row, and the element in the blocking matrix can be parameter with the direction of arrival of signal.Export the related function between voice and the BM leakage target voice through calculating GSC, and setting threshold, as the foundation that starts adjustment blocking matrix parameter.Consider factors such as environment reverberation, the noise that passes through in the BM module can have certain correlativity with target speech, thus the threshold value setting of related function can not be too low can not be too high.When correlation function value is higher than threshold value, start adjustment blocking matrix parameter algorithm, the direction of arrival of signal that obtains with initial estimation is an initial parameter; Set the adjustment step-length, be reduced to the adjustment direction, adjust parameter through doubly taking advantage of mode with correlation function value; Finally make correlation function value less than threshold value; The blocking matrix pointing direction is tending towards target direction, reduces the speech leakage of blocking matrix, realizes reducing even eliminating speech leakage.

Claims

1. the optimization method that strengthens of microphone array voice is characterized in that comprising following steps:

Second step, utilize microphone array to build the GSC structural model, realize that at first fixed beam forms algorithm; Be different from conventional GSC structure treatment, it is with the FBF separated into two parts: signal alignment forms with wave beam, at first utilizes the directional information that early stage, processing obtained to carry out signal alignment; Signal alignment is to utilize the steering vector that obtains in the first step, will have the microphone array signals of direction time delay to become from array normal direction input signal, so in theory; Microphone array will be from 0 ° of direction incident; Signal after the alignment is divided into two-way, and one the tunnel proceeds the fixed beam forming process, adds up and asks average; Another road gets into the blocking matrix module echo signal is blocked;

The 3rd goes on foot, and realizes the blocking matrix module, because through carrying out signal alignment in second step, sense is 0 ° in theory, when adopting the even battle array of straight line, blocking matrix adopts following form:

B wherein ₀Be blocking matrix, θ ₀Be to block direction, d is an array element distance, and λ is a wave length of sound, and M is the input signal number, and though this moment arrival direction why, initial θ ₀All be 0, through signal input MC module behind the blocking matrix;

The 4th step, realize the MC module, deduct BM output by FBF output in theory, with obtaining pure target speech, consider at this moment to have speech leakage when the direction misjudgment takes place that the output of MC is not temporarily as final output;

In the 5th step, the output of extracting MC utilizes the correlativity between MC output and the BM output, when related function is big, exists and leaks voice, and pair correlation function value setting threshold is when surpassing threshold value, with θ ₀Be 0 as initial parameter, set the adjustment step-length, be reduced to the adjustment direction, adjust parameter, finally make correlation function value, at last just at MC module output voice less than threshold value through doubly taking advantage of mode with correlation function value.

2. the optimization method that microphone array voice according to claim 1 strengthen is characterized in that handling early stage, and detailed process is following:

The first step is carried out pre-service to voice signal, and pre emphasis factor is 0.96, with the 16kHz sampling, divides frame by 512 sampled points, and it is 50% that frame moves, and uses Hamming window to carry out windowing process afterwards;

Second step, utilize microphone array to receive signal, estimate signal direction information, generate the signal guide vector.

3. the optimization method that microphone array voice according to claim 1 strengthen is characterized in that building the GSC model, and detailed process is following:

The first step is split as two steps with the FBF process, at first carries out early stage and handles; Utilize resulting signal guide vector, signal is carried out alignment compensation, make the signal of array received become the array normal direction; Signal after will aliging then is divided into two-way, one tunnel input BM module, and the fixed beam forming process is proceeded on another road; Add up and ask average, obtain FBF output;

In second step,, be input as the signal after the said alignment based on the blocking matrix setting; Through signal and matrix multiple; Make blocking matrix block the signal on the estimating target direction, be output as at last, the M-1 road signal of exporting is synthesized 1 road signal except the signal on other directions of target direction;

The 3rd step; Realize the MC module, FBF is exported the output that deducts BM, promptly deduct the road signal that only contains interference with the road signal that comprises echo signal and interference; Last export target signal adopts sef-adapting filter further to reduce the target speech that wherein exists here among the MC.