CN109243482A

CN109243482A - Improve the miniature array voice de-noising method of ACRANC and Wave beam forming

Info

Publication number: CN109243482A
Application number: CN201811275824.4A
Authority: CN
Inventors: 曾庆宁; 罗瀛; 方韶劻; 林凤梅; 谢先明; 龙超
Original assignee: Shenzhen Aangsi Science & Technology Co Ltd
Current assignee: Shenzhen Aangsi Science & Technology Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-01-18
Anticipated expiration: 2038-10-30
Also published as: CN109243482B

Abstract

The invention discloses a kind of miniature array voice de-noising methods for improving ACRANC and Wave beam forming, it is related to speech signal processing technology, the technical issues of solution is the noise suppressed performance for how further increasing ACRANC method and carrying out voice drop, the following steps are included: (one) improves ACRANC method, it is specifically as follows step by step: (1) that the distortion voice signal after multichannel noise reduction is obtained by multichannel adaptive noise cancellation；(2) using multichannel distortion voice signal as the input for restoring filter in ACRANC system, to obtain reducing noise of voice；(2) Wave beam forming, it is specifically as follows step by step: (1) to establish multiple improvement ACRANC subsystems and adaptive model control AMC subsystem, obtain multichannel reducing noise of voice；(2) better reducing noise of voice is obtained by Wave beam forming to multichannel reducing noise of voice.The present invention can make the effect for exporting voice more preferable, and further improve voice de-noising effect.

Description

Improve the miniature array voice de-noising method of ACRANC and Wave beam forming

Technical field

The present invention relates to speech signal processing technology more particularly to a kind of improve the miniature of ACRANC and Wave beam forming Array voice de-noising method.

Background technique

Voice de-noising technology can effectively improve the discrimination of voice quality and speech recognition system, miniature array voice de-noising Technology is a kind of effective voice de-noising method.Mini microphone battle array refers to that the array of array aperture very little, array aperture are usual All within 5 centimetres, and element number of array is less.Since miniature array is easier to be embedded in a variety of application apparatus, have wide General application value.Generalized sidelobe based on VAD (Voice Activity Detector) offsets (Generalized Sidelobe Cancellation) method (being abbreviated as VAD-GSC) is a kind of common and effectively mini microphone battle array voice drops Method for de-noising.And array resistance to crosstalk adaptive noise cancellation (Array Crosstalk Resistant Adaptive Noise Cancellation is abbreviated as ACRANC) and a kind of effective mini microphone battle array voice de-noising method, and the side ACRANC Method has more preferable than VAD-GSC and its many improved methods in many occasions, the especially closer occasion of speech source distance arrays Noise reduction effect.

In ACRANC, only signal, the input are the voice to distort all the way in fact all the way for the input of second level filter Signal, that is, the output of first order filter, the function of second level filter are to restore pure by the voice signal of distortion in fact Net voice signal, even if also it is exported close to the clean speech signal in main microphon.In the actual environment due to audio signal For the complexity and ACRANC first order filter of propagation to distortion property caused by voice signal, second level filter restores output Sound effect still have shortcoming.

Summary of the invention

In view of the deficiencies of the prior art, technical problem solved by the invention is how to further increase ACRANC method Noise suppressed performance.

In order to solve the above technical problems, the technical solution adopted by the present invention is that a kind of improve the micro- of ACRANC and Wave beam forming Type array voice de-noising method, by input multichannel distort voice be input to restore filter and with DAS (Delay And Sum, Delay summation) Wave beam forming combines carry out voice de-noising, comprising the following steps:

(1) ACRANC method is improved, specifically as follows step by step:

(1) the distortion voice signal after multichannel noise reduction is obtained by multichannel adaptive noise cancellation, detailed process is as follows:

Assuming that voice signal is s (k), noise signal is n (k), they arrive separately at microphone M by mulitpath_iAnd It is converted to signal s_i(k) and n_i(k)；Microphone M is reached from speech source and noise source_iPropagation impulse response be assumed to be h_si(k) and h_ni(k)；Microphone M_iThe signal actually picked up is expressed as x_i(k)=s_i(k)+n_i(k), wherein i=1,2 ... N, k=0,1, 2 ..., N indicates the number of microphone in array in formula, and k is discrete time serial number, obtains:

x_i(k)=s_i(k)+n_i(k) (1)

s_i(k)=h_si(k)*s(k)(2)

n_i(k)=h_ni(k) * n (k) i=1,2 ..., N (3)

* is convolution algorithm symbol in formula；

If voice signal s_iTo voice signal s_jThe intermediate shock response propagated beAnd noise signal n_iTo making an uproar Acoustical signal n_jIntermediate propagate shock response and beThen:

This step by step in, to each microphone M_i, with microphone M_iThe signal x of acquisition_i(k) as main path signal, and its The signal x that its N-1 microphone obtains_j(k) (j=1 ..., i-1, i+1 ..., N) it is used as reference signal；In global noiseless rank Section, the road Ji Ge signal is all the noiseless stage, passes through filter A_iIt goes adaptively to offset with the noise in multichannel reference signal Noise in main road；And in non-global silent period, it keeps the coefficient of filter Ai constant, only makees filtering output；Then, it can obtain Obtain multichannel distortion voice signal.The reason is as follows that:

Due to the voice signal s in global silent period_i(k)=0, i=1,2 ..., N, so that

x_i(k)=y_i1(k)+e_i1(k) (6)

n_i(k)=w_in_i(k)+err_i(k) (7)

X in formula_i(k)=n_i(k), e_i1(k)=err_iIt (k) is prediction error, y_i1=w_in_iIt (k) is filter A_iOutput, w_iIt is the filter A of 1 × (N-1) (L+1) dimension_iCoefficient row vector, that is:

w_i=(w_i1,…,w_i(i-1),w_i(i+1)…,w_N) (8)

W in formula_ij=(w_ij0,w_ij1,…,w_ijL),n_iIt (k) is the noise signal column vector of (N-1) (L+1) × 1 dimension；

n_i(k)=[n_i1(k),…,n_i(i-1)(k),n_i(i+1)(k),…,n_iN(k)]^T (9)

N in formula_ij(k)=[n_ij(k),n_ij(k-1),…,n_ij(k-L)]^T, L is the sampling point of reference channel noise signal delay Number；

If minimal error power isAnd corresponding optimal coefficient vector are as follows:

It is above-mentioned to acquireWithFilter A need to only be adjusted_iCoefficient so that e_i1Quadratic sum minimum be It can；

Following the global silent period subsequent stage closely, under conditions of it is assumed that noise circumstance is constant or slowly varying, Keep filter A_iOptimal coefficient it is constant, only make filtering output, then have:

X in formula_i(k) and s_i(k) the pure speech vector of noisy speech vector sum for respectively indicating pickup, by formula (6) and formula (11) Have:

Wherein:

Above-mentioned e_i1It (k) is the distortion voice containing residual noise all the way, p_i(k) be distortion therein voice, by formula (13) as it can be seen that it in fact in the road N clean speech signal distortion from；

e_i1(k) Shi Jiang the i-th road signal is as main signal, and other signals are as obtained from reference signal, if allowing i from 1 To N, i.e., respectively by each road signal as main signal, remaining signal is as reference signal, then the road N just can be obtained containing residual noise Distortion voice signal e_j1(k) (j=1,2 ... N).

(2) language using multichannel distortion voice signal as the input for restoring filter in ACRANC system, after obtaining noise reduction Sound signal, detailed process is as follows:

By multichannel distortion voice signal e_j1(k) (j=1,2 ... N), input the second level filter B in ACRANC system_i, Stage other than global silent period adjusts filter B_iCoefficient so that its export e_2i(k) quadratic sum is minimum, in which:

||e_i2(k)||²=| | x_i(k)-y_i2(k)||²

=| | s_i(k)+n_i(k)-y_i2(k)||²

=| | n_i(k)||²+||s_i(k)-y_i2(k)||²+2n_i(k)[s_i(k)-y_i2(k)] (14)

By formula (15) as it can be seen that minimizingIt is equivalent to minimize E [s_i(k)-y_i2(k)²], and the latter is equivalent to most Smallization y_i2(k) with voice s_i(k) error, therefore filter B_iOutput y_i2(k) clean speech signal s can be approached_i(k).By In filter B_iInput be not only single channel but multichannel distort voice signal, thus can get voice more better than ACRANC Noise reduction effect remembers that better voice de-noising signal is

(2) Wave beam forming is combined with Wave beam forming by that will improve ACRANC, further increases voice de-noising effect, It is specific as follows step by step:

(1) multiple improvement ACRANC subsystems and adaptive model control AMC subsystem are established, multichannel noise reduction language is obtained Sound, detailed process is as follows:

Using often for signal as main signal, remaining signal all establishes an improved ACRANC as reference signal all the way, from And set up N number of such subsystem.

In each improved ACRANC, filter B_iInput be all filter A_iThe output of (i=1,2 ... N), and A non-filter A_iOutput；Adaptive model control AMC is used to control the filter in these subsystems and when updates It counts and when fixed coefficient is constant；

It, can be by adjusting filter A in silent period, that is, NVP stage of not voice_iOptimal coefficient compensate environment Error caused by factor changes.For this purpose, a global silent period, that is, ONVP stage is defined, the first order filtering of subsystems Device A_iOptimal coefficient is only adjusted during ONVP；

By microphone M_iPick up to obtain the i-th tunnel noisy speech signal x_i(k) silent period is set as NVP (i), and NVP (i) is by one Serial variance section composition, it may be assumed that

Wherein discrete segment:

[k'_ij,k”_ij]={ k'_ij,k'_ij+1,…,k”_ij}

The discrete segment is x_i(k) j-th of NVP, it is clear that NVP (i₁) may not be with NVP (i₂) equal, i₁≠i₂,i₁,i₂∈ { 1,2 ..., N } but NVP (i₁) it is NVP (i₂) translation result on a timeline；

Define ONVP are as follows:

Then, it is easy to show that

Wherein:

If k "_j< k'_j, then [k' in definition (18)_j,k”_j]=φ；

Adjust filter A_iOptimal coefficient when, all should not contain voice signal in any signal all the way, otherwise, voice meeting It is taken as noise to offset together, therefore, only adjusts filter A in the following L-ONVP stage_iCoefficient；

Wherein L is reference signal input filter A_iDelay time number of samples, and:

[k'_j+L,k”_j]={ k'_j+L,k'_j+L+1,…,k”_j} (20)

If k "_j< k'_j+ L, [k' in same definition (26)_j+L,k”_j]=φ；

In the L-ONVP stage, all signals and the delay used all belong to silent period, do not include any voice signal, Therefore filter A can be adjusted in the L-ONVP stage_iOptimal coefficient, it is aforementioned in the NVP stage refers to is exactly L-ONVP or L- A part of ONVP；

Device A is filtered in (Δ, Δ ')-ONVP stage_iThe adjustment of optimal coefficient:

In formulaIt is composition i-th₀NVP (the i of road signal₀) discrete time section, Δ ' be a positive integer, it It can arbitrarily be chosen according to the accuracy that VD is adjudicated, it is therefore an objective to guarantee that time interval used is pure sound section, Δ is also one Optional positive integer, but should meet

Δ≥L+δ+Δ' (22)

Wherein δ is that noise from other microphones of microphone array travels to i-th₀Time delay between a microphone, with Postpone number of samples meter, at most delay number of samples are as follows:

Wherein d_iIt is microphoneWith microphone M_iThe distance between, f is the sample frequency of array, and c is audio signal Aerial spread speed；

Stage except (Δ, Δ ')-ONVP, the filter A of each subsystem_iOptimal coefficient keep initial value it is constant, Filter A_iOnly make filtering to use.

Remaining stage other than global silent period is adaptively adjusted all filter B_iOptimal coefficient, in order to It, can also be to B for the sake of simplicity_iCeaselessly make adaptive adjustment from beginning to end；

(2) final reducing noise of voice is obtained by delay summation DAS Wave beam forming, detailed process is as follows:

The output of each subsystem is that the voice signal after noise reduction, all road N output can input one all the way Beam-former to obtain better voice de-noising effect, if using common DAS Beam-former, can be by following defeated Enter output relation description are as follows:

τ in formula_iIt is relative to the reference microphone selected in arrayFor, voice reaches microphone M_iDelay Time；Reference microphoneIt may optionally be any one microphone in array, generally select central or close positioned at microphone array The microphone in center is as reference microphone；

Delay time T_iIt can be calculated with cross-correlation method or broad sense cross-correlation method or following methods calculate:

1) (δ, T) _ OVP discrete time section [k', k "] is chosen, meets k >=k "+δ and k- (k "+δ) as far as possible It is small；

2) τ is found_iMeet:

If the array aperture very little of microphone array, and the sample frequency of array signal is not very high words, it is all τ_iIt can be considered 0 processing.

Compared with prior art, beneficial effects of the present invention:

It is input to recovery filter by introducing multichannel distortion voice, inputs and restores compared to the original voice that only distorts all the way The effect of filter is more preferable, is imitated than common ACRANC method with better voice de-noising by improved ACRANC method Fruit, and improved ACRANC method combines with Beamforming Method and can further improve noise reduction effect.

Detailed description of the invention

Fig. 1 is flow diagram of the present invention；

Fig. 2 is voice and make an uproar sonic propagation and crosstalk schematic diagram；

Fig. 3 is improved ACRANC system schematic；

Fig. 4 is to improve ACRANC and Wave beam forming combination schematic diagram.

Specific embodiment

A specific embodiment of the invention is further described with reference to the accompanying drawing, but is not to limit of the invention It is fixed.

Fig. 1 shows a kind of miniature array voice de-noising method for improving ACRANC and Wave beam forming, by inputting multichannel Distortion voice to ACRANC recovery filter and combine with Wave beam forming carry out voice de-noising, comprising the following steps:

(1) ACRANC method is improved, specifically as follows step by step:

Assuming that voice signal is s (k), noise signal is n (k), as shown in Fig. 2, they are arrived separately at by mulitpath Microphone M_iAnd it is converted to signal s_i(k) and n_i(k)；Microphone M is reached from speech source and noise source_iPropagation impulse response it is false It is set as h_si(k) and h_ni(k)；Microphone M_iThe signal actually picked up is expressed as x_i(k)=s_i(k)+n_i(k), wherein i=1,2 ... N, k=0,1,2 ..., N indicates the number of microphone in array in formula, and k is discrete time serial number, it obtains:

x_i(k)=s_i(k)+n_i(k) (1)

s_i(k)=h_si(k)*s(k) (2)

n_i(k)=h_ni(k) * n (k) i=1,2 ..., N (3)

* is convolution algorithm symbol in formula；

This step by step in, to each microphone M_i, with microphone M_iThe signal x of acquisition_i(k) as main path signal, and its The signal x that its N-1 microphone obtains_j(k) (j=1 ..., i-1, i+1 ..., N) it is used as reference signal；In global noiseless rank Section, the road Ji Ge signal is all the noiseless stage, as shown in figure 3, passing through filter A_iIt is gone certainly with the noise in multichannel reference signal Adaptively offset the noise in main road；And in non-global silent period, it keeps the coefficient of filter Ai constant, it is defeated only to make filtering Out；Then, it can get multichannel distortion voice signal.The reason is as follows that:

Due to the voice signal s in global silent period_i(k)=0, i=1,2 ..., N, so that

x_i(k)=y_i1(k)+e_i1(k) (6)

n_i(k)=w_in_i(k)+err_i(k) (7)

w_i=(w_i1,…,w_i(i-1),w_i(i+1)…,w_N) (8)

n_i(k)=[n_i1(k),…,n_i(i-1)(k),n_i(i+1)(k),…,n_iN(k)]^T (9)

It is above-mentioned to acquireWithFilter A need to only be adjusted_iCoefficient so that e_i1Quadratic sum it is minimum；

Wherein:

||e_i2(k)||²=| | x_i(k)-y_i2(k)||²

=| | s_i(k)+n_i(k)-y_i2(k)||²

=| | n_i(k)||²+||s_i(k)-y_i2(k)||²+2n_i(k)[s_i(k)-y_i2(k)] (14)

By formula (15) as it can be seen that minimizingIt is equivalent to minimize E [s_i(k)-y_i2(k)²], and the latter is equivalent to most Smallization y_i2(k) with voice s_i(k) error, therefore filter B_iOutput y_i2(k) clean speech signal s can be approached_i(k)。

Due to filter B_iInput be the road N signal e_j1(k) (j=1,2 ... N), they are all by the road N voice by formula (13) the distortion voice signal become, the output that this multichannel input generates approach, will be than only all the way signal e_i1(k) input and The output Approximation effect of generation is more preferable, theoretically, as long as filter B_iIn to other road input signal e_j1(k) (j=1 ..., (i-1), (i+1) ... N) all coefficients when taking 0 value, the input of the road N is just degenerated to only all the way signal e_i1(k) input feelings Shape.Therefore, above-mentioned improved ACRANC method is also necessarily more preferable than existing ACRANC method effect, remembers better voice de-noising Signal is

In each improved ACRANC, filter B_iInput be all filter A_iThe output of (i=1,2 ... N), and A non-filter A_iOutput；As shown in figure 4, adaptive model control AMC is used to control the filter in these subsystems When update coefficient and when fixed coefficient is constant；

Wherein discrete segment:

[k’_ij,k”_ij]={ k '_ij,k’_ij+1,…,k”_ij}

Define ONVP are as follows:

Then, it is easy to show that

Wherein:

If k "_j< k'_j, then [k' in definition (18)_j, k " j] and=φ；

[k'_j+L,k”_j]={ k'_j+L,k'_j+L+1,…,k”_j} (20)

If k "_j< k '_j+ L, [k' in same definition (26)_j+L,k”_j]=φ；

In formulaIt is composition i-th₀NVP (the i of road signal₀) discrete time section, Δ ' be a positive integer, it It can arbitrarily be chosen according to the accuracy that VD is adjudicated, it is therefore an objective to guarantee that time interval used is pure sound section, Δ is also one Optional positive integer, but should meet:

Δ≥L+δ+Δ' (22)

Delay time T_iIt can be calculated with cross-correlation method or broad sense cross-correlation method or following methods:

2) τ is found_iMeet:

For example, if any one microphone M in array_iTo reference microphoneDistance be both less than 2 centimetres, and The snap sample frequency of array be 8000Hz, then the maximum extension time will less than half sampling time interval, so, at this moment All τ might as well be taken_i=0.

(3) about the complexity of calculating

Fig. 4 shows the voice de-noising process for improving ACRANC in conjunction with DAS Wave beam forming, wherein AMC and DAS wave beam shape The calculation amount grown up to be a useful person is all little, and AMC can be realized in fact by a VAD (Voice Activity Detector).So The computation complexity of method depends primarily on the calculation amount estimation of the improvement ACRANC algorithm of N number of subsystem, for each improvement ACRANC, calculation amount depend on all filter A again_iAnd B_iUsed adaptive algorithm.If adaptively calculated using LMS Method, the then calculation amount for being not difficult to calculate the improvement ACRANC algorithm of N number of subsystem are no more than

(2_A+3_M)[(L+1)(N-1)+(L_B+1)N]Nf (26)

2 in formula_AIndicate 2 sub-addition operations, 3_MIndicate 3 multiplyings, L is to determine filter A_iIn the formula (10) of order Delay time number of samples used in reference signal, N are the number of microphone in array, L_BIt is filter B_iOrder, f is wheat The sample rate of gram wind array.It is true since many chips can be completed at the same time an additions and multiplications in once-through operation It is real to calculate the time much smaller than the required time shown in formula (32).

For example, determining filter A if chosen_iThe L=24 of length, it is resolved that filter B_iThe L of length_B=20, sample frequency F=8000 and array is made of N=5 microphone, then can obtain calculation amount of concern according to formula (32) and be not more than 41MFLOPS。

Compared with prior art, beneficial effects of the present invention:

Detailed description is made that embodiments of the present invention in conjunction with attached drawing above, but the present invention be not limited to it is described Embodiment.To those skilled in the art, without departing from the principles and spirit of the present invention, to these implementations Mode carries out various change, modification, replacement and variant are still fallen in protection scope of the present invention.

Claims

1. a kind of miniature array voice de-noising method for improving ACRANC and Wave beam forming, which is characterized in that by inputting multichannel Distortion voice combines carry out voice de-noising to recovery filter and with Wave beam forming, comprising the following steps:

(1) ACRANC method is improved, specifically as follows step by step:

(1) the distortion voice signal after multichannel noise reduction is obtained by multichannel adaptive noise cancellation；

(2) using multichannel distortion voice signal as the input for restoring filter in ACRANC system, to obtain reducing noise of voice；

(2) Wave beam forming is combined with Wave beam forming by that will improve ACRANC, further increases voice de-noising effect, specifically It is as follows step by step:

(1) multiple improvement ACRANC subsystems and adaptive model control AMC subsystem are established, multichannel reducing noise of voice is obtained；

(2) final reducing noise of voice is obtained by delay summation DAS Wave beam forming.

2. improving the miniature array voice de-noising method of ACRANC and Wave beam forming as described in claim 1, which is characterized in that (1) detailed process is as follows step by step in step (1):

Assuming that voice signal is s (k), noise signal is n (k), they arrive separately at microphone M by mulitpath_iAnd it is converted to Signal s_i(k) and n_i(k)；Microphone M is reached from speech source and noise source_iPropagation impulse response be assumed to be h_si(k) and h_ni (k)；Microphone M_iThe signal actually picked up is expressed as x_i(k)=s_i(k)+n_i(k), wherein i=1,2 ... N, k=0,1,2 ..., N indicates the number of microphone in array in formula, and k is discrete time serial number, obtains:

x_i(k)=s_i(k)+n_i(k) (1)

s_i(k)=h_si(k)*s(k) (2)

n_i(k)=h_ni(k) * n (k) i=1,2 ..., N (3)

* is convolution algorithm symbol in formula；

If voice signal s_iTo voice signal s_jThe intermediate shock response propagated beAnd noise signal n_iTo noise signal n_jIntermediate propagate shock response and beThen:

This step by step in, to each microphone M_i, with microphone M_iThe signal x of acquisition_i(k) main path signal, and other N- are used as The signal x that 1 microphone obtains_j(k) (j=1 ..., i-1, i+1 ..., N) it is used as reference signal；In global silent period, i.e., Each road signal is all the noiseless stage, passes through filter A_iIt goes adaptively to offset in main road with the noise in multichannel reference signal Noise；And in non-global silent period, it keeps the coefficient of filter Ai constant, only makees filtering output；Then, it can get multichannel Distort voice signal；The reason is as follows that:

Due to the voice signal s in global silent period_i(k)=0, i=1,2 ..., N, so that

x_i(k)=y_i1(k)+e_i1(k) (6)

n_i(k)=w_in_i(k)+err_i(k) (7)

X in formula_i(k)=n_i(k), e_i1(k)=err_iIt (k) is prediction error, y_i1=w_in_iIt (k) is filter A_iOutput, w_iIt is 1 The filter A of × (N-1) (L+1) dimension_iCoefficient row vector, that is:

w_i=(w_i1,…,w_i(i-1),w_i(i+1)…,w_N) (8)

n_i(k)=[n_i1(k),…,n_i(i-1)(k),n_i(i+1)(k),…,n_iN(k)]^T (9)

N in formula_ij(k)=[n_ij(k),n_ij(k-1),…,n_ij(k-L)]^T, L is the number of samples of reference channel noise signal delay；

If minimal error power is P [err_i ⁰(k)], corresponding optimal coefficient vector are as follows:

It is above-mentioned to acquireWith P [err_i ⁰(k)] filter A, need to only be adjusted_iCoefficient so that e_i1Quadratic sum it is minimum；

It is following the global silent period subsequent stage closely, under conditions of it is assumed that noise circumstance is constant or slowly varying, is keeping Filter A_iOptimal coefficient it is constant, only make filtering output, then have:

X in formula_i(k) and s_i(k) the pure speech vector of noisy speech vector sum for respectively indicating pickup, is had by formula (6) and formula (11):

Wherein:

e_i1(k) Shi Jiang the i-th road signal is as main signal, and other signals are as obtained from reference signal, if allowing i from 1 to N, I.e. respectively by each road signal as main signal, remaining signal is as reference signal, then the road N just can be obtained containing the abnormal of residual noise Become voice signal e_j1(k) (j=1,2 ... N).

3. improving the miniature array voice de-noising method of ACRANC and Wave beam forming as described in claim 1, which is characterized in that (2) detailed process is as follows step by step in step (1):

By multichannel distortion voice signal e_j1(k) (j=1,2 ... N), input the second level filter B in ACRANC system_i, complete In stage other than office's silent period, adjust filter B_iCoefficient so that its export e_2i(k) quadratic sum is minimum, in which:

||e_i2(k)||²=| | x_i(k)-y_i2(k)||²

=| | s_i(k)+n_i(k)-y_i2(k)||²

=| | n_i(k)||²+||s_i(k)-y_i2(k)||²+2n_i(k)[s_i(k)-y_i2(k)] (14)

By formula (15) as it can be seen that minimizingIt is equivalent to minimize E [s_i(k)-y_i2(k)²], and the latter is equivalent to minimum y_i2(k) with voice s_i(k) error, therefore filter B_iOutput y_i2(k) clean speech signal s can be approached_i(k)；Due to filter Wave device B_iInput be not only single channel but multichannel distort voice signal, thus can get voice de-noising more better than ACRANC Effect remembers that better voice de-noising signal is

4. improving the miniature array voice de-noising method of ACRANC and Wave beam forming as described in claim 1, which is characterized in that (1) detailed process is as follows step by step in step (2):

Per for signal as main signal, remaining signal all establishes an improved ACRANC, to build as reference signal all the way Erect N number of such subsystem；

In each improved ACRANC, filter B_iInput be all filter A_iThe output of (i=1,2 ... N), Er Feiyi A filter A_iOutput；The filter that adaptive model control AMC is used to control in these subsystems when update coefficient with And when fixed coefficient is constant；

It, can be by adjusting filter A in silent period, that is, NVP stage of not voice_iOptimal coefficient come compensate environmental factor become Error caused by changing；For this purpose, defining a global silent period, that is, ONVP stage, the first order filter A of subsystems_iOnly Optimal coefficient is adjusted during ONVP；

By microphone M_iPick up to obtain the i-th tunnel noisy speech signal x_i(k) silent period is set as NVP (i), and NVP (i) is by a series of Discrete segment composition, it may be assumed that

Wherein discrete segment:

[k′_ij,k″_ij]={ k '_ij,k′_ij+1,…,k″_ij}

The discrete segment is x_i(k) j-th of NVP, it is clear that NVP (i₁) may not be with NVP (i₂) equal, i₁≠i₂,i₁,i₂∈{1, 2 ..., N } but NVP (i₁) it is NVP (i₂) translation result on a timeline；

Define ONVP are as follows:

Then, it is easy to show that

Wherein:

If k "_j< k'_j, then [k' in definition (18)_j,k″_j]=φ；

Adjust filter A_iOptimal coefficient when, all should not contain voice signal in any signal all the way, otherwise, voice can be worked as Make noise to offset together, therefore, only adjusts filter A in the following L-ONVP stage_iCoefficient；

[k'_j+L,k″_j]={ k'_j+L,k'_j+L+1,…,k″_j} (20)

If k "_j< k '_j+ L, [k' in same definition (26)_j+L,k″_j]=φ；

In the L-ONVP stage, all signals and the delay used all belong to silent period, do not include any voice signal, therefore Filter A can be adjusted in the L-ONVP stage_iOptimal coefficient, it is aforementioned in the NVP stage refers to is exactly L-ONVP or L-ONVP A part；

In formulaIt is composition i-th₀NVP (the i of road signal₀) discrete time section, Δ ' be a positive integer, it can root It is arbitrarily chosen according to the accuracy of VD judgement, it is therefore an objective to guarantee that time interval used is pure sound section, Δ is also one optional Positive integer, but should meet:

Δ≥L+δ+Δ' (22)

Wherein δ is that noise from other microphones of microphone array travels to i-th₀Time delay between a microphone, to postpone sample Points meter, at most delay number of samples are as follows:

Wherein d_iIt is microphoneWith microphone M_iThe distance between, f is the sample frequency of array, and c is audio signal in sky Spread speed in gas；

Stage except (Δ, Δ ')-ONVP, the filter A of each subsystem_iOptimal coefficient keep initial value it is constant, filtering Device A_iOnly make filtering to use；

Remaining stage other than global silent period is adaptively adjusted all filter B_iOptimal coefficient.

5. improving the miniature array voice de-noising method of ACRANC and Wave beam forming as described in claim 1, which is characterized in that (2) detailed process is as follows step by step in step (2):

The output of each subsystem is that the voice signal after noise reduction, all road N output can input a wave beam all the way Shaper to obtain better voice de-noising effect, if using common DAS Beam-former, can be defeated by following input Relationship description out are as follows:

τ in formula_iIt is relative to the reference microphone selected in arrayFor, voice reaches microphone M_iDelay when Between；Reference microphoneIt may optionally be any one microphone in array, generally select positioned at microphone array center or in The microphone of centre is as reference microphone.

6. improving the miniature array voice de-noising method of ACRANC and Wave beam forming as claimed in claim 5, which is characterized in that The delay time T_iIt can be calculated with cross-correlation method or broad sense cross-correlation method or following methods:

1) (δ, T) _ O VP discrete time section [k', k "] is chosen, meets k >=k "+δ and k- (k "+δ) is as small as possible；

2) τ is found_iMeet:

If the array aperture very little of microphone array, and the sample frequency of array signal is not very high words, all τ_iVisually For 0 processing.