CN105206281B

CN105206281B - Sound enhancement method based on distributed microphone array network

Info

Publication number: CN105206281B
Application number: CN201510582363.5A
Authority: CN
Inventors: 胡旻波
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-09-14
Filing date: 2015-09-14
Publication date: 2019-02-15
Anticipated expiration: 2035-09-14
Also published as: CN105206281A

Abstract

The invention discloses a kind of sound enhancement methods based on distributed microphone array network, including the following steps: establishes the distributed microphone array network based on Ad-hoc network；It is synchronous that sample rate is carried out to network node；The signal of each node is subjected to framing；Speech enhan-cement is carried out using multichannel Wiener filter in each node；By the every other node of transmitting voice signal after enhancing to network；In each node, simultaneously according to voice signal after the enhancing of the single channel of the multichannel microphone array observation signal of present node and every other node, speech enhan-cement is carried out using multichannel Wiener filter again, obtains voice signal after the updated single channel enhancing of present node.By isolated microphone array, network interconnects the present invention by wireless communication, forms microphone array network, is conducive to the speech enhan-cement effect for improving individual node.

Description

Sound enhancement method based on distributed microphone array network

Technical field

The present invention relates to sound enhancement methods, and in particular to a kind of speech enhan-cement based on distributed microphone array network Method.

Background technique

Various noises are frequently accompanied by environment locating for us, for example, television set sound and fan sound, automobile in room Interior engine sound, the running car sound on road, Babble noise in coffee shop etc..Noise is to multiple voice processing system It has a negative impact.For example, noise can interfere the sound for even covering other side in voice communication, speech quality is reduced；In language In sound identifying system, noise can make phonetic recognization rate decline, or even keep identifying system entirely ineffective.Therefore, according to observing Noisy Speech Signal, estimation clean speech have a very important significance, we are referred to as speech enhan-cement.

Traditional voice enhancing algorithm is handled using the observation signal of a microphone, including single channel Wiener filtering Device, spectrum-subtraction, the maximum likelihood based on statistical model and sound enhancement method of maximum a posteriori probability etc..Although such method can To eliminate noise to a certain extent, but deposit problem both ways.It will cause lacking for phonetic element while firstly, eliminating noise It loses, i.e. generation speech distortion.Secondly, noise eliminate after frequency spectrum on be commonly present the extreme point of Random Discrete, experience hearer " music noise ".The factor of these two aspects all makes the enhanced intelligibility of speech be difficult to reach expected, and makes speech recognition Performance cannot effectively improve.

To solve the above-mentioned problems, people start with two or more microphones, form " microphone array ", to visit Seek more preferably multi-channel speech enhancement method.Microphone in microphone array is in space different location, but clock and sampling Rate is consistent.Therefore, multiple microphones provide the temporal redundancy and Spatial Difference of voice and noise, and more information make The raising of speech enhan-cement performance is provided with possibility.In order to enhance voice, people can design the space filter of referred to as " Beam-former " Wave device extracts the signal in target sound source direction, inhibits the noise in other directions.Simplest Beam-former is " delay adduction " Beam-former, and MVDR and LCMV Beam-former theoretically can avoid speech distortion while reducing noise.In addition to Except simple Beam-former, generalized sidelobe is eliminated (GSC) framework and is also widely used.Although can theoretically prove The equivalence of GSC and LCMV Beam-former, but the realization of GSC is more simple, computation complexity is relatively low.Above-mentioned wave beam shape Grow up to be a useful person be required to voice directions (even noise orientation) it is known that but sound bearing often and is not fixed under actual conditions, and make an uproar Sound bearing under sound and reverberation is difficult to estimate.In order to avoid auditory localization, single channel Wiener filter is generalized to multichannel, So that optimal multichannel Wiener filter can be designed according only to noise space time statistical properties, and noise space time statistical properties can In conjunction with voice existing probability or voice activity detection algorithms estimation and update.It is compared with single-channel algorithm, even twin-channel Sound enhancement method can obtain being obviously improved for performance.

Speech enhan-cement is carried out using microphone array and is increasingly becoming mainstream.Once microphone array hardware completes, Microphone spacing, the parameters such as included number of microphone are difficult to change.Due to the limitation in the spaces such as handheld device, microphone array More microphone and biggish spacing cannot be used.When microphone array is only in a lesser spatial dimension, it is difficult to Accurate comprehensive acquisition is carried out to ambient noise and reverberation.And theoretically more microphones and bigger microphone spacing can be with Effectively improve the performance of multicenter voice enhancing algorithm.Therefore, it is traditional based on the voice enhancement algorithm of microphone array by The limitation of microphone array itself scalability and space.

Summary of the invention

In view of the deficiencies of the prior art, the invention discloses a kind of speech enhan-cements based on distributed microphone array network Method.

Technical scheme is as follows:

A kind of sound enhancement method based on distributed microphone array network, including the following steps:

Step a, the distributed microphone array network based on Ad-hoc network being made of multiple microphone arrays is established； It can be in communication with each other between any two network node；

Step b, by distributed microphone array netinit, i.e., it is synchronous sample rate to be carried out to network node；

Step c, the signal of each node is subjected to framing, the multinode multichannel microphone array after obtaining framing observes letter Number；

Step d, in each node, for the multichannel microphone array observation signal of each frame, according to the more of present node Road microphone array observation signal carries out speech enhan-cement using multichannel Wiener filter, and voice is believed after obtaining single channel enhancing Number；

Step e, in each node, transmitting voice signal after which is enhanced by the single channel that the step d is obtained To the every other node of network；

Step f, in each node, while according to the multichannel microphone array observation signal and every other section of present node Voice signal after the single channel enhancing of point, carries out speech enhan-cement using multichannel Wiener filter again, obtains present node more Voice signal after single channel enhancing after new；

Step g, iteration step e~step f, when voice signal is restrained after the single channel enhancing that certain node obtains, Voice signal no longer updates after the single channel enhancing of present node；When all nodes single channel enhancing after voice signal no longer When update, processing terminate for present frame；The finally voice signal after each node obtains present node enhancing.

Its further technical solution are as follows: the microphone array includes audio collection module and communication module.

Its further technical solution are as follows: the structure of the Ad-hoc network in the step a is planar structure or classification Structure；Ad-hoc network using priori formula, reaction equation or hybrid-type Routing Protocol realize in network two node devices it Between be in communication with each other.

Its further technical solution are as follows: the step b further includes carrying out time synchronization to network node；

The distribution microphone array includes network equipment clock；When the time synchronization is by the network equipment Clock is synchronized based on ntp network time protocol.

Its further technical solution is, the step b specifically includes the following steps:

Step b1, network samples rate is initialized, makes K=1, i.e. network samples rate f₀Equal to the equipment sample rate of node 1 f₁；

Step b2, the equipment sample rate of node K is f_K；By the equipment sample rate f of node K_KIt is transferred to node K+1；

If step b3, the equipment sample rate f of node K+1_K+1> f_K, then f₀=f_K, otherwise f₀=f_K+1；

Step b4, K=K+1；

Step b5, step b2~step b4 is repeated, until all nodes are traversed, thus network samples rate f₀For whole network institute There is the equipment sample rate minimum value of node；

Step b6, pass through finish node for current network sample rate f₀Other each nodes are transferred to, so that all nodes Equipment sample rate is f₀。

Its further technical solution are as follows: the signal framing in the step c inhibits frequency using hamming window or Hanning window Spectrum leakage；The step c uses the framing strategy of time aliasing.

Its further technical solution are as follows: the step d uses the multi-pass of time domain multichannel Wiener filter or frequency domain Road Wiener filter is filtered multichannel microphone array observation signal, to achieve the effect that speech enhan-cement:

In node K, the expression formula of the time domain multichannel Wiener filter are as follows:

h_{W, K}(t)=[R_{Xx, K}(t)+λR_{Nn, K}(t)]^-1R_{Xx, K}(t)u；

In above formula, R_{Xx, K}(t)=R_{Yy, K}(t)-R_{Nn, K}(t)；

It is the clean speech vector x of present node_K(t)=[x_1,K(t), x_2,K(t),…,x_M,K(t)]^TTime domain autocorrelation matrix；

It is the noise vector n of present node_K(t)=[n_1,K(t),n_2,K(t),..., n_M,K(t)]^TTime domain autocorrelation matrix；

It is the multichannel microphone array observation signal vector y of present node_K(t)= [y_1,K(t),y_2,K(t),...,y_M,K(t)]^TTime domain autocorrelation matrix；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

λ is the degree for controlling noise elimination and speech distortion, and λ > 0, λ is bigger, and the repressed effect of noise is more obvious, together When bring more speech distortions；

The time-domain filtering of node K exports are as follows:

In node K, the expression formula of the frequency-domain multi-channel Wiener filter are as follows:

H_{W, K}(ω)=[R_{XX, K}(ω)+λR_{NN, K}(ω)]^-1R_{XX, K}(ω)u；

In above formula, R_{XX, K}(ω)=R_{YY, K}(ω)-R_{NN, K}(ω)；

It is the clean speech vector X of present node_K(ω)=[X_{1, K} (ω), X_{2, K}(ω) ..., X_{M, K}(ω)]^HFrequency domain autocorrelation matrix；

It is the noise vector N of present node_K(ω)=[N_1,K(ω),N_2,K (ω),...,N_M,K(ω)]^HFrequency domain autocorrelation matrix；

It is the multichannel microphone array observation signal vector Y of present node_K(ω) =[Y_1,K(ω),Y_2,K(ω),...,Y_M,K(ω)]^HFrequency domain autocorrelation matrix；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

The frequency domain filtering of node K exports are as follows:

Its further technical solution are as follows: the step e includes that transmitting node sequence is added in the data packet of signal transmission Number, the information of receiving node serial number and multichannel Wiener filter number of processes.

Its further technical solution are as follows: the step f includes the multichannel Wiener filter pair using time domain or frequency domain Signal is filtered after the enhancing of present node multichannel observation signal and other nodes；

In the multichannel Wiener filter of the time domain,

The joint that signal is constituted after the enhancing of present node K multichannel microphone array observation signal and every other node Vector are as follows:

In above formula,For except Vector composed by the enhanced time domain single-channel voice of other outer nodes of node K；

N_iFor the number of iterations of step g；

It isIn clean speech ingredient；

It isIn noise contribution；

For clean speech ingredient in present nodeTime domain from phase Close matrix；

For noise vector in present nodeTime domain autocorrelation matrix；

To combine vector in present nodeTime domain autocorrelation matrix；

U=[1,0 ..., 0]^T, it is the node total number in network the length is M+P-1, P；

Then node K N_iThe time domain multichannel Wiener filter of+1 iteration are as follows:

In the multichannel Wiener filter of the frequency domain,

The joint vector that signal is constituted after the enhancing of present node K multichannel observation signal and every other node are as follows:

In above formula, For vector composed by the enhanced frequency domain single-channel voice of other nodes in addition to node K；

N_iFor the number of iterations of step g；

ForIn clean speech ingredient；

ForIn noise contribution；

For remove node K except other nodes it is pure The frequency domain autocorrelation matrix of speech vector；

To remove other node background noise vectors except node K Frequency domain autocorrelation matrix；

To remove the frequency domain of other node observation vectors except node K certainly Correlation matrix；

U=[1,0 ..., 0]^T；The length is M+P-1, P is the node total number in network, then node K N_i+ 1 iteration Frequency-domain multi-channel Wiener filter are as follows:

Its further technical solution are as follows: the step g include according to filtering front and back signal vector difference norm and Signal energy judges the step of whether voice signal restrains after the single channel enhancing that node obtains, and method is as follows:

It is previous to filter obtained single channel time domain signal vector in node K are as follows:

This filters obtained single channel time domain signal vector are as follows:

WhenWhen, present filter output convergence；

In above formula, | | | |^pP norm is represented, η is threshold value.

The method have the benefit that:

First, the invention proposes a kind of completely new frames that speech enhan-cement is carried out based on microphone array.With tradition side Method is different, and by isolated microphone array, network interconnects the present invention by wireless communication, forms microphone array network.

Second, each of microphone network node can directly or indirectly utilize whole Mikes in network Wind breaches the space limitation pole of each equipment, greatly extends the spatial observation range of individual node, be conducive to improve single The speech enhan-cement effect of node.For single-channel, after it is linked into microphone array network, it can reach more The speech enhan-cement effect in channel.

Third, microphone array network is to the microphone number in the relative position of network node quantity, each node, node Amount and spatial position are not done any it is assumed that having great scalability and freedom degree.

4th, by way of Ad-hoc networking, so that network needs not rely on central node, it can be completed distributed It calculates, improves the tolerance of network.

5th, each node of microphone array network obtains local optimal filter output simultaneously, this is each in network Node provides the user experience of differentiation.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is the distributed microphone array network diagram based on Ad-hoc network.

Fig. 3 is the synchronous flow chart of distributed microphone array network samples rate.

Fig. 4 is the flow chart of the single node speech enhan-cement based on multichannel Wiener filter.

Fig. 5 is the flow chart of the iterative speech enhan-cement of multinode based on multichannel Wiener filter.

Specific embodiment

Fig. 1 is flow chart of the invention.

Core content in the present invention mainly includes three parts: (1), the foundation of Ad-hoc network shown in step a and The initialization of audio collection module shown in step b；(2), the single node voice shown in step d based on multichannel Wiener filter increases By force；(3) the iterative speech enhan-cement of multinode shown in step f based on multichannel Wiener filter.

As shown in Figure 1, the present invention specifically includes the following steps:

(1), the foundation and initialization of Ad-hoc network

Step a, multiple microphone arrays are set, point based on Ad-hoc network being made of multiple microphone arrays is established Cloth microphone array network；It can be in communication with each other between any two network node.

Ad-hoc network is otherwise known as interim self-organizing network.Since the network is not necessarily to additional infrastructure network, base In construction and extension, therefore by the present invention for constructing distributed microphone array network.

Fig. 2 is the distributed microphone array network diagram based on Ad-hoc network.In microphone array network, net Network node is each microphone array.The microphone array equipment of each node includes at least one microphone.Each node Microphone array equipment further includes audio collection module, communication module and computing module.It is connected with each other between modules.Its In, audio collection module is responsible for acquiring the sound in present node local environment, and communication module is responsible for and other nodes communicate mould Data transmission between block, the speech enhan-cement that computing module is responsible for this node calculate.

Hierarchical structure or plane formula network structure can be used in the structure of Ad-hoc network.In hierarchical structure, multiple nets Network node is divided into different " cluster ", and each node in cluster selects cluster head by certain election algorithm, cluster head safeguard this cluster and Routing iinformation between cluster head is realized any in network jointly by the communication between communication, cluster head and the cluster interior nodes between cluster head Communication between two nodes.In plane formula network structure, each node status equity, respective independent maintenance owns to other The routing iinformation of node.In general, hierarchical structure is used when network node is more, and when network node is less, it uses Plane formula network structure.

As shown in Fig. 2, the present embodiment only includes three network nodes, therefore use plane formula network structure.

The present embodiment uses standardized Ad-hoc network communication mode, and each node of Ad-hoc network passes through IEEE 802.11 agreement is communicated.When networking, start node will be set as when a certain node by software by user, and send request The wireless signal of networking.Network node to be added searches for the signal and start node completes that the network is added after confirming.Work as institute After having node that network is added, start node turn-off request networking signal, to complete the establishment process of network.Each node is pressed The sequencing for being shining into net is assigned node serial number.

Step b, by distributed microphone array netinit, i.e., it is synchronous sample rate to be carried out to network node.

Specifically includes the following steps:

If step b3, the equipment sample rate f K+1 > f of node K+1_K, then f₀=f_K, otherwise f₀=f_K+1；

Step b4, K=K+1；

Step b6, pass through finish node, i.e., the last one node in ergodic process, by current network sample rate f₀Transmission To other each nodes, so that all node device sample rates are f₀。

Network samples rate in step b is the software sampling rate of whole network.Node device sample rate is that node passes through firmly The sample rate of part equipment acquisition voice signal.

Step b further includes that clock is synchronous.

Microphone array further includes network equipment clock, is set on communication module.Time synchronization is set by network Standby clock is synchronized based on ntp network time protocol.Number of the sequence of network as present node is added in node, initial to compile Number be 1.Communication module between each node uses high-accuracy network time protocol NTP and number to keep for 1 start node Clock is synchronous.Audio collection module in node reads the network equipment clock in communication module, at the beginning of audio collection Between and communication module specific time node T_sAlignment.T_sValue specified by user, and the whole network is sent to by start node Network.

(2), based on the single node speech enhan-cement of multichannel Wiener filter.

Step c, the signal of each node is subjected to framing, the multinode multichannel microphone array after obtaining framing observes letter Number.Signal framing in step c inhibits spectral leakage using hamming window or Hanning window.Step c uses the framing of time aliasing Strategy.

Step d: in each node, for the multichannel microphone array observation signal of each frame, according to the more of present node Road microphone array observation signal carries out speech enhan-cement using multichannel Wiener filter, and voice is believed after obtaining single channel enhancing Number.

It is eliminated relative to Beam-former and generalized sidelobe, a clear superiority of multichannel Wiener filter is without estimating Meter sound bearing can effectively realize speech enhan-cement.Since target language sound bearing may often change under actual conditions, and make an uproar The sound bearing that variation is tracked under acoustic environment is particularly difficult, therefore the present invention carries out voice increasing using multichannel Wiener filter By force.

Wiener filter can be calculated in time domain or frequency domain, theoretically, Time-Domain algorithm and frequency domain algorithm equivalent, But in practice due to the difference of time domain and Frequency domain noise estimated result, so that algorithm output is not fully consistent.In addition, two changes Changing domain algorithm, there is also the differences of computation complexity.

Fig. 4 is the flow chart of the single node speech enhan-cement based on multichannel Wiener filter.It is as shown in Figure 4: to first have to pair Speech activity detect or estimate voice existing probability, secondly estimates noise autocorrelation matrix, again Noise autocorrelation matrix is calculated, the calculating of multichannel Wiener filter is finally carried out.

The method being filtered using the multichannel Wiener filter of time domain or frequency domain to original multiple signals is as follows:

h_{W, K}(t)=[R_{Xx, K}(t)+λR_{Nn, K}(t)]^-1R_{Xx, K}(t)u；

In above formula, R_{Xx, K}(t)=R_{Yy, K}(t)-R_{Nn, K}(t)；

It is the clean speech vector x of present node_K(t)=[x_1,K(t),x_2,K (t),...,x_M,K(t)]^TTime domain autocorrelation matrix；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

λ is the degree for controlling noise elimination and speech distortion, and λ > 0, when λ is bigger, the repressed effect of noise is more obvious, More speech distortions are brought simultaneously；

The time-domain filtering of node K exports

H_{W, K}(ω)=[R_{XX, K}(ω)+λR_{NN, K}(ω)]^-1R_{XX, K}(ω)u；

Above formula, R_{XX, K}(ω)=R_{YY, K}(ω)-R_{NN, K}(ω)；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

The frequency domain filtering of node K exports

When node only includes a microphone, voice signal is original observation letter after the single channel enhancing of node output Number.

The critical issue of multichannel Wiener filter is the estimation of noise autocorrelation matrix.In the time domain, which can tie Voice activity detection is closed to be estimated.Assuming that present frame is judged as noise, then

In above formula, 0 < α > 1 is updating factor.Otherwise, the matrix is kept not update.Similarly, combinable on frequency domain Voice existing probability is estimated.Assuming that the voice existing probability of present frame frequency band omega is p (ω), then R_{NN, K}(ω) updates are as follows:

R_{NN, K}(ω)←α_pR_{NN, K}(ω)+(1-α_p)X(ω)X^H(ω)；

Wherein, α_p=α+p (ω) (1- α), similarly 0 < α < 1 is updating factor.The noise autocorrelation of time domain or frequency domain Matrix initialisation is the average value of matrix in initial several frames.

When node only includes a microphone, in order to avoid the enhanced voice signal distortion of this node, the node is defeated Voice signal is original multichannel microphone array observation signal after single channel enhancing out.

(3), the iterative speech enhan-cement of multinode based on multichannel Wiener filter.

Step e, in each node, transmitting voice signal after which is enhanced by the single channel that the step d is obtained To the every other node of network.In step e, transmitting node, receiving node sequence can also be added in the data packet of signal transmission Number and the information such as multichannel Wiener filter number of processes, to be mutually distinguishable with other data packets.

Step f, in each node, while according to the multichannel observation signal of the microphone array of present node and every other Voice signal after the single channel enhancing of node, carries out speech enhan-cement using multichannel Wiener filter again, obtains present node Voice signal after updated single channel enhancing；

Each node can obtain the enhanced voice letter of single channel according to the microphone array observation signal of node itself Number.On the one hand the different enhanced voice signals of node inhibit the noise around the node, on the other hand provide pure language The redundancy of sound, therefore can be used to further increase the effect of speech enhan-cement by other nodes.Go out from the angle of network communication Hair, transmits enhanced single-channel voice signal, and the multi-channel original signal that not a node is observed, has been greatly saved band Width, and ensure that the consistency of internodal data transformat.

This link is by the multi-channel GPS observations signal of single-channel voice signal after the enhancing of remaining node and this node together structure The observation vector of Cheng Xin.Single-channel voice signal can be considered the new observation channel of local node, class after the enhancing of remaining node As, multichannel Wiener filter can be used, according to above-mentioned new observation vector, obtains list after updated node enhancing Channel speech signal.

Fig. 5 is the flow chart of the iterative speech enhan-cement of multinode based on multichannel Wiener filter.Construction is current first The joint vector that signal is constituted after the enhancing of the multichannel observation signal and every other node of node, secondly to speech activity into Voice existing probability is estimated in row detection, updates noise autocorrelation matrix again, calculates signal with noise later Autocorrelation matrix finally carries out the calculating of multichannel Wiener filter.

Circular is as follows:

In node K, the multichannel Wiener filter of the time domain,

The joint vector that signal is constituted after the enhancing of present node K multichannel observation signal and every other node is

In above formula,For except section Vector composed by the enhanced time domain single-channel voice of other outer nodes of point K:

N_iFor the number of iterations of step g；

It isIn clean speech ingredient；

It isIn noise contribution；

For noise contribution in present nodeTime domain autocorrelation matrix；

To combine vector in present nodeTime domain autocorrelation matrix；

In node K, the multichannel Wiener filter of the frequency domain,

N_iFor the number of iterations of step g；

To combine vectorIn clean speech ingredient；

To combine vectorIn noise contribution；

It, equally can be by this updated signal after all nodes obtain updated single-channel voice enhancing signal Other nodes are transmitted to, in order to which other nodes update single-channel voice enhancing signal again.It therefore, can be in distributed Mike Wind array network repeats the above steps, and when voice signal is restrained after the single channel enhancing that certain node obtains, voice is believed after enhancing It number no longer updates.When all node single-channel voice signals no longer update, processing terminate for present frame, finally in each node Obtain present node enhancing after voice signal.

Step g may also include the step of whether voice signal restrains judged.We can be according to filtering front and back signal vector Difference norm and signal energy comprehensive descision node obtain single channel enhancing after voice signal whether restrain: step g Language after the single channel enhancing obtained according to the norm of the difference of filtering front and back signal vector and signal energy comprehensive descision node Whether sound signal restrains, and method is as follows:

In node K, the previous obtained single channel time domain signal vector that filters is

This filters obtained single channel time domain signal vector

WhenWhen, it is believed that present filter output convergence；

||·||^pP norm is represented, η is a threshold value.

What has been described above is only a preferred embodiment of the present invention, and present invention is not limited to the above embodiments.It is appreciated that this The other improvements and change that field technical staff directly exports or associates without departing from the spirit and concept in the present invention Change, is considered as being included within protection scope of the present invention.

Claims

1. a kind of sound enhancement method based on distributed microphone array network, characterized in that it comprises the following steps:

Step a, the distributed microphone array network based on Ad-hoc network being made of multiple microphone arrays is established；Arbitrarily It can be in communication with each other between two network nodes；

Step c, the signal of each node is subjected to framing, the multinode multichannel microphone array observation signal after obtaining framing；

Step d, in each node, for the multichannel microphone array observation signal of each frame, according to the multichannel wheat of present node Gram wind array observation signal carries out speech enhan-cement using multichannel Wiener filter, obtains voice signal after single channel enhancing；

Step e, in each node, transmitting voice signal is to net after the single channel enhancing which is obtained by the step d The every other node of network；

Step f, in each node, while according to the multichannel microphone array observation signal of present node and every other node Voice signal after single channel enhancing, carries out speech enhan-cement using multichannel Wiener filter again, after obtaining present node update Single channel enhancing after voice signal；

Step g, iteration step e~step f, when voice signal is restrained after the single channel enhancing that certain node obtains, currently Voice signal no longer updates after the single channel enhancing of node；Voice signal no longer updates after the single channel enhancing of all nodes When, processing terminate for present frame；The finally voice signal after each node obtains present node enhancing；

The step b specifically includes the following steps:

Step b1, network samples rate is initialized, makes K=1, i.e. network samples rate f₀Equal to the equipment sample rate f of node 1₁；

Step b4, K=K+1；

Step b5, step b2~step b4 is repeated, until all nodes are traversed, thus network samples rate f₀For all sections of whole network The equipment sample rate minimum value of point；

Step b6, pass through finish node for current network sample rate f₀Other each nodes are transferred to, so that all node devices are adopted Sample rate is f₀。

2. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Microphone array includes audio collection module and communication module.

3. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described The structure of Ad-hoc network in step a is planar structure or hierarchical structure；Ad-hoc network using priori formula, reaction equation or The hybrid-type Routing Protocol of person realizes being in communication with each other between two node devices in network.

4. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Step b further includes carrying out time synchronization to network node；The distribution microphone array includes network equipment clock；When described Between to synchronize be to be synchronized by the network equipment clock based on ntp network time protocol.

5. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Signal framing in step c inhibits spectral leakage using hamming window or Hanning window；The step c uses the framing of time aliasing Strategy.

6. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Step d, which observes multichannel microphone array using the multichannel Wiener filter of time domain multichannel Wiener filter or frequency domain, to be believed It number is filtered, to achieve the effect that speech enhan-cement:

h_{W, K}(t)=[R_{Xx, K}(t)+λR_{Nn, K}(t)]^-1R_{Xx, K}(t)u；

In above formula, R_{Xx, K}(t)=R_{Yy, K}(t)-R_{Nn, K}(t)；

It is the clean speech vector x of present node_K(t)=[x_1,K(t),x_2,K (t),…,x_M,K(t)]^TTime domain autocorrelation matrix；

It is the noise vector n of present node_K(t)=[n_1,K(t),n_2,K(t),...,n_M,K (t)]^TTime domain autocorrelation matrix；

It is the multichannel microphone array observation signal vector y of present node_K(t)=[y_1,K(t), y_2,K(t),...,y_M,K(t)]^TTime domain autocorrelation matrix；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

λ is the degree for controlling noise elimination and speech distortion, and λ > 0, λ is bigger, and the repressed effect of noise is more obvious, while band Carry out more speech distortions；

The time-domain filtering of node K exports are as follows:

H_{W, K}(ω)=[R_{XX, K}(ω)+λR_{NN, K}(ω)]^-1R_{XX, K}(ω)u；

In above formula, R_{XX, K}(ω)=R_{YY, K}(ω)-R_{NN, K}(ω)；

It is the clean speech vector X of present node_K(ω)=[X_{1, K}(ω), X_{2, K}(ω) ..., X_{M, K}(ω)]^HFrequency domain autocorrelation matrix；

It is the multichannel microphone array observation signal vector Y of present node_K(ω)= [Y_1,K(ω),Y_2,K(ω),...,Y_M,K(ω)]^HFrequency domain autocorrelation matrix；

U=[1,0 ..., 0]^T, the length is M；

M is the number of microphone of present node；

The frequency domain filtering of node K exports are as follows:

7. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Step e includes that transmitting node serial number, receiving node serial number and multichannel Wiener filtering are added in the data packet of signal transmission The information of device number of processes.

8. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Step f includes the increasing using the multichannel Wiener filter of time domain or frequency domain to present node multichannel observation signal and other nodes Signal is filtered after strong；

In the multichannel Wiener filter of the time domain,

The joint vector that signal is constituted after the enhancing of present node K multichannel microphone array observation signal and every other node Are as follows:

In above formula,For except node K Vector composed by the enhanced time domain single-channel voice of other outer nodes；

N_iFor the number of iterations of step g；

It isIn clean speech ingredient；

It isIn noise contribution；

For clean speech ingredient in present nodeTime domain auto-correlation square Battle array；

For noise vector in present nodeTime domain autocorrelation matrix；

To combine vector in present nodeTime domain autocorrelation matrix；

In the multichannel Wiener filter of the frequency domain,

In above formula,For The vector composed by the enhanced frequency domain single-channel voice of other nodes in addition to node K；

N_iFor the number of iterations of step g；

ForIn clean speech ingredient；

ForIn noise contribution；

For remove node K except other nodes clean speech to The frequency domain autocorrelation matrix of amount；

For the frequency domain for removing other node background noise vectors except node K Autocorrelation matrix；

For the frequency domain auto-correlation for removing other node observation vectors except node K Matrix；

U=[1,0 ..., 0]^T；The length is M+P-1, P is the node total number in network, then node K N_iThe frequency of+1 iteration Domain multichannel Wiener filter are as follows:

9. the sound enhancement method as described in claim 1 based on distributed microphone array network, it is characterised in that: described Step g includes judging what node obtained according to the norm of difference and the norm of filtered signal vector of filtering front and back signal vector The step of whether voice signal restrains after single channel enhancing, method is as follows:

This filters obtained single channel time domain signal vector are as follows:

WhenWhen, present filter output convergence；

In above formula, | | | |^pP norm is represented, η is threshold value.