CN110390945A

CN110390945A - A kind of dual sensor sound enhancement method and realization device

Info

Publication number: CN110390945A
Application number: CN201910678398.7A
Authority: CN
Inventors: 张军; 李�学; 宁更新; 冯义志; 余华; 季飞
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2019-10-29
Anticipated expiration: 2039-07-25
Also published as: CN110390945B; WO2021012403A1

Abstract

The invention discloses a kind of dual sensor sound enhancement methods and realization device based on binary channels Wiener filtering, this method is first with the complementarity between conductance voice and non-conductance voice, establish the double-channel pronunciation joint classification model that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal, and classified using the model to the speech frame of double channels acquisition, binary channels Wiener filter finally is constructed according to classification results, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention has more fully merged the information that conductance voice and non-conductance voice are included, and the priori knowledge of voice signal is introduced by statistical model, can effectively improve the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to a variety of occasions such as video calling, car phone, multi-media classroom, military communication.

Description

A kind of dual sensor sound enhancement method and realization device

Technical field

The present invention relates to speech signal processing technologies, and in particular to a kind of double-sensing based on binary channels Wiener filtering Device sound enhancement method and realization device.

Background technique

In actual speech communication, voice signal often will receive the interference of extraneous environmental noise, influence the matter for receiving voice Amount.Speech enhancement technique is an important branch of Speech processing, it is therefore an objective to be extracted as much as possible from noisy speech pure Net raw tone is widely used in the fields such as voice communication, voice compression coding and the speech recognition under noisy environment.

Since human ear to the perception of sound is carried out by the vibration of air, current most of voice enhancement algorithms Both for air transmitted (abbreviation conductance) voice, i.e., voice, reinforcing effect are acquired using air conduction transducer (such as microphone) It is affected by acoustic noises various in environment, usual performance is bad in a noisy environment.In order to reduce ambient noise pair The influence of voice quality, non-air conduct the often quilt such as (referred to as non-conductance) sensor such as throat's transmitter, bone-conduction microphone For the voice collecting in noisy environment.It is different from air conduction transducer, the speech transducer of non-conductance utilizes speak human vocal band, jaw The vibration at the positions such as bone drives reed in sensor or carbon film to change, and changes its resistance value, makes the electricity at its both ends Pressure changes, to convert electric signal, i.e. voice signal for vibration signal.Due to the sound wave conducted in air can not make it is non- Deformation occurs for the reed or carbon film of air conduction transducer, therefore non-air conduction transducer is not influenced by conductance sound, has very strong Acoustic resistive noise immune.But what it is because of the acquisition of non-air conduction transducer is gone out by the Vibration propagation at the positions such as jawbone, muscle, skin The voice come, high frequency section are lost serious, show as that sound is stuffy, ambiguous, and the intelligibility of speech is poor.

All there is certain deficiencies when being used alone in view of conductance and non-air conduction transducer, occur some knots in recent years Close the sound enhancement method of the two advantage.These methods utilize the complementarity of conductance voice and non-conductance voice, using more sensings Device integration technology realizes speech enhan-cement, can usually obtain more better than single-sensor speech-enhancement system effect.It is existing double For sensor speech enhan-cement there are mainly two types of mode, a kind of elder generation recovers conductance voice from non-conductance voice, then makes an uproar again with band Conductance voice is merged；Another kind is conductance voice to be recovered from non-conductance voice, and utilize air conduction transducer and non-conductance Sensor signal enhances the conductance voice that band is made an uproar, and then the two is merged again.These technologies there is it is below not Sufficient place: when (1) restoring conductance voice using non-conductance voice, the meeting noise additional in high frequency or mute middle introducing influences to increase Potent fruit.(2) when restoring conductance voice using non-conductance voice, fail the information using current conductance voice.(3) non-gas is utilized When the conductance voice that lead sound restores is merged with conductance voice, fail the correlation and priori knowledge that make full use of the two.(4) melt Usually assume that the gentle lead sound of non-conductance voice is mutually indepedent when conjunction, but the hypothesis is in practice and invalid.

Chinese invention patent 201610025390.7 discloses a kind of dual sensor speech enhan-cement side based on statistical model Method and device, the invention first in conjunction with the gentle lead sound of non-conductance voice come construct for classification joint statistical model and into Row end-point detection leads voice filter by joint statistical model to calculate current optimum gas, is filtered increasing to conductance voice By force, non-conductance voice is then converted into conductance voice using the mapping model of non-conductance voice to conductance voice, and by its with It filters enhanced conductance voice and is weighted fusion, partially solve the gentle lead of conductance voice that non-air conduction transducer restores Sound fails to make full use of the correlation of the two and the deficiency of priori knowledge when merging, but is still to use non-gas in second step fusion The conductance voice that lead sound restores, therefore equally exist high frequency and mute noise, restore conductance voice using non-conductance voice Shi Weineng utilizes the deficiencies of information of conductance voice.

Summary of the invention

The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of based on binary channels Wiener filtering Dual sensor sound enhancement method and realization device, this method is first with complementary between conductance voice and non-conductance voice Property, establish the double-channel pronunciation joint classification that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal Model, and classified using the model to the speech frame of double channels acquisition, finally binary channels dimension is constructed according to classification results It receives filter, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention more fully merges The information that conductance voice and non-conductance voice are included, and by the priori knowledge of statistical model introducing voice signal, can have Effect improves the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to video calling, vehicle mounted electric A variety of occasions such as words, multi-media classroom, military communication.

The first purpose of this invention can be reached by adopting the following technical scheme that:

A kind of dual sensor sound enhancement method based on binary channels Wiener filtering, the dual sensor speech enhan-cement side Method the following steps are included:

The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance The double-channel pronunciation joint classification model of speech frame, and calculate to correspond in above-mentioned double-channel pronunciation joint classification model and each divide The conductance phonetic speech power of class composes mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice and non-gas Cross-spectrum mean value Φ between lead sound_bs(ω, l), wherein ω is frequency, and l is the serial number of classification；

S2, synchronous acquisition conductance tested speech and non-conductance tested speech, are built using the pure noise segment of conductance tested speech The statistical model of vertical conductance noise, and calculate the power spectrum mean value Φ of conductance noise_vv(ω)；

S3, it is inputted with the double-channel pronunciation joint classification model in step S1 to synchronous using the statistical model of conductance noise Conductance tested speech frame and non-conductance tested speech frame classify；

S4, classification results and power spectrum mean value Φ according to step S3_vv(ω) constructs binary channels Wiener filter, to conductance Tested speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.

Further, the step S1 process is as follows:

S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned Take characteristic parameter --- the spectral coefficient of falling Meier of every frame voice；

S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized Disaggregated model；

S1.3, speech frame and non-conductance are trained to all conductances using trained double-channel pronunciation joint classification model Speech frame is classified, and the conductance voice of each classification included conductance training speech frame and non-conductance speech frame is then calculated Power spectrum mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bbBetween (ω, l), conductance voice and non-conductance voice Cross-spectrum mean value Φ_bs(ω,l)。

Further, in the step S1.2, double-channel pronunciation joint classification model uses multiple data stream Gaussian Mixture Model (Gaussian Mixture Model, GMM), i.e.,

Wherein N (o, μ, σ) is Gaussian function, o^x(k) and o^bIt (k) is kth frame conductance tested speech and non-conductance tested speech The characteristic vector of middle extraction,WithFor conductance audio data stream in multiple data stream GMM and non-first of conductance audio data stream high The mean value of this component,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point The variance of amount, c_lFor the weight of first of Gaussian component in multiple data stream GMM, w_xAnd w_bConductance language in respectively multiple data stream GMM The weight of sound data flow and non-conductance audio data stream, L are the number of Gaussian component.

Further, in the step S1.3, each Gaussian component in double-channel pronunciation joint classification model is represented One classification calculates it to each using following formula for every a pair of synchronous conductance training speech frame and non-conductance speech frame The score of classification

Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring；Calculate all conductance instructions Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas Lead the conductance phonetic speech power spectrum mean value Φ of speech frame_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice Cross-spectrum mean value Φ between non-conductance voice_bs(ω,l)。

Further, the statistical model of the conductance noise is the power spectrum mean value Φ of conductance noise_vv(ω) is used Following methods calculate:

S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing；

S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductance_b(m) and short-time energy E_b, it is non-to calculate every frame The short-time average of conductance detection speech frame crosses threshold rate C_b:

Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as C_bGreatly When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined Survey the endpoint location of voice signal；

S2.3, it detects as conductance at the time of correspond to the non-conductance tested speech signal end detected in step S2.2 The endpoint of voice extracts the pure noise segment in conductance detection voice；

S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speech_vv(ω)。

Further, in the step S3 first using vector Taylor series model (Vector Taylor Seties, VTS) compensation technique, using the statistical model of conductance noise to conductance audio data stream in double-channel pronunciation joint classification model Parameter is modified, and is then classified again to the conductance tested speech frame of input and non-conductance tested speech frame, wherein is used Following formula corrects the mean value of each Gaussian component of conductance audio data stream in double-channel pronunciation joint classification model:

WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is dct transform matrix, in double-channel pronunciation joint classification model Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each The classification score q (k, l) of classification.

Further, in the step S4, conductance tested speech and non-conductance for kth frame synchronous acquisition test language Sound calculates enhanced conductance voice spectrum using following formula:

Wherein Y (ω, k), X (ω, k), B (ω, k) be respectively the enhanced conductance voice of kth frame, conductance tested speech and The frequency spectrum of non-conductance tested speech,To test language corresponding to kth frame conductance tested speech and non-conductance Following formula calculating is respectively adopted in the frequency response of the Wiener filter filter of sound

Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification in formula The classification score of model l class, H_a(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model The Wiener filter frequency response of l class, calculation method are as follows:

H_na(ω, k, l) is the dimension that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class Receive filter freguency response, calculation method are as follows:

Further, describedWithIt is calculated using following formula:

L=arg max q (k, l).

Another object of the present invention is achieved through the following technical solutions:

A kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, the realization device Including conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint classification model, Model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein

The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame Categorization module, two channels filter connection；The double-channel pronunciation joint classification model, model compensation module, frame classification mould Block, filter coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation mould Block, the connection of filter coefficient generation module, the double-channel pronunciation joint classification model and filter coefficient generation module connect It connects；

The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice letter Number, the noise model estimation module is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation Joint classification model establishes conductance speech frame and non-using the clean conductance training voice and non-conductance training voice of synchronous acquisition The conductance phonetic speech power spectrum mean value of conductance speech frame, each classification in the double-channel pronunciation joint classification model is Φ_ss It is Φ that (ω, l), non-conductance phonetic speech power, which compose mean value,_bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is Φ_bs(ω, l), the model compensation module is using the statistical model of conductance noise to double-channel pronunciation joint classification model Parameter is modified, conductance tested speech that the frame classification module inputs current sync and non-conductance tested speech frame into Row classification, the filter coefficient generation module construct binary channels wiener according to the power spectrum of the gentle bone conduction noise of classification results and filter Wave device, the two channels filter is filtered conductance tested speech frame and non-conductance tested speech frame, after obtaining enhancing Conductance voice.

Further, the conductance speech transducer is microphone, and the non-conductance speech transducer is sent for throat Talk about device.

The present invention has the following advantages and effects with respect to the prior art:

(1) compared with the speech enhancement technique for being based only upon conductance tested speech or non-conductance tested speech, the present invention is increasing The information of conductance tested speech and non-conductance tested speech is utilized when strong simultaneously, better reinforcing effect can be obtained.

(2) present invention merges conductance tested speech and non-conductance tested speech using double-channel pronunciation joint classification model Information, frame classification can be made more acurrate, it is abundant using both correlation and priori knowledge.

(3) present invention is restoring conductance voice using binary channels Wiener filter, with Chinese invention patent 201610025390.7 is simpler compared to calculating, at the same be avoided that when recovering conductance voice from non-conductance voice there are high frequency or Mute noise fails to utilize the deficiency of conductance voice messaging, has better performance.

(4) present invention employs binary channels Wiener filters to restore conductance voice, avoid non-conductance voice and conductance The mutually independent hypothesis of voice.

Detailed description of the invention

Fig. 1 is the reality of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering The structural block diagram of existing device；

Fig. 2 is the stream of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering Cheng Tu.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Embodiment one

Present embodiment discloses a kind of realization devices of dual sensor sound enhancement method based on binary channels Wiener filtering Structural block diagram, as shown in Figure 1, by conductance speech transducer, non-conductance speech transducer, noise model estimation module, bilateral Road voice joint classification model, model compensation module, frame classification module, filter coefficient generation module, two channels filter are total With constituting, wherein conductance speech transducer and non-conductance speech transducer respectively with noise model estimation module, frame classification mould Block, two channels filter connection, double-channel pronunciation joint classification model, model compensation module, frame classification module, filter coefficient Generation module, two channels filter are sequentially connected with, and noise model estimation module and model compensation module, filter coefficient generate mould Block connection, double-channel pronunciation joint classification model are connect with filter coefficient generation module.

In the present embodiment, conductance speech transducer is microphone, and non-conductance speech transducer is throat's transmitter, and the two is used In acquisition conductance and non-conductance voice signal；Noise model estimation module is used to estimate the model and power of current conductance noise Spectrum.Double-channel pronunciation joint classification model establishes gas using the clean conductance training voice and non-conductance training voice of synchronous acquisition Speech frame and non-conductance speech frame are led, the conductance phonetic speech power spectrum of each classification in above-mentioned double-channel pronunciation joint classification model is equal Value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice Φ_bs(ω,l).Model compensation module using conductance noise statistical model to the parameter of double-channel pronunciation joint classification model into Row amendment.The conductance tested speech and non-conductance tested speech frame that frame classification module inputs current sync are classified.Filtering Device Coefficient generation module constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results.Two channels filter Conductance tested speech frame and non-conductance tested speech frame are filtered, enhanced conductance voice is obtained.

Embodiment two

Present embodiment discloses a kind of dual sensor sound enhancement methods based on binary channels Wiener filtering, according to above-mentioned reality Realization device disclosed in example is applied, calculates enhancing using the conductance tested speech of input and non-conductance tested speech using following steps Conductance voice afterwards, process are as shown in Figure 2:

Step S1, the clean conductance training voice of synchronous acquisition and non-conductance training voice, establish conductance speech frame and non- The double-channel pronunciation joint classification model of conductance speech frame, and calculate and correspond in above-mentioned double-channel pronunciation joint classification model often The conductance phonetic speech power of a classification composes mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice and Cross-spectrum mean value Φ between non-conductance voice_bs(ω, l), wherein ω is frequency, and l is the serial number of classification.

It is completed in the present embodiment using following steps:

S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned Take the characteristic parameter of every frame voice.

In the present embodiment, the clean conductance training voice of synchronous acquisition and non-conductance training voice are pressed into frame length 30ms, frame It moves 10ms and carries out framing, the clean conductance training voice of every frame and non-conductance training voice are respectively adopted Hamming window adding window and carry out Its power spectrum is sought after preemphasis.The power spectrum of above-mentioned conductance training voice and non-conductance training voice is passed through into 24 Jan Vermeers respectively Filter group carries out dct transform after taking logarithm to the output of filter group again, obtains the mel-frequency cepstrum system of two group of 12 dimension Number, the training characteristics as double-channel pronunciation joint classification model.

S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized Disaggregated model.In the present embodiment, double-channel pronunciation joint classification model uses multiple data stream GMM, i.e.,

Parameter c in double-channel pronunciation joint classification model_l、w_x、w_b、WithUsing greatest hope The estimation of (Expectation Maximization) algorithm.

In the present embodiment, each Gaussian component in double-channel pronunciation joint classification model represents a classification, for every A pair synchronous conductance training speech frame and non-conductance speech frame, calculate its score to each classification using following formula

Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring.Calculate all conductance instructions Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas Lead the conductance phonetic speech power spectrum mean value Φ of speech frame_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice Cross-spectrum mean value Φ between non-conductance voice_bs(ω,l)。

Step S2, synchronous acquisition conductance tested speech and non-conductance tested speech, utilize the pure noise of conductance tested speech The statistical model of Duan Jianli conductance noise, and calculate the power spectrum mean value Φ of conductance noise_vv(ω)。

In the present embodiment, the statistical model of conductance noise is the power spectrum mean value Φ of conductance noise_vv(ω), use is following Method calculates:

Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length.Work as C_bGreatly When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined Survey the endpoint location of voice signal；

Wherein, the statistical model of conductance noise is Gaussian function, GMM model or HMM model.

Step S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to synchronous The conductance tested speech frame of input and non-conductance tested speech frame are classified.

In the present embodiment, VTS model compensation technology is used first, using the statistical model of conductance noise to double-channel pronunciation The parameter of conductance audio data stream is modified in joint classification model, then again to the conductance tested speech frame of input and non-gas Tested speech frame is led to classify.Specific method is to correct conductance voice number in double-channel pronunciation joint classification model using following formula According to the mean value for flowing each Gaussian component:

WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is discrete cosine transformation matrix (Discrete Cosine Transform, DCT).Other parameters in double-channel pronunciation joint classification model remain unchanged.Using revised binary channels Voice joint classification model classifies to the conductance tested speech frame of synchronous input and non-conductance tested speech frame, obtains current Conductance tested speech frame and non-conductance tested speech frame correspond to the classification score q (k, l) of each classification.

Step S4, according to the classification results and Φ of step S3_vv(ω) constructs binary channels Wiener filter, tests language to conductance Sound frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.

In the present embodiment, conductance tested speech and non-conductance tested speech for kth frame synchronous acquisition, using following formula meter Calculate enhanced conductance voice spectrum:

Q (k, l) in formula is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint point The classification score of class model l class.H_a(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification mould The Wiener filter frequency response of type l class, calculation method are as follows:

In another embodiment, above-mentionedWithIt is calculated using following formula:

L=arg max q (k, l).

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims

1. a kind of dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that the dual sensor Sound enhancement method the following steps are included:

The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance voice The double-channel pronunciation joint classification model of frame, and calculate and correspond to each classification in above-mentioned double-channel pronunciation joint classification model Conductance phonetic speech power composes mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice and non-conductance language Cross-spectrum mean value Φ between sound_bs(ω, l), wherein ω is frequency, and l is the serial number of classification；

S2, synchronous acquisition conductance tested speech and non-conductance tested speech establish gas using the pure noise segment of conductance tested speech The statistical model of bone conduction noise, and calculate the power spectrum mean value Φ of conductance noise_vv(ω)；

S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to the gas of synchronous input It leads tested speech frame and non-conductance tested speech frame is classified；

S4, classification results and power spectrum mean value Φ according to step S3_vv(ω) constructs binary channels Wiener filter, tests conductance Speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.

2. dual sensor sound enhancement method according to claim 1, which is characterized in that the step S1 process is such as Under:

S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, extracted every The characteristic parameter of frame voice, wherein the characteristic parameter is the spectral coefficient of falling Meier；

S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint classification are utilized Model；

S1.3, speech frame and non-conductance voice are trained to all conductances using trained double-channel pronunciation joint classification model Frame is classified, and the conductance phonetic speech power of each classification included conductance training speech frame and non-conductance speech frame is then calculated Compose mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bbCross-spectrum between (ω, l), conductance voice and non-conductance voice Mean value Φ_bs(ω,l)。

3. dual sensor sound enhancement method according to claim 2, which is characterized in that double in the step S1.2 Channel speech joint classification model uses multiple data stream GMM, wherein GMM is gauss hybrid models, i.e.,

Wherein N (o, μ, σ) is Gaussian function, o^x(k) and o^bIt (k) is to be mentioned in kth frame conductance tested speech and non-conductance tested speech The characteristic vector taken,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point The mean value of amount,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gaussian component of conductance audio data stream Variance, c_lFor the weight of first of Gaussian component in multiple data stream GMM, w_xAnd w_bConductance voice number in respectively multiple data stream GMM According to the weight of stream and non-conductance audio data stream, L is the number of Gaussian component.

4. dual sensor sound enhancement method according to claim 3, which is characterized in that double in the step S1.3 Each Gaussian component in channel speech joint classification model represents a classification, for every a pair of synchronous conductance training voice Frame and non-conductance speech frame calculate its score to each classification using following formula

Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring；Calculate all conductance training languages Then classification belonging to sound frame and non-conductance speech frame calculates same classification included conductance training speech frame and non-conductance language The conductance phonetic speech power of sound frame composes mean value Φ_ss(ω, l), non-conductance phonetic speech power compose mean value Φ_bb(ω, l), conductance voice and non- Cross-spectrum mean value Φ between conductance voice_bs(ω,l)。

5. dual sensor sound enhancement method according to claim 1, which is characterized in that the statistics of the conductance noise Model is the power spectrum mean value Φ of conductance noise_vv(ω) is calculated using following methods:

S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductance_b(m) and short-time energy E_b, calculate the non-conductance of every frame The short-time average of detection speech frame crosses threshold rate C_b:

Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as C_bGreater than pre- If threshold value when, judge that the frame is otherwise noise for voice signal, non-conductance obtained according to the court verdict of every frame and detects language The endpoint location of sound signal；

S2.3, as conductance voice is detected at the time of correspond to the non-conductance tested speech signal end detected in step S2.2 Endpoint, extract conductance detection voice in pure noise segment；

6. dual sensor sound enhancement method according to claim 1, which is characterized in that adopted first in the step S3 With vector Taylor series model compensation technique, using the statistical model of conductance noise to gas in double-channel pronunciation joint classification model The parameter for leading audio data stream is modified, and is then carried out again to the conductance tested speech frame of input and non-conductance tested speech frame Classification, wherein using following formula amendment double-channel pronunciation joint classification model in each Gaussian component of conductance audio data stream it is equal Value:

WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class passes through 24 respectively Jan Vermeer filter group simultaneously takes the mean value after logarithm, and C is discrete cosine transformation matrix, in double-channel pronunciation joint classification model Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each The classification score q (k, l) of classification.

7. dual sensor sound enhancement method according to claim 2, which is characterized in that in the step S4, for The conductance tested speech of kth frame synchronous acquisition and non-conductance tested speech calculate enhanced conductance voice spectrum using following formula:

Wherein Y (ω, k), X (ω, k), B (ω, k) are respectively the enhanced conductance voice of kth frame, conductance tested speech and non-gas The frequency spectrum of tested speech is led,For corresponding to kth frame conductance tested speech and non-conductance tested speech Following formula calculating is respectively adopted in the frequency response of Wiener filter filter

Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification model in formula The classification score of l class, H_a(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model l The Wiener filter frequency response of class, calculation method are as follows:

H_na(ω, k, l) is the wiener filter that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class Wave device frequency response, calculation method are as follows:

8. dual sensor sound enhancement method according to claim 7, which is characterized in that describedWithIt is calculated using following formula:

9. a kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that described Realization device include conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint Disaggregated model, model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein

The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame classification Module, two channels filter connection；The double-channel pronunciation joint classification model, model compensation module, frame classification module, filter Wave device Coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation module, filter The connection of wave device Coefficient generation module, the double-channel pronunciation joint classification model are connect with filter coefficient generation module；

The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice signal, institute The noise model estimation module stated is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation joint point Class model establishes conductance speech frame and non-conductance language using the clean conductance training voice and non-conductance training voice of synchronous acquisition The conductance phonetic speech power spectrum mean value of sound frame, each classification in the double-channel pronunciation joint classification model is Φ_ssIt is (ω, l), non- It is Φ that conductance phonetic speech power, which composes mean value,_bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is Φ_bs(ω, l), The model compensation module repairs the parameter of double-channel pronunciation joint classification model using the statistical model of conductance noise Just, the conductance tested speech and non-conductance tested speech frame that the frame classification module inputs current sync are classified, institute The filter coefficient generation module stated constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results, described Two channels filter conductance tested speech frame and non-conductance tested speech frame are filtered, obtain enhanced conductance language Sound.

10. the realization device of dual sensor sound enhancement method according to claim 9, which is characterized in that the gas Lead sound sensor is microphone, and the non-conductance speech transducer is throat's transmitter.