CN110390945A - A kind of dual sensor sound enhancement method and realization device - Google Patents
A kind of dual sensor sound enhancement method and realization device Download PDFInfo
- Publication number
- CN110390945A CN110390945A CN201910678398.7A CN201910678398A CN110390945A CN 110390945 A CN110390945 A CN 110390945A CN 201910678398 A CN201910678398 A CN 201910678398A CN 110390945 A CN110390945 A CN 110390945A
- Authority
- CN
- China
- Prior art keywords
- conductance
- speech
- frame
- voice
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/06—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a kind of dual sensor sound enhancement methods and realization device based on binary channels Wiener filtering, this method is first with the complementarity between conductance voice and non-conductance voice, establish the double-channel pronunciation joint classification model that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal, and classified using the model to the speech frame of double channels acquisition, binary channels Wiener filter finally is constructed according to classification results, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention has more fully merged the information that conductance voice and non-conductance voice are included, and the priori knowledge of voice signal is introduced by statistical model, can effectively improve the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to a variety of occasions such as video calling, car phone, multi-media classroom, military communication.
Description
Technical field
The present invention relates to speech signal processing technologies, and in particular to a kind of double-sensing based on binary channels Wiener filtering
Device sound enhancement method and realization device.
Background technique
In actual speech communication, voice signal often will receive the interference of extraneous environmental noise, influence the matter for receiving voice
Amount.Speech enhancement technique is an important branch of Speech processing, it is therefore an objective to be extracted as much as possible from noisy speech pure
Net raw tone is widely used in the fields such as voice communication, voice compression coding and the speech recognition under noisy environment.
Since human ear to the perception of sound is carried out by the vibration of air, current most of voice enhancement algorithms
Both for air transmitted (abbreviation conductance) voice, i.e., voice, reinforcing effect are acquired using air conduction transducer (such as microphone)
It is affected by acoustic noises various in environment, usual performance is bad in a noisy environment.In order to reduce ambient noise pair
The influence of voice quality, non-air conduct the often quilt such as (referred to as non-conductance) sensor such as throat's transmitter, bone-conduction microphone
For the voice collecting in noisy environment.It is different from air conduction transducer, the speech transducer of non-conductance utilizes speak human vocal band, jaw
The vibration at the positions such as bone drives reed in sensor or carbon film to change, and changes its resistance value, makes the electricity at its both ends
Pressure changes, to convert electric signal, i.e. voice signal for vibration signal.Due to the sound wave conducted in air can not make it is non-
Deformation occurs for the reed or carbon film of air conduction transducer, therefore non-air conduction transducer is not influenced by conductance sound, has very strong
Acoustic resistive noise immune.But what it is because of the acquisition of non-air conduction transducer is gone out by the Vibration propagation at the positions such as jawbone, muscle, skin
The voice come, high frequency section are lost serious, show as that sound is stuffy, ambiguous, and the intelligibility of speech is poor.
All there is certain deficiencies when being used alone in view of conductance and non-air conduction transducer, occur some knots in recent years
Close the sound enhancement method of the two advantage.These methods utilize the complementarity of conductance voice and non-conductance voice, using more sensings
Device integration technology realizes speech enhan-cement, can usually obtain more better than single-sensor speech-enhancement system effect.It is existing double
For sensor speech enhan-cement there are mainly two types of mode, a kind of elder generation recovers conductance voice from non-conductance voice, then makes an uproar again with band
Conductance voice is merged;Another kind is conductance voice to be recovered from non-conductance voice, and utilize air conduction transducer and non-conductance
Sensor signal enhances the conductance voice that band is made an uproar, and then the two is merged again.These technologies there is it is below not
Sufficient place: when (1) restoring conductance voice using non-conductance voice, the meeting noise additional in high frequency or mute middle introducing influences to increase
Potent fruit.(2) when restoring conductance voice using non-conductance voice, fail the information using current conductance voice.(3) non-gas is utilized
When the conductance voice that lead sound restores is merged with conductance voice, fail the correlation and priori knowledge that make full use of the two.(4) melt
Usually assume that the gentle lead sound of non-conductance voice is mutually indepedent when conjunction, but the hypothesis is in practice and invalid.
Chinese invention patent 201610025390.7 discloses a kind of dual sensor speech enhan-cement side based on statistical model
Method and device, the invention first in conjunction with the gentle lead sound of non-conductance voice come construct for classification joint statistical model and into
Row end-point detection leads voice filter by joint statistical model to calculate current optimum gas, is filtered increasing to conductance voice
By force, non-conductance voice is then converted into conductance voice using the mapping model of non-conductance voice to conductance voice, and by its with
It filters enhanced conductance voice and is weighted fusion, partially solve the gentle lead of conductance voice that non-air conduction transducer restores
Sound fails to make full use of the correlation of the two and the deficiency of priori knowledge when merging, but is still to use non-gas in second step fusion
The conductance voice that lead sound restores, therefore equally exist high frequency and mute noise, restore conductance voice using non-conductance voice
Shi Weineng utilizes the deficiencies of information of conductance voice.
Summary of the invention
The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of based on binary channels Wiener filtering
Dual sensor sound enhancement method and realization device, this method is first with complementary between conductance voice and non-conductance voice
Property, establish the double-channel pronunciation joint classification that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal
Model, and classified using the model to the speech frame of double channels acquisition, finally binary channels dimension is constructed according to classification results
It receives filter, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention more fully merges
The information that conductance voice and non-conductance voice are included, and by the priori knowledge of statistical model introducing voice signal, can have
Effect improves the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to video calling, vehicle mounted electric
A variety of occasions such as words, multi-media classroom, military communication.
The first purpose of this invention can be reached by adopting the following technical scheme that:
A kind of dual sensor sound enhancement method based on binary channels Wiener filtering, the dual sensor speech enhan-cement side
Method the following steps are included:
The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance
The double-channel pronunciation joint classification model of speech frame, and calculate to correspond in above-mentioned double-channel pronunciation joint classification model and each divide
The conductance phonetic speech power of class composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non-gas
Cross-spectrum mean value Φ between lead soundbs(ω, l), wherein ω is frequency, and l is the serial number of classification;
S2, synchronous acquisition conductance tested speech and non-conductance tested speech, are built using the pure noise segment of conductance tested speech
The statistical model of vertical conductance noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω);
S3, it is inputted with the double-channel pronunciation joint classification model in step S1 to synchronous using the statistical model of conductance noise
Conductance tested speech frame and non-conductance tested speech frame classify;
S4, classification results and power spectrum mean value Φ according to step S3vv(ω) constructs binary channels Wiener filter, to conductance
Tested speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
Further, the step S1 process is as follows:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned
Take characteristic parameter --- the spectral coefficient of falling Meier of every frame voice;
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized
Disaggregated model;
S1.3, speech frame and non-conductance are trained to all conductances using trained double-channel pronunciation joint classification model
Speech frame is classified, and the conductance voice of each classification included conductance training speech frame and non-conductance speech frame is then calculated
Power spectrum mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbBetween (ω, l), conductance voice and non-conductance voice
Cross-spectrum mean value Φbs(ω,l)。
Further, in the step S1.2, double-channel pronunciation joint classification model uses multiple data stream Gaussian Mixture
Model (Gaussian Mixture Model, GMM), i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is kth frame conductance tested speech and non-conductance tested speech
The characteristic vector of middle extraction,WithFor conductance audio data stream in multiple data stream GMM and non-first of conductance audio data stream high
The mean value of this component,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point
The variance of amount, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance language in respectively multiple data stream GMM
The weight of sound data flow and non-conductance audio data stream, L are the number of Gaussian component.
Further, in the step S1.3, each Gaussian component in double-channel pronunciation joint classification model is represented
One classification calculates it to each using following formula for every a pair of synchronous conductance training speech frame and non-conductance speech frame
The score of classification
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring;Calculate all conductance instructions
Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas
Lead the conductance phonetic speech power spectrum mean value Φ of speech framess(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice
Cross-spectrum mean value Φ between non-conductance voicebs(ω,l)。
Further, the statistical model of the conductance noise is the power spectrum mean value Φ of conductance noisevv(ω) is used
Following methods calculate:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, it is non-to calculate every frame
The short-time average of conductance detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as CbGreatly
When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined
Survey the endpoint location of voice signal;
S2.3, it detects as conductance at the time of correspond to the non-conductance tested speech signal end detected in step S2.2
The endpoint of voice extracts the pure noise segment in conductance detection voice;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
Further, in the step S3 first using vector Taylor series model (Vector Taylor Seties,
VTS) compensation technique, using the statistical model of conductance noise to conductance audio data stream in double-channel pronunciation joint classification model
Parameter is modified, and is then classified again to the conductance tested speech frame of input and non-conductance tested speech frame, wherein is used
Following formula corrects the mean value of each Gaussian component of conductance audio data stream in double-channel pronunciation joint classification model:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively
It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is dct transform matrix, in double-channel pronunciation joint classification model
Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input
Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each
The classification score q (k, l) of classification.
Further, in the step S4, conductance tested speech and non-conductance for kth frame synchronous acquisition test language
Sound calculates enhanced conductance voice spectrum using following formula:
Wherein Y (ω, k), X (ω, k), B (ω, k) be respectively the enhanced conductance voice of kth frame, conductance tested speech and
The frequency spectrum of non-conductance tested speech,To test language corresponding to kth frame conductance tested speech and non-conductance
Following formula calculating is respectively adopted in the frequency response of the Wiener filter filter of sound
Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification in formula
The classification score of model l class, Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model
The Wiener filter frequency response of l class, calculation method are as follows:
Hna(ω, k, l) is the dimension that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class
Receive filter freguency response, calculation method are as follows:
Further, describedWithIt is calculated using following formula:
L=arg max q (k, l).
Another object of the present invention is achieved through the following technical solutions:
A kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, the realization device
Including conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint classification model,
Model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein
The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame
Categorization module, two channels filter connection;The double-channel pronunciation joint classification model, model compensation module, frame classification mould
Block, filter coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation mould
Block, the connection of filter coefficient generation module, the double-channel pronunciation joint classification model and filter coefficient generation module connect
It connects;
The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice letter
Number, the noise model estimation module is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation
Joint classification model establishes conductance speech frame and non-using the clean conductance training voice and non-conductance training voice of synchronous acquisition
The conductance phonetic speech power spectrum mean value of conductance speech frame, each classification in the double-channel pronunciation joint classification model is Φss
It is Φ that (ω, l), non-conductance phonetic speech power, which compose mean value,bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is
Φbs(ω, l), the model compensation module is using the statistical model of conductance noise to double-channel pronunciation joint classification model
Parameter is modified, conductance tested speech that the frame classification module inputs current sync and non-conductance tested speech frame into
Row classification, the filter coefficient generation module construct binary channels wiener according to the power spectrum of the gentle bone conduction noise of classification results and filter
Wave device, the two channels filter is filtered conductance tested speech frame and non-conductance tested speech frame, after obtaining enhancing
Conductance voice.
Further, the conductance speech transducer is microphone, and the non-conductance speech transducer is sent for throat
Talk about device.
The present invention has the following advantages and effects with respect to the prior art:
(1) compared with the speech enhancement technique for being based only upon conductance tested speech or non-conductance tested speech, the present invention is increasing
The information of conductance tested speech and non-conductance tested speech is utilized when strong simultaneously, better reinforcing effect can be obtained.
(2) present invention merges conductance tested speech and non-conductance tested speech using double-channel pronunciation joint classification model
Information, frame classification can be made more acurrate, it is abundant using both correlation and priori knowledge.
(3) present invention is restoring conductance voice using binary channels Wiener filter, with Chinese invention patent
201610025390.7 is simpler compared to calculating, at the same be avoided that when recovering conductance voice from non-conductance voice there are high frequency or
Mute noise fails to utilize the deficiency of conductance voice messaging, has better performance.
(4) present invention employs binary channels Wiener filters to restore conductance voice, avoid non-conductance voice and conductance
The mutually independent hypothesis of voice.
Detailed description of the invention
Fig. 1 is the reality of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering
The structural block diagram of existing device;
Fig. 2 is the stream of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering
Cheng Tu.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment one
Present embodiment discloses a kind of realization devices of dual sensor sound enhancement method based on binary channels Wiener filtering
Structural block diagram, as shown in Figure 1, by conductance speech transducer, non-conductance speech transducer, noise model estimation module, bilateral
Road voice joint classification model, model compensation module, frame classification module, filter coefficient generation module, two channels filter are total
With constituting, wherein conductance speech transducer and non-conductance speech transducer respectively with noise model estimation module, frame classification mould
Block, two channels filter connection, double-channel pronunciation joint classification model, model compensation module, frame classification module, filter coefficient
Generation module, two channels filter are sequentially connected with, and noise model estimation module and model compensation module, filter coefficient generate mould
Block connection, double-channel pronunciation joint classification model are connect with filter coefficient generation module.
In the present embodiment, conductance speech transducer is microphone, and non-conductance speech transducer is throat's transmitter, and the two is used
In acquisition conductance and non-conductance voice signal;Noise model estimation module is used to estimate the model and power of current conductance noise
Spectrum.Double-channel pronunciation joint classification model establishes gas using the clean conductance training voice and non-conductance training voice of synchronous acquisition
Speech frame and non-conductance speech frame are led, the conductance phonetic speech power spectrum of each classification in above-mentioned double-channel pronunciation joint classification model is equal
Value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice
Φbs(ω,l).Model compensation module using conductance noise statistical model to the parameter of double-channel pronunciation joint classification model into
Row amendment.The conductance tested speech and non-conductance tested speech frame that frame classification module inputs current sync are classified.Filtering
Device Coefficient generation module constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results.Two channels filter
Conductance tested speech frame and non-conductance tested speech frame are filtered, enhanced conductance voice is obtained.
Embodiment two
Present embodiment discloses a kind of dual sensor sound enhancement methods based on binary channels Wiener filtering, according to above-mentioned reality
Realization device disclosed in example is applied, calculates enhancing using the conductance tested speech of input and non-conductance tested speech using following steps
Conductance voice afterwards, process are as shown in Figure 2:
Step S1, the clean conductance training voice of synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-
The double-channel pronunciation joint classification model of conductance speech frame, and calculate and correspond in above-mentioned double-channel pronunciation joint classification model often
The conductance phonetic speech power of a classification composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and
Cross-spectrum mean value Φ between non-conductance voicebs(ω, l), wherein ω is frequency, and l is the serial number of classification.
It is completed in the present embodiment using following steps:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned
Take the characteristic parameter of every frame voice.
In the present embodiment, the clean conductance training voice of synchronous acquisition and non-conductance training voice are pressed into frame length 30ms, frame
It moves 10ms and carries out framing, the clean conductance training voice of every frame and non-conductance training voice are respectively adopted Hamming window adding window and carry out
Its power spectrum is sought after preemphasis.The power spectrum of above-mentioned conductance training voice and non-conductance training voice is passed through into 24 Jan Vermeers respectively
Filter group carries out dct transform after taking logarithm to the output of filter group again, obtains the mel-frequency cepstrum system of two group of 12 dimension
Number, the training characteristics as double-channel pronunciation joint classification model.
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized
Disaggregated model.In the present embodiment, double-channel pronunciation joint classification model uses multiple data stream GMM, i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is kth frame conductance tested speech and non-conductance tested speech
The characteristic vector of middle extraction,WithFor conductance audio data stream in multiple data stream GMM and non-first of conductance audio data stream high
The mean value of this component,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point
The variance of amount, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance language in respectively multiple data stream GMM
The weight of sound data flow and non-conductance audio data stream, L are the number of Gaussian component.
Parameter c in double-channel pronunciation joint classification modell、wx、wb、WithUsing greatest hope
The estimation of (Expectation Maximization) algorithm.
S1.3, speech frame and non-conductance are trained to all conductances using trained double-channel pronunciation joint classification model
Speech frame is classified, and the conductance voice of each classification included conductance training speech frame and non-conductance speech frame is then calculated
Power spectrum mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbBetween (ω, l), conductance voice and non-conductance voice
Cross-spectrum mean value Φbs(ω,l)。
In the present embodiment, each Gaussian component in double-channel pronunciation joint classification model represents a classification, for every
A pair synchronous conductance training speech frame and non-conductance speech frame, calculate its score to each classification using following formula
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring.Calculate all conductance instructions
Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas
Lead the conductance phonetic speech power spectrum mean value Φ of speech framess(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice
Cross-spectrum mean value Φ between non-conductance voicebs(ω,l)。
Step S2, synchronous acquisition conductance tested speech and non-conductance tested speech, utilize the pure noise of conductance tested speech
The statistical model of Duan Jianli conductance noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω)。
In the present embodiment, the statistical model of conductance noise is the power spectrum mean value Φ of conductance noisevv(ω), use is following
Method calculates:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, it is non-to calculate every frame
The short-time average of conductance detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length.Work as CbGreatly
When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined
Survey the endpoint location of voice signal;
S2.3, it detects as conductance at the time of correspond to the non-conductance tested speech signal end detected in step S2.2
The endpoint of voice extracts the pure noise segment in conductance detection voice;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
Wherein, the statistical model of conductance noise is Gaussian function, GMM model or HMM model.
Step S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to synchronous
The conductance tested speech frame of input and non-conductance tested speech frame are classified.
In the present embodiment, VTS model compensation technology is used first, using the statistical model of conductance noise to double-channel pronunciation
The parameter of conductance audio data stream is modified in joint classification model, then again to the conductance tested speech frame of input and non-gas
Tested speech frame is led to classify.Specific method is to correct conductance voice number in double-channel pronunciation joint classification model using following formula
According to the mean value for flowing each Gaussian component:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively
It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is discrete cosine transformation matrix (Discrete Cosine
Transform, DCT).Other parameters in double-channel pronunciation joint classification model remain unchanged.Using revised binary channels
Voice joint classification model classifies to the conductance tested speech frame of synchronous input and non-conductance tested speech frame, obtains current
Conductance tested speech frame and non-conductance tested speech frame correspond to the classification score q (k, l) of each classification.
Step S4, according to the classification results and Φ of step S3vv(ω) constructs binary channels Wiener filter, tests language to conductance
Sound frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
In the present embodiment, conductance tested speech and non-conductance tested speech for kth frame synchronous acquisition, using following formula meter
Calculate enhanced conductance voice spectrum:
Wherein Y (ω, k), X (ω, k), B (ω, k) be respectively the enhanced conductance voice of kth frame, conductance tested speech and
The frequency spectrum of non-conductance tested speech,To test language corresponding to kth frame conductance tested speech and non-conductance
Following formula calculating is respectively adopted in the frequency response of the Wiener filter filter of sound
Q (k, l) in formula is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint point
The classification score of class model l class.Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification mould
The Wiener filter frequency response of type l class, calculation method are as follows:
Hna(ω, k, l) is the dimension that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class
Receive filter freguency response, calculation method are as follows:
In another embodiment, above-mentionedWithIt is calculated using following formula:
L=arg max q (k, l).
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.
Claims (10)
1. a kind of dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that the dual sensor
Sound enhancement method the following steps are included:
The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance voice
The double-channel pronunciation joint classification model of frame, and calculate and correspond to each classification in above-mentioned double-channel pronunciation joint classification model
Conductance phonetic speech power composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non-conductance language
Cross-spectrum mean value Φ between soundbs(ω, l), wherein ω is frequency, and l is the serial number of classification;
S2, synchronous acquisition conductance tested speech and non-conductance tested speech establish gas using the pure noise segment of conductance tested speech
The statistical model of bone conduction noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω);
S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to the gas of synchronous input
It leads tested speech frame and non-conductance tested speech frame is classified;
S4, classification results and power spectrum mean value Φ according to step S3vv(ω) constructs binary channels Wiener filter, tests conductance
Speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
2. dual sensor sound enhancement method according to claim 1, which is characterized in that the step S1 process is such as
Under:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, extracted every
The characteristic parameter of frame voice, wherein the characteristic parameter is the spectral coefficient of falling Meier;
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint classification are utilized
Model;
S1.3, speech frame and non-conductance voice are trained to all conductances using trained double-channel pronunciation joint classification model
Frame is classified, and the conductance phonetic speech power of each classification included conductance training speech frame and non-conductance speech frame is then calculated
Compose mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbCross-spectrum between (ω, l), conductance voice and non-conductance voice
Mean value Φbs(ω,l)。
3. dual sensor sound enhancement method according to claim 2, which is characterized in that double in the step S1.2
Channel speech joint classification model uses multiple data stream GMM, wherein GMM is gauss hybrid models, i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is to be mentioned in kth frame conductance tested speech and non-conductance tested speech
The characteristic vector taken,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point
The mean value of amount,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gaussian component of conductance audio data stream
Variance, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance voice number in respectively multiple data stream GMM
According to the weight of stream and non-conductance audio data stream, L is the number of Gaussian component.
4. dual sensor sound enhancement method according to claim 3, which is characterized in that double in the step S1.3
Each Gaussian component in channel speech joint classification model represents a classification, for every a pair of synchronous conductance training voice
Frame and non-conductance speech frame calculate its score to each classification using following formula
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring;Calculate all conductance training languages
Then classification belonging to sound frame and non-conductance speech frame calculates same classification included conductance training speech frame and non-conductance language
The conductance phonetic speech power of sound frame composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non-
Cross-spectrum mean value Φ between conductance voicebs(ω,l)。
5. dual sensor sound enhancement method according to claim 1, which is characterized in that the statistics of the conductance noise
Model is the power spectrum mean value Φ of conductance noisevv(ω) is calculated using following methods:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, calculate the non-conductance of every frame
The short-time average of detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as CbGreater than pre-
If threshold value when, judge that the frame is otherwise noise for voice signal, non-conductance obtained according to the court verdict of every frame and detects language
The endpoint location of sound signal;
S2.3, as conductance voice is detected at the time of correspond to the non-conductance tested speech signal end detected in step S2.2
Endpoint, extract conductance detection voice in pure noise segment;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
6. dual sensor sound enhancement method according to claim 1, which is characterized in that adopted first in the step S3
With vector Taylor series model compensation technique, using the statistical model of conductance noise to gas in double-channel pronunciation joint classification model
The parameter for leading audio data stream is modified, and is then carried out again to the conductance tested speech frame of input and non-conductance tested speech frame
Classification, wherein using following formula amendment double-channel pronunciation joint classification model in each Gaussian component of conductance audio data stream it is equal
Value:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class passes through 24 respectively
Jan Vermeer filter group simultaneously takes the mean value after logarithm, and C is discrete cosine transformation matrix, in double-channel pronunciation joint classification model
Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input
Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each
The classification score q (k, l) of classification.
7. dual sensor sound enhancement method according to claim 2, which is characterized in that in the step S4, for
The conductance tested speech of kth frame synchronous acquisition and non-conductance tested speech calculate enhanced conductance voice spectrum using following formula:
Wherein Y (ω, k), X (ω, k), B (ω, k) are respectively the enhanced conductance voice of kth frame, conductance tested speech and non-gas
The frequency spectrum of tested speech is led,For corresponding to kth frame conductance tested speech and non-conductance tested speech
Following formula calculating is respectively adopted in the frequency response of Wiener filter filter
Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification model in formula
The classification score of l class, Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model l
The Wiener filter frequency response of class, calculation method are as follows:
Hna(ω, k, l) is the wiener filter that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class
Wave device frequency response, calculation method are as follows:
8. dual sensor sound enhancement method according to claim 7, which is characterized in that describedWithIt is calculated using following formula:
9. a kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that described
Realization device include conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint
Disaggregated model, model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein
The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame classification
Module, two channels filter connection;The double-channel pronunciation joint classification model, model compensation module, frame classification module, filter
Wave device Coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation module, filter
The connection of wave device Coefficient generation module, the double-channel pronunciation joint classification model are connect with filter coefficient generation module;
The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice signal, institute
The noise model estimation module stated is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation joint point
Class model establishes conductance speech frame and non-conductance language using the clean conductance training voice and non-conductance training voice of synchronous acquisition
The conductance phonetic speech power spectrum mean value of sound frame, each classification in the double-channel pronunciation joint classification model is ΦssIt is (ω, l), non-
It is Φ that conductance phonetic speech power, which composes mean value,bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is Φbs(ω, l),
The model compensation module repairs the parameter of double-channel pronunciation joint classification model using the statistical model of conductance noise
Just, the conductance tested speech and non-conductance tested speech frame that the frame classification module inputs current sync are classified, institute
The filter coefficient generation module stated constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results, described
Two channels filter conductance tested speech frame and non-conductance tested speech frame are filtered, obtain enhanced conductance language
Sound.
10. the realization device of dual sensor sound enhancement method according to claim 9, which is characterized in that the gas
Lead sound sensor is microphone, and the non-conductance speech transducer is throat's transmitter.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678398.7A CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
PCT/CN2019/110290 WO2021012403A1 (en) | 2019-07-25 | 2019-10-10 | Dual sensor speech enhancement method and implementation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910678398.7A CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110390945A true CN110390945A (en) | 2019-10-29 |
CN110390945B CN110390945B (en) | 2021-09-21 |
Family
ID=68287587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910678398.7A Active CN110390945B (en) | 2019-07-25 | 2019-07-25 | Dual-sensor voice enhancement method and implementation device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110390945B (en) |
WO (1) | WO2021012403A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009253A (en) * | 2019-11-29 | 2020-04-14 | 联想(北京)有限公司 | Data processing method and device |
CN111524531A (en) * | 2020-04-23 | 2020-08-11 | 广州清音智能科技有限公司 | Method for real-time noise reduction of high-quality two-channel video voice |
WO2024012095A1 (en) * | 2022-07-12 | 2024-01-18 | 苏州旭创科技有限公司 | Filter implementation method and apparatus, noise suppression method and apparatus, and computer device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
CN203165457U (en) * | 2013-03-08 | 2013-08-28 | 华南理工大学 | Voice acquisition device used for noisy environment |
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
WO2018229503A1 (en) * | 2017-06-16 | 2018-12-20 | Cirrus Logic International Semiconductor Limited | Earbud speech estimation |
CN110010143A (en) * | 2019-04-19 | 2019-07-12 | 出门问问信息科技有限公司 | A kind of voice signals enhancement system, method and storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9711127B2 (en) * | 2011-09-19 | 2017-07-18 | Bitwave Pte Ltd. | Multi-sensor signal optimization for speech communication |
CN103208291A (en) * | 2013-03-08 | 2013-07-17 | 华南理工大学 | Speech enhancement method and device applicable to strong noise environments |
CN105513605B (en) * | 2015-12-01 | 2019-07-02 | 南京师范大学 | The speech-enhancement system and sound enhancement method of mobile microphone |
CN110010149B (en) * | 2016-01-14 | 2023-07-28 | 深圳市韶音科技有限公司 | Dual-sensor voice enhancement method based on statistical model |
JP2018063400A (en) * | 2016-10-14 | 2018-04-19 | 富士通株式会社 | Audio processing apparatus and audio processing program |
CN107886967B (en) * | 2017-11-18 | 2018-11-13 | 中国人民解放军陆军工程大学 | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network |
CN108986834B (en) * | 2018-08-22 | 2023-04-07 | 中国人民解放军陆军工程大学 | Bone conduction voice blind enhancement method based on codec framework and recurrent neural network |
-
2019
- 2019-07-25 CN CN201910678398.7A patent/CN110390945B/en active Active
- 2019-10-10 WO PCT/CN2019/110290 patent/WO2021012403A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004279768A (en) * | 2003-03-17 | 2004-10-07 | Mitsubishi Heavy Ind Ltd | Device and method for estimating air-conducted sound |
CN203165457U (en) * | 2013-03-08 | 2013-08-28 | 华南理工大学 | Voice acquisition device used for noisy environment |
CN106328156A (en) * | 2016-08-22 | 2017-01-11 | 华南理工大学 | Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information |
WO2018229503A1 (en) * | 2017-06-16 | 2018-12-20 | Cirrus Logic International Semiconductor Limited | Earbud speech estimation |
CN110010143A (en) * | 2019-04-19 | 2019-07-12 | 出门问问信息科技有限公司 | A kind of voice signals enhancement system, method and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111009253A (en) * | 2019-11-29 | 2020-04-14 | 联想(北京)有限公司 | Data processing method and device |
CN111009253B (en) * | 2019-11-29 | 2022-10-21 | 联想(北京)有限公司 | Data processing method and device |
CN111524531A (en) * | 2020-04-23 | 2020-08-11 | 广州清音智能科技有限公司 | Method for real-time noise reduction of high-quality two-channel video voice |
WO2024012095A1 (en) * | 2022-07-12 | 2024-01-18 | 苏州旭创科技有限公司 | Filter implementation method and apparatus, noise suppression method and apparatus, and computer device |
Also Published As
Publication number | Publication date |
---|---|
CN110390945B (en) | 2021-09-21 |
WO2021012403A1 (en) | 2021-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105632512B (en) | A kind of dual sensor sound enhancement method and device based on statistical model | |
CN100573663C (en) | Mute detection method based on speech characteristic to jude | |
JP6954680B2 (en) | Speaker confirmation method and speaker confirmation device | |
CN105513605B (en) | The speech-enhancement system and sound enhancement method of mobile microphone | |
CN110390945A (en) | A kind of dual sensor sound enhancement method and realization device | |
CN107886967B (en) | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network | |
CN110246510B (en) | End-to-end voice enhancement method based on RefineNet | |
CN110728989B (en) | Binaural speech separation method based on long-time and short-time memory network L STM | |
CN103325381B (en) | A kind of speech separating method based on fuzzy membership functions | |
CN110197665B (en) | Voice separation and tracking method for public security criminal investigation monitoring | |
KR102429152B1 (en) | Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal | |
CN103208291A (en) | Speech enhancement method and device applicable to strong noise environments | |
CN103985381A (en) | Voice frequency indexing method based on parameter fusion optimized decision | |
CN110349588A (en) | A kind of LSTM network method for recognizing sound-groove of word-based insertion | |
KR20080064557A (en) | Apparatus and method for improving speech intelligibility | |
CN106328141A (en) | Ultrasonic lip reading recognition device and method for mobile terminal | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
Al-Kaltakchi et al. | Study of statistical robust closed set speaker identification with feature and score-based fusion | |
CN203165457U (en) | Voice acquisition device used for noisy environment | |
CN104240717B (en) | Voice enhancement method based on combination of sparse code and ideal binary system mask | |
Zheng et al. | Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint | |
CN111341351A (en) | Voice activity detection method and device based on self-attention mechanism and storage medium | |
CN114495909B (en) | End-to-end bone-qi guiding voice joint recognition method | |
CN114613384B (en) | Deep learning-based multi-input voice signal beam forming information complementation method | |
CN113327589B (en) | Voice activity detection method based on attitude sensor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |