CN110390945A - A kind of dual sensor sound enhancement method and realization device - Google Patents

A kind of dual sensor sound enhancement method and realization device Download PDF

Info

Publication number
CN110390945A
CN110390945A CN201910678398.7A CN201910678398A CN110390945A CN 110390945 A CN110390945 A CN 110390945A CN 201910678398 A CN201910678398 A CN 201910678398A CN 110390945 A CN110390945 A CN 110390945A
Authority
CN
China
Prior art keywords
conductance
speech
frame
voice
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910678398.7A
Other languages
Chinese (zh)
Other versions
CN110390945B (en
Inventor
张军
李�学
宁更新
冯义志
余华
季飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910678398.7A priority Critical patent/CN110390945B/en
Priority to PCT/CN2019/110290 priority patent/WO2021012403A1/en
Publication of CN110390945A publication Critical patent/CN110390945A/en
Application granted granted Critical
Publication of CN110390945B publication Critical patent/CN110390945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of dual sensor sound enhancement methods and realization device based on binary channels Wiener filtering, this method is first with the complementarity between conductance voice and non-conductance voice, establish the double-channel pronunciation joint classification model that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal, and classified using the model to the speech frame of double channels acquisition, binary channels Wiener filter finally is constructed according to classification results, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention has more fully merged the information that conductance voice and non-conductance voice are included, and the priori knowledge of voice signal is introduced by statistical model, can effectively improve the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to a variety of occasions such as video calling, car phone, multi-media classroom, military communication.

Description

A kind of dual sensor sound enhancement method and realization device
Technical field
The present invention relates to speech signal processing technologies, and in particular to a kind of double-sensing based on binary channels Wiener filtering Device sound enhancement method and realization device.
Background technique
In actual speech communication, voice signal often will receive the interference of extraneous environmental noise, influence the matter for receiving voice Amount.Speech enhancement technique is an important branch of Speech processing, it is therefore an objective to be extracted as much as possible from noisy speech pure Net raw tone is widely used in the fields such as voice communication, voice compression coding and the speech recognition under noisy environment.
Since human ear to the perception of sound is carried out by the vibration of air, current most of voice enhancement algorithms Both for air transmitted (abbreviation conductance) voice, i.e., voice, reinforcing effect are acquired using air conduction transducer (such as microphone) It is affected by acoustic noises various in environment, usual performance is bad in a noisy environment.In order to reduce ambient noise pair The influence of voice quality, non-air conduct the often quilt such as (referred to as non-conductance) sensor such as throat's transmitter, bone-conduction microphone For the voice collecting in noisy environment.It is different from air conduction transducer, the speech transducer of non-conductance utilizes speak human vocal band, jaw The vibration at the positions such as bone drives reed in sensor or carbon film to change, and changes its resistance value, makes the electricity at its both ends Pressure changes, to convert electric signal, i.e. voice signal for vibration signal.Due to the sound wave conducted in air can not make it is non- Deformation occurs for the reed or carbon film of air conduction transducer, therefore non-air conduction transducer is not influenced by conductance sound, has very strong Acoustic resistive noise immune.But what it is because of the acquisition of non-air conduction transducer is gone out by the Vibration propagation at the positions such as jawbone, muscle, skin The voice come, high frequency section are lost serious, show as that sound is stuffy, ambiguous, and the intelligibility of speech is poor.
All there is certain deficiencies when being used alone in view of conductance and non-air conduction transducer, occur some knots in recent years Close the sound enhancement method of the two advantage.These methods utilize the complementarity of conductance voice and non-conductance voice, using more sensings Device integration technology realizes speech enhan-cement, can usually obtain more better than single-sensor speech-enhancement system effect.It is existing double For sensor speech enhan-cement there are mainly two types of mode, a kind of elder generation recovers conductance voice from non-conductance voice, then makes an uproar again with band Conductance voice is merged;Another kind is conductance voice to be recovered from non-conductance voice, and utilize air conduction transducer and non-conductance Sensor signal enhances the conductance voice that band is made an uproar, and then the two is merged again.These technologies there is it is below not Sufficient place: when (1) restoring conductance voice using non-conductance voice, the meeting noise additional in high frequency or mute middle introducing influences to increase Potent fruit.(2) when restoring conductance voice using non-conductance voice, fail the information using current conductance voice.(3) non-gas is utilized When the conductance voice that lead sound restores is merged with conductance voice, fail the correlation and priori knowledge that make full use of the two.(4) melt Usually assume that the gentle lead sound of non-conductance voice is mutually indepedent when conjunction, but the hypothesis is in practice and invalid.
Chinese invention patent 201610025390.7 discloses a kind of dual sensor speech enhan-cement side based on statistical model Method and device, the invention first in conjunction with the gentle lead sound of non-conductance voice come construct for classification joint statistical model and into Row end-point detection leads voice filter by joint statistical model to calculate current optimum gas, is filtered increasing to conductance voice By force, non-conductance voice is then converted into conductance voice using the mapping model of non-conductance voice to conductance voice, and by its with It filters enhanced conductance voice and is weighted fusion, partially solve the gentle lead of conductance voice that non-air conduction transducer restores Sound fails to make full use of the correlation of the two and the deficiency of priori knowledge when merging, but is still to use non-gas in second step fusion The conductance voice that lead sound restores, therefore equally exist high frequency and mute noise, restore conductance voice using non-conductance voice Shi Weineng utilizes the deficiencies of information of conductance voice.
Summary of the invention
The purpose of the present invention is to solve drawbacks described above in the prior art, provide a kind of based on binary channels Wiener filtering Dual sensor sound enhancement method and realization device, this method is first with complementary between conductance voice and non-conductance voice Property, establish the double-channel pronunciation joint classification that frame classification is carried out to air conduction transducer and non-air conduction transducer binary channels input signal Model, and classified using the model to the speech frame of double channels acquisition, finally binary channels dimension is constructed according to classification results It receives filter, enhancing is filtered to the voice signal of double channels acquisition.Compared with prior art, the present invention more fully merges The information that conductance voice and non-conductance voice are included, and by the priori knowledge of statistical model introducing voice signal, can have Effect improves the reinforcing effect of speech-enhancement system in a noisy environment.The present invention can be widely applied to video calling, vehicle mounted electric A variety of occasions such as words, multi-media classroom, military communication.
The first purpose of this invention can be reached by adopting the following technical scheme that:
A kind of dual sensor sound enhancement method based on binary channels Wiener filtering, the dual sensor speech enhan-cement side Method the following steps are included:
The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance The double-channel pronunciation joint classification model of speech frame, and calculate to correspond in above-mentioned double-channel pronunciation joint classification model and each divide The conductance phonetic speech power of class composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non-gas Cross-spectrum mean value Φ between lead soundbs(ω, l), wherein ω is frequency, and l is the serial number of classification;
S2, synchronous acquisition conductance tested speech and non-conductance tested speech, are built using the pure noise segment of conductance tested speech The statistical model of vertical conductance noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω);
S3, it is inputted with the double-channel pronunciation joint classification model in step S1 to synchronous using the statistical model of conductance noise Conductance tested speech frame and non-conductance tested speech frame classify;
S4, classification results and power spectrum mean value Φ according to step S3vv(ω) constructs binary channels Wiener filter, to conductance Tested speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
Further, the step S1 process is as follows:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned Take characteristic parameter --- the spectral coefficient of falling Meier of every frame voice;
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized Disaggregated model;
S1.3, speech frame and non-conductance are trained to all conductances using trained double-channel pronunciation joint classification model Speech frame is classified, and the conductance voice of each classification included conductance training speech frame and non-conductance speech frame is then calculated Power spectrum mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbBetween (ω, l), conductance voice and non-conductance voice Cross-spectrum mean value Φbs(ω,l)。
Further, in the step S1.2, double-channel pronunciation joint classification model uses multiple data stream Gaussian Mixture Model (Gaussian Mixture Model, GMM), i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is kth frame conductance tested speech and non-conductance tested speech The characteristic vector of middle extraction,WithFor conductance audio data stream in multiple data stream GMM and non-first of conductance audio data stream high The mean value of this component,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point The variance of amount, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance language in respectively multiple data stream GMM The weight of sound data flow and non-conductance audio data stream, L are the number of Gaussian component.
Further, in the step S1.3, each Gaussian component in double-channel pronunciation joint classification model is represented One classification calculates it to each using following formula for every a pair of synchronous conductance training speech frame and non-conductance speech frame The score of classification
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring;Calculate all conductance instructions Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas Lead the conductance phonetic speech power spectrum mean value Φ of speech framess(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice Cross-spectrum mean value Φ between non-conductance voicebs(ω,l)。
Further, the statistical model of the conductance noise is the power spectrum mean value Φ of conductance noisevv(ω) is used Following methods calculate:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, it is non-to calculate every frame The short-time average of conductance detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as CbGreatly When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined Survey the endpoint location of voice signal;
S2.3, it detects as conductance at the time of correspond to the non-conductance tested speech signal end detected in step S2.2 The endpoint of voice extracts the pure noise segment in conductance detection voice;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
Further, in the step S3 first using vector Taylor series model (Vector Taylor Seties, VTS) compensation technique, using the statistical model of conductance noise to conductance audio data stream in double-channel pronunciation joint classification model Parameter is modified, and is then classified again to the conductance tested speech frame of input and non-conductance tested speech frame, wherein is used Following formula corrects the mean value of each Gaussian component of conductance audio data stream in double-channel pronunciation joint classification model:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is dct transform matrix, in double-channel pronunciation joint classification model Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each The classification score q (k, l) of classification.
Further, in the step S4, conductance tested speech and non-conductance for kth frame synchronous acquisition test language Sound calculates enhanced conductance voice spectrum using following formula:
Wherein Y (ω, k), X (ω, k), B (ω, k) be respectively the enhanced conductance voice of kth frame, conductance tested speech and The frequency spectrum of non-conductance tested speech,To test language corresponding to kth frame conductance tested speech and non-conductance Following formula calculating is respectively adopted in the frequency response of the Wiener filter filter of sound
Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification in formula The classification score of model l class, Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model The Wiener filter frequency response of l class, calculation method are as follows:
Hna(ω, k, l) is the dimension that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class Receive filter freguency response, calculation method are as follows:
Further, describedWithIt is calculated using following formula:
L=arg max q (k, l).
Another object of the present invention is achieved through the following technical solutions:
A kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, the realization device Including conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint classification model, Model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein
The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame Categorization module, two channels filter connection;The double-channel pronunciation joint classification model, model compensation module, frame classification mould Block, filter coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation mould Block, the connection of filter coefficient generation module, the double-channel pronunciation joint classification model and filter coefficient generation module connect It connects;
The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice letter Number, the noise model estimation module is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation Joint classification model establishes conductance speech frame and non-using the clean conductance training voice and non-conductance training voice of synchronous acquisition The conductance phonetic speech power spectrum mean value of conductance speech frame, each classification in the double-channel pronunciation joint classification model is Φss It is Φ that (ω, l), non-conductance phonetic speech power, which compose mean value,bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is Φbs(ω, l), the model compensation module is using the statistical model of conductance noise to double-channel pronunciation joint classification model Parameter is modified, conductance tested speech that the frame classification module inputs current sync and non-conductance tested speech frame into Row classification, the filter coefficient generation module construct binary channels wiener according to the power spectrum of the gentle bone conduction noise of classification results and filter Wave device, the two channels filter is filtered conductance tested speech frame and non-conductance tested speech frame, after obtaining enhancing Conductance voice.
Further, the conductance speech transducer is microphone, and the non-conductance speech transducer is sent for throat Talk about device.
The present invention has the following advantages and effects with respect to the prior art:
(1) compared with the speech enhancement technique for being based only upon conductance tested speech or non-conductance tested speech, the present invention is increasing The information of conductance tested speech and non-conductance tested speech is utilized when strong simultaneously, better reinforcing effect can be obtained.
(2) present invention merges conductance tested speech and non-conductance tested speech using double-channel pronunciation joint classification model Information, frame classification can be made more acurrate, it is abundant using both correlation and priori knowledge.
(3) present invention is restoring conductance voice using binary channels Wiener filter, with Chinese invention patent 201610025390.7 is simpler compared to calculating, at the same be avoided that when recovering conductance voice from non-conductance voice there are high frequency or Mute noise fails to utilize the deficiency of conductance voice messaging, has better performance.
(4) present invention employs binary channels Wiener filters to restore conductance voice, avoid non-conductance voice and conductance The mutually independent hypothesis of voice.
Detailed description of the invention
Fig. 1 is the reality of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering The structural block diagram of existing device;
Fig. 2 is the stream of the dual sensor sound enhancement method disclosed in the embodiment of the present invention based on binary channels Wiener filtering Cheng Tu.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Embodiment one
Present embodiment discloses a kind of realization devices of dual sensor sound enhancement method based on binary channels Wiener filtering Structural block diagram, as shown in Figure 1, by conductance speech transducer, non-conductance speech transducer, noise model estimation module, bilateral Road voice joint classification model, model compensation module, frame classification module, filter coefficient generation module, two channels filter are total With constituting, wherein conductance speech transducer and non-conductance speech transducer respectively with noise model estimation module, frame classification mould Block, two channels filter connection, double-channel pronunciation joint classification model, model compensation module, frame classification module, filter coefficient Generation module, two channels filter are sequentially connected with, and noise model estimation module and model compensation module, filter coefficient generate mould Block connection, double-channel pronunciation joint classification model are connect with filter coefficient generation module.
In the present embodiment, conductance speech transducer is microphone, and non-conductance speech transducer is throat's transmitter, and the two is used In acquisition conductance and non-conductance voice signal;Noise model estimation module is used to estimate the model and power of current conductance noise Spectrum.Double-channel pronunciation joint classification model establishes gas using the clean conductance training voice and non-conductance training voice of synchronous acquisition Speech frame and non-conductance speech frame are led, the conductance phonetic speech power spectrum of each classification in above-mentioned double-channel pronunciation joint classification model is equal Value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice Φbs(ω,l).Model compensation module using conductance noise statistical model to the parameter of double-channel pronunciation joint classification model into Row amendment.The conductance tested speech and non-conductance tested speech frame that frame classification module inputs current sync are classified.Filtering Device Coefficient generation module constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results.Two channels filter Conductance tested speech frame and non-conductance tested speech frame are filtered, enhanced conductance voice is obtained.
Embodiment two
Present embodiment discloses a kind of dual sensor sound enhancement methods based on binary channels Wiener filtering, according to above-mentioned reality Realization device disclosed in example is applied, calculates enhancing using the conductance tested speech of input and non-conductance tested speech using following steps Conductance voice afterwards, process are as shown in Figure 2:
Step S1, the clean conductance training voice of synchronous acquisition and non-conductance training voice, establish conductance speech frame and non- The double-channel pronunciation joint classification model of conductance speech frame, and calculate and correspond in above-mentioned double-channel pronunciation joint classification model often The conductance phonetic speech power of a classification composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and Cross-spectrum mean value Φ between non-conductance voicebs(ω, l), wherein ω is frequency, and l is the serial number of classification.
It is completed in the present embodiment using following steps:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, mentioned Take the characteristic parameter of every frame voice.
In the present embodiment, the clean conductance training voice of synchronous acquisition and non-conductance training voice are pressed into frame length 30ms, frame It moves 10ms and carries out framing, the clean conductance training voice of every frame and non-conductance training voice are respectively adopted Hamming window adding window and carry out Its power spectrum is sought after preemphasis.The power spectrum of above-mentioned conductance training voice and non-conductance training voice is passed through into 24 Jan Vermeers respectively Filter group carries out dct transform after taking logarithm to the output of filter group again, obtains the mel-frequency cepstrum system of two group of 12 dimension Number, the training characteristics as double-channel pronunciation joint classification model.
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint are utilized Disaggregated model.In the present embodiment, double-channel pronunciation joint classification model uses multiple data stream GMM, i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is kth frame conductance tested speech and non-conductance tested speech The characteristic vector of middle extraction,WithFor conductance audio data stream in multiple data stream GMM and non-first of conductance audio data stream high The mean value of this component,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point The variance of amount, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance language in respectively multiple data stream GMM The weight of sound data flow and non-conductance audio data stream, L are the number of Gaussian component.
Parameter c in double-channel pronunciation joint classification modell、wx、wbWithUsing greatest hope The estimation of (Expectation Maximization) algorithm.
S1.3, speech frame and non-conductance are trained to all conductances using trained double-channel pronunciation joint classification model Speech frame is classified, and the conductance voice of each classification included conductance training speech frame and non-conductance speech frame is then calculated Power spectrum mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbBetween (ω, l), conductance voice and non-conductance voice Cross-spectrum mean value Φbs(ω,l)。
In the present embodiment, each Gaussian component in double-channel pronunciation joint classification model represents a classification, for every A pair synchronous conductance training speech frame and non-conductance speech frame, calculate its score to each classification using following formula
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring.Calculate all conductance instructions Practice classification belonging to speech frame and non-conductance speech frame, then calculates same classification included conductance training speech frame and non-gas Lead the conductance phonetic speech power spectrum mean value Φ of speech framess(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice Cross-spectrum mean value Φ between non-conductance voicebs(ω,l)。
Step S2, synchronous acquisition conductance tested speech and non-conductance tested speech, utilize the pure noise of conductance tested speech The statistical model of Duan Jianli conductance noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω)。
In the present embodiment, the statistical model of conductance noise is the power spectrum mean value Φ of conductance noisevv(ω), use is following Method calculates:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, it is non-to calculate every frame The short-time average of conductance detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length.Work as CbGreatly When preset threshold value, judge that the frame is otherwise noise for voice signal, non-conductance is obtained according to the court verdict of every frame and is examined Survey the endpoint location of voice signal;
S2.3, it detects as conductance at the time of correspond to the non-conductance tested speech signal end detected in step S2.2 The endpoint of voice extracts the pure noise segment in conductance detection voice;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
Wherein, the statistical model of conductance noise is Gaussian function, GMM model or HMM model.
Step S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to synchronous The conductance tested speech frame of input and non-conductance tested speech frame are classified.
In the present embodiment, VTS model compensation technology is used first, using the statistical model of conductance noise to double-channel pronunciation The parameter of conductance audio data stream is modified in joint classification model, then again to the conductance tested speech frame of input and non-gas Tested speech frame is led to classify.Specific method is to correct conductance voice number in double-channel pronunciation joint classification model using following formula According to the mean value for flowing each Gaussian component:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class leads to respectively It crosses 24 Jan Vermeer filter groups and takes the mean value after logarithm, C is discrete cosine transformation matrix (Discrete Cosine Transform, DCT).Other parameters in double-channel pronunciation joint classification model remain unchanged.Using revised binary channels Voice joint classification model classifies to the conductance tested speech frame of synchronous input and non-conductance tested speech frame, obtains current Conductance tested speech frame and non-conductance tested speech frame correspond to the classification score q (k, l) of each classification.
Step S4, according to the classification results and Φ of step S3vv(ω) constructs binary channels Wiener filter, tests language to conductance Sound frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
In the present embodiment, conductance tested speech and non-conductance tested speech for kth frame synchronous acquisition, using following formula meter Calculate enhanced conductance voice spectrum:
Wherein Y (ω, k), X (ω, k), B (ω, k) be respectively the enhanced conductance voice of kth frame, conductance tested speech and The frequency spectrum of non-conductance tested speech,To test language corresponding to kth frame conductance tested speech and non-conductance Following formula calculating is respectively adopted in the frequency response of the Wiener filter filter of sound
Q (k, l) in formula is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint point The classification score of class model l class.Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification mould The Wiener filter frequency response of type l class, calculation method are as follows:
Hna(ω, k, l) is the dimension that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class Receive filter freguency response, calculation method are as follows:
In another embodiment, above-mentionedWithIt is calculated using following formula:
L=arg max q (k, l).
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (10)

1. a kind of dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that the dual sensor Sound enhancement method the following steps are included:
The clean conductance training voice of S1, synchronous acquisition and non-conductance training voice, establish conductance speech frame and non-conductance voice The double-channel pronunciation joint classification model of frame, and calculate and correspond to each classification in above-mentioned double-channel pronunciation joint classification model Conductance phonetic speech power composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non-conductance language Cross-spectrum mean value Φ between soundbs(ω, l), wherein ω is frequency, and l is the serial number of classification;
S2, synchronous acquisition conductance tested speech and non-conductance tested speech establish gas using the pure noise segment of conductance tested speech The statistical model of bone conduction noise, and calculate the power spectrum mean value Φ of conductance noisevv(ω);
S3, using the statistical model of conductance noise with the double-channel pronunciation joint classification model in step S1 to the gas of synchronous input It leads tested speech frame and non-conductance tested speech frame is classified;
S4, classification results and power spectrum mean value Φ according to step S3vv(ω) constructs binary channels Wiener filter, tests conductance Speech frame and non-conductance tested speech frame are filtered, and obtain enhanced conductance voice.
2. dual sensor sound enhancement method according to claim 1, which is characterized in that the step S1 process is such as Under:
S1.1, framing and pretreatment are carried out to the clean conductance training voice and non-conductance training voice of synchronous acquisition, extracted every The characteristic parameter of frame voice, wherein the characteristic parameter is the spectral coefficient of falling Meier;
S1.2, conductance voice obtained in step S1.1 and non-conductance phonetic feature, training double-channel pronunciation joint classification are utilized Model;
S1.3, speech frame and non-conductance voice are trained to all conductances using trained double-channel pronunciation joint classification model Frame is classified, and the conductance phonetic speech power of each classification included conductance training speech frame and non-conductance speech frame is then calculated Compose mean value Φss(ω, l), non-conductance phonetic speech power compose mean value ΦbbCross-spectrum between (ω, l), conductance voice and non-conductance voice Mean value Φbs(ω,l)。
3. dual sensor sound enhancement method according to claim 2, which is characterized in that double in the step S1.2 Channel speech joint classification model uses multiple data stream GMM, wherein GMM is gauss hybrid models, i.e.,
Wherein N (o, μ, σ) is Gaussian function, ox(k) and obIt (k) is to be mentioned in kth frame conductance tested speech and non-conductance tested speech The characteristic vector taken,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gauss of conductance audio data stream point The mean value of amount,WithFor conductance audio data stream in multiple data stream GMM and non-first of Gaussian component of conductance audio data stream Variance, clFor the weight of first of Gaussian component in multiple data stream GMM, wxAnd wbConductance voice number in respectively multiple data stream GMM According to the weight of stream and non-conductance audio data stream, L is the number of Gaussian component.
4. dual sensor sound enhancement method according to claim 3, which is characterized in that double in the step S1.3 Each Gaussian component in channel speech joint classification model represents a classification, for every a pair of synchronous conductance training voice Frame and non-conductance speech frame calculate its score to each classification using following formula
Current conductance training speech frame and non-conductance speech frame belong to the classification of highest scoring;Calculate all conductance training languages Then classification belonging to sound frame and non-conductance speech frame calculates same classification included conductance training speech frame and non-conductance language The conductance phonetic speech power of sound frame composes mean value Φss(ω, l), non-conductance phonetic speech power compose mean value Φbb(ω, l), conductance voice and non- Cross-spectrum mean value Φ between conductance voicebs(ω,l)。
5. dual sensor sound enhancement method according to claim 1, which is characterized in that the statistics of the conductance noise Model is the power spectrum mean value Φ of conductance noisevv(ω) is calculated using following methods:
S2.1, synchronous acquisition conductance tested speech and non-conductance tested speech and framing;
S2.2, the short-time autocorrelation function R that speech frame is detected according to non-conductanceb(m) and short-time energy Eb, calculate the non-conductance of every frame The short-time average of detection speech frame crosses threshold rate Cb:
Wherein sgn [] is to take symbolic operation,It is regulatory factor, T is thresholding initial value, and M is frame length, works as CbGreater than pre- If threshold value when, judge that the frame is otherwise noise for voice signal, non-conductance obtained according to the court verdict of every frame and detects language The endpoint location of sound signal;
S2.3, as conductance voice is detected at the time of correspond to the non-conductance tested speech signal end detected in step S2.2 Endpoint, extract conductance detection voice in pure noise segment;
S2.4, the power spectrum mean value Φ for calculating pure noise segment signal in conductance tested speechvv(ω)。
6. dual sensor sound enhancement method according to claim 1, which is characterized in that adopted first in the step S3 With vector Taylor series model compensation technique, using the statistical model of conductance noise to gas in double-channel pronunciation joint classification model The parameter for leading audio data stream is modified, and is then carried out again to the conductance tested speech frame of input and non-conductance tested speech frame Classification, wherein using following formula amendment double-channel pronunciation joint classification model in each Gaussian component of conductance audio data stream it is equal Value:
WhereinWithThe power spectrum of the clean conductance training voice and noise that respectively belong to first of class passes through 24 respectively Jan Vermeer filter group simultaneously takes the mean value after logarithm, and C is discrete cosine transformation matrix, in double-channel pronunciation joint classification model Other parameters remain unchanged, using revised double-channel pronunciation joint classification model to the conductance tested speech frame of synchronous input Classify with non-conductance tested speech frame, obtains current conductance tested speech frame and non-conductance tested speech frame corresponds to each The classification score q (k, l) of classification.
7. dual sensor sound enhancement method according to claim 2, which is characterized in that in the step S4, for The conductance tested speech of kth frame synchronous acquisition and non-conductance tested speech calculate enhanced conductance voice spectrum using following formula:
Wherein Y (ω, k), X (ω, k), B (ω, k) are respectively the enhanced conductance voice of kth frame, conductance tested speech and non-gas The frequency spectrum of tested speech is led,For corresponding to kth frame conductance tested speech and non-conductance tested speech Following formula calculating is respectively adopted in the frequency response of Wiener filter filter
Q (k, l) is that kth frame conductance tested speech and non-conductance tested speech correspond to double-channel pronunciation joint classification model in formula The classification score of l class, Ha(ω, k, l) is that kth frame conductance tested speech corresponds to double-channel pronunciation joint classification model l The Wiener filter frequency response of class, calculation method are as follows:
Hna(ω, k, l) is the wiener filter that the non-conductance tested speech of kth frame corresponds to double-channel pronunciation joint classification model l class Wave device frequency response, calculation method are as follows:
8. dual sensor sound enhancement method according to claim 7, which is characterized in that describedWithIt is calculated using following formula:
9. a kind of realization device of the dual sensor sound enhancement method based on binary channels Wiener filtering, which is characterized in that described Realization device include conductance speech transducer, non-conductance speech transducer, noise model estimation module, double-channel pronunciation joint Disaggregated model, model compensation module, frame classification module, filter coefficient generation module and two channels filter, wherein
The conductance speech transducer and non-conductance speech transducer respectively with the noise model estimation module, frame classification Module, two channels filter connection;The double-channel pronunciation joint classification model, model compensation module, frame classification module, filter Wave device Coefficient generation module, two channels filter are sequentially connected with, the noise model estimation module and model compensation module, filter The connection of wave device Coefficient generation module, the double-channel pronunciation joint classification model are connect with filter coefficient generation module;
The conductance speech transducer and non-conductance speech transducer are respectively used to acquisition conductance and non-conductance voice signal, institute The noise model estimation module stated is used to estimate the model and power spectrum of current conductance noise, the double-channel pronunciation joint point Class model establishes conductance speech frame and non-conductance language using the clean conductance training voice and non-conductance training voice of synchronous acquisition The conductance phonetic speech power spectrum mean value of sound frame, each classification in the double-channel pronunciation joint classification model is ΦssIt is (ω, l), non- It is Φ that conductance phonetic speech power, which composes mean value,bbCross-spectrum mean value between (ω, l), conductance voice and non-conductance voice is Φbs(ω, l), The model compensation module repairs the parameter of double-channel pronunciation joint classification model using the statistical model of conductance noise Just, the conductance tested speech and non-conductance tested speech frame that the frame classification module inputs current sync are classified, institute The filter coefficient generation module stated constructs binary channels Wiener filter according to the power spectrum of the gentle bone conduction noise of classification results, described Two channels filter conductance tested speech frame and non-conductance tested speech frame are filtered, obtain enhanced conductance language Sound.
10. the realization device of dual sensor sound enhancement method according to claim 9, which is characterized in that the gas Lead sound sensor is microphone, and the non-conductance speech transducer is throat's transmitter.
CN201910678398.7A 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device Active CN110390945B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910678398.7A CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device
PCT/CN2019/110290 WO2021012403A1 (en) 2019-07-25 2019-10-10 Dual sensor speech enhancement method and implementation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910678398.7A CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device

Publications (2)

Publication Number Publication Date
CN110390945A true CN110390945A (en) 2019-10-29
CN110390945B CN110390945B (en) 2021-09-21

Family

ID=68287587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910678398.7A Active CN110390945B (en) 2019-07-25 2019-07-25 Dual-sensor voice enhancement method and implementation device

Country Status (2)

Country Link
CN (1) CN110390945B (en)
WO (1) WO2021012403A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111009253A (en) * 2019-11-29 2020-04-14 联想(北京)有限公司 Data processing method and device
CN111524531A (en) * 2020-04-23 2020-08-11 广州清音智能科技有限公司 Method for real-time noise reduction of high-quality two-channel video voice
WO2024012095A1 (en) * 2022-07-12 2024-01-18 苏州旭创科技有限公司 Filter implementation method and apparatus, noise suppression method and apparatus, and computer device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004279768A (en) * 2003-03-17 2004-10-07 Mitsubishi Heavy Ind Ltd Device and method for estimating air-conducted sound
CN203165457U (en) * 2013-03-08 2013-08-28 华南理工大学 Voice acquisition device used for noisy environment
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
WO2018229503A1 (en) * 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Limited Earbud speech estimation
CN110010143A (en) * 2019-04-19 2019-07-12 出门问问信息科技有限公司 A kind of voice signals enhancement system, method and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9711127B2 (en) * 2011-09-19 2017-07-18 Bitwave Pte Ltd. Multi-sensor signal optimization for speech communication
CN103208291A (en) * 2013-03-08 2013-07-17 华南理工大学 Speech enhancement method and device applicable to strong noise environments
CN105513605B (en) * 2015-12-01 2019-07-02 南京师范大学 The speech-enhancement system and sound enhancement method of mobile microphone
CN110010149B (en) * 2016-01-14 2023-07-28 深圳市韶音科技有限公司 Dual-sensor voice enhancement method based on statistical model
JP2018063400A (en) * 2016-10-14 2018-04-19 富士通株式会社 Audio processing apparatus and audio processing program
CN107886967B (en) * 2017-11-18 2018-11-13 中国人民解放军陆军工程大学 A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN108986834B (en) * 2018-08-22 2023-04-07 中国人民解放军陆军工程大学 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004279768A (en) * 2003-03-17 2004-10-07 Mitsubishi Heavy Ind Ltd Device and method for estimating air-conducted sound
CN203165457U (en) * 2013-03-08 2013-08-28 华南理工大学 Voice acquisition device used for noisy environment
CN106328156A (en) * 2016-08-22 2017-01-11 华南理工大学 Microphone array voice reinforcing system and microphone array voice reinforcing method with combination of audio information and video information
WO2018229503A1 (en) * 2017-06-16 2018-12-20 Cirrus Logic International Semiconductor Limited Earbud speech estimation
CN110010143A (en) * 2019-04-19 2019-07-12 出门问问信息科技有限公司 A kind of voice signals enhancement system, method and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111009253A (en) * 2019-11-29 2020-04-14 联想(北京)有限公司 Data processing method and device
CN111009253B (en) * 2019-11-29 2022-10-21 联想(北京)有限公司 Data processing method and device
CN111524531A (en) * 2020-04-23 2020-08-11 广州清音智能科技有限公司 Method for real-time noise reduction of high-quality two-channel video voice
WO2024012095A1 (en) * 2022-07-12 2024-01-18 苏州旭创科技有限公司 Filter implementation method and apparatus, noise suppression method and apparatus, and computer device

Also Published As

Publication number Publication date
CN110390945B (en) 2021-09-21
WO2021012403A1 (en) 2021-01-28

Similar Documents

Publication Publication Date Title
CN105632512B (en) A kind of dual sensor sound enhancement method and device based on statistical model
CN100573663C (en) Mute detection method based on speech characteristic to jude
JP6954680B2 (en) Speaker confirmation method and speaker confirmation device
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
CN110390945A (en) A kind of dual sensor sound enhancement method and realization device
CN107886967B (en) A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN110246510B (en) End-to-end voice enhancement method based on RefineNet
CN110728989B (en) Binaural speech separation method based on long-time and short-time memory network L STM
CN103325381B (en) A kind of speech separating method based on fuzzy membership functions
CN110197665B (en) Voice separation and tracking method for public security criminal investigation monitoring
KR102429152B1 (en) Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal
CN103208291A (en) Speech enhancement method and device applicable to strong noise environments
CN103985381A (en) Voice frequency indexing method based on parameter fusion optimized decision
CN110349588A (en) A kind of LSTM network method for recognizing sound-groove of word-based insertion
KR20080064557A (en) Apparatus and method for improving speech intelligibility
CN106328141A (en) Ultrasonic lip reading recognition device and method for mobile terminal
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
Al-Kaltakchi et al. Study of statistical robust closed set speaker identification with feature and score-based fusion
CN203165457U (en) Voice acquisition device used for noisy environment
CN104240717B (en) Voice enhancement method based on combination of sparse code and ideal binary system mask
Zheng et al. Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint
CN111341351A (en) Voice activity detection method and device based on self-attention mechanism and storage medium
CN114495909B (en) End-to-end bone-qi guiding voice joint recognition method
CN114613384B (en) Deep learning-based multi-input voice signal beam forming information complementation method
CN113327589B (en) Voice activity detection method based on attitude sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant