CN203165457U - Voice acquisition device used for noisy environment - Google Patents

Voice acquisition device used for noisy environment Download PDF

Info

Publication number
CN203165457U
CN203165457U CN 201320107350 CN201320107350U CN203165457U CN 203165457 U CN203165457 U CN 203165457U CN 201320107350 CN201320107350 CN 201320107350 CN 201320107350 U CN201320107350 U CN 201320107350U CN 203165457 U CN203165457 U CN 203165457U
Authority
CN
China
Prior art keywords
voice
module
model
air
speech transducer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201320107350
Other languages
Chinese (zh)
Inventor
张军
朱颖莉
宁更新
冯义志
余华
韦岗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN 201320107350 priority Critical patent/CN203165457U/en
Application granted granted Critical
Publication of CN203165457U publication Critical patent/CN203165457U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The utility model discloses a voice acquisition device used for noisy environment. The device comprises an air conduction voice sensor, a non-air conduction voice sensor, a multipath data acquisition module, a noise model estimation module, a combined model correction module, a voice enhancement module, a combined model training and self-adapted module. The non-air conduction voice sensor, the noise model estimation module, and the voice enhancement module are respectively connected with the multipath data acquisition module. The noise model estimation module, the combined model correction module, and the voice enhancement module are sequentially connected together. The combined model training and self-adapted module is connected with the multipath data acquisition module and the combined model correction module. Compared to the prior art, the voice acquisition device has advantages of smaller size, more convenient use, stronger anti-noise capability, and better voice quality.

Description

A kind of voice acquisition device that can be used for strong noise environment
Technical field
The utility model relates to the signal process field, particularly a kind of voice acquisition device that can be used for strong noise environment.
Background technology
Voice are the most natural means of human interchange, but in practical applications such as voice communication, speech recognition, often exist various neighbourhood noises.When these neighbourhood noises are strong, can have a strong impact on the quality of voice communication and the accuracy of identification.For example in noisy environments such as factory, rally, not only the tonequality of voice communication, intelligibility are understood remarkable variation, and the discrimination of speech recognition device also can sharply descend.
It is a kind of minimizing neighbourhood noise influence commonly used, the method for raising voice communication quality that voice strengthen, and the pre-service of voice before also can being used for identifying is to improve the discrimination of speech recognition device.Sound enhancement method mainly comprises two classes at present, one class is based on the sound enhancement method of single microphone, comprise spectrum-subtraction, Wiener filtering, MMSE, Kalman filtering, wavelet transformation etc., these class methods are utilized the single microphone received speech signal, by filtering such as time domain, frequency domain, wavelet transformed domain with handle and suppress noise, improve the quality of voice; The another kind of sound enhancement method that is based on microphone array, these class methods are applied to array signal process technique in the voice enhancing, the space phase information that comprises in the voice signal that utilizes a plurality of microphones to receive is carried out spatial filtering to the input voice, formation has the spatial beams of directive property, voice signal on the assigned direction is strengthened, suppress the interference on other directions simultaneously, can provide than the better noise suppression effect of traditional voice Enhancement Method.Existing speech enhancement technique can improve the quality of noisy speech to a certain extent, but because these technology are all based on the speech transducer of air transmitteds such as microphone, in receiving signal, neighbourhood noise directly is superimposed upon on the voice signal, therefore along with the enhancing of neighbourhood noise, its performance descends inevitably, and especially under strong noise environment, existing speech enhancement technique still is difficult to the effect that obtains.
In order to adapt to the voice communication under the strong noise environment, some voice communication systems have adopted the speech transducer of non-air conduction, as throat's transmitter and osteoacusis speech transducer etc.These speech transducer are close to positions such as user's throat, jawbone in use, and vocal cord vibration drove the reed generation deformation in the sensor when user spoke, and the vibration of reed is converted into electric signal can obtains voice signal.Because the sound wave that conducts in the air can't make the reed generation deformation of this class speech transducer, therefore this class speech transducer is not subjected to the influence of acoustic noise, have very strong antijamming capability, be usually used in voice communication and speech recognition in the strong noise environments such as tank, factory.But since the detected voice signal of non-air conduction voice sensor on the characteristic of propagation channel with in a minute the time characteristic of sound channel have bigger different, therefore to compare naturalness poor with the voice that air transmitted speech transducer such as microphone receives, and sounds and uncomfortable.
The utility model content
At existing speech enhancement technique deficiency such as poor effect and non-air conduction voice sensor lower tone in strong noise environment based on the air transmitted speech transducer, the utility model provides a kind of voice acquisition device that can be used for strong noise environment.The utility model volume is little, noise resisting ability is strong, voice quality is good, easy to use, can be widely used in occasions such as voice communication under the various strong noise environments, record, identification.The concrete technical scheme of the utility model is as follows.
A kind of voice acquisition device that can be used for strong noise environment, it comprises the air transmitted speech transducer, the non-air conduction voice sensor, multi-channel data acquisition module and data processing equipment, the air transmitted speech transducer, the non-air conduction voice sensor all is connected with the multi-channel data acquisition module separately, the multi-channel data acquisition module is connected with data processing equipment, wherein, air transmitted speech transducer and non-air conduction voice sensor are respectively applied to gather the voice signal of air transmitted and non-air conduction, the multi-channel data acquisition module be used for to be gathered admission of air conduction speech transducer and non-air conduction voice signal of sensor, and data processing equipment is used for the data of multi-channel data acquisition module collection are handled and the air transmitted speech transducer is detected voice strengthening and exporting.
Further, in the above-mentioned voice acquisition device that can be used for strong noise environment, described data processing equipment comprises the noise model estimation module, the conjunctive model correcting module, voice strengthen module, conjunctive model training and adaptation module, the air transmitted speech transducer, the non-air conduction voice sensor, the noise model estimation module, voice strengthen module and are connected with the multi-channel data acquisition module respectively, the noise model estimation module, the conjunctive model correcting module, voice strengthen module and connect in turn, and the conjunctive model training is connected with the conjunctive model correcting module with the multi-channel data acquisition module with adaptation module; Wherein, the noise model estimation module is used for estimating that current air transmitted speech transducer detects the noise model of voice, the conjunctive model correcting module is used for revising according to current noise model parameter to the joint model, voice strengthen module and strengthen according to revised conjunctive model the air transmitted speech transducer being detected voice before revising, conjunctive model training and adaptation module are used for the training conjunctive model, and model parameter is carried out online self-adaptation adjustment.
Further, described data processing equipment comprises the DSP process chip.
Further, described multi-channel data acquisition module adopts the multi-channel data acquisition chip.
Further, described air transmitted speech transducer adopts microphone, and described non-air conduction voice sensor adopts throat's transmitter.
The utility model combines air transmitted speech transducer and non-air conduction voice sensor, at first set up the conjunctive model that the air transmitted speech transducer detects voice and non-air conduction voice sensor detection voice, when strengthening, utilize the non-air conduction voice sensor to detect voice and accurately estimate the acoustic noise model, and to the joint model parameter is revised accordingly, utilizes revised conjunctive model that the air transmitted speech transducer of input is detected voice then and strengthens.Recover voice signal owing to utilized air transmitted speech transducer detection voice and non-air conduction voice sensor to detect voice simultaneously, therefore compared with prior art, the method that the utility model provides can be exported the better voice signal of tonequality in strong noise environment.
The above-mentioned sound enhancement method that can be used for the voice acquisition device of strong noise environment specifically comprises following steps:
Step 1: set up air transmitted speech transducer under the clean environment and detect the conjunctive model that voice and non-air conduction voice sensor detect voice;
Step 2: according to the voice signal that the non-air conduction voice sensor detects, estimate that the air transmitted speech transducer of current reception detects the noise model of voice signal;
Step 3: the noise model parameter to the joint model of utilizing step 2 to obtain is revised;
Step 4: strengthen with revised conjunctive model the air transmitted speech transducer being detected voice signal before the correction, and the voice signal after the output enhancing.
Further, the air transmitted speech transducer detects the conjunctive model of voice and non-air conduction voice sensor detection voice in the above-mentioned steps 1, is that the air transmitted speech transducer detects voice and the non-air conduction voice sensor detects the joint ensemble of voice or the mapping relations between them.
Further, the foundation of conjunctive model in the above-mentioned steps 1 comprises following steps:
Step 1.1: gather synchronous, clean air transmitted speech transducer detection voice and non-air conduction voice sensor and detect speech data as training data;
Step 1.2: air transmitted speech transducer detection voice and the non-air conduction voice sensor detection speech data of gathering in the step 1.1 carried out the branch frame, extract channel parameters and the excitation parameters of every frame voice;
Step 1.3: utilize the air transmitted speech transducer to detect voice and non-air conduction voice sensor and detect the conjunctive model that the channel parameters extracted in the voice and excitation parameters are trained channel parameters and excitation parameters respectively.
The air transmitted speech transducer of current reception in the above-mentioned steps 2 detects the noise model of voice signal to be estimated, comprises following steps:
Step 2.1: gather the air transmitted speech transducer synchronously and detect voice and non-air conduction voice sensor detection voice;
Step 2.2: the speech data that utilizes the non-air conduction voice sensor to detect carries out the end-point detection of voice;
Step 2.3: according to the sound end that step 2.2 detects, extract the pure noise segment in the air transmitted speech transducer detection voice;
Step 2.4: utilize the pure noise segment data in the air transmitted speech transducer detection voice that obtain in the step 2.3, the statistical model of estimating noise.
In the above-mentioned steps 3, detect the channel parameters conjunctive model that training obtains in the noise model of voice signal and the step 1 according to the air transmitted speech transducer, adopt the model compensation technology that the parameter of channel parameters conjunctive model is revised.
Air transmitted speech transducer in the above-mentioned steps 4 detects voice signal and strengthens, and specifically comprises following steps:
Step 4.1: utilize and revise preceding and revised channel parameters conjunctive model, and current air transmitted speech transducer detects voice and the non-air conduction voice sensor detects the channel parameters of extracting in the voice, estimates that by selected optimization criterion clean air transmitted speech transducer detects the speech channel parameter;
Step 4.2: obtain the excitation parameters that current non-air conduction voice sensor detects voice;
Step 4.3: utilize the air transmitted speech transducer to detect the conjunctive model of voice and non-air conduction voice sensor detection voice-activated parameter, the excitation parameters that the non-air conduction voice sensor is detected voice is mapped as the excitation parameters that the air transmitted speech transducer detects voice, and reconstruct air transmitted speech transducer detects the excitation of voice;
Step 4.4: utilize air transmitted speech transducer that step 4.3 obtains to detect the excitation of voice and air transmitted speech transducer that step 4.1 obtains detects the speech channel parameter, the voice behind synthetic the enhancing.
Further preferred, the optimization criterion in the above-mentioned steps 4.1 is minimum mean square error criterion.
In the said method, the air transmitted speech transducer detects the parameter that voice and non-air conduction voice sensor detect the voice conjunctive model, adopts the model adaptation technology to adjust at acoustic noise during less than default thresholding.
Compared with prior art, the utility model has following major advantage:
(1) volume is little, and is easy to use.With the voice wild phase ratio based on microphone array, the utility model uses an air transmitted speech transducer and a non-air conduction voice sensor, and the structure of portion is designed to compact earphone per capita, and volume is littler, uses more convenient.
(2) with voice wild phase ratio based on the air transmitted speech transducer, antimierophonic ability is stronger.The utility model goes conduction sensor to combine air transmitted speech transducer and non-NULL, and the sound wave that conducts in the air can not exert an influence to the sensor of non-air conduction, thereby have very strong noise resisting ability, in strong noise environment, still can obtain voice more clearly.
(3) with voice wild phase ratio based on the non-air conduction voice sensor, voice quality is better.Therefore the utility model has utilized the mapping relations between non-air conduction voice and the air transmitted voice to rebuild clean speech when voice strengthen, and has better naturalness with comparing based on the sound enhancement method of non-air conduction voice sensor.
Description of drawings
The speech sound enhancement device structural drawing that Fig. 1 provides for the utility model embodiment;
The sound enhancement method process flow diagram that Fig. 2 provides for the utility model embodiment;
Set up the process flow diagram of voice conjunctive model in the sound enhancement method that Fig. 3 provides for the utility model embodiment;
Set up the process flow diagram of noise model in the sound enhancement method that Fig. 4 provides for the utility model embodiment;
In the sound enhancement method that Fig. 5 provides for the utility model embodiment the air transmitted speech transducer is detected the process flow diagram that voice strengthen.
Embodiment
Below in conjunction with drawings and Examples concrete implementation step of the present utility model is described further, but enforcement of the present utility model and protection domain are not limited thereto.
The system construction drawing of the speech sound enhancement device that the utility model embodiment provides as shown in Figure 1, comprise the air transmitted speech transducer, the non-air conduction voice sensor, multi-channel data acquisition module and data processing equipment, wherein data processing equipment comprises the noise model estimation module, the conjunctive model correcting module, voice strengthen module, conjunctive model training and adaptation module, wherein air transmitted speech transducer, the non-air conduction voice sensor, the noise model estimation module, voice strengthen module and are connected with the multi-channel data acquisition module respectively, the noise model estimation module, the conjunctive model correcting module, voice strengthen module and connect in turn, and the conjunctive model training is connected with the conjunctive model correcting module with the multi-channel data acquisition module with adaptation module.Air transmitted speech transducer and non-air conduction voice sensor are respectively applied to gather the voice signal of air transmitted and non-air conduction, in above-described embodiment, the air transmitted speech transducer adopts microphone to realize, the non-air conduction voice sensor adopts throat's transmitter to realize; The multi-channel data acquisition module is used for gathering admission of air conduction speech transducer and non-air conduction voice signal of sensor, and in above-described embodiment, the multi-channel data acquisition module adopts the multi-channel data acquisition chip to realize; The noise model estimation module is used for estimating that current air transmitted speech transducer detects the noise model of voice, the conjunctive model correcting module is used for revising corresponding to the parameter of air transmitted voice to the joint model according to current noise model, voice strengthen module and strengthen according to revised conjunctive model the air transmitted speech transducer being detected voice before revising, conjunctive model training and adaptation module are used for the training conjunctive model, and model parameter carried out online self-adaptation adjustment, in above-described embodiment, the noise model estimation module, the conjunctive model correcting module, voice strengthen module, and conjunctive model training and adaptation module are realized in dsp chip.
In above-described embodiment, sound enhancement method adopts following steps to realize as shown in Figure 2:
Step 1: set up air transmitted speech transducer under the clean environment and detect the conjunctive model that voice and non-air conduction voice sensor detect voice, its flow process specifically can be divided into following steps as shown in Figure 3:
Step 1.1: gather synchronous, clean air transmitted speech transducer detection voice and non-air conduction voice sensor and detect speech data as training data.In above-described embodiment, quietly gathering the speech data of microphone and throat's transmitter collection synchronously as the training data of conjunctive model by the multi-channel data acquisition chip under the environment.
Step 1.2: air transmitted speech transducer detection voice and the non-air conduction voice sensor detection speech data of gathering in the step 1.1 carried out the branch frame, extract channel parameters and the excitation parameters of every frame voice.In above-described embodiment, air transmitted speech transducer detection voice and non-air conduction voice sensor detection voice are carried out the branch frame by the interval of 10ms, for channel parameters, adopt the linear prediction analysis method to extract the linear predictor coefficient (being the LPC coefficient) that a frame air transmitted speech transducer detects voice and non-air conduction voice sensor detection voice.By the lpc analysis wave filter, the amplitude spectrum that obtains prediction residual is required excitation parameters with raw tone.
Step 1.3: utilize the air transmitted speech transducer to detect the channel parameters and the excitation parameters that extract in voice and the non-air conduction voice sensor detection voice and train channel parameters conjunctive model and excitation parameters conjunctive model respectively.
The conjunctive model that the air transmitted speech transducer detects voice and non-air conduction voice sensor detection speech parameter can adopt joint ensemble or the mapping relations between them to represent, in above-described embodiment, use Gauss model to come that the air transmitted speech transducer is detected the channel parameters and the excitation parameters that extract in voice and the non-air conduction voice sensor detection voice and carry out modeling, concrete grammar is as follows:
Conjunctive model training for channel parameters, at first synchronization air transmitted speech transducer is detected the LPC parameter of extracting in voice and the non-air conduction voice sensor detection voice and be converted to linear prediction cepstrum coefficient (being the LPCC coefficient), both are merged into an associating vector, be designated as c=[c 1 T, c 2 T] T, c wherein 1Be the LPCC coefficient of air transmitted speech transducer detection voice, c 2Be the LPCC coefficient of non-air conduction voice sensor detection voice, use J Gauss model to come the probability distribution of this associating vector of match then.Make λ jRepresent j Gauss model, then its model parameter comprises the prior probability of average, variance and this Gauss model of Gaussian function.The parameter of Gauss model has the training method of multiple maturation, in above-described embodiment, adopts following steps to train the parameter of J Gauss model:
Step 1.3.1: the associating vector of all training usefulness is divided into J group, each group uses a Gauss model to come its probability distribution of match, obtain in this group the average of all associating vectors and variance as average and the variance of Gaussian function, the number of the associating vector that comprises in this group is the prior probability of this Gauss model with the ratio of the associating vector number of all training usefulness.
Step 1.3.2: the Gauss model parameter according to the previous step gained is repartitioned affiliated group to the associating vector of all training usefulness, and its principle is if a certain associating vector C belongs to crowd j, and P (c| λ is then arranged j)>P (c| λ i), i ≠ j.
Step 1.3.3: if iterations reaches preset value, then current Gauss model parameter is the Gauss model parameter that trains.Otherwise the grouping result of 1.3.2 recomputates average, variance and the prior probability of all Gauss models set by step, and changes step 1.3.2.
For the joint ensemble training of excitation parameters, synchronization air transmitted speech transducer is detected the amplitude spectrum of the excitation of extracting in voice and the non-air conduction voice sensor detection voice and merge into an associating vector, be designated as s=[s 1 T, s 2 T] T, s wherein 1Be the amplitude spectrum of air transmitted speech transducer detection voice-activated, s 2Detect the amplitude spectrum of voice-activated for the non-air conduction voice sensor.Use K Gauss model to come the probability distribution of this associating vector of match, adopt the training method identical with channel parameters, can obtain the parameter of K Gauss model of excitation parameters.
Step 2: according to the voice signal that the non-air conduction voice sensor detects, estimate that the air transmitted speech transducer of current reception detects the noise model of voice signal, its flow process as shown in Figure 4, concrete steps are as follows:
Step 2.1: gather the air transmitted speech transducer synchronously and detect voice and non-air conduction voice sensor detection voice.In above-described embodiment, microphone voice signal and the Sound Conducted by Laryngoscope signal gathered simultaneously by data acquisition chip, and send into the estimation that the noise model estimation module is carried out noise model;
Step 2.2: the speech data that utilizes the non-air conduction voice sensor to detect carries out the end-point detection of voice.Because the voice signal that the non-air conduction voice sensor detects is not subjected to the acoustic enviroment The noise, therefore can the end points that detect voice under the environment of acoustic noise exactly arranged.Sound end detects several different methods, in above-described embodiment, adopts the classical method based on energy and zero-crossing rate that the detected voice of throat's transmitter are carried out end-point detection;
Step 2.3: according to the sound end that step 2.2 detects, extract the pure noise segment in the air transmitted speech transducer detection voice.Because the air transmitted speech transducer detects voice and the non-air conduction voice sensor detects the voice synchronous collection, therefore the end points of both voice is consistent in time, the sound end that detects according to step 2.2 can detect the no segment of speech that the air transmitted speech transducer detects voice, i.e. pure noise signal.
Step 2.4: utilize the pure noise segment data in the air transmitted speech transducer detection voice that obtain in the step 2.3, the statistical model of estimating noise.In above-described embodiment, only the channel parameters of noise is carried out modeling, modeler model adopts single Gaussian function, extracts the channel parameters of the pure noise signal of some frames and calculates its average and variance, can obtain the Gauss model of noise channel parameters.
Step 3: the noise model parameter to the joint model of utilizing step 2 to obtain is revised, and makes itself and current environment for use coupling.
Above-mentioned steps is finished at the conjunctive model correcting module, detect the channel parameters conjunctive model that training obtains in the noise model of voice signal and the step 1 according to the air transmitted speech transducer, adopt the model compensation technology that the parameter of channel parameters conjunctive model is revised, itself and current environment for use are mated.In above-described embodiment, the non-air conduction sensor detects the influence that voice are considered to not be subjected to acoustic noise, so the noise that the non-air conduction sensor detects in the voice is set to 0, and used noise parameter all arranges extraction by this in the model compensation.In addition, the Gauss model prior probability in the channel parameters conjunctive model remains unchanged, and the conjunctive model of excitation parameters is not revised.
The model compensation technology is widely used in speech recognition, for channel parameters, adopted a kind of model compensation technology that is applicable to linear predictor coefficient (LPCC) to come the Gauss model parameter among the GMM revised that (document sees reference: Ivandro Sanches. Noise-Compensated Hidden Markov Models. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING in above-described embodiment, 2000,8(5): 533-540), concrete grammar is as follows:
(1) compensation of average
Make c sThe average of representing Gauss model in the linear cepstrum domain, then its correction is carried out according to the following steps:
Step 3.1A: use formula (1) with c sTransform to the LPC territory from the LPCC territory
a 1 = - c 1 = a k = - c k - Σ j = 1 k - 1 ( 1 - j k ) a j c k - j , 2 ≤ k ≤ p - - - ( 1 )
Obtain the average a in LPC territory s=[a 1, a 2... a p] T
Step 3.2A: use formula (2) that the mean value transformation in LPC territory is arrived the auto-correlation territory
Ar s=-a s (2)
Wherein A = 1 0 0 . . . 0 a 1 1 0 . . . 0 a 2 a 1 0 . . . 0 . . . . . . . . . . . . . . . a p - 1 a p - 2 a p - 3 . . . 1 + a 2 a 3 . . . a p 0 a 3 a 4 . . . 0 0 . . . . . . . . . . . . . . . a p 0 . . . 0 0 0 0 . . . 0 0 , r s=[r 1, r 2... r p] TAverage for the auto-correlation territory.
Step 3.3A: the signal to noise ratio (S/N ratio) of computing voice signal
Figure BDA0000289782853
, E wherein SAnd E nRepresent the energy estimated value of clean speech signal and noise respectively, compensate with the average of formula (3) to the auto-correlation territory
r s + n = 1 1 + α ( r s + α r n ) - - - ( 3 )
Step 3.4A: with r S+nTransform to the LPC territory and obtain, revised LPC coefficient average a S+n
Step 3.5A: with LPC coefficient average a S+nTransform to the LPCC territory, obtain revised LPCC coefficient average c S+n
(2) compensation of variance
The variance correction of Gauss model can be divided into following steps and carry out:
Step 3.1B: employing formula (4) is transformed into LOG energy spectral domain with average and the variance of Gauss model from cepstrum domain
l s = pC c s , σ s 2 = p 2 C σ 2 ( c s ) C T
l n = pC c n , σ n 2 = p 2 C σ 2 ( c n ) C T - - - ( 4 )
C wherein s, σ 2(c s) and c n, σ 2(c n) be respectively average and the variance of clean speech signal and noise cepstrum domain Gauss model, l s,
Figure BDA0000289782857
And l n,
Figure BDA0000289782858
Be respectively average and the variance of clean speech signal and noise LOG energy spectrum Gauss model, C is the DCT matrix.
Step 3.2B: calculate noisy speech signal in the variance of LOG energy spectral domain with formula (5)
σ s + n 2 ( i , j ) = Δ i Δ j σ s 2 ( i , j ) + ( 1 - Δ i ) ( 1 - Δ j ) σ n 2 ( i , j ) - - - ( 5 )
Wherein &Delta; i = 0 , if S i / N i < 1 1 , if S i / N i &GreaterEqual; 1 , i = 1,2 , . . . p , S i, N iI component representing the energy spectrum of clean speech signal and noise signal respectively.
Step 3.3B: with formula (6) variance of LOG energy spectral domain is transformed into cepstrum domain, obtains the variance matrix of noisy speech signal cepstrum domain Gauss model
&sigma; 2 ( c s + n ) = p - 2 C - 1 &sigma; s + n 2 C - T - - - ( 6 )
Step 4: with before revising and revised conjunctive model the air transmitted speech transducer detected voice signal strengthen, and the voice signal of output after strengthening, its flow process as shown in Figure 5, concrete grammar is as follows:
Step 4.1: utilize and revise preceding and revised channel parameters conjunctive model, and current air transmitted speech transducer detects voice and the non-air conduction voice sensor detects the channel parameters of extracting in the voice, estimates that by selected optimization criterion clean air transmitted speech transducer detects the speech channel parameter.
If the probability density function of i Gauss model of channel parameters is P (c| λ before revising i), the revised probability density function of this Gauss model be P (c ' | λ ' i), wherein c and c ' are respectively the channel parameters of clean speech and noisy speech, λ iAnd λ ' iBe respectively and revise preceding and revised i Gauss model.When then known air conduction speech transducer detects the channel parameters of extracting in voice and the non-air conduction voice sensor detection voice, select minimum mean square error criterion, clean air transmitted speech transducer detects being estimated as of speech channel parameter
c ~ = E ( c | c &prime; ) = &Integral; c &Sigma; j = 1 J [ P ( c | &lambda; j ) &Sigma; k = 1 J ( P ( &lambda; j | &lambda; k &prime; ) P ( &lambda; k &prime; | c &prime; ) ) ] dc - - - ( 7 )
Wherein P ( &lambda; k &prime; | c &prime; ) = P ( &lambda; k &prime; ) P ( c &prime; | &lambda; k &prime; ) &Sigma; k = 1 J P ( &lambda; k &prime; ) P ( c &prime; | &lambda; k &prime; ) P ( &lambda; j | &lambda; k &prime; ) = 1 j = k 0 j &NotEqual; k
Step 4.2: obtain the excitation parameters that current non-air conduction voice sensor detects voice.In above-described embodiment, owing to think that the non-air conduction voice sensor detects the influence that voice are not subjected to acoustic noise, therefore directly utilize in the current channel parameters and construct the linear prediction analysis filter that the non-air conduction voice sensor detects voice corresponding to that part of parameter of non-air conduction voice sensor detection voice, the non-air conduction voice sensor is detected voice can obtain the pumping signal that the non-air conduction voice sensor detects voice by above-mentioned linear prediction analysis filter.The amplitude spectrum of this pumping signal is the excitation parameters that current non-air conduction voice sensor detects voice.
Step 4.3: utilize the air transmitted speech transducer to detect voice and non-air conduction voice sensor detection voice-activated parametric joint model, the excitation parameters that the non-air conduction voice sensor is detected voice is mapped as the excitation parameters that the air transmitted speech transducer detects voice, and reconstruct air transmitted speech transducer detects the excitation of voice.
If the probability density function of i Gauss model of excitation parameters is P (s| γ i), wherein , s MAnd s TBe respectively the excitation parameters of air transmitted speech transducer detection voice and non-air conduction voice sensor detection voice, γ iBe i Gauss model, then can the excitation parameters that the non-air conduction voice sensor detects voice be mapped as the excitation parameters that the air transmitted speech transducer detects voice by formula (8)
s ~ M = E ( s M | s T ) = &Integral; s M &Sigma; j = 1 K [ P ( s M | &gamma; j ) P ( &gamma; j | s T ) ] d s M - - - ( 8 )
Wherein This excitation parameters can regard that current clean air transmitted speech transducer detects the estimated value of voice-activated parameter as.
Estimate clean air transmitted speech transducer and detect the voice-activated parameter, be after clean air transmitted speech transducer detects the amplitude spectrum of voice-activated signal, the phase spectrum of this amplitude spectrum and current air transmitted speech transducer detection voice-activated is constructed jointly the frequency spectrum of pumping signal, and be transformed into time domain, can obtain the excitation that reconstruct air transmitted speech transducer detects voice.
Step 4.4: the air transmitted speech transducer of utilizing step 4.3 to obtain detects the excitation of voice and the clean air transmitted speech transducer detection speech channel parameter that step 4.1 obtains, the voice after synthetic being enhanced.The pumping signal that the cleaned air conduction speech transducer of estimating in the above-mentioned steps to obtain is detected voice is by the composite filter of the clean track parametric configuration estimating to obtain, the voice after namely being enhanced.
In above-described embodiment, in order to reduce the training time of conjunctive model, the air transmitted speech transducer of gathering a plurality of speaker's synchronous recordings before use earlier detects voice and non-air conduction voice sensor detection speech data, channel parameters and the excitation parameters conjunctive model of training unspecified person.During use, under the situation of acoustic noise less than default thresholding, adopt traditional MLLR model adaptation technology parameter to the joint model to adjust, to adapt to a certain specific speaker better.

Claims (5)

1. voice acquisition device that can be used for strong noise environment, it is characterized in that comprising the air transmitted speech transducer, the non-air conduction voice sensor, multi-channel data acquisition module and data processing equipment, the air transmitted speech transducer, the non-air conduction voice sensor all is connected with the multi-channel data acquisition module separately, the multi-channel data acquisition module is connected with data processing equipment, wherein, air transmitted speech transducer and non-air conduction voice sensor are respectively applied to gather the voice signal of air transmitted and non-air conduction, the multi-channel data acquisition module be used for to be gathered admission of air conduction speech transducer and non-air conduction voice signal of sensor, and data processing equipment is used for the data of multi-channel data acquisition module collection are handled and the air transmitted speech transducer is detected voice strengthening and exporting.
2. a kind of voice acquisition device that can be used for strong noise environment according to claim 1, it is characterized in that described data processing equipment comprises the noise model estimation module, the conjunctive model correcting module, voice strengthen module, conjunctive model training and adaptation module, the air transmitted speech transducer, the non-air conduction voice sensor, the noise model estimation module, voice strengthen module and are connected with the multi-channel data acquisition module respectively, the noise model estimation module, the conjunctive model correcting module, voice strengthen module and connect in turn, and the conjunctive model training is connected with the conjunctive model correcting module with the multi-channel data acquisition module with adaptation module; Wherein, the noise model estimation module is used for estimating that current air transmitted speech transducer detects the noise model of voice, the conjunctive model correcting module is used for revising according to current noise model parameter to the joint model, voice strengthen module and strengthen according to revised conjunctive model the air transmitted speech transducer being detected voice before revising, conjunctive model training and adaptation module are used for the training conjunctive model, and model parameter is carried out online self-adaptation adjustment.
3. a kind of voice acquisition device that can be used for strong noise environment according to claim 1 is characterized in that described data processing equipment comprises the DSP process chip.
4. a kind of voice acquisition device that can be used for strong noise environment according to claim 1 is characterized in that described multi-channel data acquisition module adopts the multi-channel data acquisition chip.
5. according to each described a kind of voice acquisition device that can be used for strong noise environment of claim 1 ~ 4, it is characterized in that described air transmitted speech transducer adopts microphone, described non-air conduction voice sensor adopts throat's transmitter.
CN 201320107350 2013-03-08 2013-03-08 Voice acquisition device used for noisy environment Expired - Fee Related CN203165457U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201320107350 CN203165457U (en) 2013-03-08 2013-03-08 Voice acquisition device used for noisy environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201320107350 CN203165457U (en) 2013-03-08 2013-03-08 Voice acquisition device used for noisy environment

Publications (1)

Publication Number Publication Date
CN203165457U true CN203165457U (en) 2013-08-28

Family

ID=49026622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201320107350 Expired - Fee Related CN203165457U (en) 2013-03-08 2013-03-08 Voice acquisition device used for noisy environment

Country Status (1)

Country Link
CN (1) CN203165457U (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024018A (en) * 2015-03-27 2016-10-12 大陆汽车系统公司 Real-time wind buffet noise detection
CN109643477A (en) * 2016-08-12 2019-04-16 因滕迪梅公司 Equipment for notification voice alarm etc.
WO2019128140A1 (en) * 2017-12-28 2019-07-04 科大讯飞股份有限公司 Voice denoising method and apparatus, server and storage medium
CN110390945A (en) * 2019-07-25 2019-10-29 华南理工大学 A kind of dual sensor sound enhancement method and realization device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106024018A (en) * 2015-03-27 2016-10-12 大陆汽车系统公司 Real-time wind buffet noise detection
CN106024018B (en) * 2015-03-27 2022-06-03 大陆汽车系统公司 Real-time wind buffet noise detection
CN109643477A (en) * 2016-08-12 2019-04-16 因滕迪梅公司 Equipment for notification voice alarm etc.
WO2019128140A1 (en) * 2017-12-28 2019-07-04 科大讯飞股份有限公司 Voice denoising method and apparatus, server and storage medium
US11064296B2 (en) 2017-12-28 2021-07-13 Iflytek Co., Ltd. Voice denoising method and apparatus, server and storage medium
CN110390945A (en) * 2019-07-25 2019-10-29 华南理工大学 A kind of dual sensor sound enhancement method and realization device
CN110390945B (en) * 2019-07-25 2021-09-21 华南理工大学 Dual-sensor voice enhancement method and implementation device

Similar Documents

Publication Publication Date Title
CN103208291A (en) Speech enhancement method and device applicable to strong noise environments
CN103229238B (en) System and method for producing an audio signal
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
US11024324B2 (en) Methods and devices for RNN-based noise reduction in real-time conferences
US8880396B1 (en) Spectrum reconstruction for automatic speech recognition
CN106373589B (en) A kind of ears mixing voice separation method based on iteration structure
CN102565759B (en) Binaural sound source localization method based on sub-band signal to noise ratio estimation
CN105489227A (en) Hearing device comprising a low-latency sound source separation unit
CN105632512B (en) A kind of dual sensor sound enhancement method and device based on statistical model
CN101625869B (en) Non-air conduction speech enhancement method based on wavelet-packet energy
CN109584903A (en) A kind of multi-person speech separation method based on deep learning
CN106710603A (en) Speech recognition method and system based on linear microphone array
CN105741849A (en) Voice enhancement method for fusing phase estimation and human ear hearing characteristics in digital hearing aid
CN105390142B (en) A kind of digital deaf-aid voice noise removing method
CN203165457U (en) Voice acquisition device used for noisy environment
CN108109617A (en) A kind of remote pickup method
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
CN111916101A (en) Deep learning noise reduction method and system fusing bone vibration sensor and double-microphone signals
CN107346664A (en) A kind of ears speech separating method based on critical band
WO2016042295A1 (en) Speech synthesis from detected speech articulator movement
CN111583936A (en) Intelligent voice elevator control method and device
CN104064196B (en) A kind of method of the raising speech recognition accuracy eliminated based on speech front-end noise
CN111312275B (en) On-line sound source separation enhancement system based on sub-band decomposition
WO2021012403A1 (en) Dual sensor speech enhancement method and implementation device

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130828

Termination date: 20160308