CN101089952A - Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould - Google Patents

Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould Download PDF

Info

Publication number
CN101089952A
CN101089952A CNA2006100922461A CN200610092246A CN101089952A CN 101089952 A CN101089952 A CN 101089952A CN A2006100922461 A CNA2006100922461 A CN A2006100922461A CN 200610092246 A CN200610092246 A CN 200610092246A CN 101089952 A CN101089952 A CN 101089952A
Authority
CN
China
Prior art keywords
noise
speech
speech manual
ratio
manual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100922461A
Other languages
Chinese (zh)
Other versions
CN101089952B (en
Inventor
丁沛
何磊
郝杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to CN2006100922461A priority Critical patent/CN101089952B/en
Priority to US11/758,855 priority patent/US20080059163A1/en
Publication of CN101089952A publication Critical patent/CN101089952A/en
Application granted granted Critical
Publication of CN101089952B publication Critical patent/CN101089952B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit

Abstract

A method for suppressing noise includes applying sectioned linear function to approximate interflow supergeomatric function and utilizing geometric column weight to carry out time axis smooth and frequency axis smooth on voice chart composition after minimum mean square error is estimated as well as regulating verified SNR for controlling balance between noise suppression and voice distortion.

Description

Squelch, smoothing speech manual, extraction phonetic feature, speech recognition, and the method and apparatus of training utterance model
Technical field
The present invention relates to speech recognition technology, and the noise reduction techniques of speech manual and speech manual smoothing technique.
Background technology
Popular speech recognition system can obtain very high accuracy of identification to clean speech at present, but because noise brings the mismatch between acoustic model and the acoustic feature, the performance of existing speech recognition system can sharply descend under noise circumstance.
Mainly concentrate on Front-end Design in the work aspect the noise robustness, purpose is to reduce the mismatch at speech feature space that noise brings.Least mean-square error (Minimum Mean-Square Error MMSE) estimates it is a kind of voice enhancement algorithm, and it can suppress ground unrest effectively, thus the signal to noise ratio (S/N ratio) of raising input signal (Signal-to-Noise Ratio, SNR).Estimate for least mean-square error, document " Speech enhancement using aminimum mean-square error short-time spectral amplitude estimator " at Y.Ephraim and D.Malah, IEEE Trans.Acoustic, Speech, and Signal Processing, Vol.ASSP-32, PP.1109-1121 is described in detail in 1984.In the document, utilize MMSE to estimate to short-time spectrum amplitude (Short-Time Spectral Amplitude, STSA) estimate, and the system that has proposed to utilize MMSE STSA to estimate, and with this system with widely used based on Wiener filtering with subtract the system of composing algorithm (Spectral Subtraction Algorithm) and compare.The full content of above document is contained in this with way of reference, for your guidance.
Adopting MMSE to strengthen improving robustness at front end is that positive role is arranged, but in this framework, has three problems to solve:
1. the calculating (by the Taylor series read group total) to confluent hypergeometric function can cause huge calculated amount.
2. owing to the transition to noise suppresses, cause the decline of recognition performance in the extremely low-yield meeting of some frequency range existence.
3. the strategy in MMSE estimates is not optimum concerning speech recognition.
Summary of the invention
In order to solve above-mentioned problems of the prior art, the invention provides noise suppressing method, the method of smoothing speech manual, extract the method for phonetic feature, the method of audio recognition method and training utterance model, and Noise Suppression Device, the device of smoothing speech manual, extract the device of phonetic feature, the device of speech recognition equipment and training utterance model.
According to an aspect of the present invention, provide a kind of noise suppressing method that is used to contain the noise speech manual, having comprised:, the described noise speech manual that contains has been carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual according to the Noise Estimation spectrum; Wherein, replacing confluent hypergeometric function to carry out described least mean-square error with piecewise linear function estimates.
According to another aspect of the present invention, provide a kind of noise suppressing method that is used to contain the noise speech manual, having comprised:, the described noise speech manual that contains has been carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual according to the priori signal to noise ratio (S/N ratio); And adjust described priori signal to noise ratio (S/N ratio) to obtain suitable squelch.
According to another aspect of the present invention, provide a kind of method that is used for smoothing speech manual, having comprised: the weighted mean of the energy of each spectral component and adjacent spectral component thereof in how much above-mentioned speech manuals of ordered series of numbers weight calculation of utilization; And with the energy of above-mentioned this spectral component of weighted mean correction that calculates.
According to another aspect of the present invention, provide a kind of method that is used to extract phonetic feature, having comprised: will contain the noise phonetic modification and become to contain the noise speech manual; Utilize noise suppressing method recited above, reduce the described noise that contains the noise speech manual; And extract phonetic feature from the speech manual that described noise reduces.
According to another aspect of the present invention, provide a kind of method that is used to extract phonetic feature, having comprised: phonetic modification is become speech manual; Utilize the method for smoothing speech manual recited above, level and smooth described speech manual; And from described level and smooth speech manual extraction phonetic feature.
According to another aspect of the present invention, provide a kind of audio recognition method, having comprised: utilized the method for extraction phonetic feature recited above, extract phonetic feature; And according to the described phonetic feature that extracts, recognizing voice.
According to another aspect of the present invention, provide a kind of method of training utterance model, having comprised: utilized the method for extraction phonetic feature recited above, extract phonetic feature; And, train described speech model according to the described phonetic feature that extracts.
According to another aspect of the present invention, provide a kind of audio recognition method, having comprised: will contain the noise phonetic modification and become to contain the noise speech manual; Utilize noise suppressing method recited above, reduce the described noise that contains the noise speech manual; The speech manual that reduces from described noise extracts described phonetic feature; According to the described phonetic feature that extracts, discern the described noise voice that contain; And the optimal value of determining described priori signal to noise ratio (S/N ratio) according to the result of speech recognition.
According to another aspect of the present invention, a kind of Noise Suppression Device that is used to contain the noise speech manual is provided, comprise: estimation unit (estimation unit), compose according to Noise Estimation, the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual; Wherein, described estimation unit uses piecewise linear function to replace confluent hypergeometric function to carry out described least mean-square error estimation.
According to another aspect of the present invention, a kind of Noise Suppression Device that is used to contain the noise speech manual is provided, comprise: estimation unit (estimation unit), according to the priori signal to noise ratio (S/N ratio), the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual; And adjustment unit (adjusting unit), be used to adjust described priori signal to noise ratio (S/N ratio) to obtain suitable squelch.
According to another aspect of the present invention, provide a kind of device that is used for smoothing speech manual, having comprised: weighted mean unit (weight-averaging unit), the weighted mean of the energy of a plurality of adjacent spectral components in how much ordered series of numbers weight calculation of utilization speech manual; And level and smooth amending unit (smooth-correctingunit), to the energy of each spectral component in the speech manual, utilize the weighted mean of the energy of this spectral component that described weighted mean unit calculates and adjacent spectral component thereof to revise.
According to another aspect of the present invention, provide a kind of device that is used to extract phonetic feature, having comprised: converter unit (transforming unit) will contain the noise phonetic modification and become to contain the noise speech manual; Noise Suppression Device recited above is used to reduce the described noise that contains the noise speech manual; And extraction unit (extracting unit), the speech manual that reduces from described noise extracts described phonetic feature.
According to another aspect of the present invention, provide a kind of device that is used to extract phonetic feature, having comprised: converter unit (transforming unit) becomes speech manual with phonetic modification; The device of smoothing speech manual recited above is used for level and smooth described speech manual; And extraction unit (extracting unit), extract described phonetic feature from described level and smooth speech manual.
According to another aspect of the present invention, provide a kind of speech recognition equipment, having comprised: the device of extraction phonetic feature recited above is used to extract phonetic feature; And voice recognition unit (speechrecognition unit), according to the described phonetic feature that extracts, recognizing voice.
According to another aspect of the present invention, provide a kind of device of training utterance model, having comprised: the device of extraction phonetic feature recited above is used to extract phonetic feature; And model training unit (model-training unit), according to the described phonetic feature that extracts, train described speech model.
According to another aspect of the present invention, provide a kind of speech recognition equipment, having comprised: converter unit (transforming unit) will contain the noise phonetic modification and become to contain the noise speech manual; Noise Suppression Device recited above is used to reduce the described noise that contains the noise speech manual; Extraction unit (extractingunit), the speech manual that reduces from described noise extracts described phonetic feature; Voice recognition unit (speechrecognition unit) according to the described phonetic feature that extracts, is discerned the described noise voice that contain; And definite device (determination unit), determine the optimal value of described priori signal to noise ratio (S/N ratio) according to the result of speech recognition.
Description of drawings
Believe by below in conjunction with the explanation of accompanying drawing, can make people understand the above-mentioned characteristics of the present invention, advantage and purpose better the specific embodiment of the invention.
Fig. 1 is the process flow diagram of noise suppressing method according to an embodiment of the invention;
Fig. 2 A-2D shows an example of the process of the cut-point that piecewise linear function is set, wherein Fig. 2 A shows the curve of a confluent hypergeometric function, Fig. 2 B shows the curve of the derivative of confluent hypergeometric function, Fig. 2 C shows the curve of the difference between confluent hypergeometric function and the piecewise linear function, and Fig. 2 D shows the curve of the piecewise linear function after cutting apart;
Fig. 3 is the process flow diagram of noise suppressing method according to another embodiment of the invention;
Fig. 4 A-4C shows an example of the balance between control squelch and the voice distortion, wherein Fig. 4 A shows the initial MMSE enhancing spectrum that priori SNR is not adjusted, Fig. 4 B shows by reducing priori SNR and adjusts the speech manual that obtains, and Fig. 4 C shows by increasing the speech manual that priori SNR adjustment obtains;
Fig. 5 is the process flow diagram of the method for smoothing speech manual according to another embodiment of the invention;
Fig. 6 A-6B shows an example of smoothing speech manual, and wherein Fig. 6 A shows the speech manual before level and smooth, and Fig. 6 B shows the speech manual after level and smooth;
Fig. 7 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention;
Fig. 8 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention;
Fig. 9 is the process flow diagram of audio recognition method according to another embodiment of the invention;
Figure 10 is the process flow diagram of the method for training utterance model according to another embodiment of the invention;
Figure 11 is the process flow diagram of audio recognition method according to another embodiment of the invention;
Figure 12 is the block scheme of Noise Suppression Device according to an embodiment of the invention;
Figure 13 is the block scheme of Noise Suppression Device according to another embodiment of the invention;
Figure 14 is the block scheme of the device of smoothing speech manual according to another embodiment of the invention;
Figure 15 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention;
Figure 16 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention;
Figure 17 is the block scheme of speech recognition equipment according to another embodiment of the invention;
Figure 18 is the block scheme of the device of training utterance model according to another embodiment of the invention; And
Figure 19 is the block scheme of speech recognition equipment according to another embodiment of the invention.
Embodiment
For the ease of the understanding of back embodiment, at first briefly introduce least mean-square error estimation principles.
It is a kind of voice enhancement algorithm that least mean-square error is estimated, it utilizes the estimation spectrum of ground unrest, and the noise that contains in the noise speech manual is suppressed.Particularly, least mean-square error is estimated to be undertaken by following formula (1):
A ^ k = C υ k γ k M ( υ k ) R k - - - ( 1 )
Wherein υ k = ξ k 1 + ξ k γ k - - - ( 2 )
Wherein
Figure A20061009224600153
The speech manual that the expression noise is inhibited, R kExpression contains the noise speech manual, and C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, M (υ k) be confluent hypergeometric function, and k represents k spectral component.Detail is referring to the document of above-mentioned Y.Ephraim and D.Malah.
Below just in conjunction with the accompanying drawings each embodiment of the present invention is described in detail.
Fig. 1 is the process flow diagram of noise suppressing method according to an embodiment of the invention.As shown in Figure 1, at first, in step 101, input contains the noise speech manual.Containing the noise speech manual is according to the voice data that comprises ground unrest and voice, and therefore the speech manual that for example utilizes Fast Fourier Transform (FFT) to obtain is the speech manual that ground unrest and voice are superimposed.
Then, in step 105,, carry out the least mean-square error estimation to containing the noise voice according to the Noise Estimation spectrum of pre-estimating.The Noise Estimation spectrum is the ground unrest that does not have voice to be pre-estimated obtain.The mode that obtains the Noise Estimation spectrum is a lot, for example, the ground unrest spectrum of repeatedly gathering is averaged or the like, and the present invention is to this not special restriction.Particularly, carry out the least mean-square error estimation, wherein utilize the confluent hypergeometric function M (υ in the piecewise linear function replacement formula (1) according to above-mentioned formula (1) and formula (2) k), the formula after the conversion is:
A ^ k = C υ k γ k L ( υ k ) R k - - - ( 3 )
Wherein
Figure A20061009224600155
The speech manual that the expression noise is inhibited, R kExpression contains the noise speech manual, and C is a constant, υ kSuch as formula (2) definition, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, L (υ k) be piecewise linear function, and k represents k spectral component.
In the present embodiment, can utilize the piecewise linear function L (υ that preestablishes cut-point k) approximate confluent hypergeometric function M (υ k).For example, can carry out piecewise linear function L (υ by following steps k) approximate confluent hypergeometric function M (υ k) process.
Particularly, Fig. 2 A-2D shows an example of the process of the cut-point that piecewise linear function is set, wherein Fig. 2 A shows a confluent hypergeometric function h (curve v), Fig. 2 B shows the curve of the derivative of confluent hypergeometric function, Fig. 2 C shows the curve of the difference between confluent hypergeometric function and the piecewise linear function, and Fig. 2 D shows the piecewise linear function pwlf (curve v) after cutting apart.Concrete cutting procedure is as follows.
At first, (derivative v) is as described in Fig. 2 B to calculate confluent hypergeometric function h.For convenience, in this example, the curve of differentiation value in the 0.05-0.50 scope is as example.
Then, (initial segmentation point v) is as described in Fig. 2 B to set piecewise linear function pwlf.For example in this example, to be located at derivative value be 0.10,1.15,0.20,0.25,0.30,0.35,0.40,0.45 place to the initial segmentation point.
Then, calculate piecewise linear function pwlf between per two continuous cut-points of initial segmentation point (v) and confluent hypergeometric function h (difference v) is shown in Fig. 2 C.
Then, the difference and the pre-set threshold of the functional value between per two the continuous cut-points that calculate compared, for example, in this example, threshold setting is 0.037.By relatively,, for example, between cut-point 0.10 and 0.15, for example insert a new cut-point in their midpoint if difference greater than 0.037, is then inserted a new cut-point between two continuous cut-points.
Repeat the step of aforementioned calculation difference and step afterwards thereof, up to not having described difference greater than described threshold value.Thereby, obtain the piecewise linear function shown in Fig. 2 D.
Turn back to Fig. 1, utilizing piecewise linear function L (υ k) replacement confluent hypergeometric function M (υ k) carry out after the least mean-square error estimation, in step 110, output estimates to reduce the speech manual of noise by MMSE.
By the noise suppressing method of present embodiment, utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, kept the squelch performance simultaneously.
Under same inventive concept, Fig. 3 is the process flow diagram of noise suppressing method according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 3, at first, in step 301, input contains the noise speech manual.Contain the noise speech manual and comprise ground unrest and voice.
Then, in step 305, carry out the least mean-square error estimation to containing the noise voice.Particularly, in the present embodiment, the priori signal to noise ratio (S/N ratio) ξ in the above-mentioned formula (2) is carried out least mean-square error instead of a ξ estimates, promptly carry out least mean-square error and estimate by formula (1) and formula (4):
υ k = a ξ k 1 + a ξ k γ k - - - ( 4 )
Similarly, in the present embodiment, also can utilize piecewise linear function L (υ k) replacement confluent hypergeometric function M (υ k) carry out the least mean-square error estimation, promptly utilize formula (3) and formula (4) to carry out least mean-square error and estimate.
Then, in step 310, output estimates to reduce the speech manual of noise by MMSE.
Then,, judge whether speech manual is optimum, judge that promptly noise reduces and whether the voice distortion reaches optimum balance in step 315.If the speech manual optimum then finishes in step 320.If speech manual is not optimum, then adjust coefficient a, return step 305 and proceed the MMSE estimation, up to reaching satisfied result.
Particularly, Fig. 4 A-4C shows an example of the balance between control squelch and the voice distortion, wherein Fig. 4 A shows the initial MMSE enhancing spectrum that priori SNR is not adjusted, Fig. 4 B shows by reducing priori SNR and adjusts the speech manual that obtains, and Fig. 4 C shows by increasing the speech manual that priori SNR adjustment obtains.
Can know from figure and find out that if reduce coefficient a, promptly reduce priori signal to noise ratio (S/N ratio) ξ, then squelch increases, the voice distortion increases simultaneously, as shown in Fig. 4 B.Otherwise, if increase coefficient a, promptly increase priori signal to noise ratio (S/N ratio) ξ, then squelch reduces, and the voice distortion reduces simultaneously, as shown in Fig. 4 C.Wherein, judge that adjusting suitable foundation is recognition correct rate.If recognition correct rate greater than pre-set threshold, is then adjusted and is finished.
By above explanation as can be known, because the noise suppressing method of present embodiment can utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted, so, can control the balance between noise reduction and the voice distortion, thereby reach satisfied result.
In addition, the noise suppressing method of present embodiment also can use the piecewise linear function in the above-mentioned noise suppressing method to replace confluent hypergeometric function, thereby has greatly reduced the calculated amount that MMSE estimates, has kept the squelch performance simultaneously.
Under same inventive concept, Fig. 5 is the process flow diagram of the method for smoothing speech manual according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 5, at first, in step 501, the input speech manual, for example pure speech manual contains the noise speech manual in the foregoing description, has perhaps carried out speech manual after the squelch by the foregoing description, and present embodiment is not particularly limited speech manual.
Then,, utilize the ordered series of numbers weight how much, the speech manual of importing is carried out smoothly in step 505.Wherein, for each spectral component of speech manual, as its energy, wherein weight is how much ordered series of numbers weights with the weighted mean of it and the energy of its adjacent spectral component.
Particularly, Fig. 6 A-6B shows an example of smoothing speech manual, and wherein Fig. 6 A shows the speech manual before level and smooth, and Fig. 6 B shows the speech manual after level and smooth.For example, in Fig. 6 A, to time t=10, the spectral component E (10,30) at frequency k=30 place carries out smoothly, and wherein E (10,30) represents the energy of this spectral component.Concrete smoothing method comprises following three kinds:
(1) on time shaft, promptly for each frequency, with the weighted mean of the energy of each frame and its consecutive frame energy as this frequency, this frame.For example for frequency k=30, smoothly be with the energy of the spectral component at frame t=10 place:
(E(10,30)×d 1+E(9,30)×d 2+E(11,30)×d 2+E(8,30)×d 3+E(12,30)×d 3+...)/(d 1+2d 2+2d 3+...)
D wherein 1, d 2, d 3... be how much ordered series of numbers weights of successively decreasing.For the spectral component of other frame, carry out level and smooth equally.
(2) on frequency axis, promptly for each frame, with the weighted mean of the energy of each frequency and its side frequency energy as this frame, this frequency.For example for frame t=10, smoothly be with the energy of the spectral component at frequency k=30 place:
E(10,30)=(E(10,30)×d 1+E(10,29)×d 2+E(10,31)×d 2+E(10,28)×d 3+E(10,32)×d 3+...)/(d 1+2d 2+2d 3+...)
D wherein 1, d 2, d 3... be how much ordered series of numbers weights of successively decreasing.For the spectral component of other frequency, carry out level and smooth equally.
(3) simultaneously on time shaft and frequency axis, with the weighted mean of the energy of each frequency, each frame and its side frequency, consecutive frame energy as this frequency, this frame.For example with frame t=10, the energy of the spectral component at frequency k=30 place smoothly is:
E(10,30)=(E(10,30)×d 1+E(9,30)×d 2+E(11,30)×d 2+E(10,29)×d 2+E(10,31)×d 2+E(8,30)×d 3+E(12,30)×d 3+E(10,28)×d 3+E(10,32)×d 3+...)/(d 1+4d 2+4d 3+...)
D wherein 1, d 2, d 3... be how much ordered series of numbers weights of successively decreasing.For the spectral component of other frequency and frame, carry out level and smooth equally.In addition, for frequency domain and time domain, also can use how much different ordered series of numbers weights.
Fig. 6 B shows the speech manual after level and smooth, and the spectrum energy after level and smooth as can be seen can play the effect of increase for the energy of the low spectral component of original energy level.
Turn back to Fig. 5, after the speech manual that utilizes geometry ordered series of numbers weights to input carries out smoothly, in step 510, the speech manual behind the output smoothing.
By above explanation as can be known, because the method for the smoothing speech manual of present embodiment utilizes the energy weighted mean of adjacent spectral component to come level and smooth each spectral component, for the low spectral component of original energy level, the energy of adjacent spectral component is inserted like this, thereby can be improved the quality of speech manual.
Under same inventive concept, Fig. 7 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 7, at first, in step 701, input contains the noise voice, and this contains the noise voice packet and draws together voice and the ground unrest that the speaker says.
Then, in step 705, the described noise phonetic modification that contains is become to contain the noise speech manual, for example (Fast Fourier Transform FFT) becomes the phonetic modification on the time domain speech manual on the frequency domain by fast fourier transform.
Then, in step 710, according to the described noise suppressing method of the embodiment of Fig. 1 and Fig. 2, reduce the described noise that contains the noise speech manual above utilizing.Described noise suppressing method is to carry out the least mean-square error estimation according to above-mentioned formula (3) and formula (2), wherein, utilizes piecewise linear function to replace confluent hypergeometric function.Identical in concrete noise reduction process and the foregoing description do not repeat them here.
In addition, according to the described noise suppressing method of the embodiment of Fig. 3 and Fig. 4, reduce the described noise that contains the noise speech manual above also can utilizing.Described noise suppressing method is to carry out least mean-square error according to above-mentioned formula (1) and formula (4) or formula (3) and formula (4) to estimate, wherein, utilizes a ξ to replace priori signal to noise ratio (S/N ratio) ξ.Identical in concrete noise reduction process and the foregoing description do not repeat them here.
At last, in step 715, from the speech manual that noise reduces, extract phonetic feature.Particularly, can pass through Mel frequency cepstral coefficient (Mel Frequency ceptral Coefficient, MFCC) or linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient, LPCC) etc. conventional method is extracted phonetic feature, and the present invention is not particularly limited this.
By above explanation as can be known, because the method for the extraction phonetic feature of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.
In addition, the method of the extraction phonetic feature of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of phonetic feature.
In addition, present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Fig. 8 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 8, at first, in step 801, the input voice, for example pure voice perhaps contain the voice of noise, and present embodiment is not particularly limited voice.
Then, in step 805, described phonetic modification is become speech manual, for example (Fast Fourier Transform FFT) becomes the phonetic modification on the time domain speech manual on the frequency domain by fast fourier transform.At this, if described voice comprise noise, the speech manual after can utilizing noise suppressing method in the foregoing description to conversion carries out squelch.
Then,, utilize the method for smoothing speech manual recited above, level and smooth described speech manual in step 810.Particularly, can utilize any or its combination in above-mentioned three kinds of smoothing methods that speech manual is carried out smoothly.Identical in concrete smoothing process and the foregoing description do not repeat them here.
At last, in step 815, extract described phonetic feature from described level and smooth speech manual.Particularly, for example can pass through Mel frequency cepstral coefficient (Mel Frequency ceptral Coefficient, MFCC) or linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient, LPCC) etc. conventional method is extracted phonetic feature, and the present invention is not particularly limited this.
By above explanation as can be known, because the method for the extraction phonetic feature of present embodiment can be before extracting phonetic feature from speech manual, the method of the smoothing speech manual by the foregoing description with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the quality of phonetic feature.
In addition, in the present embodiment, if described voice comprise noise, can utilize the noise suppressing method among above-mentioned 1 and 2 the embodiment with reference to the accompanying drawings, estimate to reduce noise according to carrying out least mean-square error, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates by above-mentioned formula (3) and formula (2), keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.
In addition, in the present embodiment, if described voice comprise noise, also can utilize the noise suppressing method among above-mentioned 3 and 4 the embodiment with reference to the accompanying drawings, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of phonetic feature.
In addition, present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Fig. 9 is the process flow diagram of audio recognition method according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 9, at first,, with reference to the method for the described extraction phonetic feature of the embodiment of figure 7 or Fig. 8, extract phonetic feature above utilizing in step 901.Identical in concrete leaching process and the foregoing description do not repeat them here.
Then, in step 905,, carry out speech recognition according to the described phonetic feature that extracts.Particularly, for example, phonetic feature and the good template of training in advance that extracts compared, thereby identify the content information of described voice, the present invention is not particularly limited this.
By above explanation as can be known, in the audio recognition method of present embodiment, can be before from speech manual, extracting phonetic feature, the method of the smoothing speech manual by the foregoing description with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the performance of speech recognition.
In addition, in the present embodiment, if described voice comprise noise, can be before from contain the noise speech manual, extracting phonetic feature, carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the performance of speech recognition.
In addition, alternatively, the audio recognition method of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the performance of speech recognition.
In addition, the audio recognition method of present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the performance of speech recognition.
Under same inventive concept, Figure 10 is the process flow diagram of the method for training utterance model according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 10, at first,, with reference to the method for the described extraction phonetic feature of the embodiment of figure 7 or Fig. 8, extract phonetic feature above utilizing in step 1001.Identical in concrete leaching process and the foregoing description do not repeat them here.
Then, in step 1005,, train described speech model according to the described phonetic feature that extracts.
By above explanation as can be known, in the method for the training utterance model of present embodiment, can be before from speech manual, extracting phonetic feature, the method of the smoothing speech manual by the foregoing description with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the quality of the model that trains.
In addition, in the present embodiment, if described voice comprise noise, can be before from contain the noise speech manual, extracting phonetic feature, carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the quality of the model that trains.
In addition, alternatively, the method of the training utterance model of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of the model that trains.
In addition, the method of the training utterance model of present embodiment also can be carried out least mean-square error by formula (3) and formula (4) and be estimated to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of the model that trains.
Under same inventive concept, Figure 11 is the process flow diagram of audio recognition method according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 11, at first, in step 1101, input contains the noise voice, and this contains the noise voice packet and draws together voice and the ground unrest that the speaker says.
Then, in step 1105, the described noise phonetic modification that contains is become to contain the noise speech manual, for example (Fast Fourier Transform FFT) becomes the phonetic modification on the time domain speech manual on the frequency domain by fast fourier transform.
Then, in step 1110, with reference to the described noise suppressing method of the embodiment of figure 3 and 4, reduce the described noise that contains the noise speech manual above utilizing.Described noise suppressing method is to carry out least mean-square error according to above-mentioned formula (1) and formula (4) or formula (3) and formula (4) to estimate.Identical in concrete noise reduction process and the foregoing description do not repeat them here.
Then, in step 1115, the speech manual that reduces from described noise extracts described phonetic feature.Particularly, can pass through Mel frequency cepstral coefficient (Mel Frequency ceptral Coefficient, MFCC) or linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient, LPCC) etc. conventional method is extracted phonetic feature, and the present invention has no particular limits this.
Then, in step 1120,, carry out speech recognition according to the described phonetic feature that extracts.Particularly, for example phonetic feature and the good template of training in advance that extracts compared, to obtain the information of described voice, the present invention has no particular limits this.
Then,, judge according to recognition correct rate whether the result of speech recognition is optimum, whether promptly judge recognition correct rate,, finish in step 1130 if judge it is optimum greater than pre-set threshold in step 1125.If judge it is not optimum, then the result according to speech recognition adjusts coefficient a, returns step 1110 and proceeds the MMSE estimation, up to reaching satisfied result.Concrete adjustment process, does not repeat them here with reference to as described in the embodiment shown in figure 3 and 4 as above-mentioned.
By above explanation as can be known, because the audio recognition method of present embodiment can be effectively at the result of speech recognition MMSE is estimated to adjust, thereby improved the performance of speech recognition.
Under same inventive concept, Figure 12 is the block scheme of Noise Suppression Device according to an embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 12, the Noise Suppression Device 1200 that being used to of present embodiment contains the noise speech manual comprises least mean-square error estimation unit (minimum mean-square error estimation unit) 1201, it is composed according to Noise Estimation, the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual.Described least mean-square error estimation unit 1200 utilizes piecewise linear function to replace confluent hypergeometric function, carry out the least mean-square error estimation according to above-mentioned formula (3) and formula (2), detail does not repeat them here with above-mentioned identical with reference to the description about noise suppressing method among the embodiment of Fig. 1 and 2.
The Noise Suppression Device 1200 of present embodiment can also comprise cut-point preservation unit 1205, is used to preserve the cut-point of described piecewise linear function; And Noise Estimation preservation unit 1210, be used to preserve the Noise Estimation of ground unrest being pre-estimated acquisition.In addition, described Noise Estimation also can be imported described least mean-square error estimation unit 1201 from the outside.
By above explanation as can be known,, greatly reduced the calculated amount that MMSE estimates, kept the squelch performance simultaneously because the Noise Suppression Device 1200 of present embodiment utilizes piecewise linear function to replace confluent hypergeometric function.
Under same inventive concept, Figure 13 is the block scheme of Noise Suppression Device according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 13, the Noise Suppression Device 1300 that being used to of present embodiment contains the noise speech manual comprises: least mean-square error estimation unit (minimum mean-square error estimation unit) 1301, it is according to the priori signal to noise ratio (S/N ratio), the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual; And adjustment unit (adjusting unit) 1305, be used to adjust described priori signal to noise ratio (S/N ratio) to obtain suitable squelch.Detail does not repeat them here with above-mentioned identical with reference to the description about noise suppressing method among the embodiment of figure 3 and 4.
By above explanation as can be known, owing to the Noise Suppression Device 1300 of present embodiment can be adjusted the priori signal to noise ratio (S/N ratio), so, can control the balance between noise reduction and the voice distortion, thereby reach satisfied squelch.
In addition, the Noise Suppression Device 1300 of present embodiment also can utilize the piecewise linear function in the above-mentioned noise suppressing method to replace confluent hypergeometric function to carry out the least mean-square error estimation, thereby greatly reduced the calculated amount that MMSE estimates, kept the squelch performance simultaneously.
Under same inventive concept, Figure 14 is the block scheme of the device of smoothing speech manual according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 14, the device that is used for smoothing speech manual 1400 of present embodiment comprises: weighted mean unit (weight-averaging unit) 1401, the weighted mean of the energy of a plurality of adjacent spectral components in how much ordered series of numbers weight calculation of utilization speech manual; And level and smooth amending unit (smooth-correctingunit) 1405, to the energy of each spectral component in the speech manual, utilize the weighted mean of the energy of this spectral component that described weighted mean unit calculates and adjacent spectral component thereof to revise.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for smoothing speech among the embodiment of figure 5 and 6.
By above explanation as can be known, because the device 1400 of the smoothing speech manual of present embodiment is inserted each spectral component with the energy of adjacent spectral component, for the low spectral component of original energy level, the energy of adjacent spectral component is inserted like this, thereby can be improved the quality of speech manual.
Under same inventive concept, Figure 15 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 15, the device 1500 that being used to of present embodiment extracted phonetic feature comprises: input block (inputting unit) 1501, and input contains the noise voice; Converter unit (transforming unit) 1505 becomes to contain the noise speech manual with the described noise phonetic modification that contains; Noise Suppression Device 1200 recited above or Noise Suppression Device 1300 are used to reduce the described noise that contains the noise speech manual; And extraction unit (extracting unit) 1510, the speech manual that reduces from described noise extracts described phonetic feature.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for extracting phonetic feature among the embodiment of figure 7.
By above explanation as can be known, because the device 1500 of the extraction phonetic feature of present embodiment can carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.
In addition, alternatively, the Noise Suppression Device 1300 of the device 1500 of the extraction phonetic feature of present embodiment can carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of phonetic feature.
In addition, the Noise Suppression Device 1300 of the device 1500 of present embodiment extraction phonetic feature also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Figure 16 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 16, the device 1600 that being used to of present embodiment extracted phonetic feature comprises: input block (inputting unit) 1601, input voice; Converter unit (transforming unit) 1605 becomes speech manual with described phonetic modification; The device 1400 of smoothing speech manual recited above is used for level and smooth described speech manual; And extraction unit (extracting unit) 1610, extract described phonetic feature from described level and smooth speech manual.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for extracting phonetic feature among the embodiment of figure 8.
By above explanation as can be known, since the device 1600 of the extraction phonetic feature of present embodiment can be by the foregoing description the method for smoothing speech manual with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the quality of phonetic feature.
In addition, in the present embodiment, if described voice comprise noise, can utilize the noise suppressing method among above-mentioned 1 and 2 the embodiment with reference to the accompanying drawings, estimate to reduce noise according to carrying out least mean-square error, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates by above-mentioned formula (3) and formula (2), keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.
In addition, in the present embodiment, if described voice comprise noise, also can utilize the noise suppressing method among above-mentioned 3 and 4 the embodiment with reference to the accompanying drawings, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of phonetic feature.
In addition, present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Figure 17 is the block scheme of speech recognition equipment according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 17, the speech recognition equipment 1700 of present embodiment comprises: the device 1600 of the device 1500 of extraction phonetic feature recited above or extraction phonetic feature is used to extract phonetic feature; And voice recognition unit (speech recognition unit) 1701, according to the described phonetic feature that extracts, carry out speech recognition.Detail does not repeat them here with above-mentioned identical with reference to the description about audio recognition method among the embodiment of figure 9.
By above explanation as can be known, because the speech recognition equipment 1700 of present embodiment can be before extracting phonetic feature from speech manual, the method of the smoothing speech manual by the foregoing description with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the performance of speech recognition.
In addition, in the present embodiment, if described voice comprise noise, can be before from contain the noise speech manual, extracting phonetic feature, carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the performance of speech recognition.
In addition, alternatively, the speech recognition equipment 1700 of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the performance of speech recognition.
In addition, the speech recognition equipment 1700 of present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the performance of speech recognition.
Under same inventive concept, Figure 18 is the block scheme of the device of training utterance model according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 18, the device 1800 of the training utterance model of present embodiment comprises: the device 1600 of the device 1500 of extraction phonetic feature recited above or extraction phonetic feature is used to extract phonetic feature; And model training unit (model-training unit) 1801, according to the described phonetic feature that extracts, train described speech model.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for training utterance model among the embodiment of Figure 10.
By above explanation as can be known, because the device 1800 of the training utterance model of present embodiment can be before extracting phonetic feature from speech manual, the method of the smoothing speech manual by the foregoing description with the energy weighted mean of adjacent spectral component to each spectral component, like this for the low spectral component of original energy level, the energy of adjacent spectral component is inserted, thereby can be improved the quality of speech manual.Therefore, can improve the quality of the model that trains.
In addition, in the present embodiment, if described voice comprise noise, can be before from contain the noise speech manual, extracting phonetic feature, carry out least mean-square error by above-mentioned formula (3) and formula (2) and estimate to reduce noise, wherein utilize piecewise linear function to replace confluent hypergeometric function, greatly reduced the calculated amount that MMSE estimates, keep the squelch performance simultaneously, thereby can improve the quality of the model that trains.
In addition, alternatively, the device 1800 of the training utterance model of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out least mean-square error by above-mentioned formula (1) and formula (4) and estimate to reduce noise, wherein utilize a ξ to replace priori signal to noise ratio (S/N ratio) ξ that priori signal to noise ratio (S/N ratio) ξ is adjusted the balance of controlling between noise reduction and the voice distortion, thereby can improve the quality of the model that trains.
In addition, the device 1800 of the training utterance model of present embodiment also can carry out least mean-square error by formula (3) and formula (4) and estimate to reduce noise, thereby not only can reduce the calculated amount that MMSE estimates, can control the balance between noise reduction and the voice distortion simultaneously.Therefore, can improve the quality of the model that trains.
Under same inventive concept, Figure 19 is the block scheme of speech recognition equipment according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 19, the speech recognition equipment 1900 of present embodiment comprises: input block (inputtingunit) 1901, and input contains the noise voice; Converter unit (transforming unit) 1905 becomes to contain the noise speech manual with the described noise phonetic modification that contains; Noise Suppression Device 1300 recited above is used to reduce the described noise that contains the noise speech manual; Extraction unit (extracting unit) 1910, the speech manual that reduces from described noise extracts described phonetic feature; And voice recognition unit (speechrecognition unit) 1915, according to the described phonetic feature that extracts, carry out speech recognition; Wherein, determine the optimal value of described priori signal to noise ratio (S/N ratio) according to the result of speech recognition.Detail does not repeat them here with above-mentioned identical with reference to the description about audio recognition method among the embodiment of Figure 11.
By above explanation as can be known, because the speech recognition equipment 1900 of present embodiment can be effectively at the result of speech recognition MMSE is estimated to adjust, thereby improved the performance of speech recognition.
Though more than described noise suppressing method of the present invention in detail by some exemplary embodiments, the method of smoothing speech manual, extract the method for phonetic feature, the method of audio recognition method and training utterance model, and Noise Suppression Device, the device of smoothing speech manual, extract the device of phonetic feature, the device of speech recognition equipment and training utterance model, but above these embodiment are not exhaustive, and those skilled in the art can realize variations and modifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiment, and scope of the present invention only is as the criterion by claims.

Claims (46)

1. noise suppressing method that is used to contain the noise speech manual comprises:
According to the Noise Estimation spectrum, the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual;
Wherein, replacing confluent hypergeometric function to carry out described least mean-square error with piecewise linear function estimates.
2. noise suppressing method according to claim 1 wherein, utilizes predefined cut-point that described confluent hypergeometric function is transformed to described piecewise linear function, carries out described least mean-square error and estimates.
3. noise suppressing method according to claim 2, wherein, the described predefined cut-point of described piecewise linear function obtains by following steps:
Calculate the derivative of described confluent hypergeometric function;
Set the initial segmentation point of described piecewise linear function;
Calculating between per two continuous cut-points of described initial segmentation point described piecewise linear function and the difference between the described confluent hypergeometric function;
If described difference greater than a threshold value, is inserted a new cut-point between described two continuous cut-points; And
Repeat the step of described calculating difference and step afterwards thereof, up to not having described difference greater than described threshold value.
4. according to any described noise suppressing method among the claim 1-3, wherein, described least mean-square error is estimated to be undertaken by following formula:
A ^ k = C υ k γ k L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A2006100922460002C3
Represent the speech manual that described noise reduces, R kRepresent the described noise speech manual that contains, C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to described Noise Estimation spectrum, γ kBe according to described Noise Estimation spectrum and the described posteriority signal to noise ratio (S/N ratio) that the noise speech manual obtains, the L (υ of containing k) be described piecewise linear function, and k represents k spectral component.
5. noise suppressing method that is used to contain the noise speech manual comprises:
According to the priori signal to noise ratio (S/N ratio), the described noise speech manual that contains is carried out the least mean-square error estimation, to reduce the described noise that contains the noise speech manual; And
Adjust described priori signal to noise ratio (S/N ratio) to obtain suitable squelch.
6. noise suppressing method according to claim 5, wherein, described priori signal to noise ratio (S/N ratio) obtains according to the Noise Estimation spectrum.
7. according to claim 5 or 6 described noise suppressing methods, wherein said set-up procedure increases described priori signal to noise ratio (S/N ratio) reducing described squelch, or reduces described priori signal to noise ratio (S/N ratio) to increase described squelch.
8. according to any described noise suppressing method among the claim 5-7, wherein replace confluent hypergeometric function to carry out described least mean-square error and estimate with piecewise linear function.
9. noise suppressing method according to claim 8 wherein, utilizes predefined cut-point that described confluent hypergeometric function is transformed to described piecewise linear function, carries out described least mean-square error and estimates.
10. noise suppressing method according to claim 9, wherein, the described predefined cut-point of described piecewise linear function obtains by following steps:
Calculate the derivative of described confluent hypergeometric function;
Set the initial segmentation point of described piecewise linear function;
Calculating between per two continuous cut-points of described initial segmentation point described piecewise linear function and the difference between the described confluent hypergeometric function;
If described difference greater than a threshold value, is inserted a new cut-point between described two continuous cut-points; And
Repeat the step of described calculating difference and step afterwards thereof, up to not having described difference greater than described threshold value.
11. any described noise suppressing method according to Claim 8-10, wherein, described least mean-square error is estimated to be undertaken by following formula:
A ^ k = C υ k γ k L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A2006100922460004C3
Represent the speech manual that described noise reduces, R kRepresent the described noise speech manual that contains, C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to described Noise Estimation spectrum, γ kBe according to described Noise Estimation spectrum and the described posteriority signal to noise ratio (S/N ratio) that the noise speech manual obtains, the L (υ of containing k) be described piecewise linear function, and k represents k spectral component.
12. a method that is used for smoothing speech manual comprises:
The weighted mean of the energy of each spectral component and adjacent spectral component thereof in how much above-mentioned speech manuals of ordered series of numbers weight calculation of utilization; And
Energy with above-mentioned this spectral component of weighted mean correction that calculates.
13. the method for smoothing speech manual according to claim 12, wherein, described how much ordered series of numbers weights are successively decreased with how much ordered series of numbers on away from the direction of described spectral component in described spectral component place maximum.
14. according to the method for claim 12 or 13 described smoothing speech manuals, wherein the average weighted step of aforementioned calculation comprises: the weighted mean of calculating the energy of the spectral component that this spectral component and its time goes up adjacent same frequency.
15. according to the method for claim 12 or 13 described smoothing speech manuals, wherein the average weighted step of aforementioned calculation comprises: the weighted mean of calculating the energy of the spectral component in the same frame adjacent on this spectral component and its frequency.
16. according to the method for claim 12 or 13 described smoothing speech manuals, wherein the average weighted step of aforementioned calculation comprises: calculate spectral component that this spectral component, its time go up adjacent same frequency with and frequency on the weighted mean of energy of spectral component in the adjacent same frame.
17. method according to any described smoothing speech manual among the claim 12-16, also comprise, before the average weighted step of aforementioned calculation, utilize any described noise suppressing method among the aforesaid right requirement 1-11 that described speech manual is carried out squelch.
18. a method that is used to extract phonetic feature comprises:
To contain the noise phonetic modification and become to contain the noise speech manual;
Utilize any described noise suppressing method among the aforesaid right requirement 1-11, reduce the described noise that contains the noise speech manual; And
The speech manual that reduces from described noise extracts phonetic feature.
19. the method for extraction phonetic feature according to claim 18, wherein, described shift step comprises fast fourier transform.
20. a method that is used to extract phonetic feature comprises:
Phonetic modification is become speech manual;
Utilize the method for any described smoothing speech manual among the aforesaid right requirement 12-17, level and smooth described speech manual; And
Extract phonetic feature from described level and smooth speech manual.
21. the method for extraction phonetic feature according to claim 20, wherein, described shift step comprises fast fourier transform.
22. an audio recognition method comprises:
Utilize the method for any described extraction phonetic feature of aforesaid right requirement 18-21, extract phonetic feature; And
According to the described phonetic feature that extracts, recognizing voice.
23. the method for a training utterance model comprises:
Utilize the method for any described extraction phonetic feature of aforesaid right requirement 18-21, extract phonetic feature; And
According to the described phonetic feature that extracts, train described speech model.
24. an audio recognition method comprises:
To contain the noise phonetic modification and become to contain the noise speech manual;
Utilize any described noise suppressing method among the aforesaid right requirement 5-11, reduce the described noise that contains the noise speech manual;
The speech manual that reduces from described noise extracts described phonetic feature;
According to the described phonetic feature that extracts, discern the described noise voice that contain; And
Determine the optimal value of described priori signal to noise ratio (S/N ratio) according to the result of speech recognition.
25. a Noise Suppression Device that is used to contain the noise speech manual comprises:
Estimation unit (estimation unit) according to the Noise Estimation spectrum, carries out the least mean-square error estimation to the described noise speech manual that contains, to reduce the described noise that contains the noise speech manual;
Wherein, described estimation unit uses piecewise linear function to replace confluent hypergeometric function to carry out described least mean-square error estimation.
26. Noise Suppression Device according to claim 25 wherein, utilizes predefined cut-point that described confluent hypergeometric function is transformed to described piecewise linear function, carries out described least mean-square error and estimates.
27. according to claim 25 or 26 described Noise Suppression Devices, wherein, described least mean-square error estimation unit carries out least mean-square error by following formula and estimates:
A ^ k = C υ k γ k L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A2006100922460006C3
Represent the speech manual that described noise reduces, R kRepresent the described noise speech manual that contains, C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to described Noise Estimation spectrum, γ kBe according to described Noise Estimation spectrum and the described posteriority signal to noise ratio (S/N ratio) that the noise speech manual obtains, the L (υ of containing k) be described piecewise linear function, and k represents k spectral component.
28. a Noise Suppression Device that is used to contain the noise speech manual comprises:
Estimation unit (estimation unit) according to the priori signal to noise ratio (S/N ratio), carries out the least mean-square error estimation to the described noise speech manual that contains, to reduce the described noise that contains the noise speech manual; And
Adjustment unit (adjusting unit) is used to adjust described priori signal to noise ratio (S/N ratio) to obtain suitable squelch.
29. Noise Suppression Device according to claim 28, wherein, the initial value of described priori signal to noise ratio (S/N ratio) obtains according to the Noise Estimation spectrum.
30. according to claim 28 or 29 described Noise Suppression Devices, wherein said adjusting gear is configured to increase described priori signal to noise ratio (S/N ratio) reducing described squelch, or reduces described priori signal to noise ratio (S/N ratio) to increase described squelch.
31. according to any described Noise Suppression Device among the claim 28-30, wherein said estimation unit uses piecewise linear function to replace confluent hypergeometric function to carry out described least mean-square error estimation.
32. Noise Suppression Device according to claim 31, wherein, described estimation unit utilizes predefined cut-point that described confluent hypergeometric function is transformed to described piecewise linear function, carries out described least mean-square error and estimates.
33. according to claim 31 or 32 described Noise Suppression Devices, wherein, described estimation unit carries out least mean-square error by following formula and estimates:
A ^ k = C υ k γ k L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A2006100922460007C3
Represent the speech manual that described noise reduces, R kRepresent the described noise speech manual that contains, C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to described Noise Estimation spectrum, γ kBe according to described Noise Estimation spectrum and the described posteriority signal to noise ratio (S/N ratio) that the noise speech manual obtains, the L (υ of containing k) be described piecewise linear function, and k represents k spectral component.
34. a device that is used for smoothing speech manual comprises:
Weighted mean unit (weight-averaging unit), the weighted mean of the energy of a plurality of adjacent spectral components in how much ordered series of numbers weight calculation of utilization speech manual; And
Level and smooth amending unit (smooth-correcting unit) to the energy of each spectral component in the speech manual, utilizes the weighted mean of the energy of this spectral component that described weighted mean unit calculates and adjacent spectral component thereof to revise.
35. the device of smoothing speech manual according to claim 34, wherein, described how much ordered series of numbers weights are successively decreased with how much ordered series of numbers on away from the direction of described spectral component in described spectral component place maximum.
36. according to the device of claim 34 or 35 described smoothing speech manuals, wherein said weighted mean unit calculates the weighted mean of the energy of the spectral component that this spectral component and its time goes up adjacent same frequency.
37. according to the device of claim 34 or 35 described smoothing speech manuals, wherein said weighted mean unit calculates the weighted mean of the energy of the spectral component in the same frame adjacent on this spectral component and its frequency.
38. according to the device of claim 34 or 35 described smoothing speech manuals, wherein said weighted mean unit calculate spectral component that this spectral component, its time go up adjacent same frequency with and frequency on the weighted mean of energy of spectral component in the adjacent same frame.
39. device according to any described smoothing speech manual among the claim 34-38, also comprise according to any described Noise Suppression Device among the aforesaid right requirement 25-33, be used for before described weighted mean unit calculates, described speech manual being carried out squelch.
40. a device that is used to extract phonetic feature comprises:
Converter unit (transforming unit) will contain the noise phonetic modification and become to contain the noise speech manual;
Any described Noise Suppression Device according among the aforesaid right requirement 25-33 is used to reduce the described noise that contains the noise speech manual; And
Extraction unit (extracting unit), the speech manual that reduces from described noise extracts described phonetic feature.
41. according to the device of the described extraction phonetic feature of claim 40, wherein, described converter unit is configured to carry out conversion by fast fourier transform.
42. a device that is used to extract phonetic feature comprises:
Converter unit (transforming unit) becomes speech manual with phonetic modification;
Device according to any described smoothing speech manual among the aforesaid right requirement 34-39 is used for level and smooth described speech manual; And
Extraction unit (extracting unit) extracts described phonetic feature from described level and smooth speech manual.
43. according to the device of the described extraction phonetic feature of claim 42, wherein, described converter unit is configured to carry out conversion by fast fourier transform.
44. a speech recognition equipment comprises:
Require the device of any described extraction phonetic feature of 40-43 according to aforesaid right, be used to extract phonetic feature; And
Voice recognition unit (speech recognition unit) is according to the described phonetic feature that extracts, recognizing voice.
45. the device of a training utterance model comprises:
Require the device of any described extraction phonetic feature of 40-43 according to aforesaid right, be used to extract phonetic feature; And
Described speech model according to the described phonetic feature that extracts, is trained in model training unit (model-training unit).
46. a speech recognition equipment comprises:
Converter unit (transforming unit) will contain the noise phonetic modification and become to contain the noise speech manual;
Any described Noise Suppression Device according among the aforesaid right requirement 28-33 is used to reduce the described noise that contains the noise speech manual;
Extraction unit (extracting unit), the speech manual that reduces from described noise extracts described phonetic feature;
Voice recognition unit (speech recognition unit) according to the described phonetic feature that extracts, is discerned the described noise voice that contain; And
Determine device (determination unit), determine the optimal value of described priori signal to noise ratio (S/N ratio) according to the result of speech recognition.
CN2006100922461A 2006-06-15 2006-06-15 Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould Expired - Fee Related CN101089952B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2006100922461A CN101089952B (en) 2006-06-15 2006-06-15 Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould
US11/758,855 US20080059163A1 (en) 2006-06-15 2007-06-06 Method and apparatus for noise suppression, smoothing a speech spectrum, extracting speech features, speech recognition and training a speech model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006100922461A CN101089952B (en) 2006-06-15 2006-06-15 Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould

Publications (2)

Publication Number Publication Date
CN101089952A true CN101089952A (en) 2007-12-19
CN101089952B CN101089952B (en) 2010-10-06

Family

ID=38943281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006100922461A Expired - Fee Related CN101089952B (en) 2006-06-15 2006-06-15 Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould

Country Status (2)

Country Link
US (1) US20080059163A1 (en)
CN (1) CN101089952B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154383B (en) * 2006-09-29 2010-10-06 株式会社东芝 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model
CN102723081A (en) * 2012-05-30 2012-10-10 林其灿 Voice signal processing method, voice and voiceprint recognition method and device
CN106356071A (en) * 2016-08-30 2017-01-25 广州市百果园网络科技有限公司 Noise detection method and device
CN106796802A (en) * 2014-09-03 2017-05-31 马维尔国际贸易有限公司 Method and apparatus for eliminating music noise via nonlinear attenuation/gain function
CN108550365A (en) * 2018-02-01 2018-09-18 北京云知声信息技术有限公司 The threshold adaptive method of adjustment of offline speech recognition
CN108600130A (en) * 2017-12-29 2018-09-28 南京理工大学 A kind of mains frequency method of estimation based on spectral band signal-to-noise ratio
CN109817201A (en) * 2019-03-29 2019-05-28 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium
CN110970015A (en) * 2018-09-30 2020-04-07 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN111124108A (en) * 2019-11-22 2020-05-08 Oppo广东移动通信有限公司 Model training method, gesture control method, device, medium and electronic equipment
CN111883164A (en) * 2020-06-22 2020-11-03 北京达佳互联信息技术有限公司 Model training method and device, electronic equipment and storage medium

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953052B (en) * 2005-10-20 2010-09-08 株式会社东芝 Method and device of voice synthesis, duration prediction and duration prediction model of training
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8185389B2 (en) * 2008-12-16 2012-05-22 Microsoft Corporation Noise suppressor for robust speech recognition
GB2471875B (en) * 2009-07-15 2011-08-10 Toshiba Res Europ Ltd A speech recognition system and method
KR101587844B1 (en) * 2009-08-26 2016-01-22 삼성전자주식회사 Microphone signal compensation apparatus and method of the same
US20110178800A1 (en) * 2010-01-19 2011-07-21 Lloyd Watts Distortion Measurement for Noise Suppression System
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US10149047B2 (en) * 2014-06-18 2018-12-04 Cirrus Logic Inc. Multi-aural MMSE analysis techniques for clarifying audio signals
CN106797512B (en) 2014-08-28 2019-10-25 美商楼氏电子有限公司 Method, system and the non-transitory computer-readable storage medium of multi-source noise suppressed
EP3574499B1 (en) 2017-01-26 2022-01-05 Cerence Operating Company Methods and apparatus for asr with embedded noise reduction
US10224053B2 (en) * 2017-03-24 2019-03-05 Hyundai Motor Company Audio signal quality enhancement based on quantitative SNR analysis and adaptive Wiener filtering
CN109599102A (en) * 2018-10-24 2019-04-09 慈中华 Identify the method and device of channels and collaterals state
CN111429931B (en) * 2020-03-26 2023-04-18 云知声智能科技股份有限公司 Noise reduction model compression method and device based on data enhancement
US20220199102A1 (en) * 2020-12-18 2022-06-23 International Business Machines Corporation Speaker-specific voice amplification

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546459A (en) * 1993-11-01 1996-08-13 Qualcomm Incorporated Variable block size adaptation algorithm for noise-robust acoustic echo cancellation
GB9905788D0 (en) * 1999-03-12 1999-05-05 Fulcrum Systems Ltd Background-noise reduction
JP2004198810A (en) * 2002-12-19 2004-07-15 Denso Corp Speech recognition device
CN1281003C (en) * 2004-02-26 2006-10-18 上海交通大学 Time-domain adaptive channel estimating method based on pilot matrix
CN100349383C (en) * 2004-04-14 2007-11-14 华为技术有限公司 Method and device for evaluating channels

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101154383B (en) * 2006-09-29 2010-10-06 株式会社东芝 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model
CN102723081A (en) * 2012-05-30 2012-10-10 林其灿 Voice signal processing method, voice and voiceprint recognition method and device
CN102723081B (en) * 2012-05-30 2014-05-21 无锡百互科技有限公司 Voice signal processing method, voice and voiceprint recognition method and device
CN106796802B (en) * 2014-09-03 2021-06-18 马维尔亚洲私人有限公司 Method and apparatus for eliminating musical noise via a non-linear attenuation/gain function
CN106796802A (en) * 2014-09-03 2017-05-31 马维尔国际贸易有限公司 Method and apparatus for eliminating music noise via nonlinear attenuation/gain function
CN106356071B (en) * 2016-08-30 2019-10-25 广州市百果园网络科技有限公司 A kind of noise detecting method and device
CN106356071A (en) * 2016-08-30 2017-01-25 广州市百果园网络科技有限公司 Noise detection method and device
CN108600130A (en) * 2017-12-29 2018-09-28 南京理工大学 A kind of mains frequency method of estimation based on spectral band signal-to-noise ratio
CN108600130B (en) * 2017-12-29 2020-12-18 南京理工大学 Power grid frequency estimation method based on signal-to-noise ratio of frequency spectrum band
CN108550365A (en) * 2018-02-01 2018-09-18 北京云知声信息技术有限公司 The threshold adaptive method of adjustment of offline speech recognition
CN108550365B (en) * 2018-02-01 2021-04-02 云知声智能科技股份有限公司 Threshold value self-adaptive adjusting method for off-line voice recognition
CN110970015A (en) * 2018-09-30 2020-04-07 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN110970015B (en) * 2018-09-30 2024-04-23 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN109817201A (en) * 2019-03-29 2019-05-28 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium
CN109817201B (en) * 2019-03-29 2021-03-26 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium
CN111124108A (en) * 2019-11-22 2020-05-08 Oppo广东移动通信有限公司 Model training method, gesture control method, device, medium and electronic equipment
CN111883164A (en) * 2020-06-22 2020-11-03 北京达佳互联信息技术有限公司 Model training method and device, electronic equipment and storage medium
CN111883164B (en) * 2020-06-22 2023-11-03 北京达佳互联信息技术有限公司 Model training method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
US20080059163A1 (en) 2008-03-06
CN101089952B (en) 2010-10-06

Similar Documents

Publication Publication Date Title
CN101089952B (en) Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould
CN101154383B (en) Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model
EP2151821B1 (en) Noise-reduction processing of speech signals
CN103238183B (en) Noise suppression device
CN108831499A (en) Utilize the sound enhancement method of voice existing probability
CN102132343B (en) Noise suppression device
CN103578477B (en) Denoising method and device based on noise estimation
CN103000174A (en) Feature compensation method based on rapid noise estimation in speech recognition system
CN108108357A (en) Accent conversion method and device, electronic equipment
Su et al. Speech enhancement using generalized maximum a posteriori spectral amplitude estimator
CN103594093A (en) Method for enhancing voice based on signal to noise ratio soft masking
Abdelaziz et al. Twin-HMM-based audio-visual speech enhancement
Alam et al. Robust feature extraction for speech recognition by enhancing auditory spectrum
CN106128480B (en) The method that a kind of pair of noisy speech carries out voice activity detection
CN102637438A (en) Voice filtering method
Elshamy et al. Two-stage speech enhancement with manipulation of the cepstral excitation
US20080228477A1 (en) Method and Device For Processing a Voice Signal For Robust Speech Recognition
Pandey et al. Significance of glottal activity detection for speaker verification in degraded and limited data condition
Li et al. Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition
Arakawa et al. Model-basedwiener filter for noise robust speech recognition
Esch et al. Wideband noise suppression supported by artificial bandwidth extension techniques
Kawamura et al. Impact and high-pitch noise suppression based on spectral entropy
Gordillo et al. Median filtering the temporal probability distribution in histogram mapping for robust continuous speech recognition
Panda et al. Improved spectral subtraction technique for text-independent speaker verification
Kurian et al. PNCC based speech enhancement and its performance evaluation using SNR Loss

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101006

Termination date: 20140615

EXPY Termination of patent right or utility model