EP1944754B1 - Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole - Google Patents
Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole Download PDFInfo
- Publication number
- EP1944754B1 EP1944754B1 EP07000568.1A EP07000568A EP1944754B1 EP 1944754 B1 EP1944754 B1 EP 1944754B1 EP 07000568 A EP07000568 A EP 07000568A EP 1944754 B1 EP1944754 B1 EP 1944754B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- values
- fundamental frequency
- correlation function
- speech fundamental
- power density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Not-in-force
Links
- 238000000034 method Methods 0.000 title claims description 69
- 238000001228 spectrum Methods 0.000 claims description 187
- 238000005314 correlation function Methods 0.000 claims description 143
- 230000001629 suppression Effects 0.000 claims description 34
- 238000012937 correction Methods 0.000 claims description 17
- 238000005070 sampling Methods 0.000 claims description 16
- 238000001914 filtration Methods 0.000 claims description 15
- 230000001419 dependent effect Effects 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 4
- 230000000903 blocking effect Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 45
- 238000013459 approach Methods 0.000 description 31
- 230000003595 spectral effect Effects 0.000 description 30
- 238000001514 detection method Methods 0.000 description 28
- 238000005311 autocorrelation function Methods 0.000 description 24
- 230000008901 benefit Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 21
- 230000006872 improvement Effects 0.000 description 15
- 238000012805 post-processing Methods 0.000 description 13
- 230000009467 reduction Effects 0.000 description 13
- 230000003044 adaptive effect Effects 0.000 description 10
- 230000003111 delayed effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000000638 stimulation Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241000819038 Chichester Species 0.000 description 2
- 108010014172 Factor V Proteins 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000005534 acoustic noise Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02168—Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
Definitions
- This invention relates to speech analysis systems and especially to a speech fundamental frequency estimator and a method for estimating a speech fundamental frequency.
- DFT discrete Fourier transform
- the corresponding spectrum shows distinct amplitude peaks which are located equidistantly in frequency (see for example Fig. 1 ).
- the distance between two amplitude peaks represents herein the speech fundamental frequency which is dependent of the speaker.
- This frequency varies between 80 Hz and 150 Hz
- women and children in contrast, have a higher speech fundamental frequency which varies between 150 Hz and 300 Hz with women, respectively between 200 Hz and 600 Hz with children.
- a good, sure and reliable estimation of the speech fundamental frequency is often not easy to obtain.
- Mainly difficulties in detecting low speech fundamental frequencies arise wherein especially men have in most cases a low speech fundamental frequency.
- FIG. 2 a block diagram of a multi-rate system for speech reconstruction with an analysis and a synthesis filter bank for the signal processing is shown.
- the speech fundamental frequency estimation is shown as a separate functional block.
- the aim of such an application is to extract parameters from a distorted speech signal y(n) as, for example, the spectral envelope, the type of stimulation (voiced/ unvoiced) and the speech fundamental frequency f p (n). Subsequently an undistorted speech signal x(n) is resynthesized from these parameters. For this purpose a very precise and reliable estimation of the speech fundamental frequency is necessary.
- the output signal x(n) after the synthesis filter bank should be nearly without error, the following condition is therefore very desirable: x n ⁇ s n , s(n) denotes herein the undisturbed speech signal.
- Figure 3 shows a block diagram of a signal analysis system with subsequent feature extraction and speech fundamental frequency estimation, in order to perform a speech recognition.
- An adequate estimation of the speech fundamental frequency can, for example, contribute to significantly improve the recognition rates of the speech recognizer.
- the speech fundamental frequency estimator is configured for receiving a first set of values and a second set of values, the first set of values being a frequency domain representation of a first set of time domain signal values within a first time interval and the second set of values being a frequency domain representation of a second set of time domain signal values within a second time interval, the second time interval being later than and offset from the first time interval, the speech fundamental frequency estimator comprising:
- the analyzer is further configured for performing a first frequency-time-transform of the first power density spectrum in order to obtain a first set of correlation function values, for performing a second frequency-time-transform of the second power density spectrum in order to obtain a second set of correlation function values, and for determining the speech fundamental frequency estimate on the basis of the first and second sets of correlation function values.
- a method for estimating a speech fundamental frequency using a first set of values and a second set of values, the first set of values being a received frequency domain representation of a first set of time domain signal values within a first time interval and the second set of values being a received frequency domain representation of a second set of time domain signal values within a second time interval, the second time interval being later than and offset from the first time interval, the method for estimating the speech fundamental frequency comprising the steps of:
- the step of determining the speech fundamental frequency estimate comprises performing a first frequency-time-transform of the first power density spectrum in order to obtain a first set of correlation function values, performing a second frequency-time-transform of the second power density spectrum in order to obtain a second set of correlation function values, and determining the speech fundamental frequency estimate on the basis of the first and second sets of correlation function values.
- This first aspect of the invention is based on the finding that by utilizing the first and second sets of values, which originate from sets of a time domain signal values in the time intervals which are offset from each other, results in a total analyzed signal portion which is a larger than just one single signal portion, for example the first or the second time intervals.
- a timely longer signal portion by means of existing (short) time-frequency-transformed signals without the need to provide a new time-frequency-transform just for the estimation of the speech fundamental frequency.
- the first spectrum represents the spectrum over the longer time interval whereas the second spectrum serves the purpose to determine the characteristics of the second set of values in order to compensate errors in the first spectrum. Therefore it is necessary not only to calculate the first spectrum but also to calculate the second spectrum.
- the approach according to the first aspect of the invention provides the advantage that a signal given in a time-frequency-transformed version (provided for other applications than speech fundamental frequency estimation) can still be used also for speech fundamental frequency estimation (even in the case the time-frequency-transformed version of the signal would normally be not appropriate for providing a precise speech fundamental frequency estimation).
- a speech fundamental frequency estimator which is configured for receiving a set of values, the set of values being a frequency domain representation of a set of time domain signal values within a time interval, the speech fundamental frequency estimator comprising:
- a method for estimating a speech fundamental frequency is provided, the method being configured for receiving a set of values, the set of values being a frequency domain representation of a set of time domain signal values within a time interval, the method comprising the steps of:
- the second aspect is based on the finding that a significant improvement in the preciseness of speech fundamental frequency estimation can be realized when background noise is adequately compensated. This is especially the case in a scenario where in speech pauses erroneous detections of speech occur which then falsify the detected result and, in consequence, decrease the reliability of the detected speech fundamental frequency.
- the second aspect thus provides the advantage that by simple means, for example a pause detector or just a further analysis of the already existing signal frames a significant improvement in preciseness and reliability of the estimated speech fundamental frequency can be obtained.
- the speech fundamental frequency estimator is characterized in that the first power density spectrum calculator is configured for multiplying versions of the sets of values which represent sets of time domain signal values having overlapping time intervals.
- the speech fundamental frequency estimator is characterized in that the first power density spectrum calculator is configured for multiplying versions of the sets of values which represent time domain signal values having time intervals overlapping in least 25 percent. This provides the possibility that the speech fundamental frequency estimate can be surely determined as the first and second sets of values belonged to time domain signal values which have a sufficiently overlapping a interval structure. Therefore, due to the sufficient overlap of both time intervals, such an estimation can be considered to be an estimation over the "longer" time interval.
- the speech fundamental frequency estimator is characterized in that the second power density spectrum calculator is configured for providing a conjugate complex version of the second set of values to the first power density spectrum calculator and wherein the first power density spectrum calculator is configured for using the provided conjugate complex version of the second set of values as the version with which the stored version of the first set of values is to be multiplied.
- the speech fundamental frequency estimator is characterized in that the analyzer is configured for performing a first frequency-time-transform of the first power density spectrum in order to obtain a first set of correlation function values and for performing a second frequency-time-transform of the second power density spectrum in order to obtain a second set of correlation function values, wherein the analyzer is furthermore configured for determining a set of normalization values and a set of weighting values from the second power density spectrum and for using the set of normalization values and the set of weighting values in the first and second frequency-time-transform and wherein the analyzer is furthermore configured for determining the speech fundamental frequency estimate on the basis of the first and second sets of correlation function values.
- the speech fundamental frequency estimator according to a further embodiment can be characterized in that the analyzer further comprises a compensator being configured for adaptively compensating the values of the first set of correlation function values by a correction factor being based on a value of the second set of correlation function values and wherein the analyzer is furthermore configured for determining the speech fundamental frequency estimate on the basis of the compensated first set of correlation function values and the second set of correlation function values.
- the speech fundamental frequency estimator can be characterized in that the compensator is configured for multiplying the second set of correlation function values by a lower bounded quotient between a value of the first set of correlation function values and a value of the second set of correlation function values in order to obtain said compensated first set of correlation function values.
- the compensator is configured for multiplying the second set of correlation function values by a lower bounded quotient between a value of the first set of correlation function values and a value of the second set of correlation function values in order to obtain said compensated first set of correlation function values.
- the speech fundamental frequency estimator is characterized in that the analyzer is configured for combining the compensated first set of correlation function values and the second set of correlation function values in order to obtain an extended set of correlation function values, wherein the values of the extended set of correlation function values assume corresponding values from the compensated first set of correlation function values, the second set of correlation function values or values between the compensated first set of correlation function values and the second set of correlation function values and wherein the analyzer is furthermore configured for determining the speech fundamental frequency estimate on the basis of said extended set of correlation function values.
- the extended set of correlation function values comprises now information from the first as well as the second set of correlation function values such that an estimation of the speech fundamental frequency can be based on the information comprised in the first and second time interval as well as a correction of possible errors is also possible by the information of the second time interval. Furthermore, it is also possible to perform a weighting of the values of the first set of correlation function values in contrast to the values of the second set of correlation function values in order to take into account the influence of an offset between the first set of correlation function values (respectively the compensated set of correlation function values) and the second set of correlation function values.
- the speech fundamental frequency estimator is characterized in that the analyzer is configured for determining the speech fundamental frequency estimate by searching the index of a maximum value from the extended set of correlation function values within a predetermined number of indices of the values of the extended set of correlation values, from the first or second set of correlation function values within a predetermined number of indices of values of the first respectively second set of correlation function values or from the compensated first set of correlation function values within the predetermined number of indices of values of the compensated first set of correlation function values and wherein the analyzer is furthermore configured for determining the speech fundamental frequency estimate as the product of a sampling frequency and a reciprocal value of said searched index.
- the speech fundamental frequency is characterized in that the analyzer is furthermore configured for determining a reliability factor for the determined speech fundamental frequency estimate and for blocking an output of the determined speech fundamental frequency estimate in the case the determined reliability factor for the determined speech fundamental frequency estimate is below said predetermined reliability factor.
- the analyzer is furthermore configured for determining a reliability factor for the determined speech fundamental frequency estimate and for blocking an output of the determined speech fundamental frequency estimate in the case the determined reliability factor for the determined speech fundamental frequency estimate is below said predetermined reliability factor.
- the speech fundamental frequency estimator can be characterized in that the analyzer is furthermore configured for determining said reliability factor by dividing the maximum value at said searched index by the first value of the extended set of correlation function values or, respectively the first, the compensated first or second set of correlation function values.
- the speech fundamental frequency estimator can be characterized in that the second power density spectrum calculator is configured for determining an estimate of the power density spectrum of background noise and for determining a noise suppression factor on the basis of said power density spectrum of background noise, and wherein the analyzer is configured for multiplying the first and second power density spectrum with said noise suppression factor prior to the frequency-time-transform of the first respectively second power density spectrum.
- the speech fundamental frequency estimator can be characterized in that the second power density spectrum calculator is configured for determining the noise suppression factor as the maximum of a predetermined maximum suppression coefficient and a term being dependent on a quotient of the estimate of the power density spectrum of background noise and the second power density spectrum. This makes sure, that a minimum suppression factor is used and thus an effective suppression of background noise is accomplished.
- the speech fundamental frequency estimator can be characterized in that the second power density spectrum calculator is configured for determining the estimate of the power density spectrum of background noise in speech pauses or for determining the estimate of the power density spectrum of background noise from a segment-wise estimation of the minima of the power of a differential signal. This provides an efficient and numerically simple way of determining the estimate of the power density spectrum of background noise.
- the speech fundamental frequency estimator can be characterized in that the analyzer is furthermore configured for reestimating the speech fundamental frequency estimate in the case the determined speech fundamental frequency estimate is below the predefined frequency value wherein the analyzer is configured for performing the reestimation by searching a further index of a further maximum value of the extended set of correlation function values, the first or second set of correlation function values or the compensated first set of correlation function values within a further number of values of said sets of correlation function values and for outputing a product of a sampling frequency and a reciprocal value of said further index as the determined speech fundamental frequency estimate.
- This provides a further improvement of the speech fundamental frequency especially in the case when the determined estimate is below said predefined frequency (which means that the estimate may probably not as reliable as actually wanted).
- Such a use of the doubled speech fundamental frequency estimate from a previous estimation broadens the region to be searched and thus strengthens the reliability and preciseness of the outputted estimate.
- the speech fundamental frequency estimator can be characterized in that the analyzer is configured for outputting said product as the predetermined speech fundamental frequency estimate only in the case the value of the autocorrelation function at the further index is larger than 60 percent of the value of the autocorrelation function at the previously searched maximal index as well as a value of the extended set of correlation function values at said further index is larger than a previously defined amplitude value. This further strengthens the validity of the outputted speech fundamental frequency estimate as before outputting the result two separate conditions have to be fulfilled.
- the speech fundamental frequency estimator in a further embodiment can be characterized in that the analyzer is configured for modifying a speech fundamental period corresponding to said determined speech fundamental frequency estimate by an interpolation correction term prior of outputting a modified speech fundamental frequency estimate, wherein said interpolation correction term is dependent on values of said first or second set of correlation function values, of said extended set of correlation function values or said compensated first set of correlation function values, respectively.
- an interpolation approach provides the advantage that the error terms resulting from the use of a discrete time-frequency-transform respectively a frequency-time-transform can be reduced by a processing of the signals after the inverse transform has been performed.
- the speech fundamental frequency estimator can be characterized by a frequency domain filtering unit being configured for receiving the frequency domain versions of the first and second set of time domain signal values, for frequency domain filtering said frequency domain versions in order to obtain said first and second sets of values, respectively, and for providing said first and second sets of values to the first and second power density spectrum calculator respectively.
- a frequency domain filtering unit being configured for receiving the frequency domain versions of the first and second set of time domain signal values, for frequency domain filtering said frequency domain versions in order to obtain said first and second sets of values, respectively, and for providing said first and second sets of values to the first and second power density spectrum calculator respectively.
- the speech fundamental frequency estimator can be characterized in that the frequency domain filtering unit is configured for filtering only frequencies below a predefined limiting frequency. This relaxes a computational burden as only the parts of the spectrum are filtered which are of the most importance for a reliable estimation of very low speech fundamental frequencies.
- the speech fundamental frequency estimator can be characterized in that the frequency domain filtering unit is configured for delaying values of said frequency domain versions being above said predefined limiting frequency. This compensates a delay which might be introduced in a signal flow path for filtering signals having a frequency below said limiting frequency.
- the invention can also be implemented as a computer program having a program code for performing the inventive method, when the computer program runs on a computer.
- the speech fundamental frequency estimator can be characterized in that the power density spectrum calculator is configured for determining the noise suppression factor as the maximum of a predetermined maximum suppression coefficient and a term being dependent on a quotient of the estimate of the power density spectrum of background noise and the second power density spectrum.
- the second aspect may comprise a speech fundamental frequency estimator being characterized in that the power density spectrum calculator is configured for determining the estimate of the power density spectrum of background noise in speech pauses or for determining the estimate of the power density spectrum of background noise from a segment-wise estimation of the minima of the power of a differential signal. This makes sure, that a minimum suppression factor is used and thus an effective suppression of background noise is accomplished.
- the present invention relies mainly on estimation methods based on autocorrelation function which are described herein in advance for a better understanding. However, some aspects of the present invention are also implemented in the conventional autocorrelation methods such that the description in this section is not to be considered as state of the art.
- the speech signal s(n) will be recorded by a microphone.
- the weighting function W (e j ⁇ ⁇ , n ) has been chosen such that the attenuation rises with rising frequency. This choice results from the fact that speech mainly at low frequencies has a speech fundamental frequency structure - which in turn results in an improved estimation of the speech fundamental frequency.
- Fig. 4 the functional principle of a method for speech fundamental frequency estimation is shown.
- the autocorrelation function r ⁇ yy ( m,n ) is used in order to estimate the speech fundamental frequency f p ( n ).
- the index m describes herein the autocorrelation offset and the index n describes the present frame (under analysis).
- the preliminary speech fundamental frequency f' p ( n ) can be determined by a search of the maximum in a selected range of indices, for example 30 ⁇ m ⁇ 100.
- a threshold value of p 0 ⁇ [0.2,0.3] has turned out to be favourable.
- the value of the normalized autocorrelation at the location ⁇ p ( n ) can be of large significance as reliability information, for example for a speech signal reconstruction.
- the desired value of the speech fundamental frequency can be either slowly or quickly traced, dependent on how sure a speech fundamental frequency can be estimated.
- the time-frequency analysis was considered only in the interesting frequency range up to 1000 Hz.
- the spectral refinement can be used without using the post-processing or the interpolation or the approach having the additional delay correction structure can be used without using the spectral refinement approach.
- all the individual aspects commonly contribute to a much improved estimation of the speech fundamental frequency and shall be described herein as an embodiment.
- the newly proposed method uses an additional spectral refinement of the input spectrum Y ( e j ⁇ ⁇ ,n ).
- the functional principle of this approach is disclosed in Fig. 6 .
- FIR finite impulse response
- the parameter ⁇ denotes herein the ⁇ -th frequency sampling point of a short-time spectrum ⁇ ( e j ⁇ ⁇ ,n ) having a higher resolution and the parameter M denotes the order of the used FIR-filters.
- a memory length M of the short FIR-filter is chosen between 3 and 5.
- a spectral refinement in the whole frequency range is not necessary for speech signals.
- the speech fundamental frequency structure is only present in the lower frequency range that means it is sufficient to perform the refinement up to, for example, 1000 Hz. Above this threshold it is possible to only introduce a delay of (M-1)/2 samples (down-sampled). The numerical effort necessary for such a refinement can thus be kept low.
- Fig. 7 the analysis-synthesis-system with additional calculation of the spectral refinement in a low frequency range is shown.
- Fig. 8 the analysis of autocorrelation as well as the time-frequency-analysis with spectral refinement is shown.
- test signal the same combination from sinusoidal signals have been used which have a varying frequency distance of 300 Hz to 60 Hz.
- the black graph in the upper diagram of Fig. 8 as well as the white graph in the lower diagram of Fig. 8 show the estimated pitch period duration, respectively; the estimate of speech fundamental frequency when using the spectral refinement approach.
- Fig. 9A shows a block diagram of an embodiment of a speech fundamental frequency estimator 900.
- the speech fundamental frequency estimator 900 comprises a power density spectrum calculator 902 and an analyzer 904.
- the power density spectrum calculator 902 has 2 inputs, one for receiving a set of values and one for receiving background noise information.
- the set of values ⁇ 1 is a frequency-domain representation of a set of a time domain signal values y 1 in a time interval t 1 .
- the background noise information can for example be determined in speech pauses in which only a noise signal and no speech signal is provided to the power density spectrum calculator 902.
- the power density spectrum calculator 902 has 2 outputs, one for outputting a noise suppression factor V(e j ⁇ ,n) and one for outputting values of a power density spectrum.
- the analyzer 904 has 2 inputs for receiving both of the outputs of the power density spectrum calculator 902.
- the analyzer 904 has a furthermore one output for outputting the determined speech fundamental frequency f p
- the function of the speech fundamental frequency estimator 900 shall be described in more detail with reference to Fig. 9B .
- Fig. 9B a flow diagram of a method for estimating the speech fundamental frequency is disclosed.
- the method 940 comprises a first step 950 in which a power density spectrum is provided by multiplying a version of the set of values ⁇ 2 with a complex conjugate version of the second set of values.
- a second step 952 an estimate of a power density spectrum of background noise is determined.
- the background noise information is used which may originate for example from a speech pause detector or other means which provide only information about the background noise in the absence of speech.
- a noise suppression factor is determined which is explained in more detail below.
- a multiplication of the power density spectrum with the noise suppression factor V(e j ⁇ ,n) is performed before in a fifth step 958 a frequency-time-transform is accomplished.
- a sixth step 960 speech fundamental frequency is determined from the frequency-time-transformed signal resulting in step 958.
- ⁇ nn ( ⁇ ⁇ ,n ) denotes an estimation of the auto power density spectrum of a disturbance (background noise), V 0 describes a maximal attenuation and the parameter ⁇ is used for overestimating the power density spectrum of the disturbance. Because of the fact that the disturbance can be considered to be non-stationary a short-time estimation value has to be used for this disturbance value. However, signal and disturbance are available only as a sum in the microphone signal y(n).
- the estimation of the power density spectrum of the background noise can be obtained in two different ways, firstly the power of the microphone signal can be estimated in speech pauses - which requires a speech pause detector - or, secondly, that an estimated value for the power of the disturbance can be determined from the segment-wise estimated minima of the power of the microphone signal.
- the noise estimation is not the main focus in this patent application other details shall not be explained here; however reference is made to P. Vary, R. Martin: Digital Speech Transmission, John Wiley & Sons, Chichester, England, 2006 .
- noise reductions are used as a pre-processing stage for a speech fundamental frequency estimation that is instead of the input subband signals Y ( e j ⁇ ⁇ , n ) the noise reduced signals Y ( e j ⁇ ⁇ , n ) ⁇ V ( e j ⁇ ⁇ , n ) are processed.
- FIG. 10 shows results of the speech fundamental frequency estimation with spectral refinement in terms of time-frequency-analysis with and without noise reduction. All parameters of the methods have been identical to the previously described parameters. As can be seen very clearly erroneous detections (denoted by black ellipses in the upper diagram of Fig. 10 ) can be suppressed in the case when the above-mentioned active noise reduction is used. In speech activity passages nearly nothing changes.
- Speech fundamental frequency estimation on the basis of a plurality of subband vectors
- Fig. 11A shows a block diagram of an embodiment of the inventive speech fundamental frequency estimator 1100.
- the speech fundamental frequency estimator 1100 comprises a first power density spectrum calculator 1102, a second power density spectrum calculator 1104 and an analyzer 1106.
- the first power density spectrum calculator 1102 and second power density spectrum calculator 1104 are both fed by a common input of width N, on which subsequently a first set of values ⁇ 1 and a second set of values ⁇ 2 is provided.
- the first set of values ⁇ 1 is a frequency domain representation of a first set of time domain signal values y 1 within a first time interval t 1 .
- the second set of values ⁇ 2 is a frequency domain representation of a second set of time domain signal values y 2 within a second time interval t 2 .
- the first power density spectrum calculator 1102 is configured for storing a version of the first set of values and for providing values of a first power density spectrum ⁇ ⁇ ( ⁇ ⁇ ,n ) by multiplying the stored version of the first set of values ⁇ 1 with a complex conjugate version of the second set of values ⁇ 2 .
- the second power density spectrum calculator 1104 is configured for providing values of a second power density spectrum ⁇ ⁇ ( ⁇ ⁇ ,n ) by multiplying a version of the second set of values with a complex conjugate version of the second set of values.
- the analyzer 1106 is configured for receiving the first and second power density spectrums of the first respectively second power density spectrum calculator 1102, 1104 and for determining the speech fundamental frequency estimate f p (n) on the basis of the values of the first power density spectrum ⁇ ⁇ d ( ⁇ ⁇ ,n ) and the values of the second power density spectrum ⁇ ⁇ ( ⁇ ⁇ ,n ).
- Fig. 11B shows the functionality of the speech fundamental frequency estimator as shown in Fig. 11A in more detail.
- Fig. 11B discloses a method 1140 for estimating the speech fundamental frequency f p (n).
- first and second sets of values ⁇ 1 and ⁇ 2 are provided, each of which have the number of N individual values (that is a width of N).
- a first step 1150 a version of the first set of values ⁇ 1 is stored.
- the stored version of the first set of values ⁇ 1 it is multiplied with a version of the second set of values ⁇ 2 which are directly fed to the multiplication step without a storing step.
- the result from the multiplication step 1152 is said first power density spectrum ⁇ ⁇ d ( ⁇ ⁇ ,n ).
- a further step of multiplying 1154 is performed in which a versions of the second set of values ⁇ 2 are multiplied with each other, which results in the second power density spectrum.
- the speech fundamental frequency estimate f p (n) is determined.
- the inventive approach as shown in Fig. 11A and 11B has the advantage that it is now possible to estimate lower speech fundamental frequencies as would be possible according to the state of the art. This is mainly due to the fact that (conventional existing) short frequency domain values can be used for a precise speech fundamental frequency estimation as the multiplication in step 1152 with a stored respectively delayed version of a previous set of frequency domain values results in a kind of elongated analysis time interval for estimating the low speech fundamental frequency.
- a further inventive idea it can be seen in the fact that not only the present signal frame y(n) is used for the estimation of the speech fundamental frequency but also a signal frame y(n-d) which is a signal frame delayed by d clock cycles.
- the present short-time spectrum ⁇ ( e j ⁇ ⁇ ,n ) and the delayed short-time spectrum ⁇ *(e j ⁇ ⁇ ,n-d ) is used.
- the cross-correlation function r ⁇ ⁇ ,g ( m,n ) is determined according to equation 13.
- the aim will be to determine an extended autocorrelation function r ⁇ ⁇ ,erw ( k,n ) of order N/2 + r from the autocorrelation function r ⁇ ⁇ d ,g ( m,n ) and the cross-correlation function r ⁇ ⁇ d ,g ( m,n ), each of which having the order N/2.
- the index k of the term r ⁇ ⁇ ,erw ( k,n ) describes herein the offset of the autocorrelation, wherein the following equation is valid: k ⁇ 0 , ... , N 2 + r ⁇ 1
- the linear function a(m) was chosen such that with an increasing offset m the weight of the coefficients reduces.
- the thus obtained extended autocorrelation function r ⁇ ⁇ ,erw ( k,n ) is finally used for the estimation of the speech fundamental frequency.
- the speech fundamental frequency is determined by a search of the maximum for each single frame in an elongated area - for example in the range 30 ⁇ k ⁇ 180.
- Fig. 13 two examples for the analysis of the speech fundamental frequency are shown.
- the left section of Fig. 13 discloses the analysis of the speech fundamental frequency at about 270 Hz whereas in the right section of Fig. 13 the analysis of a speech fundamental frequency at about 60 Hz is shown.
- the correlation of the present signal frame with itself (left) and with a proceeding signal frame (right) are shown each, the left and also the right section of Fig. 13 .
- the lower graph in each of both sections of Fig. 13 shows the extended autocorrelation function r ⁇ ⁇ ,erw ( k,n ) across an elongated autocorrelation offset which is generated by the composition of both correlation functions r ⁇ ⁇ ,g ( m,n ) and r ⁇ ⁇ ,g, mod ( m,n ) respectively by the usage of the equation 30.
- the corresponding speech fundamental period can be determined and detected quite well using the autocorrelation function r ⁇ ⁇ ,g ( m,n ) (left section of Fig. 13 ).
- Fig. 13 shows in the lower part that by a combination of the correlation of the signal frame with itself and the correlation with a proceeding signal frame the speech fundamental period can still be determined and detected.
- Fig. 14 the analysis of the extended autocorrelation function r ⁇ ⁇ ,erw ( k,n ) is shown when a previous spectral refinement in the low frequent region as well as a time-frequency-analysis of the input signal is used.
- a comparison with the analyses from the Fig. 5 and 14 indicates that by using the previously described approach significant improvements can be achieved.
- no erroneous detections with low speech fundamental frequencies occur.
- f p (n) After estimation of the speech fundamental frequency f p (n) a test can be made whether this estimate is below a threshold f k .
- f p (n) For the determination of this area the previously determined speech fundamental frequency f p (n) is firstly doubled.
- the parameter f p,max in equation 33 is herein a predefined value of a maximal possible speech fundamental frequency.
- Fig. 15 shows a time-frequency-analysis of an input signal, respectively, the detection results of the speech fundamental frequency estimation.
- the post-processing was deactivated and at two locations (at 0.7 and at 0.75 seconds) erroneous detections (bisections of frequency) can be observed.
- Such erroneous detections can be corrected by the post-processing which can be concluded from the lower part of Fig. 15 .
- the autocorrelation coefficient is used for the interpolation at which the extended autocorrelation function r ⁇ ⁇ ,erw ( k,n ) has the maximum, and also the adjacent autocorrelation coefficients unconsidered- that is the autocorrelation offsets left and right of the maximum.
- Fig. 16 the time-frequency-analysis of a portion of several sinusoidal signals of equal amplitude is shown. Contrary hereto a portion of a speech signal of a female voice is shown in the lower part of Fig. 16 .
- the white graph denotes the estimated quantized speech fundamental frequency in the upper as well as also in the lower part of Fig. 16 .
- the grey graph in the upper part respectively the black graph in the lower part demonstrates the estimated speech fundamental frequency after the interpolation. It can be seen from the upper part of Fig. 16 that due to the interpolation nearly the desired straight graph of the estimated speech fundamental frequency can be obtained. In the lower part it can be seen that the estimated speech fundamental frequency of the speech fundamental frequency structure follows the speech signal closely when the interpolation is used.
- this invention describes a method for estimating the fundamental frequency (pitch frequency) of speech signals. This is achieved in the DFT domain by analyzing the current input spectrum as well as past input spectra. To achieve an - compared to standard methods - improved estimation performance a four stage algorithm is applied or proposed whereby the steps can also be used independently: First, pre-processing (called spectral refinement) is applied to the input spectrum at low frequencies. Second, a noise reduction is applied when computing normalization values. Third, estimations for the autocorrelation of the current frame and cross correlation of the current with the previous frame are adaptively combined in order to obtain an extended range. Fourth, post-processing is applied to reduce estimation errors and to achieve an improved pitch accuracy.
- pre-processing called spectral refinement
- a noise reduction is applied when computing normalization values.
- estimations for the autocorrelation of the current frame and cross correlation of the current with the previous frame are adaptively combined in order to obtain an extended range.
- post-processing is applied to reduce estimation errors and to achieve an improved pitch accuracy
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Claims (42)
- Estimateur de fréquence fondamentale vocale (1100) étant configuré pour recevoir un premier ensemble de valeurs (Ỹ 1) et un second ensemble de valeurs (Ỹ 2), le premier ensemble de valeurs (Ỹ 1) étant une représentation d'un domaine de fréquence d'un premier ensemble de valeurs de signal (y1) de domaine temporel dans un premier intervalle de temps (t1) et le second ensemble de valeurs (Ỹ 2) étant une représentation d'un domaine de fréquence d'un second ensemble de valeurs de signal (y2) de domaine temporel dans un second intervalle de temps (t2), le second intervalle de temps (t2) étant postérieur à et décalé par rapport au premier intervalle de temps (t1), l'estimateur de fréquence fondamentale vocale (1100) comprenant :- un premier calculateur de spectre de densité de puissance (1102) étant configuré pour stocker une version du premier ensemble de valeurs (Ỹ 1) et étant configuré pour fournir des valeurs d'un premier spectre de densité de puissance (Ŝỹỹ
d (Ω µ,n)) en multipliant la version stockée du premier ensemble de valeurs (Ỹ 1) avec une version conjuguée complexe du second ensemble de valeurs (Ỹ 2) ;- un second calculateur de spectre de densité de puissance (1104) étant configuré pour fournir des valeurs d'un second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) en multipliant une version du second ensemble de valeurs (Ỹ 1) par une version conjuguée complexe du second ensemble de valeurs (Ỹ 2) ;- un analyseur (1106) étant configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base des valeurs du premier spectre de densité de puissance (Ŝỹỹd (Ω µ,n)) et des valeurs du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)),
dans lequel l'analyseur est en outre configuré
pour exécuter une première transformée fréquence-temps du premier spectre de densité de puissance (Ŝỹỹd (Ω µ,n)) afin d'obtenir un premier ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n)),
pour exécuter une seconde transformée fréquence-temps du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) afin d'obtenir un second ensemble de valeurs de fonctions de corrélation (r̂ŷŷ,g (m,n)), et
pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base des premier et second ensembles de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),(r̂ŷŷ,g (m,n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 1, caractérisé en ce que le premier calculateur de spectre de densité de puissance (1102) est configuré pour multiplier des versions des ensembles de valeurs (Ỹ 1,Ỹ 2) qui représentent des ensembles de valeurs de signal de domaine temporel (y1, y2) ayant des intervalles de temps se recouvrant (t1, t2).
- Estimateur de fréquence fondamentale vocale (1100) selon la revendication 2, caractérisé en ce que le premier calculateur de spectre de densité de puissance (1102) est configuré pour multiplier des versions des ensembles de valeurs (Ỹ 1,Ỹ 2) qui représentent des ensembles de valeurs de signal de domaine temporel (y1, y2) ayant des intervalles de temps se recouvrant (t1, t2) d'au moins 25 pour cent.
- Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 1 à 3, caractérisé en ce que le second calculateur de spectre de densité de puissance (1104) est configuré pour fournir une version complexe conjuguée du second ensemble de valeurs (Ỹ 2) au premier calculateur de spectre de densité de puissance (1102) et dans lequel le premier calculateur de spectre de densité de puissance (1102) est configuré pour utiliser la version complexe conjuguée fournie du second ensemble de valeurs (Ỹ 2) comme version avec laquelle la version stockée du premier ensemble de valeurs (Ỹ 1) doit être multipliée.
- Estimateur de fréquence fondamentale vocale (1100) selon l'une quelconque des revendications précédentes, caractérisé en ce que l'analyseur (1106) est configuré pour exécuter une première transformée fréquence-temps du premier spectre de densité de puissance (Ŝỹỹ
d (Ω µ,n)) afin d'obtenir un premier ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n)) et pour exécuter une seconde transformée fréquence-temps du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) afin d'obtenir un second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) dans lequel l'analyseur (1106) est en outre configuré pour déterminer un ensemble de valeurs de normalisation (S ỹỹ (Ω µ,n)) et un ensemble de valeurs de pondération (V(e jΩµ ,n)) à partir du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) et pour utiliser l'ensemble de valeurs de normalisation (S ỹỹ (Ω µ,n)) et l'ensemble de valeurs de pondération (V(e jΩµ ,n)) dans les première et seconde transformées fréquence-temps et dans lequel l'analyseur (1106) est en outre configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base des premier et second ensembles de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ŷŷ,g (m,n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 5, caractérisé en ce que l'analyseur (1106) comprend en outre un compensateur étant configuré pour compenser de façon adaptative les valeurs du premier ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n)) par un facteur de correction (Δ(m, n)) basé sur une valeur du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) et dans lequel l'analyseur (1106) est en outre configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ỹỹd ,g,mod(m,n)) et du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,.n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 6, caractérisé en ce que le compensateur est configuré pour multiplier le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) par un quotient limité à la baisse entre une valeur du premier ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n)) et une valeur du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) afin d'obtenir ledit premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 7, caractérisé en ce que l'analyseur (1106) est configuré pour combiner le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et le second ensemble de valeurs de fonction de corrélation (r̂ỹỹ,g (m,n)) afin d'obtenir un ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) dans lequel les valeurs de l'ensemble étendu de valeurs de fonction de corrélation (r̂ỹỹ,erw (k,n)) reprennent les valeurs correspondantes du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)), du second ensemble de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) ou des valeurs entre le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) et dans lequel l'analyseur (1106) est en outre configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base dudit ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)).
- Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 5 à 8, caractérisé en ce que l'analyseur (1106) est configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) en recherchant l'indice d'une valeur maximale (τp (n)) à partir de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) parmi un nombre prédéterminé d'indices (k) des valeurs de l'ensemble étendu de valeurs de corrélation (r̂ŷŷ,erw (k,n)) à partir du premier ou du second ensemble de valeurs de fonction de corrélation (r̂ỹỹ
d ,g (m,n)),r̂ŷŷ,g (m,n)) parmi un nombre prédéterminé d'indices (m) de valeurs du premier respectivement second ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ŷŷ,g (m,n)) ou du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) parmi le nombre prédéterminé d'indices (m) de valeurs du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et dans lequel l'analyseur (1106) est en outre configuré pour déterminer l'estimation de fréquence fondamentale vocale (fp(n)) comme le produit d'une fréquence d'échantillonnage (fa) et d'une valeur réciproque dudit indice recherché (τp (n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 9, caractérisé en ce que l'analyseur (1106) est en outre configuré pour déterminer un facteur de fiabilité (pf
p (n)) pour l'estimation de fréquence fondamentale vocale déterminée et pour bloquer une sortie de l'estimation de fréquence fondamentale vocale déterminée (fp(n)) dans le cas où le facteur de fiabilité déterminé (pfp (n)) pour l'estimation de fréquence fondamentale vocale déterminée est inférieure à un facteur de fiabilité prédéterminé (po). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 10, caractérisé en ce que l'analyseur (1106) est en outre configuré pour déterminer ledit facteur de fiabilité (pf
p (n)) en divisant la valeur maximale (τ̃p (n)) audit indice recherché par la première valeur de l'ensemble étendu de valeurs de fonction de corrélation ((r̂ŷŷ,erw (k,n)) ou, respectivement le premier ensemble, le premier ensemble compensé ou le second ensemble compensé de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),(r̂ ŷŷd ,g,mod(m,n), (r̂ŷŷ,g (m,n)). - Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 5 à 11, caractérisé en ce que le second calculateur de spectre de densité de puissance (1104) est configuré pour déterminer une estimation du spectre de densité de puissance d'un bruit de fond (S̃nn (Ω µ,n)) et pour déterminer un facteur de suppression de bruit (V(ejΩ
µ ,n)) sur la base dudit spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)) et dans lequel l'analyseur (1106) est configuré pour multiplier les premier et second spectres de densité de puissance avec ledit facteur de suppression de bruit (V(ejΩµ ,n)) avant la transformée fréquence-temps du premier respectivement second spectre de densité de puissance (Ŝỹỹd (Ω µ,n),Ŝỹỹ (Ω µ,n)). - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 12, caractérisé en ce que le second calculateur de spectre de densité de puissance (1104) est configuré pour déterminer le facteur de suppression de bruit comme maximum d'un coefficient de suppression maximal prédéterminé (Vo) et d'un terme fonction du quotient de l'estimation du spectre de densité de puissance d'un bruit de fond (Ŝnn (Ω µ,n)) et du second spectre de densité de puissance Ŝỹỹ (Ω µ,n)).
- Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 12 à 13, caractérisé en ce que le second calculateur de spectre de densité de puissance (1104) est configuré pour déterminer l'estimation du spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)) dans les pauses vocales ou pour déterminer l'estimation du spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)) à partir d'une estimation par segment du minima d'une puissance d'un signal de microphone.
- Estimateur de fréquence fondamentale vocale (1100) selon la revendication 13 ou les revendications 13 et 14, caractérisé en ce que le facteur de suppression de bruit est défini par
d (Ω µ,n) désigne le second spectre de densité de puissance, Vo désigne un facteur d'atténuation maximal prédéfini et β désigne une valeur pour surestimer le spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)). - Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 5 à 15, caractérisé en ce que l'analyseur (1106) est configuré en outre pour réestimer l'estimation de fréquence fondamentale vocale dans le cas où l'estimation de fréquence fondamentale vocale déterminée est inférieure à la valeur de fréquence prédéterminée (fk) dans lequel l'analyseur (106) et configuré pour effectuer la réestimation en recherchant un nouvel indice (k, m) d'une nouvelle valeur maximale (τ̃p (n)) de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)), le premier ou le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n)),(r̂ŷŷ,g (m,n)) ou le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) parmi un nouveau nombre de valeurs desdits ensembles de valeurs de fonction de corrélation et pour produire un produit d'une fréquence d'échantillonnage (fs) et une valeur réciproque dudit nouvel indice (τ̃p (n)) comme estimation de la fréquence fondamentale vocale déterminée. - Estimateur de fréquence fondamentale vocale (1100) selon la revendication 16, caractérisé en ce que l'analyseur (106) est configuré pour rechercher ledit indice (k, m) de ladite nouvelle valeur maximale (τ̃p (n)) en utilisant un certain nombre k de valeurs desdits ensembles de valeurs de fonction de corrélation qui est défini par
- Estimateur de fréquence fondamentale vocale (1100) selon la revendication 16 ou 17, caractérisé en ce que l'analyseur (1106) est configuré pour produire ledit produit comme estimation de fréquence fondamentale vocale prédéterminée uniquement dans le cas où le nouvel indice (τ̃p (n)) est plus large que 60 pour cent de l'indice maximal recherché antérieurement (τp (n)) et où une valeur (r̂ŷŷ,erw (τ̃p (n),n)) de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) audit nouvel indice (τ̃p (n)) est supérieure à une valeur d'amplitude précédemment définie (p̃ 0).
- Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 5 à 18, caractérisé en ce que l'analyseur (1106) est configuré pour modifier une période fondamentale vocale (τ̃p (n)) correspondant à ladite estimation de fréquence fondamentale vocale déterminée par un terme de correction d'interpolation Δp(n)) avant de produire une estimation de fréquence fondamentale vocale modifiée (fp(n)), dans laquelle ledit terme de correction d'interpolation (Δp) est dépendant de valeurs dudit premier ou second ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n),(r̂ŷŷ,g (m,n)) dudit ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) ou dudit premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷ d ,g,mod(m,n)), respectivement. - Estimateur de fréquence fondamentale vocale (1100) selon l'une des revendications 1 à 19, caractérisé par une unité de filtrage de domaine de fréquence étant configurée pour recevoir les versions de domaine de fréquence (Y1, Y2) des premier et second ensembles de valeurs de signal de domaine temporel (y1, y2) pour le filtrage par domaine de fréquence desdites versions de domaine de fréquence afin d'obtenir lesdits premier et second ensembles de valeurs (Ỹ 1, Ỹ 2) respectivement et pour fournir lesdits premier et second ensembles de valeurs (Ỹ 1, Ỹ 2) aux premier et second calculateurs de spectre de densité de puissance, respectivement.
- Estimateur de fréquence fondamentale vocale (1100) selon la revendication 20, caractérisé en ce que l'unité de filtrage du domaine de fréquence est configurée pour filtrer uniquement les fréquences au-dessous d'une fréquence limite prédéfinie.
- Estimateur de fréquence fondamentale vocale (1100) selon la revendication 21, caractérisé en ce que l'unité de filtrage de domaine de fréquence est configurée pour retarder les valeurs desdites versions du domaine de fréquence qui sont au-dessus de la fréquence de limitation prédéfinie.
- Procédé (1140) pour estimer une fréquence fondamentale vocale (fp(n)), le procédé utilisant un premier ensemble de valeurs (Ỹ 1) et un second ensemble de valeurs (Ỹ 2), le premier ensemble de valeurs (Ỹ 1) étant une représentation de domaine de fréquence reçue d'un premier ensemble de valeurs de signal de domaine temporel (y1) dans un premier intervalle de temps (t1) et le second ensemble de valeurs (Ỹ 2) étant une représentation de domaine de fréquence reçue d'un second ensemble de valeurs de signal de domaine temporel (ỹ2) dans un second intervalle de temps (t2), le second intervalle de temps (t2) étant postérieur à et décalé par rapport au premier intervalle de temps (t1), le procédé pour estimer la fréquence fondamentale vocale (fp(n)) comprenant les étapes consistant à :- stocker (1150) une version du premier ensemble de valeurs (Ỹ 1) et fournir des valeurs d'un premier spectre de densité de puissance (Ŝỹỹ
d (Ω µ,n)) en multipliant (1152) la version stockée du premier ensemble de valeurs (Ỹ 1) avec une version conjuguée complexe du second ensemble de valeurs (Ỹ 2) ;- fournir des valeurs d'un second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) en multipliant (1153) une version du second ensemble de valeurs (Ỹ 2) avec une version conjuguée complexe d'un second ensemble de valeurs (Ỹ 2) ;- déterminer (1156) l'estimation de fréquence fondamentale vocale (fp) sur la base des valeurs du premier spectre de densité de puissance (Ŝỹỹd (Ω µ,n)) et des valeurs du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)),
dans lequel l'étape consistant à déterminer l'estimation de la fréquence fondamentale vocale (fp(n)) comprend les étapes consistant à :effectuer une première transformée fréquence-temps du premier spectre de densité de puissance (Ŝỹỹd (Ω µ,n)) afin d'obtenir un premier ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n)).exécuter une seconde transformée fréquence-temps du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) afin d'obtenir un second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)), etdéterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base des premier et second ensembles de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ŷŷ,g (m,n)). - Procédé (1140) selon la revendication 23, caractérisé en ce que l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend les étapes consistant à :• exécuter une première transformation fréquence-temps du premier spectre de densité de puissance (Ŝỹỹ
d (Ω µ,n)) afin d'obtenir un premier ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n)) ;• exécuter une seconde transformation fréquence-temps du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) afin d'obtenir un second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)), dans lequel l'étape consistant à déterminer (1156) comprend en outre la détermination d'un ensemble de valeurs de normalisation (S̃ỹỹ (Ω µ,n)) et d'un ensemble de valeurs de pondération (V(ejΩµ ,n)) à partir du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) et en utilisant l'ensemble de valeurs de normalisation (Ŝỹỹ (Ω µ,n)) et l'ensemble de valeurs de pondération (V(ejΩµ ,n)) dans les première et seconde transformations fréquence-temps et dans lequel la détermination de l'estimation de fréquence fondamentale vocale (fp(n)) est exécutée sur la base desdits premier et du second ensembles de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ŷŷ,g (m,n)). - Procédé (1140) selon la revendication 24, caractérisé en ce que l'étape de détermination (1156) de l'estimation de fréquence fondamentale vocale (fp(n)) comprend l'étape consistant à compenser de façon adaptative les valeurs du premier ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n)) par un facteur de correction (Δ(m, n)) étant basé sur une valeur du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) afin d'obtenir un premier ensemble compensé de valeurs et à déterminer l'estimation de fréquence fondamentale vocale (fp(n)) sur la base du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)). - Procédé (1140) selon la revendication 25, caractérisé en ce que l'étape de compensation comprend la multiplication du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) par un quotient limité à la baisse entre une valeur du premier ensemble de valeurs de fonctions de corrélation (r̂ŷŷ
d ,g (m,n)) et une valeur du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) afin d'obtenir ledit premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷ,g,mod(m,n)). - Procédé (1140) selon la revendication 26, caractérisé en ce que l'étape de détermination (1156) de l'estimation de fréquence fondamentale vocale (fp(n)) comprend l'étape consistant à combiner le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)) afin d'obtenir un ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)), dans lequel les valeurs de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) reprennent des valeurs correspondantes du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)), du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)), ou des valeurs entre le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd ,g,mod(m,n)) et le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ,g (m,n)), et dans lequel l'étape de détermination (1156) de l'estimation de fréquence fondamentale vocale (fp(n)) comprend en outre la détermination de l'estimation de fréquence fondamentale vocale (fp(n)) sur la base du dit ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)).
- Procédé (1140) selon l'une des revendications 23 à 27, caractérisé en ce que l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend la détermination de l'estimation de fréquence fondamentale vocale (fp(n)) en recherchant l'indice d'une valeur maximale (τp (n)) à partir de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) parmi un nombre prédéterminé d'indices (k) des valeurs de l'ensemble étendu de valeurs de corrélation (r̂ŷŷ,erw (k,n)), à partir du premier ou du second ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n),r̂ŷŷ,g (m,n)) parmi un nombre prédéterminé d'indices (m) de valeurs du premier respectivement second ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ŷŷ,g (m,n)) ou à partir du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd,gmod(m,n)) parmi le nombre prédéterminé d'indices (m) de valeurs du premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd,gmod(m,n)) et dans lequel l'étape de détermination (1156) de l'estimation de fréquence fondamentale vocale (fp(n)) comprend en outre l'étape consistant à déterminer l'estimation de fréquence fondamentale vocale (fp(n)) comme le produit d'une fréquence d'échantillonnage (fs) et d'une valeur réciproque dudit indice recherché (τp (n)). - Procédé (1140) selon la revendication 28, caractérisé en ce que l'étape de détermination (1156) de l'estimation de la fréquence fondamentale vocale (fp(n)) comprend la détermination d'un facteur de fiabilité (pf
p (n)) pour l'estimation de la fréquence fondamentale vocale déterminée (fp(n)) et pour bloquer une sortie de l'estimation de la fréquence fondamentale vocale déterminée (fp(n)) dans le cas où le facteur de fiabilité déterminé (pfp (n)) pour l'estimation de fréquence fondamentale vocale déterminée (fp(n)) est inférieur au facteur de fiabilité prédéterminé (po). - Procédé (1140) selon la revendication 29, caractérisé en ce que l'étape de détermination (1156) de l'estimation de fréquence fondamentale vocale (fp(n)) comprend l'étape consistant à déterminer ledit facteur de fiabilité (pf
p (n)) en divisant la valeur maximale (τ̃p (n)) à ladite valeur recherchée par la première valeur de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) ou respectivement du premier ensemble, du premier ensemble compensé ou du second ensemble de valeurs de fonction de corrélation (r̂ŷŷd ,g (m,n),r̂ ŷŷd,g,mod(m,n), (r̂ŷŷ,g (m,n)). - Procédé (1140) selon l'une des revendications 23 à 30 et selon la revendication 24, caractérisé en ce que l'étape consistant à fournir des valeurs d'un second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) comprend l'étape consistant à déterminer une estimation du spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)) et à déterminer un facteur de suppression de bruit (V(ejΩ
µ ,n)) sur la base dudit spectre de densité de puissance du bruit de fond (S̃nn (Ω µ,n)) et l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend la multiplication des premier et second spectres de densité de puissance avec ledit facteur de suppression de bruit (V(ejΩµ ,n)) avant la transformation fréquence-temps du premier respectivement second spectre de densité de puissance (Ŝỹỹd (Ω µ,n), (Ŝỹỹ (Ω µ,n)). - Procédé (1140) selon la revendication 31, caractérisé en ce que l'étape de fourniture de valeurs d'un second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) comprend la détermination du facteur de suppression de bruit comme maximum du coefficient de suppression maximum prédéterminé (V0) et d'un terme fonction d'un quotient de l'estimation du spectre de densité de puissance du bruit de fond ((Ŝnn Ω µ,n)) et du second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)).
- Procédé (1140) selon la revendication 32, caractérisé en ce que l'étape consistant à fournir des valeurs d'un second spectre de densité de puissance (Ŝỹỹ (Ω µ,n)) comprend l'étape consistant à déterminer l'estimation du spectre de densité de puissance du bruit de fond (Ŝnn (Ω µ,n)) dans les pauses vocales ou à déterminer l'estimation du spectre de densité de puissance du bruit de fond (Ŝnn (Ω µ,n)) à partir d'une estimation par segment du minima de la puissance d'un signal de microphone.
- Procédé (1140) selon l'une des revendications 31 à 33, caractérisé en ce que le facteur de suppression de bruit est défini par
- Procédé (1140) selon l'une des revendications 24 à 34, caractérisé en ce que l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend la réestimation de l'estimation de fréquence fondamentale vocale (fp(n)) dans le cas où l'estimation de fréquence fondamentale vocale déterminée est inférieure à la valeur de fréquence prédéfinie (fk) dans lequel l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend la réestimation par la recherche d'un nouvel indice (k, m) d'une nouvelle valeur maximale (τ̃p (n)) de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)), le premier ou le second ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n),(r̂ŷŷ,g (m,n)) ou le premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd,gmod(m,n)) parmi un nouveau nombre de valeurs desdits ensembles de valeurs de fonction de corrélation et la production d'un produit d'une fréquence d'échantillonnage (fs) et d'une valeur réciproque dudit nouvel indice (τ̃p (n)) comme estimation de la fréquence fondamentale vocale déterminée. - Procédé (1140) selon la revendication 35, caractérisé en ce que l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend l'étape consistant à rechercher ledit indice (k, m) de ladite nouvelle valeur maximale (τ̃p (n)) en utilisant un nombre k de valeurs desdits ensembles de valeurs de fonction de corrélation qui est défini par
- Procédé (1140) selon une des revendications 35 ou 36, caractérisé en ce que l'étape consistant à déterminer (1156) l'estimation de fréquence fondamentale vocale (fp(n)) comprend la production dudit produit comme estimation de fréquence fondamentale vocale prédéterminée (fp(n)) uniquement dans le cas où le nouvel indice (τ̃p (n)) est supérieur à 60 pour cent de l'indice maximal recherché précédemment (τp (n)) et où la valeur (r̂ŷŷ,erw (τ̃p (n),n)) de l'ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) audit nouvel indice (τ̃p (n)) est supérieure à une valeur d'amplitude précédemment définie (p̃ 0).
- Procédé (1140) selon l'une des revendications 24 à 37, caractérisé en ce que l'étape de détermination de l'estimation de fréquence fondamentale vocale (fp(n)) comprend la modification d'une période fondamentale vocale (τ̃p (n)) correspondant à ladite estimation de fréquence fondamentale vocale déterminée (fp(n)) par un terme de correction d'interpolation (Δp(n)) avant de produire ladite estimation de fréquence fondamentale (fp(n)) dans laquelle ledit terme de correction d'interpolation (Δp(n)) dépend de valeurs dudit premier ou second ensemble de valeurs de fonction de corrélation (r̂ŷŷ
d ,g (m,n),(r̂ŷŷ,g (m,n)) dudit ensemble étendu de valeurs de fonction de corrélation (r̂ŷŷ,erw (k,n)) dudit premier ensemble compensé de valeurs de fonction de corrélation (r̂ ŷŷd,g,mod(m,n)), respectivement. - Procédé (1140) selon l'une des revendications précédentes, caractérisé en ce que le procédé comprend en outre une étape de réception des versions de domaine de fréquence (Y 1,Y 2) du premier et du second ensemble de valeurs de signal de domaine temporel (y1, y2), de filtrage par domaine de fréquence desdites versions du domaine de fréquence afin d'obtenir lesdits premier et second ensembles de valeurs (Ỹ 1,Ỹ 2 ) respectivement, et de fourniture desdits premier et second ensembles de valeurs (Ỹ 1 ,Ỹ 2), les premier et second calculateurs de spectre de densité de puissance respectivement.
- Procédé (1140) selon la revendication 39, caractérisé en ce que l'étape du filtrage de domaine de fréquence n'est effectuée que pour les fréquences inférieures à une fréquence limite prédéfinie.
- Procédé (1140) selon la revendication 40, caractérisé en ce que l'étape du filtrage de domaine de fréquence comprend l'étape consistant à retarder les valeurs des versions de domaine de fréquence supérieures à ladite fréquence limite prédéfinie.
- Produit de programme informatique ayant un code programme pour exécuter le procédé selon l'une des revendications 23 à 41, lorsque le programme informatique tourne sur un ordinateur.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07000568.1A EP1944754B1 (fr) | 2007-01-12 | 2007-01-12 | Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07000568.1A EP1944754B1 (fr) | 2007-01-12 | 2007-01-12 | Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1944754A1 EP1944754A1 (fr) | 2008-07-16 |
EP1944754B1 true EP1944754B1 (fr) | 2016-08-31 |
Family
ID=37898474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07000568.1A Not-in-force EP1944754B1 (fr) | 2007-01-12 | 2007-01-12 | Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1944754B1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021138201A1 (fr) * | 2019-12-30 | 2021-07-08 | Texas Instruments Incorporated | Estimation de bruit de fond et système de détection d'activité vocale |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2249333B1 (fr) | 2009-05-06 | 2014-08-27 | Nuance Communications, Inc. | Procédé et appareil d'évaluation d'une fréquence fondamentale d'un signal vocal |
RU2587652C2 (ru) * | 2010-11-10 | 2016-06-20 | Конинклейке Филипс Электроникс Н.В. | Способ и устройство для оценки структуры в сигнале |
CN111400883B (zh) * | 2020-03-10 | 2023-05-09 | 南昌航空大学 | 基于频谱压缩的磁声发射信号特征提取方法 |
CN117688371B (zh) * | 2024-02-04 | 2024-04-19 | 安徽至博光电科技股份有限公司 | 一种二次联合广义互相关时延估计方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6587816B1 (en) * | 2000-07-14 | 2003-07-01 | International Business Machines Corporation | Fast frequency-domain pitch estimation |
-
2007
- 2007-01-12 EP EP07000568.1A patent/EP1944754B1/fr not_active Not-in-force
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021138201A1 (fr) * | 2019-12-30 | 2021-07-08 | Texas Instruments Incorporated | Estimation de bruit de fond et système de détection d'activité vocale |
US11270720B2 (en) | 2019-12-30 | 2022-03-08 | Texas Instruments Incorporated | Background noise estimation and voice activity detection system |
Also Published As
Publication number | Publication date |
---|---|
EP1944754A1 (fr) | 2008-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11031029B2 (en) | Pitch detection algorithm based on multiband PWVT of teager energy operator | |
EP2031583B1 (fr) | Estimation rapide de la densité spectrale de puissance de bruit pour l'amélioration d'un signal vocal | |
EP1918910B1 (fr) | Amélioration basée sur modèle de signaux de parole | |
US7286980B2 (en) | Speech processing apparatus and method for enhancing speech information and suppressing noise in spectral divisions of a speech signal | |
Martin | Bias compensation methods for minimum statistics noise power spectral density estimation | |
KR100330230B1 (ko) | 잡음 억제 방법 및 장치 | |
KR100770839B1 (ko) | 음성 신호의 하모닉 정보 및 스펙트럼 포락선 정보,유성음화 비율 추정 방법 및 장치 | |
US7957964B2 (en) | Apparatus and methods for noise suppression in sound signals | |
JP5791092B2 (ja) | 雑音抑圧の方法、装置、及びプログラム | |
EP1806739A1 (fr) | Systeme de suppression du bruit | |
KR19980701735A (ko) | 스펙트럼 감산 잡음억제방법 | |
Yuo et al. | Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences | |
EP2249333B1 (fr) | Procédé et appareil d'évaluation d'une fréquence fondamentale d'un signal vocal | |
CN102612711A (zh) | 信号处理方法、信息处理装置和用于存储信号处理程序的存储介质 | |
EP1944754B1 (fr) | Estimateur de la fréquence fondamentale de la parole et méthode pour estimer une fréquence fondamentale de la parole | |
CN114005457A (zh) | 一种基于幅度估计与相位重构的单通道语音增强方法 | |
US10083705B2 (en) | Discrimination and attenuation of pre echoes in a digital audio signal | |
US7003452B1 (en) | Method and device for detecting voice activity | |
EP1635331A1 (fr) | Procédé d'estimation d'un rapport signal-bruit | |
US20230095174A1 (en) | Noise supression for speech enhancement | |
JP4125322B2 (ja) | 基本周波数抽出装置、その方法、そのプログラム並びにそのプログラムを記録した記録媒体 | |
Chang et al. | Pitch estimation of speech signal based on adaptive lattice notch filter | |
Funaki | Speech enhancement based on iterative wiener filter using complex speech analysis | |
Lim et al. | Acoustic blur kernel with sliding window for blind estimation of reverberation time | |
Patil et al. | Use of baseband phase structure to improve the performance of current speech enhancement algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20080527 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK RS |
|
AKX | Designation fees paid |
Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: NUANCE COMMUNICATIONS, INC. |
|
17Q | First examination report despatched |
Effective date: 20120118 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602007047685 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G10L0011040000 Ipc: G10L0025900000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 25/90 20130101AFI20160229BHEP Ipc: G10L 21/0216 20130101ALN20160229BHEP |
|
INTG | Intention to grant announced |
Effective date: 20160315 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602007047685 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 825594 Country of ref document: AT Kind code of ref document: T Effective date: 20161015 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20160831 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 825594 Country of ref document: AT Kind code of ref document: T Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161201 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161130 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170102 Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602007047685 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602007047685 Country of ref document: DE |
|
26N | No opposition filed |
Effective date: 20170601 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20170112 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20170929 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170131 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170131 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170131 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170801 Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170112 Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170112 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170112 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20070112 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160831 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161231 |