EP0821345A1 - Procédé d'extraction de la fréquence fondamentale d'un signal de parole - Google Patents
Procédé d'extraction de la fréquence fondamentale d'un signal de parole Download PDFInfo
- Publication number
- EP0821345A1 EP0821345A1 EP97401752A EP97401752A EP0821345A1 EP 0821345 A1 EP0821345 A1 EP 0821345A1 EP 97401752 A EP97401752 A EP 97401752A EP 97401752 A EP97401752 A EP 97401752A EP 0821345 A1 EP0821345 A1 EP 0821345A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech signal
- values
- fundamental frequency
- frames
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 82
- 238000004364 calculation method Methods 0.000 claims abstract description 35
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000001228 spectrum Methods 0.000 claims description 41
- 230000003595 spectral effect Effects 0.000 claims description 14
- 230000006835 compression Effects 0.000 claims description 13
- 238000007906 compression Methods 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000012805 post-processing Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 210000000056 organ Anatomy 0.000 claims description 2
- 230000002093 peripheral effect Effects 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 230000001594 aberrant effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 24
- 238000010586 diagram Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 241000135309 Processus Species 0.000 description 2
- 241000723104 Progne subis Species 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- -1 (i) - F Chemical class 0.000 description 1
- 241001080024 Telles Species 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Definitions
- the invention relates to a method for extracting fundamental frequency of a speech signal.
- the fundamental frequency is one of the parameters that characterize the better the voice of a given speaker and that allows therefore to contribute to the certain authentication of it.
- spectral compression method Another method, designated by spectral compression method, has been published by NOLL (1970), see the work by W. HESS previously cited on pages 414-417.
- This method based on an analysis of the harmonic structure of the energy spectrum as a function of the frequency of the speech signal, consists in compressing the energy spectrum of the speech signal along the frequency axis, by factors successive integers, then adding the compressed spectra obtained to the initial spectrum.
- the disadvantage of this method lies in the fact that the amplitude of the harmonic peaks decreases as a function of the frequency, with a slope of the order of -12 dB / octave. Although a pre-emphasis process raises the level of high frequency harmonics, some harmonic peaks have a lower energy level than others due to the contribution of formants, which causes frequent errors in estimating the value of the fundamental frequency.
- the subject of the present invention is the implementation of a process for extracting the fundamental frequency of a speech signal in which frequency extraction fundamental is obtained with increased reliability.
- Another object of the present invention is the implementation using a method for extracting the fundamental frequency of a speech signal in which the process actual extraction of the fundamental frequency may be conditional upon detection of voicing or the absence of voicing of the sounds constituting the signal of speech.
- Another object of the present invention is finally the implementation of a frequency extraction process fundamental of a speech signal in which the value of extracted fundamental frequency is further subjected to a post-processing process, of the learning type, in order to eliminate any improbable or outliers.
- the method of extracting the fundamental frequency from a speech signal, a succession of digital samples, which is the subject of the present invention, is remarkable in that it comprises at least the steps consisting in subjecting this speech signal to a process.
- the process which is the subject of the present invention finds in particular application to speech recognition and identification of speakers from signatures sound.
- the signal of speech on which we want to extract the fundamental frequency is for example an analog signal representative of distinct words and syllables, this signal analog being transformed into a succession of samples digital, the speech signal, in its digital form, being designated by sp in Figure 1a.
- the speech signal sp is then subjected to a pre-emphasis process making it possible to generate a pre-accented speech signal, noted spp.
- the pre-emphasis process is a conventional type process, which, as such, will not be described in detail.
- This process consists of an overall pre-emphasis, which in fact consists in applying an increasing gain value with frequency to compensate for the attenuation of the harmonics of higher rank.
- This formatting operation actually consists of constitute the pre-accentuated speech signal spp in frames successive each with N samples and corresponding at a duration of these N samples, two consecutive frames each having a number of duration overlaps consecutive consecutive samples at most equal to 50/100 of number N of constituent samples of each frame.
- the aforementioned step b) also consists in calculating, on each current frame designated by T q , a first set of values, denoted X 1 (k) of the logarithm of the energy spectrum for the frame considered by application of a transform of Fourier on a number M 1 of points.
- the number M 1 of points on which the Fourier transform is applied is chosen so that the Shannon theorem is satisfied.
- the number M 1 of points can be taken equal to 128.
- step b represented in FIG. 1a, then makes it possible to have the first set of values, denoted ⁇ X 1 (k) ⁇ .
- the method which is the subject of the present invention then consists in carrying out in a step c) the calculation, from the first set of values ⁇ X 1 (k) ⁇ , a determined number p of first coefficients cepstrals denoted C (m) of the logarithm of the energy spectrum defined by the first set of values ⁇ X 1 (k) ⁇ .
- cepstral coefficients verify the relation:
- p designating the number of first cepstral coefficients calculated and retained for the implementation of the method object of the present invention .
- p can be limited to 16.
- step c we thus have the above-mentioned cepstral coefficients, which will allow the implementation of the following stages of the process which is the subject of the invention, as shown in Figure 1a.
- step d) the process which is the subject of the The present invention consists, in a step d), in submitting the speech signal pre-emphasized spp to type filtering low pass and downsampling to generate a sub-sampled filtered speech signal, noted spf.
- the filtering low pass type can be achieved with a low pass filter with a cut-off frequency equal to 2 kHz by means of a finite impulse response filter, called RIF filter, at 47 coefficients.
- the filtered signal from the above filtering can then be subjected to subsampling, the subsampling can be achieved by decimation, for delivering the noted down sampled filtered speech signal spf.
- step d) is then followed, as shown in FIG. 1a, by a step e) consisting in calculating by spectral compression the maximum fundamental frequency of rank k of a function P (k) representative of the difference between a second set of values X 2 (k) of the logarithm of the energy spectrum of the subsampled filtered speech signal spf, and of the set of values H (k) of the smoothed frequency spectrum obtained from the cepstral coefficients available at the end of step c) previously mentioned in the description.
- the function P (k) checks the relation:
- step e) The formatting step carried out in step e) is then followed by an effective step of calculating the second set of values ⁇ X 2 (k) ⁇ of the logarithm of the energy spectrum, this calculation being carried out by application of a Fourier transform on a number M 2 of points for each current frame obtained at the end of the formatting carried out.
- the step of calculating the second set of values ⁇ X 2 (k) ⁇ is then followed by a step of calculating the smoothed frequency spectrum H (k) from the cepstral coefficients C (m) available from the end of l step c), the connection between step c) and step e) in FIG. 1a being shown in phantom for this reason.
- the smoothed spectrum H (k) is calculated by applying a cosine transform to the p cepstral coefficients available.
- the step of calculating the smoothed frequency spectrum is then followed by a step of calculating the function P (k) verifying the relation previously cited in the description.
- L is equal to M 2 / k for k varying between a first and a second value representative of a low frequency band between 70 and 450 Hz.
- the fundamental frequency extraction process of a speech signal allows, by spectral compression, by the calculation of the product harmonic of the difference between the energy spectrum of the speech signal and the spectrum of the signal smoothed, to eliminate the contribution of the formants and extracting the structure harmonic of the fundamental frequency of the speech signal.
- steps a) to e) can be executed successively.
- the pre-emphasized speech signal spp and that, on the other share, cepstral coefficients, especially p cepstral coefficients used, can be stored at the outcome of step c) respectively after step a) to allow the sequential implementation of the steps b) to e) previously mentioned.
- the process object of the present invention can be implemented in an alternative embodiment as shown in FIG. 1b, in parallel, steps b), c) being carried out sequentially, in parallel with steps d) and e) from of the pre-emphasized speech signal spp.
- This embodiment as shown in Figure 1b is made possible by due to the fact that steps b) and c) are qualitatively independent of steps d) and e) and can be carried out in parallel on the pre-emphasized speech signal spp.
- this other embodiment although equivalent to the embodiment described with calculation of the first set of values X 1 (k) then of the second set of values X 2 (k), has the drawback of requiring keeping in memory all the values X (k) during the entire execution time of the calculation process for each of the current frames, which causes a memory congestion harmful to the management of all the resources of calculation.
- step e) of calculation by spectral compression consists, as mentioned previously in the description, in performing a step e 1 ) comprising the formatting in frames of N 2 samples from the speech signal filtered under- sampled spf and calculation of the second set of values X 2 (k) of the logarithm of the energy spectrum by application of a Fourier transform on a number M 2 of points on a frequency band between 0 and 2 KHz.
- Sub-step e 1 ) and followed by sub-step e 2 ) consisting in calculating the spectral envelope H (k) or smoothed frequency spectrum of the current frame on the frequency band between 0 and 2 kHz over the same number M 2 of points, by applying to the first p-1 cepstral coefficients of a cosine transform verifying the relation:
- Sub-step e 3 ) is itself followed by a sub-step e 4 ) consisting in calculating the function P (k) by spectral compression of the difference D (k) on the low frequency band between 70 and 450 Hz.
- sub-step e 4 is itself followed by a sub-step e 5 ) carrying out the extraction of the maximum of the function P (k) for the value of k representative of the value F o , fundamental frequency of the speech signal.
- Sub-step e 5 can be carried out from a program for sorting the successive values of the function P (k) in the aforementioned low frequency band.
- the sorting program is a conventional type program for searching for a maximum value among several values.
- FIG. 2b diagrams have been shown successively in a W-frequency energy space successively relating to the short-term spectrum between 0 and 2 KHz of a frame of a speech signal, the frame having a duration of 32 ms over 2048 points, this diagram possibly corresponding to a frame obtained following the formatting sub-step carried out in sub-step e 1 ) of FIG. 2a, the spectral envelope obtained by cosine transform applied to the first 16 cepstral coefficients, this envelope representing only the contribution of the formants, that is to say the smooth spectrum H (k) obtained at the end of sub-step e 2 ) of FIG.
- the process which is the subject of the present invention can normally be implemented on a continuous stream or pseudo-continuous of words or syllables constituting a signal of speech.
- the method the present invention may advantageously consist, moreover, to discriminate, among all the successive frames, the voiced frames and unvoiced frames then to be eliminated each unvoiced frame.
- unvoiced frames do not are not physically removed from the frame sequence common.
- These unvoiced frames are discriminated by assigning a fundamental frequency value thereto arbitrary, zero value, as will be described later in the description.
- the constitution of these signals in successive frames of N respectively N 2 samples can be carried out in a conventional manner by reception and storage of these samples at specific addresses of a random access memory for example, then sequential reading , as shown in FIG. 3a, successive frames, with reading for example of the frame of rank q-1 by simultaneous reading of the N corresponding samples, then reading at the end of the frame duration, ie 32 ms, of the frame of subsequent rank q corresponding to N samples overlapping N / 2 samples with respect to the previous frame of rank q-1, and so on for the frame of rank q + 1 and the following frames.
- This reading process can advantageously be carried out by simple addressing in reading of the memory containing the samples of the speech signal. As shown in FIG.
- the process of discrimination between voiced frames and unvoiced frames can consist, starting from the current frame T q , in a step 100, to apply a criterion 101 of discrimination between current frames, voiced or unvoiced.
- the current frame T q is assigned an arbitrary value of fundamental frequency, zero value for example, in a step 102, whereas on the contrary, on a positive response to criterion 101, the current frame is kept in step 103 for processing according to the calculation process to extract the fundamental frequency from the speech signal.
- the succession of current frames kept in step 103 is then subject, as a function of the signal considered spp, respectively spf, to the calculation of the first set of values X 1 (k) or X 2 (k) respectively, within the framework of the implementation of step b) or step e), or sub-step e 1 ), of FIGS. 1a, 1b or 2a.
- this can consist, as shown in connection with FIG. 3c, of subdividing each current frame T q into a number ST of contiguous frame segments successive, then establishing, for each of the frame segments, a voicing discrimination criterion.
- FIG. 3c four contiguous frame segments, denoted S 1 to S 4 , are shown, each frame segment therefore comprising 64 samples and occupying a duration of 8 ms.
- the voicing discrimination criterion may consist in assigning to each frame segment considered a voicing index whose value is between 0 and 1.
- Each voicing index is denoted Vs (1) to Vs (4) and is representative of the low frequency energy level of the frame segment S 1 to S 4 considered, according to a substantially linear law.
- each current frame T q is classified as an unvoiced frame by comparison of a linear combination of the voicing indices of each segment with a determined threshold value.
- the aforementioned linear combination of the voicing indices can consist in calculating the arithmetic mean of these indices and in comparing this arithmetic mean with the aforementioned threshold value ⁇ , the criterion for comparing the combination linear writing:
- each voicing index can be assigned based low frequency energy of each segment according to the abacus shown in the above figure.
- the index value of affected voicing is linear between the values 0 and 1 for low frequency energy values of each segment between -35 and -15 dB. These values may well heard to be changed.
- errors can occur in the estimation the value of the fundamental frequency of the signal speech, these errors may be due to the presence in a same frame of voiced segments and unvoiced segments or of silences. These types of errors are referred to as errors of transition. Such errors can also occur in voiced or mixed low energy frames. In certain conditions, it is then possible to correct these errors when, when correction is not possible, the value of the fundamental frequency of the speech signal is taken arbitrarily equal to a fictitious value, the zero value, by convention for example, similarly the value assigned to unvoiced frames or frames of silence.
- the process which is the subject of the present invention can then consist, in addition, in carrying out a post-processing of the value extracted from fundamental frequency of the signal speech.
- This post-processing step can consist of example to establish a histogram of fundamental frequencies, to determine the range of frequency values most likely as well as lower bounds and higher of these values.
- the process post-processing can be to submit each value extracted from fundamental frequency to a sorting criterion by relation to the lower and upper value limits, for get sorted values representative of the evolution values extracted from fundamental frequency.
- > ⁇ F 0 (i) 0.
- the index i assigned to the fundamental frequency values designates the successive order of the extracted values
- ⁇ represents an arbitrary threshold value to which is compared the difference between two successive fundamental frequency values of rank i and i-1.
- the isolated zero values are then recalculated by linear interpolation, while the non-zero values isolated in the middle of a series of zeros are assigned to the value 0 by convention. Finally, statistical parameters such as maximum and minimum F o values as well as the average value can be calculated.
- the device shown in the above figure allows the implementation of the process, object of the present invention, previously described in the description.
- This device presents an architecture adapted to the implementation work of this process.
- a circuit 1 for sampling and conversion analog-to-digital analog speech signal entry into a series of digital samples.
- a host computer 2 is provided to allow driving of the succession of steps a) to e) of the process which is the subject of the present invention, as well as management and control peripheral organs such as in particular the circuit sampling 1 and analog-digital conversion, as will be described later in the description.
- the acquisition of the constituent samples of the speech signal sp is carried out by
- the dedicated digital signal processor 3 can be made up of a MOTOROLA signal processor, referenced DSP56001, clocked at the clock frequency of 33 MHz.
- the host microcomputer 2 can advantageously be consisting of a PC-PENTIUM microcomputer, clocked at a clock frequency of 90 MHz and equipped with a operating system such as an operating system MS-WINDOWS multitasking.
- the digital signal processor dedicated 3 is a 24 bit fixed point processor, this type processor to perform calculations previously cited, for the implementation of steps a) to e) of process object of the present invention optimally.
- This signal processor 3 is in fact constituted by a central processing unit 30, denoted DSP-CPU, to which associated with a program memory space denoted P, referenced 31, and two data memory spaces, noted X and Y, with a capacity of 512 words each and referenced 32.
- the memory spaces P, X and Y are each accessible by three independent 24-bit BUSes, the addressing being done by three 16-bit BUSs for addressing separately each memory space which can therefore be extended at 64 k words.
- the programs and calculation subroutines are executed within 512 words of the internal memory P, these programs or subroutines being previously loaded in the 8 k-words from memory P external.
- a program or a subroutine can be transferred from the external memory to the internal memory to be executed.
- the data to be processed, data relating to the signal speech, as well as the calculation tables necessary for the calculation cepstral coefficients for example and the results intermediaries are stored in spaces X and Y 32 extended to 2 x 64 k-words.
- the host microcomputer 2 has programs and subprograms to ensure dialogue with the dedicated digital signal processor 3 for performing loading code and data, reading data, code transfer, execution of one or more programs as well as the initialization of the analog-digital conversion module 1 to ensure acquisition and reproduction of the speech signal.
- the assembly constituted by the conversion circuit analog-digital 1 and digital signal processor dedicated 3 is installed on an additional card, such than a card marketed by the DIGIMETRIE Company, under the PC-DSP56k / AD / MEM reference.
- This card in addition to the digital signal processor DSP56001, has a analog-to-digital / digital-to-analog converter marketed by TEXAS INSTRUMENTS, under the reference TCL32040CN to ensure the acquisition of speech signals, this converter bearing the reference 10 in figure 4.
- the calculation time of the fundamental frequency for 100 speech frames of duration 32 ms, is approximately 2.7 seconds, or 27 ms per frame of 32 ms.
- the host microcomputer can be configured from the MS-Windows operating system to operate in multitasking mode, which allows driving parallel operations in the aforementioned multitasking mode.
- a such a procedure is not essential but it allows optimize the use of computing resources.
- the method and the device, objects of the present invention can advantageously be used so as to produce a speaker authentication system with a high probability of success.
- the construction of the frequency histogram can be carried out, either generally for a determined number of speakers, or, on the contrary, for a particular speaker for which the frequency histogram is effectively representative of this speaker. It is of course the same as regards the value of the lower and upper limits, as well as, where appropriate, statistical parameters such as the values F 0max and F 0min and mean value of the fundamental frequency of the speech signal of this speaker.
- the aforementioned frequency histogram, for a determined speaker can then be updated over time as a function of the evolution of the speaker's voice.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
- Les méthodes temporelles telles que celles mettant en oeuvre un processus d'autocorrélation avec écrêtement central et comparaison des pics à une valeur de seuil ou celles désignées par AMDF, ces dernières ayant été décrites par R.BOITE et M.KUNT dans l'ouvrage intitulé "Traitement de la parole", pages 193-195, Presses polytechniques romandes, Lausanne 1987, sont relativement peu coûteuses en temps de calcul car elles ne nécessitent pas la mise en oeuvre d'opérations arithmétiques de multiplication. Toutefois, elles manquent de précision et il est nécessaire, en conséquence, de procéder à un suréchantillonnage du signal de parole, afin d'obtenir une précision convenable, ce qui, bien entendu, entraíne une augmentation notable du temps de calcul effectif.
- Les méthodes fréquentielles sont, au contraire, basées sur l'analyse de la structure harmonique du spectre d'énergie en fonction de la fréquence du signal de parole. Parmi celles-ci, la méthode dite du peigne, décrite par P.MARTIN dans l'article intitulé "Extraction de la fréquence fondamentale par intercorrélation avec une fonction peigne", publiée aux Journées d'Etude Parole 12, pp. 221-232, 1981, consiste à calculer la fonction d'intercorrélation entre le spectre du signal numérique de parole et une fonction en peigne, pour différentes valeurs de la distance entre les dents du peigne. Le maximum de la fonction d'intercorrélation est obtenu pour une distance entre deux dents consécutives du peigne, égale à la fréquence fondamentale du signal à analyser. Cette méthode présente une bonne fiabilité mais elle est relativement complexe, dans la mesure où elle nécessite un prélèvement fréquentiel consistant à ne retenir que les maxima du spectre et les valeurs adjacentes. En outre, il est nécessaire d'effectuer une interpolation afin d'augmenter la précision du résultat.
L = M/k, M désignant le nombre de points du spectre X(l) désigne le logarithme du spectre d'énergie.
L'inconvénient de cette méthode réside dans le fait que l'amplitude des pics harmoniques décroít en fonction de la fréquence, avec une pente de l'ordre de -12 dB/octave. Bien qu'un processus de pré-accentuation permette de relever le niveau des harmoniques de fréquence élevée, certains pics harmoniques présentent un niveau d'énergie plus faible que d'autres en raison de la contribution des formants, ce qui provoque des erreurs fréquentes dans l'estimation de la valeur de la fréquence fondamentale.
- la figure la représente un organigramme illustratif de l'ensemble des étapes permettant la mise en oeuvre du procédé objet de la présente invention ;
- la figure 1b représente un organigramme illustratif d'une variante de mise en oeuvre avantageuse du procédé objet de la présente invention, dans laquelle certaines étapes sont conduites en parallèle ou, le cas échéant, sous système d'exploitation multitâche afin de permettre un mode opératoire en temps réel, sans toutefois nécessiter une puissance de calcul très importante ;
- la figure 2a représente un détail de réalisation d'une succession d'étapes élémentaires permettant une mise en oeuvre optimale de l'étape terminale de calcul par compression spectrale de la fréquence fondamentale du signal de parole du procédé, objet de la présente invention, illustré conformément à la figure 1a ou 1b ;
- la figure 2b représente une série de signaux obtenus dans le domaine fréquentiel suite à la mise en oeuvre des étapes élémentaires illustrées en figure 2a ;
- les figures 3a, 3b, 3c et 3d représentent un mode opératoire de formatage de trames d'échantillons, constitutifs du signal de parole, un processus de discrimination des trames courantes en fonction d'un critère relatif au caractère voisé ou non voisé de chaque trame courante, un mode d'établissement de ce critère et un abaque d'attribution d'un indice de voisement de segments temporels constitutifs de chaque trame respectivement ;
- la figure 4 représente un schéma synoptique de l'architecture d'un dispositif permettant la mise en oeuvre du procédé, objet de la présente invention, à partir d'un micro-ordinateur hôte et d'un processeur de signal numérique spécialisé ou dédié connectés par une liaison de type BUS.
La fonction P(k) vérifie la relation :
si F0(i) > B.Sup
Claims (9)
- Procédé d'extraction de la fréquence fondamentale d'un signal de parole, succession d'échantillons numériques, caractérisé en ce que ce procédé comprend au moins les étapes consistant à :a) soumettre ledit signal de parole à un processus de préaccentuation, pour engendrer un signal de parole préaccentué ;b) calculer, à partir du signal de parole préaccentué, pour chaque trame courante d'une succession de trames correspondant chacune en durée à un nombre déterminé N d'échantillons, deux trames consécutives présentant chacune un recouvrement de durée en nombre d'échantillons consécutifs communs au plus égal à 50/100 du nombre N d'échantillons, un premier ensemble de valeurs X1(k) du logarithme du spectre d'énergie par transformée de Fourier sur un nombre M1 de points ;c) calculer, à partir dudit ensemble de valeurs, un nombre p déterminé de premiers coefficients cepstraux C(m), par application d'une transformée en cosinus discrète auxdites valeurs X1(k) sur un nombre de ces valeurs au moins égal à la moitié du nombre N d'échantillons constitutifs de ladite trame courante, ladite transformée vérifiant la relation : avec m = [1,2,...,p], C(m) désignant lesdits coefficients cepstraux ;d) soumettre ledit signal de parole préaccentué à un filtrage de type passe-bas et à un sous-échantillonnage, pour engendrer un signal de parole filtré sous-échantillonné ;e) calculer, par compression spectrale, à partir dudit signal de parole filtré sous-échantillonné et à partir desdits coefficients cepstraux pour chaque trame courante d'une succession de trames de même recouvrement de durée, la fréquence fondamentale maximum de rang k d'une fonction P(k) représentative de la différence entre un deuxième ensemble des valeurs X2(k) du logarithme du spectre d'énergie et l'ensemble des valeurs H(k) du spectre de fréquences lissé, ladite fonction vérifiant la relation : avec L = M2/k, k variant entre une première et une deuxième valeur représentatives d'une bande de fréquences basses comprises entre 70 et 450 Hz, ladite fonction P(k) présentant un maximum pour k=F0, valeur extraite de la fréquence fondamentale du signal de parole.
- Procédé selon la revendication 1, caractérisé en ce que ladite étape de calcul par compression spectrale consiste successivement à :calculer sur ledit signal de parole filtré sous-échantillonné, pour chaque trame courante, ledit deuxième ensemble de valeurs X2(k) du logarithme du spectre d'énergie par transformée de Fourier sur un nombre M2 de points sur une bande de fréquences comprises entre 0 et 2 kHz ;calculer l'enveloppe spectrale H(k), spectre de fréquences lissé de ladite trame courante sur ladite bande de fréquences comprises entre 0 et 2 kHz sur un même nombre M2 de points, par application sur lesdits p-1 premiers coefficients cepstraux d'une transformée en cosinus vérifiant la relation : avec k = [0,1,2,...M2] et M2 = Q/4 ;calculer la différence D(k) = X2(k) - H(k) ;calculer le produit harmonique représentatif de la fonction P(k) par compression spectrale de ladite différence D(k) sur ladite bande de fréquences basses comprises entre 70 et 450 Hz ;déterminer par un processus de tri le maximum de la fonction P(k) et le rang k=F0 correspondant, valeur extraite de la fréquence fondamentale.
- Procédé selon la revendication 1 ou 2, caractérisé en ce que, suite à l'étape consistant à soumettre le signal de parole préaccentué respectivement filtré sous-échantillonné à un formatage en trames, celui-ci consiste en outre à discriminer, parmi l'ensemble des trames, les trames voisées et les trames non voisées, le processus d'extraction de la fréquence fondamentale étant conduit sur les trames voisées.
- Procédé selon la revendication 3, caractérisé en ce que l'étape consistant à discriminer les trames voisées et les trames non voisées consiste :à subdiviser chaque trame en un nombre ST de segments de trames contigus successifs ;à établir pour chacun desdits segments de trame un critère de discrimination de voisement, à partir d'un indice de voisement, compris entre 0 et 1 représentatif du niveau d'énergie basse fréquence du segment de trame considéré selon une loi sensiblement linéaire ;à classifier chaque trame comme trame non voisée par comparaison d'une combinaison linéaire des indices de voisement de chaque segment à une valeur de seuil déterminée.
- Procédé selon l'une des revendications 1 à 4, caractérisé en ce que suite à l'étape de détermination du maximum de rang k de ladite fonction P(k), k=F0 représentant la valeur de la fréquence fondamentale du signal de parole, et en vue d'éliminer toute valeur de fréquence fondamentale aberrante et supprimer les risques d'erreur dues à la présence d'erreurs de transitions engendrées par l'existence, dans une même trame, de segments voisés, non voisés ou de silences ainsi que par l'existence de trames voisées ou mixtes de faible niveau d'énergie, ledit procédé consiste en outre à effectuer un post-traitement de ladite valeur extraite de fréquence fondamentale dudit signal de parole, cette étape de post-traitement consistant à :établir un histogramme des fréquences fondamentales, afin de déterminer la plage de valeurs de fréquences les plus probables et les bornes de valeurs inférieure et supérieure de ces valeurs ;soumettre chaque valeur extraite de fréquence fondamentale à un critère de tri par rapport auxdites bornes de valeurs inférieure et supérieure, pour obtenir des valeurs triées représentatives de l'évolution des valeurs extraites de fréquence fondamentale ;soumettre ces valeurs triées à un filtrage non linéaire pour supprimer les valeurs aberrantes.
- Procédé selon la revendication 1, caractérisé en ce que les étapes a) à e) sont réalisées séquentiellement.
- Procédé selon la revendication 6, caractérisé en ce que les étapes b) et c), respectivement d) et e) sont réalisées sous système d'exploitation multi-tâches, ce qui permet d'effectuer l'extraction de la fréquence fondamentale en temps réel.
- Dispositif d'extraction de la fréquence fondamentale d'un signal de parole, conformément au procédé selon l'une des revendications 1 à 7, caractérisé en ce que ce dispositif comprend :des moyens d'échantillonnage et de conversion analogique-numérique d'un signal de parole en une suite d'échantillons numériques ;un micro-ordinateur hôte permettant la conduite de la succession des étapes a) à e) du procédé et la gestion et la commande d'organes périphériques, notamment lesdits moyens d'échantillonnage et de conversion analogique-numérique ;un processeur de signal numérique interconnecté par une liaison par BUS audit micro-ordinateur hôte et permettant d'effectuer les opérations de calcul du premier ensemble de valeurs X1(k) du logarithme du spectre d'énergie par transformée de Fourier sur un nombre M1 de points, des p premiers coefficients cepstraux, de filtrage passe-bas et de sous-échantillonnage, du deuxième ensemble de valeurs X2(k) du logarithme du spectre d'énergie, de l'ensemble des valeurs H(k) du spectre de fréquences lissé, de la fonction de l'extraction du maximum P(k) pour k = F0 valeur extraite de la fréquence fondamentale du signal de parole.
- Utilisation du procédé et du dispositif d'extraction de la fréquence fondamentale d'un signal de parole selon l'une des revendications 1 à 8, pour l'authentification d'un ou plusieurs locuteurs.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9609313A FR2751776B1 (fr) | 1996-07-24 | 1996-07-24 | Procede d'extraction de la frequence fondamentale d'un signal de parole |
FR9609313 | 1996-07-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0821345A1 true EP0821345A1 (fr) | 1998-01-28 |
EP0821345B1 EP0821345B1 (fr) | 2001-09-05 |
Family
ID=9494427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19970401752 Expired - Lifetime EP0821345B1 (fr) | 1996-07-24 | 1997-07-21 | Procédé d'extraction de la fréquence fondamentale d'un signal de parole |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0821345B1 (fr) |
DE (1) | DE69706488T2 (fr) |
FR (1) | FR2751776B1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002097793A1 (fr) * | 2001-06-01 | 2002-12-05 | France Telecom | Procede d'extraction de la frequence fondamentale d'un signal sonore |
EP2081405A1 (fr) | 2008-01-21 | 2009-07-22 | Bernafon AG | Appareil d'aide auditive adapté à un type de voix spécifique dans un environnement acoustique, procédé et utilisation |
-
1996
- 1996-07-24 FR FR9609313A patent/FR2751776B1/fr not_active Expired - Fee Related
-
1997
- 1997-07-21 DE DE1997606488 patent/DE69706488T2/de not_active Expired - Fee Related
- 1997-07-21 EP EP19970401752 patent/EP0821345B1/fr not_active Expired - Lifetime
Non-Patent Citations (3)
Title |
---|
HERMES D J: "Measurement of pitch by subharmonic summation", JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, JAN. 1988, USA, vol. 83, no. 1, January 1988 (1988-01-01), ISSN 0001-4966, pages 257 - 264, XP002027082 * |
KEIKICHI HIROSE ET AL: "A SCHEME FOR PITCH EXTRACTION OF SPEECH USING AUTOCORRELATION FUNCTION WITH FRAME LENGTH PROPORTIONAL TO THE TIME LAG", SPEECH PROCESSING 1, SAN FRANCISCO, MAR. 23 - 26, 1992, vol. 1, 23 March 1992 (1992-03-23), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 149 - 152, XP000341105 * |
W. HESS: "Pitch Determination of Speech Signals: Algorithm and Methods", 1983, SPRINGER VERLAG, BERLING HEIDELBERG NY TOKYO, XP002027083 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002097793A1 (fr) * | 2001-06-01 | 2002-12-05 | France Telecom | Procede d'extraction de la frequence fondamentale d'un signal sonore |
FR2825505A1 (fr) * | 2001-06-01 | 2002-12-06 | France Telecom | Procede d'extraction de la frequence fondamentale d'un signal sonore au moyen d'un dispositif mettant en oeuvre un algorithme d'autocorrelation |
EP2081405A1 (fr) | 2008-01-21 | 2009-07-22 | Bernafon AG | Appareil d'aide auditive adapté à un type de voix spécifique dans un environnement acoustique, procédé et utilisation |
Also Published As
Publication number | Publication date |
---|---|
DE69706488D1 (de) | 2001-10-11 |
EP0821345B1 (fr) | 2001-09-05 |
FR2751776A1 (fr) | 1998-01-30 |
DE69706488T2 (de) | 2002-05-23 |
FR2751776B1 (fr) | 1998-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0594480B1 (fr) | Procédé de détection de la parole | |
EP0867856B1 (fr) | "Méthode et dispositif de detection d'activité vocale" | |
EP2415047B1 (fr) | Classification du bruit de fond contenu dans un signal sonore | |
EP2419900B1 (fr) | Procede et dispositif d'evaluation objective de la qualite vocale d'un signal de parole prenant en compte la classification du bruit de fond contenu dans le signal | |
FR2522179A1 (fr) | Procede et appareil de reconnaissance de paroles permettant de reconnaitre des phonemes particuliers du signal vocal quelle que soit la personne qui parle | |
EP2603862B1 (fr) | Procédé d'analyse de signaux fournissant des fréquences instantanées et des transformées de fourier glissantes et dispositif d'analyse de signaux | |
EP0511095B1 (fr) | Procédé et dispositif de codage-décodage d'un signal numérique | |
EP0620546B1 (fr) | Procédé de détection énergétique de signaux noyés dans du bruit | |
EP0481895B1 (fr) | Procédé de transmission, à bas débit, par codage CELP d'un signal de parole et système correspondant | |
EP2795618B1 (fr) | Procédé de détection d'une bande de fréquence prédéterminée dans un signal de données audio, dispositif de détection et programme d'ordinateur correspondant | |
EP0234993B1 (fr) | Procédé et dispositif de reconnaissance automatique de cibles à partir d'échos "Doppler" | |
EP1131813A1 (fr) | Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede | |
EP0714088B1 (fr) | Détection d'activité vocale | |
EP0821345B1 (fr) | Procédé d'extraction de la fréquence fondamentale d'un signal de parole | |
EP0574288B1 (fr) | Procédé et dispositif de dissimulation d'erreurs de transmission de signaux audio-numériques codés par transformée fréquentielle | |
CA2108663C (fr) | Procede et dispositif de filtrage pour la reduction des preechos d'un signal audio-numerique | |
EP1605440B1 (fr) | Procédé de séparation de signaux sources à partir d'un signal issu du mélange | |
EP1459214B1 (fr) | Procede de caracterisation d un signal sonore | |
EP0015363B1 (fr) | Détecteur de parole à niveau de seuil variable | |
WO2002082424A1 (fr) | Procede et dispositif d'extraction de parametres acoustiques d'un signal vocal | |
EP1373908A1 (fr) | Procede et dispositif d'analyse d'un signal audio numerique | |
FR3032553A1 (fr) | Procede de generation d'une empreinte audio reduite a partir d'un signal sonore et procede d'identification d'un signal sonore en utilisant une telle empreinte audio reduite | |
FR2689292A1 (fr) | Procédé et système de reconnaissance vocale à réseau neuronal. | |
FR2622727A1 (fr) | Procede de reconnaissance de la parole ou de toute autre onde sonore et son procede de mise en oeuvre | |
FR2856506A1 (fr) | Procede et dispositif de detection de parole dans un signal audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): BE CH DE GB IT LI |
|
17P | Request for examination filed |
Effective date: 19971206 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): BE CH DE GB IT LI |
|
17Q | First examination report despatched |
Effective date: 20000105 |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 11/04 A |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): BE CH DE GB IT LI |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 69706488 Country of ref document: DE Date of ref document: 20011011 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: NV Representative=s name: KELLER & PARTNER PATENTANWAELTE AG |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20011206 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20040628 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20040629 Year of fee payment: 8 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: BE Payment date: 20050622 Year of fee payment: 9 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: CH Payment date: 20050624 Year of fee payment: 9 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050721 Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050721 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060201 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20050721 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060731 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060731 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060731 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
BERE | Be: lapsed |
Owner name: *LA POSTE Effective date: 20060731 Owner name: *FRANCE TELECOM Effective date: 20060731 |