EP0821345B1 - Method to determine the fundamental frequency of a speech signal - Google Patents

Method to determine the fundamental frequency of a speech signal Download PDF

Info

Publication number
EP0821345B1
EP0821345B1 EP19970401752 EP97401752A EP0821345B1 EP 0821345 B1 EP0821345 B1 EP 0821345B1 EP 19970401752 EP19970401752 EP 19970401752 EP 97401752 A EP97401752 A EP 97401752A EP 0821345 B1 EP0821345 B1 EP 0821345B1
Authority
EP
European Patent Office
Prior art keywords
values
speech signal
frames
frame
pitch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP19970401752
Other languages
German (de)
French (fr)
Other versions
EP0821345A1 (en
Inventor
Jean-Jacques Schwartzmann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ministere des PTT
Orange SA
Original Assignee
Ministere des PTT
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ministere des PTT, France Telecom SA filed Critical Ministere des PTT
Publication of EP0821345A1 publication Critical patent/EP0821345A1/en
Application granted granted Critical
Publication of EP0821345B1 publication Critical patent/EP0821345B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the invention relates to a method for extracting fundamental frequency of a speech signal.
  • the fundamental frequency is one of the parameters that characterize the better the voice of a given speaker and that allows therefore to contribute to the certain authentication of it.
  • the subject of the present invention is the implementation of a process for extracting the fundamental frequency of a speech signal in which frequency extraction fundamental is obtained with increased reliability.
  • Another object of the present invention is the implementation using a method for extracting the fundamental frequency of a speech signal in which the process actual extraction of the fundamental frequency may be conditional upon detection of voicing or the absence of voicing of the sounds constituting the signal of speech.
  • Another object of the present invention is finally the implementation of a frequency extraction process fundamental of a speech signal in which the value of extracted fundamental frequency is further subjected to a post-processing process, of the learning type, in order to eliminate any improbable or outliers.
  • the method of extracting the fundamental frequency from a speech signal, a succession of digital samples, which is the subject of the present invention, is remarkable in that it comprises at least the steps consisting in subjecting this speech signal to a process.
  • the process which is the subject of the present invention finds in particular application to speech recognition and identification of speakers from signatures sound.
  • the signal of speech on which we want to extract the fundamental frequency is for example an analog signal representative of distinct words and syllables, this signal analog being transformed into a succession of samples digital, the speech signal, in its digital form, being designated by sp in Figure 1a.
  • the speech signal sp is then subjected to a pre-emphasis process making it possible to generate a pre-accented speech signal, noted spp.
  • the pre-emphasis process is a conventional type process, which, as such, will not be described in detail.
  • This process consists of an overall pre-emphasis, which in fact consists in applying an increasing gain value with frequency to compensate for the attenuation of the harmonics of higher rank.
  • This formatting operation actually consists of constitute the pre-accentuated speech signal spp in frames successive each with N samples and corresponding at a duration of these N samples, two consecutive frames each having a number of duration overlaps consecutive consecutive samples at most equal to 50/100 of number N of constituent samples of each frame.
  • the aforementioned step b) also consists in calculating, on each current frame designated by T q , a first set of values, denoted X 1 (k) of the logarithm of the energy spectrum for the frame considered by application of a transform of Fourier on a number M 1 of points.
  • the number M 1 of points on which the Fourier transform is applied is chosen so that the Shannon theorem is satisfied.
  • the number M 1 of points can be taken equal to 128.
  • step b represented in FIG. 1a, then makes it possible to have the first set of values, noted ⁇ X 1 (k) ⁇ .
  • the method which is the subject of the present invention then consists in carrying out in a step c) the calculation, from the first set of values ⁇ X 1 (k) ⁇ , a determined number p of first coefficients cepstrals denoted C (m) of the logarithm of the energy spectrum defined by the first set of values ⁇ X 1 (k) ⁇ .
  • cepstral coefficients verify the relation:
  • p designating the number of first cepstral coefficients calculated and retained for the implementation of the method object of the present invention .
  • p can be limited to 16.
  • step c we thus have the above-mentioned cepstral coefficients, which will allow the implementation of the following stages of the process which is the subject of the invention, as shown in Figure 1a.
  • step d) the process which is the subject of the The present invention consists, in a step d), in submitting the speech signal pre-emphasized spp to type filtering low pass and downsampling to generate a sub-sampled filtered speech signal, noted spf.
  • FIG. 1a there is shown a connection in mixed line between step c) and step d), this link in mixed line indicating an operation performed on the signal spp pre-accentuated speech available after step a) of global pre-emphasis.
  • the speech signal in digital form sp consisting in fact of a burst of successive words by example, the pre-emphasized speech signal spp can be memorized after the pre-emphasis stage performed in step a), and that, of course, step d) can be realized from the pre-emphasized speech signal spp previously cited.
  • the filtering low pass type can be achieved with a low pass filter with a cut-off frequency equal to 2 kHz by means of a finite impulse response filter, called RIF filter, at 47 coefficients.
  • the filtered signal from the above filtering can then be subjected to subsampling, the subsampling can be achieved by decimation, for delivering the noted down sampled filtered speech signal spf.
  • step d) is then followed, as shown in FIG. 1a, by a step e) consisting in calculating by spectral compression the maximum fundamental frequency of rank k of a function P (k) representative of the difference between a second set of values X 2 (k) of the logarithm of the energy spectrum of the subsampled filtered speech signal spf, and of the set of values H (k) of the smoothed frequency spectrum obtained from the cepstral coefficients available at the end of step c) previously mentioned in the description.
  • the function P (k) checks the relation:
  • step e) The formatting step carried out in step e) is then followed by an effective step of calculating the second set of values ⁇ X 2 (k) ⁇ of the logarithm of the energy spectrum, this calculation being carried out by application of a Fourier transform on a number M 2 of points for each current frame obtained at the end of the formatting carried out.
  • the step of calculating the second set of values ⁇ X 2 (k) ⁇ is then followed by a step of calculating the smoothed frequency spectrum H (k) from the cepstral coefficients C (m) available from the end of l step c), the connection between step c) and step e) in FIG. 1a being shown in phantom for this reason.
  • the smoothed spectrum H (k) is calculated by applying a cosine transform to the p cepstral coefficients available.
  • the step of calculating the smoothed frequency spectrum is then followed by a step of calculating the function P (k) verifying the relation previously cited in the description.
  • L is equal to M 2 / k for k varying between a first and a second value representative of a low frequency band between 70 and 450 Hz.
  • the fundamental frequency extraction process of a speech signal allows, by spectral compression, by the calculation of the product harmonic of the difference between the energy spectrum of the speech signal and the spectrum of the signal smoothed, to eliminate the contribution of the formants and extracting the structure harmonic of the fundamental frequency of the speech signal.
  • steps a) to e) can be executed successively.
  • the pre-emphasized speech signal spp and that, on the other share, cepstral coefficients, especially p cepstral coefficients used, can be stored at the outcome of step c) respectively after step a) to allow the sequential implementation of the steps b) to e) previously mentioned.
  • the process object of the present invention can be implemented in an alternative embodiment as shown in FIG. 1b, in parallel, steps b), c) being carried out sequentially, in parallel with steps d) and e) from of the pre-emphasized speech signal spp.
  • This embodiment as shown in Figure 1b is made possible by due to the fact that steps b) and c) are qualitatively independent of steps d) and e) and can be carried out in parallel on the pre-emphasized speech signal spp.
  • this other embodiment although equivalent to the embodiment described with calculation of the first set of values X 1 (k) then of the second set of values X 2 (k), has the drawback of requiring keeping in memory all the values X (k) during the entire execution time of the calculation process for each of the current frames, which causes a memory congestion harmful to the management of all the resources of calculation.
  • step e) of calculation by spectral compression consists, as mentioned previously in the description, in performing a step e 1 ) comprising the formatting in frames of N 2 samples from the speech signal filtered under- sampled spf and calculation of the second set of values X 2 (k) of the logarithm of the energy spectrum by application of a Fourier transform on a number M 2 of points on a frequency band between 0 and 2 KHz.
  • Sub-step e 1 ) and followed by sub-step e 2 ) consisting in calculating the spectral envelope H (k) or smoothed frequency spectrum of the current frame on the frequency band between 0 and 2 kHz over the same number M 2 of points, by applying to the first p-1 cepstral coefficients of a cosine transform verifying the relation:
  • Sub-step e 3 ) is itself followed by a sub-step e 4 ) consisting in calculating the function P (k) by spectral compression of the difference D (k) on the low frequency band between 70 and 450 Hz.
  • sub-step e 4 is itself followed by a sub-step e 5 ) carrying out the extraction of the maximum of the function P (k) for the value of k representative of the value F 0 , fundamental frequency of the speech signal.
  • Sub-step e 5 can be carried out from a program for sorting the successive values of the function P (k) in the aforementioned low frequency band.
  • the sorting program is a conventional type program for searching for a maximum value among several values.
  • FIG. 2b diagrams have been shown successively in a W-frequency energy space successively relating to the short-term spectrum between 0 and 2 KHz of a frame of a speech signal, the frame having a duration of 32 ms over 2048 points, this diagram possibly corresponding to a frame obtained following the formatting sub-step carried out in sub-step e 1 ) of FIG. 2a, the spectral envelope obtained by cosine transform applied to the first 16 cepstral coefficients, this envelope representing only the contribution of the formants, that is to say the smooth spectrum H (k) obtained at the end of sub-step e 2 ) of FIG.
  • the process which is the subject of the present invention can normally be implemented on a continuous stream or pseudo-continuous of words or syllables constituting a signal of speech.
  • the method the present invention may advantageously consist, moreover, to discriminate, among all the successive frames, the voiced frames and unvoiced frames then to be eliminated each unvoiced frame.
  • unvoiced frames do not are not physically removed from the frame sequence common.
  • These unvoiced frames are discriminated by assigning a fundamental frequency value thereto arbitrary, zero value, as will be described later in the description.
  • the constitution of these signals in successive frames of N respectively N 2 samples can be carried out in a conventional manner by reception and storage of these samples at specific addresses of a random access memory for example, then sequential reading , as shown in FIG. 3a, successive frames, with reading for example of the frame of rank q-1 by simultaneous reading of the N corresponding samples, then reading at the end of the frame duration, ie 32 ms, of the frame of subsequent rank q corresponding to N samples overlapping N / 2 samples with respect to the previous frame of rank q-1, and so on for the frame of rank q + 1 and the following frames.
  • This reading process can advantageously be carried out by simple addressing in reading of the memory containing the samples of the speech signal. As shown in FIG.
  • the process of discrimination between voiced frames and unvoiced frames can consist, starting from the current frame T q , in a step 100, to apply a criterion 101 of discrimination between current frames, voiced or unvoiced.
  • the current frame T q is assigned an arbitrary value of fundamental frequency, zero value for example, in a step 102, whereas on the contrary, on a positive response to criterion 101, the current frame is kept in step 103 for processing according to the calculation process to extract the fundamental frequency from the speech signal.
  • the succession of current frames kept in step 103 is then subject, as a function of the signal considered spp, respectively spf, to the calculation of the first set of values X 1 (k) or X 2 (k) respectively, within the framework of the implementation of step b) or step e), or sub-step e 1 ), of FIGS. 1a, 1b or 2a.
  • this can consist, as shown in connection with FIG. 3c, of subdividing each current frame T q into a number ST of contiguous frame segments successive, then establishing, for each of the frame segments, a voicing discrimination criterion.
  • FIG. 3c four contiguous frame segments, denoted S 1 to S 4 , are shown, each frame segment therefore comprising 64 samples and occupying a duration of 8 ms.
  • the voicing discrimination criterion may consist in assigning to each frame segment considered a voicing index whose value is between 0 and 1.
  • Each voicing index is denoted Vs (1) to Vs (4) and is representative of the low frequency energy level of the frame segment S 1 to S 4 considered, according to a substantially linear law.
  • each current frame T q is classified as an unvoiced frame by comparison of a linear combination of the voicing indices of each segment with a determined threshold value.
  • the aforementioned linear combination of the voicing indices can consist in calculating the arithmetic mean of these indices and in comparing this arithmetic mean with the aforementioned threshold value ⁇ , the criterion for comparing the combination linear writing:
  • each voicing index can be assigned based low frequency energy of each segment according to the abacus shown in the above figure.
  • the index value of affected voicing is linear between the values 0 and 1 for low frequency energy values of each segment between -35 and -15 dB. These values may well heard to be changed.
  • errors can occur in the estimation the value of the fundamental frequency of the signal speech, these errors may be due to the presence in a same frame of voiced segments and unvoiced segments or of silences. These types of errors are referred to as errors of transition. Such errors can also occur in voiced or mixed low energy frames. In certain conditions, it is then possible to correct these errors when, when correction is not possible, the value of the fundamental frequency of the speech signal is taken arbitrarily equal to a fictitious value, the zero value, by convention for example, similarly the value assigned to unvoiced frames or frames of silence.
  • the process which is the subject of the present invention can then consist, in addition, in carrying out a post-processing of the value extracted from fundamental frequency of the signal speech.
  • This post-processing step can consist of example to establish a histogram of fundamental frequencies, to determine the range of frequency values most likely as well as lower bounds and higher of these values.
  • the process post-processing can be to submit each value extracted from fundamental frequency to a sorting criterion by relation to the lower and upper value limits, for get sorted values representative of the evolution values extracted from fundamental frequency.
  • the isolated zero values are then recalculated by linear interpolation, while the non-zero values isolated in the middle of a sequence of zeros are assigned to the value O by convention. Finally, statistical parameters such as the maximum and minimum F 0 values as well as the average value can be calculated.
  • the device shown in the above figure allows the implementation of the process, object of the present invention, previously described in the description.
  • This device presents an architecture adapted to the implementation work of this process.
  • a circuit 1 for sampling and conversion analog-to-digital analog speech signal entry into a series of digital samples.
  • a host computer 2 is provided to allow driving of the succession of steps a) to e) of the process which is the subject of the present invention, as well as management and control peripheral organs such as in particular the circuit sampling 1 and analog-digital conversion, as will be described later in the description.
  • the acquisition of the constituent samples of the speech signal sp is carried out by
  • the dedicated digital signal processor 3 can be made up of a MOTOROLA signal processor, referenced DSP56001, clocked at the clock frequency of 33 MHz.
  • the host microcomputer 2 can advantageously be consisting of a PC-PENTIUM microcomputer, clocked at a clock frequency of 90 MHz and equipped with a operating system such as an operating system MS-WINDOWS multitasking.
  • the digital signal processor dedicated 3 is a 24 bit fixed point processor, this type processor to perform calculations previously cited, for the implementation of steps a) to e) of process object of the present invention optimally.
  • This signal processor 3 is in fact constituted by a central processing unit 30, denoted DSP-CPU, to which associated with a program memory space denoted P, referenced 31, and two data memory spaces, noted X and Y, with a capacity of 512 words each and referenced 32.
  • the memory spaces P, X and Y are each accessible by three independent 24-bit BUSes, the addressing being done by three 16-bit BUSs for addressing separately each memory space which can therefore be extended at 64 k words.
  • the programs and calculation subroutines are executed within 512 words of the internal memory P, these programs or subroutines being previously loaded in the 8 k-words from memory P external.
  • a program or a subroutine can be transferred from the external memory to the internal memory to be executed.
  • the data to be processed, data relating to the signal speech, as well as the calculation tables necessary for the calculation cepstral coefficients for example and the results intermediaries are stored in spaces X and Y 32 extended to 2 x 64 k-words.
  • the host microcomputer 2 has programs and subprograms to ensure dialogue with the dedicated digital signal processor 3 for performing loading code and data, reading data, code transfer, execution of one or more programs as well as the initialization of the analog-digital conversion module 1 to ensure acquisition and reproduction of the speech signal.
  • the assembly constituted by the conversion circuit analog-digital 1 and digital signal processor dedicated 3 is installed on an additional card, such than a card marketed by the DIGIMETRIE Company, under the PC-DSP56k / AD / MEM reference.
  • This card in addition to the digital signal processor DSP56001, has a analog-to-digital / digital-to-analog converter marketed by TEXAS INSTRUMENTS, under the reference TCL32040CN to ensure the acquisition of speech signals, this converter bearing the reference 10 in figure 4.
  • the calculation time of the fundamental frequency for 100 speech frames of duration 32 ms, is approximately 2.7 seconds, or 27 ms per frame of 32 ms.
  • the host microcomputer can be configured from the MS-Windows operating system to operate in multitasking mode, which allows driving parallel operations in the aforementioned multitasking mode.
  • a such a procedure is not essential but it allows optimize the use of computing resources.
  • the method and the device, objects of the present invention can advantageously be used so as to produce a speaker authentication system with a high probability of success.
  • the construction of the frequency histogram can be carried out, either generally for a determined number of speakers, or, on the contrary, for a particular speaker for which the frequency histogram is effectively representative of this speaker. It is of course the same as regards the value of the lower and upper limits, as well as, where appropriate, statistical parameters such as the values F 0max and F 0min and mean value of the fundamental frequency of the speech signal of this speaker.
  • the aforementioned frequency histogram, for a determined speaker can then be updated over time as a function of the evolution of the speaker's voice.

Description

L'invention concerne un procédé d'extraction de la fréquence fondamentale d'un signal de parole.The invention relates to a method for extracting fundamental frequency of a speech signal.

Les techniques actuelles de traitement des signaux numériques de parole ont pour objet essentiel d'en extraire les paramètres fondamentaux, en vue d'en améliorer la qualité, par amélioration du rapport signal à bruit, et, le cas échéant, de déterminer l'origine du locuteur, en vue par exemple d'une authentification de ce dernier.Current signal processing techniques digital speech have the essential object of extracting the basic parameters, in order to improve the quality, by improving the signal-to-noise ratio, and where appropriate, to determine the origin of the speaker, with a view to example of authentication of the latter.

Parmi les paramètres fondamentaux précités, la fréquence fondamentale est l'un des paramètres qui caractérisent le mieux la voix d'un locuteur donné et qui permet donc de contribuer à l'authentification certaine de celui-ci.Among the aforementioned fundamental parameters, the fundamental frequency is one of the parameters that characterize the better the voice of a given speaker and that allows therefore to contribute to the certain authentication of it.

De nombreux processus d'extraction de la fréquence fondamentale d'un signal de parole ont été proposés. Pour un panorama général des techniques proposées, on pourra utilement se reporter à l'ouvrage publié par W.HESS intitulé "Pitch détermination of speech signais : algorithms and methods", Springer-Verlag, New-York 1983.Many processes for extracting the fundamental frequency from a speech signal have been proposed. For a general overview of the proposed techniques, one can usefully refer to the work published by W. HESS entitled "Pitch determination of speech signais: algorithms and methods", Springer-Verlag, New-York 1983.

Les techniques ou méthodes précitées peuvent être classées en deux grandes familles.

  • Les méthodes -temporelles telles que celles mettant en oeuvre un processus d'autocorrélation avec écrêtement central et comparaison des pics à une valeur de seuil ou celles désignées par AMDF, ces dernières ayant été décrites par R.BOITE et M.KUNT dans l'ouvrage intitulé "Traitement de la parole", pages 193-195, Presses polytechniques romandes, Lausanne 1987, sont relativement peu coûteuses en temps de calcul car elles ne nécessitent pas la mise en oeuvre d'opérations arithmétiques de multiplication. Toutefois, elles manquent de précision et il est nécessaire, en conséquence, de procéder à un suréchantillonnage du signal de parole, afin d'obtenir une précision convenable, ce qui, bien entendu, entraíne une augmentation notable du temps de calcul effectif.
    Parmi ces méthodes, le document KEIKICHI HIROSE ET AL : "A SCHEME FOR PITCH EXTRACTION OF SPEECH USING AUTOCORRELATION FUNCTION WITH FRAME LENGTH PROPORTIONAL TO THE TIME LAG", ICASSP-92, SPEECH PROCESSING 1, SAN FRANCISCO, MAR 23-26, 1992, Vol.1, IEEE pages 149-152, XP 000341105, décrit un processus d'extraction de la fréquence fondamentale d'un signal de parole par autocorrélation.
  • Les méthodes fréquentielles sont, au contraire, basées sur l'analyse de la structure harmonique du spectre d'énergie en fonction de la fréquence du signal de parole. Parmi celles-ci, la méthode dite du peigne, décrite par P.MARTIN dans l'article intitulé "Extraction de la fréquence fondamentale par intercorrélation avec une fonction peigne", publiée aux Journées d'Etude Parole 12, pp. 221-232, 1981, consiste à calculer la fonction d'intercorrélation entre le spectre du signal numérique de parole et une fonction en peigne, pour différentes valeurs de la distance entre les dents du peigne. Le maximum de la fonction d'intercorrélation est obtenu pour une distance entre deux dents consécutives du peigne, égale à la fréquence fondamentale du signal à analyser. Cette méthode présente une bonne fiabilité mais elle est relativement complexe, dans la mesure où elle nécessite un prélèvement fréquentiel consistant à ne retenir que les maxima du spectre et les valeurs adjacentes. En outre, il est nécessaire d'effectuer une interpolation afin d'augmenter la précision du résultat.
The aforementioned techniques or methods can be classified into two main families.
  • -Temporal methods such as those implementing an autocorrelation process with central clipping and comparison of peaks at a threshold value or those designated by AMDF, the latter having been described by R.BOITE and M.KUNT in the work entitled "Speech processing", pages 193-195, Presses polytechniques romandes, Lausanne 1987, are relatively inexpensive in computation time because they do not require the implementation of arithmetic multiplication operations. However, they lack precision and it is necessary, therefore, to proceed to an oversampling of the speech signal, in order to obtain a suitable precision, which, of course, results in a significant increase in the effective calculation time.
    Among these methods, the document KEIKICHI HIROSE ET AL: "A SCHEME FOR PITCH EXTRACTION OF SPEECH USING AUTOCORRELATION FUNCTION WITH FRAME LENGTH PROPORTIONAL TO THE TIME LAG", ICASSP-92, SPEECH PROCESSING 1, SAN FRANCISCO, MAR 23-26, 1992, Vol.1, IEEE pages 149-152, XP 000341105, describes a process of extracting the fundamental frequency from a speech signal by autocorrelation.
  • Frequency methods are, on the contrary, based on the analysis of the harmonic structure of the energy spectrum as a function of the frequency of the speech signal. Among these, the so-called comb method, described by P.MARTIN in the article entitled "Extraction of the fundamental frequency by intercorrelation with a comb function", published in the Journées d'Etude Parole 12, pp. 221-232, 1981, consists in calculating the intercorrelation function between the spectrum of the digital speech signal and a comb function, for different values of the distance between the teeth of the comb. The maximum of the intercorrelation function is obtained for a distance between two consecutive teeth of the comb, equal to the fundamental frequency of the signal to be analyzed. This method has good reliability but it is relatively complex, insofar as it requires a frequency sampling consisting in retaining only the spectrum maxima and the adjacent values. In addition, it is necessary to perform an interpolation in order to increase the accuracy of the result.

Une autre méthode, désignée par méthode de compression spectrale, a été publiée par NOLL (1970), confer l'ouvrage de W.HESS précédemment cité pages 414-417. Cette méthode, basée sur une analyse de la structure harmonique du spectre d'énergie en fonction de la fréquence du signal de parole, consiste à comprimer le spectre d'énergie du signal de parole le long de l'axe des fréquences, par des facteurs entiers successifs, puis à additionner les spectres comprimés obtenus au spectre initial. Ces opérations permettent, en principe, d'obtenir un maximum significatif, lequel résulte de la contribution cohérente des harmoniques de la fréquence fondamentale après compression. L'extraction de la fréquence fondamentale consiste alors à chercher le maximum du logarithme du produit harmonique défini par :

Figure 00040001

  • L = M/k, M désignant le nombre de points du spectre
  • X(l) désigne le logarithme du spectre d'énergie.
  • L'inconvénient de cette méthode réside dans le fait que l'amplitude des pics harmoniques décroít en fonction de la fréquence, avec une pente de l'ordre de -12 dB/octave. Bien qu'un processus de pré-accentuation permette de relever le niveau des harmoniques de fréquence élevée, certains pics harmoniques présentent un niveau d'énergie plus faible que d'autres en raison de la contribution des formants, ce qui provoque des erreurs fréquentes dans l'estimation de la valeur de la fréquence fondamentale.Another method, designated by spectral compression method, has been published by NOLL (1970), see the work by W. HESS previously cited on pages 414-417. This method, based on an analysis of the harmonic structure of the energy spectrum as a function of the frequency of the speech signal, consists in compressing the energy spectrum of the speech signal along the frequency axis, by factors successive integers, then adding the compressed spectra obtained to the initial spectrum. These operations allow, in principle, to obtain a significant maximum, which results from the coherent contribution of the harmonics of the fundamental frequency after compression. The extraction of the fundamental frequency then consists in seeking the maximum of the logarithm of the harmonic product defined by:
    Figure 00040001
    or
  • L = M / k, M denoting the number of points of the spectrum
  • X (l) denotes the logarithm of the energy spectrum.
  • The disadvantage of this method lies in the fact that the amplitude of the harmonic peaks decreases as a function of the frequency, with a slope of the order of -12 dB / octave. Although a pre-emphasis process raises the level of high frequency harmonics, some harmonic peaks have a lower energy level than others due to the contribution of formants, which causes frequent errors in estimating the value of the fundamental frequency.

    La présente invention a pour objet la mise en oeuvre d'un procédé d'extraction de la fréquence fondamentale d'un signal de parole dans lequel l'extraction de la fréquence fondamentale est obtenue avec une fiabilité accrue.The subject of the present invention is the implementation of a process for extracting the fundamental frequency of a speech signal in which frequency extraction fundamental is obtained with increased reliability.

    Un autre objet de la présente invention est la mise en oeuvre d'un procédé d'extraction de la fréquence fondamentale d'un signal de parole dans lequel le processus d'extraction proprement dit de la fréquence fondamentale peut être conditionnel à la détection du voisement ou de l'absence de voisement des sons constitutifs du signal de parole.Another object of the present invention is the implementation using a method for extracting the fundamental frequency of a speech signal in which the process actual extraction of the fundamental frequency may be conditional upon detection of voicing or the absence of voicing of the sounds constituting the signal of speech.

    Un autre objet de la présente invention est enfin la mise en oeuvre d'un procédé d'extraction de la fréquence fondamentale d'un signal de parole dans lequel la valeur de fréquence fondamentale extraite est en outre soumise à un processus de post-traitement, du type par apprentissage, afin d'éliminer toute valeur improbable ou aberrante.Another object of the present invention is finally the implementation of a frequency extraction process fundamental of a speech signal in which the value of extracted fundamental frequency is further subjected to a post-processing process, of the learning type, in order to eliminate any improbable or outliers.

    Le procédé d'extraction de la fréquence fondamentale d'un signal de parole, succession d'échantillons numériques, objet de la présente invention, est remarquable en ce qu'il comprend au moins les étapes consistant à soumettre ce signal de parole à un processus de préaccentuation, pour engendrer un signal de parole préaccentué, calculer, à partir du signal de parole préaccentué, pour chaque trame courante d'une succession de trames correspondant chacune en durée à un nombre déterminé N d'échantillons, deux trames consécutives présentant chacune un recouvrement de durée en nombre d'échantillons consécutifs communs au plus égal à 50/100 du nombre N d'échantillons, un premier ensemble de valeurs X1(k) du logarithme du spectre d'énergie par transformée de Fourier sur un nombre M1 de points, calculer, à partir de ce premier ensemble de valeurs, un nombre p déterminé de premiers coefficients cepstraux C(m), par application d'une transformée en cosinus discrète auxdites valeurs X1(k) sur un nombre de ces valeurs au moins égal à la moitié du nombre N d'échantillons constitutifs de la trame courante, cette transformée vérifiant la relation :

    Figure 00050001
    avec m = [1,2,...,p], soumettre le signal de parole préaccentué à un filtrage de type passe-bas et à un sous-échantillonnage, pour engendrer un signal de parole filtré sous-échantillonné, calculer, par compression spectrale, à partir du signal de parole filtré sous-échantillonné et à partir des coefficients cepstraux pour chaque trame courante d'une succession de trames de même recouvrement de durée, la fréquence fondamentale, maximum de rang k, d'une fonction P(k) représentative de la différence entre un deuxième ensemble des valeurs X2(k) du logarithme du spectre d'énergie et l'ensemble des valeurs H(k) du spectre de fréquences lissé, ladite fonction vérifiant la relation :
    Figure 00050002
    avec L = M2/k, k variant entre une première et une deuxième valeur représentatives d'une bande de fréquences basses comprises entre 70 et 450 Hz, ladite fonction P(k) présentant un maximum pour k=F0, valeur extraite de la fréquence fondamentale du signal de parole.The method of extracting the fundamental frequency from a speech signal, a succession of digital samples, which is the subject of the present invention, is remarkable in that it comprises at least the steps consisting in subjecting this speech signal to a process. pre-emphasis, to generate a pre-emphasized speech signal, calculating, from the pre-emphasized speech signal, for each current frame of a succession of frames each corresponding in duration to a determined number N of samples, two consecutive frames each having a duration recovery in number of consecutive consecutive samples at most equal to 50/100 of the number N of samples, a first set of values X 1 (k) of the logarithm of the energy spectrum by Fourier transform on a number M 1 of points, calculate, from this first set of values, a determined number p of first cepstral coefficients C (m), by applying a transform in disc cosine repeat at said values X 1 (k) over a number of these values at least equal to half the number N of samples constituting the current frame, this transform verifying the relation:
    Figure 00050001
    with m = [1,2, ..., p], subject the pre-emphasized speech signal to low-pass type filtering and to sub-sampling, to generate a filtered sub-sampled speech signal, calculate, by spectral compression, from the filtered sub-sampled speech signal and from cepstral coefficients for each current frame of a succession of frames of the same duration overlap, the fundamental frequency, maximum of rank k, of a function P ( k) representative of the difference between a second set of values X 2 (k) of the logarithm of the energy spectrum and the set of values H (k) of the smoothed frequency spectrum, said function verifying the relationship:
    Figure 00050002
    with L = M 2 / k, k varying between a first and a second value representative of a low frequency band between 70 and 450 Hz, said function P (k) having a maximum for k = F0, value extracted from the fundamental frequency of the speech signal.

    Le procédé objet de la présente invention trouve en particulier application à la reconnaissance vocale et à l'identification de locuteurs à partir de signatures sonores.The process which is the subject of the present invention finds in particular application to speech recognition and identification of speakers from signatures sound.

    Il sera mieux compris à la lecture de la description et à l'observation des dessins ci-après dans lesquels :

    • la figure la représente un organigramme illustratif de l'ensemble des étapes permettant la mise en oeuvre du procédé objet de la présente invention ;
    • la figure 1b représente un organigramme illustratif d'une variante de mise en oeuvre avantageuse du procédé objet de la présente invention, dans laquelle certaines étapes sont conduites en parallèle ou, le cas échéant, sous système d'exploitation multitâche afin de permettre un mode opératoire en temps réel, sans toutefois nécessiter une puissance de calcul très importante ;
    • la figure 2a représente un détail de réalisation d'une succession d'étapes élémentaires permettant une mise en oeuvre optimale de l'étape terminale de calcul par compression spectrale de la fréquence fondamentale du signal de parole du procédé, objet de la présente invention, illustré conformément à la figure 1a ou 1b ;
    • la figure 2b représente une série de signaux obtenus dans le domaine fréquentiel suite à la mise en oeuvre des étapes élémentaires illustrées en figure 2a ;
    • les figures 3a, 3b, 3c et 3d représentent un mode opératoire de formatage de trames d'échantillons, constitutifs du signal de parole, un processus de discrimination des trames courantes en fonction d'un critère relatif au caractère voisé ou non voisé de chaque trame courante, un mode d'établissement de ce critère et un abaque d'attribution d'un indice de voisement de segments temporels constitutifs de chaque trame respectivement ;
    • la figure 4 représente un schéma synoptique de l'architecture d'un dispositif permettant la mise en oeuvre du procédé, objet de la présente invention, à partir d'un micro-ordinateur hôte et d'un processeur de signal numérique spécialisé ou dédié connectés par une liaison de type BUS.
    It will be better understood on reading the description and on observing the drawings below in which:
    • FIG. 1a represents an illustrative flowchart of all the steps allowing the implementation of the method which is the subject of the present invention;
    • FIG. 1b represents an illustrative flowchart of an advantageous variant implementation of the method which is the subject of the present invention, in which certain steps are carried out in parallel or, where appropriate, under a multitasking operating system in order to allow an operating mode in real time, without however requiring very large computing power;
    • FIG. 2a represents a detail of the realization of a succession of elementary steps allowing an optimal implementation of the terminal calculation step by spectral compression of the fundamental frequency of the speech signal of the process, object of the present invention, illustrated according to Figure 1a or 1b;
    • FIG. 2b represents a series of signals obtained in the frequency domain following the implementation of the elementary steps illustrated in FIG. 2a;
    • FIGS. 3a, 3b, 3c and 3d represent an operating mode for formatting sample frames constituting the speech signal, a process for discriminating current frames according to a criterion relating to the voiced or unvoiced character of each frame current, a mode of establishment of this criterion and an abacus of allocation of a voicing index of time segments constituting each frame respectively;
    • FIG. 4 represents a synoptic diagram of the architecture of a device allowing the implementation of the method, object of the present invention, from a host microcomputer and a specialized or dedicated digital signal processor connected by a BUS type connection.

    Une description plus détaillée du procédé d'extraction de la fréquence fondamentale d'un signal de parole, objet de la présente invention, sera maintenant donnée en liaison avec les figures 1a et 1b.A more detailed description of the extraction process the fundamental frequency of a speech signal, object of the present invention, will now be given in connection with Figures 1a and 1b.

    Ainsi qu'on l'observera sur la figure la, le signal de parole sur lequel on souhaite procéder à l'extraction de la fréquence fondamentale, conformément au procédé objet de la présente invention, est par exemple un signal analogique représentatif de mots et de syllabes distincts, ce signal analogique étant transformé en une succession d'échantillons numériques, le signal de parole, dans sa forme numérique, étant désigné par sp sur la figure 1a.As will be seen in Figure la, the signal of speech on which we want to extract the fundamental frequency, in accordance with the process object of the present invention is for example an analog signal representative of distinct words and syllables, this signal analog being transformed into a succession of samples digital, the speech signal, in its digital form, being designated by sp in Figure 1a.

    Ainsi qu'il apparaít en outre sur la figure précitée, le signal de parole sp est alors soumis à un processus de préaccentuation permettant d'engendrer un signal de parole préaccentué, noté spp. Le processus de préaccentuation est un processus de type classique, lequel, à ce titre, ne sera pas décrit de manière détaillée. Ce processus consiste en une préaccentuation globale, laquelle consiste en fait à appliquer une valeur de gain croissante avec la fréquence pour compenser l'atténuation des harmoniques de rang supérieur. A titre d'exemple non limitatif, on indique que le processus de préaccentuation globale peut consister à appliquer au signal de parole sp une fonction de transfert du type : G(z) = 1 - z-1. As it also appears in the above figure, the speech signal sp is then subjected to a pre-emphasis process making it possible to generate a pre-accented speech signal, noted spp. The pre-emphasis process is a conventional type process, which, as such, will not be described in detail. This process consists of an overall pre-emphasis, which in fact consists in applying an increasing gain value with frequency to compensate for the attenuation of the harmonics of higher rank. By way of nonlimiting example, it is indicated that the global pre-emphasis process can consist in applying to the speech signal sp a transfer function of the type: G (z) = 1 - z -1 .

    Dans la relation précitée, on indique que z = e où ω = 2πf, f désignant la fréquence instantanée du signal de parole.In the above-mentioned relation, it is indicated that z = e where ω = 2πf, f denoting the instantaneous frequency of the speech signal.

    Le procédé objet de la présente invention, ainsi que représenté en figure 1a, consiste ensuite, en une étape b), à effectuer un formatage du signal de parole préaccentué spp. Cette opération de formatage consiste en fait à constituer le signal de parole préaccentué spp en trames successives comportant chacune N échantillons et correspondant à une durée de ces N échantillons, deux trames consécutives présentant chacune un recouvrement de durée en nombre d'échantillons consécutifs communs au plus égal à 50/100 du nombre N d'échantillons constitutifs de chaque trame.The process which is the subject of the present invention, as well as represented in FIG. 1a, then consists, in a step b), formatting the pre-emphasized speech signal spp. This formatting operation actually consists of constitute the pre-accentuated speech signal spp in frames successive each with N samples and corresponding at a duration of these N samples, two consecutive frames each having a number of duration overlaps consecutive consecutive samples at most equal to 50/100 of number N of constituent samples of each frame.

    L'étape b) précitée consiste également à calculer, sur chaque trame courante désignée par Tq, un premier ensemble de valeurs, noté X1(k) du logarithme du spectre d'énergie pour la trame considérée par application d'une transformée de Fourier sur un nombre M1 de points.The aforementioned step b) also consists in calculating, on each current frame designated by T q , a first set of values, denoted X 1 (k) of the logarithm of the energy spectrum for the frame considered by application of a transform of Fourier on a number M 1 of points.

    D'une manière pratique, on indique que le nombre M1 de points sur lequel la transformée de Fourier est appliquée est choisi de façon que le théorème de Shannon soit satisfait. A titre d'exemple non limitatif, on indique que pour des trames constituées par 256 échantillons successifs et pour une durée de chaque trame courante égale à 32 ms, le nombre M1 de points peut être pris égal à 128.In a practical way, it is indicated that the number M 1 of points on which the Fourier transform is applied is chosen so that the Shannon theorem is satisfied. By way of nonlimiting example, it is indicated that for frames constituted by 256 successive samples and for a duration of each current frame equal to 32 ms, the number M 1 of points can be taken equal to 128.

    L'étape b) précitée, représentée en figure 1a, permet alors de disposer du premier ensemble de valeurs, noté {X1(k)}.The aforementioned step b), represented in FIG. 1a, then makes it possible to have the first set of values, noted {X 1 (k)}.

    Ainsi que représenté sur la figure la précitée, le procédé objet de la présente invention consiste ensuite à effectuer en une étape c) le calcul, à partir du premier ensemble de valeurs {X1(k)}, un nombre p déterminé de premiers coefficients cepstraux notés C(m) du logarithme du spectre d'énergie défini par le premier ensemble de valeurs {X1(k)}.As shown in the above-mentioned figure, the method which is the subject of the present invention then consists in carrying out in a step c) the calculation, from the first set of values {X 1 (k)}, a determined number p of first coefficients cepstrals denoted C (m) of the logarithm of the energy spectrum defined by the first set of values {X 1 (k)}.

    Les coefficients cepstraux précités vérifient la relation :

    Figure 00080001
    Dans cette relation, on indique que m est un entier prenant les valeurs = [1,2,...,p], p désignant le nombre de premiers coefficients cepstraux calculé et retenu pour la mise en oeuvre du procédé objet de la présente invention. A titre d'exemple non limitatif, on indique que p peut être limité à 16.The above cepstral coefficients verify the relation:
    Figure 00080001
    In this relation, it is indicated that m is an integer taking the values = [1,2, ..., p], p designating the number of first cepstral coefficients calculated and retained for the implementation of the method object of the present invention . By way of nonlimiting example, it is indicated that p can be limited to 16.

    A la fin de l'étape c), on dispose ainsi des coefficients cepstraux précités, lesquels vont permettre la mise en oeuvre des étapes suivantes du procédé objet de l'invention, tel que représenté en figure 1a.At the end of step c), we thus have the above-mentioned cepstral coefficients, which will allow the implementation of the following stages of the process which is the subject of the invention, as shown in Figure 1a.

    Suite à l'étape c) précitée, le procédé objet de la présente invention consiste, en une étape d), à soumettre le signal de parole préaccentué spp à un filtrage de type passe-bas et à un sous-échantillonnage pour engendrer un signal de parole filtré sous-échantillonné, noté spf.Following step c) above, the process which is the subject of the The present invention consists, in a step d), in submitting the speech signal pre-emphasized spp to type filtering low pass and downsampling to generate a sub-sampled filtered speech signal, noted spf.

    Sur la figure 1a, on a représenté une liaison en trait mixte entre l'étape c) et l'étape d), cette liaison en trait mixte indiquant une opération réalisée sur le signal de parole préaccentué spp disponible postérieurement à l'étape a) de préaccentuation globale. On comprend en particulier que le signal de parole sous forme numérique sp, consistant en fait en une salve de mots successifs par exemple, le signal de parole préaccentué spp peut être mémorisé postérieurement à l'étape de préaccentuation réalisée à l'étape a), et que, bien entendu, l'étape d) peut être réalisée à partir du signal de parole préaccentué spp précédemment cité.In Figure 1a, there is shown a connection in mixed line between step c) and step d), this link in mixed line indicating an operation performed on the signal spp pre-accentuated speech available after step a) of global pre-emphasis. We understand in particular that the speech signal in digital form sp, consisting in fact of a burst of successive words by example, the pre-emphasized speech signal spp can be memorized after the pre-emphasis stage performed in step a), and that, of course, step d) can be realized from the pre-emphasized speech signal spp previously cited.

    D'une manière générale, on indique que le filtrage de type passe-bas peut être réalisé grâce à un filtre passe-bas de fréquence de coupure égale à 2 kHz au moyen d'un filtre à réponse impulsionnelle finie, dit filtre RIF, à 47 coefficients. Le signal filtré issu du filtrage précité peut alors être soumis à un sous-échantillonnage, le sous-échantillonnage pouvant être réalisé par décimation, pour délivrer le signal de parole filtré sous-échantillonné noté spf.In general, it is indicated that the filtering low pass type can be achieved with a low pass filter with a cut-off frequency equal to 2 kHz by means of a finite impulse response filter, called RIF filter, at 47 coefficients. The filtered signal from the above filtering can then be subjected to subsampling, the subsampling can be achieved by decimation, for delivering the noted down sampled filtered speech signal spf.

    L'étape d) précitée est alors suivie, ainsi que représenté en figure la, d'une étape e) consistant à calculer par compression spectrale la fréquence fondamentale maximum de rang k d'une fonction P(k) représentative de la différence entre un deuxième ensemble de valeurs X2(k) du logarithme du spectre d'énergie du signal de parole filtré sous-échantillonné spf, et de l'ensemble des valeurs H(k) du spectre de fréquences lissé obtenu à partir des coefficients cepstraux disponibles à la fin de l'étape c) précédemment mentionnée dans la description.
    La fonction P(k) vérifie la relation :

    Figure 00100001
    The above-mentioned step d) is then followed, as shown in FIG. 1a, by a step e) consisting in calculating by spectral compression the maximum fundamental frequency of rank k of a function P (k) representative of the difference between a second set of values X 2 (k) of the logarithm of the energy spectrum of the subsampled filtered speech signal spf, and of the set of values H (k) of the smoothed frequency spectrum obtained from the cepstral coefficients available at the end of step c) previously mentioned in the description.
    The function P (k) checks the relation:
    Figure 00100001

    D'une manière générale, l'étape e) représentée en figure la consiste également en une étape de formatage en trames de N2 échantillons, avec N2 = N/2, deux trames consécutives étant en recouvrement de N2/2 échantillons du signal de parole filtré sous-échantillonné spf, le formatage étant bien entendu semblable au formatage appliqué au début de l'étape b) sur le signal de parole préaccentué spp.In general, step e) shown in Figure la also comprises a frame in formatting step of N 2 samples, with N 2 = N / 2, two consecutive frames being in overlying N 2/2 samples of the filtered sub-sampled speech signal spf, the formatting being of course similar to the formatting applied at the start of step b) to the pre-emphasized speech signal spp.

    L'étape de formatage réalisée à l'étape e) est alors suivie d'une étape effective de calcul du deuxième ensemble des valeurs {X2(k)} du logarithme du spectre d'énergie, ce calcul étant effectué par application d'une transformée de Fourier sur un nombre M2 de points pour chaque trame courante obtenue à l'issue du formatage réalisé. Le deuxième ensemble de valeurs {X2(k)} est avantageusement calculé par l'intermédiaire d'une transformée de Fourier rapide FFT appliquée sur M2 = 2048 points en utilisant la méthode de remplissage par des zéros.The formatting step carried out in step e) is then followed by an effective step of calculating the second set of values {X 2 (k)} of the logarithm of the energy spectrum, this calculation being carried out by application of a Fourier transform on a number M 2 of points for each current frame obtained at the end of the formatting carried out. The second set of values {X 2 (k)} is advantageously calculated via a fast Fourier transform FFT applied to M 2 = 2048 points using the method of filling with zeros.

    L'étape de calcul du deuxième ensemble de valeurs {X2(k)} est alors suivie d'une étape de calcul du spectre de fréquences lissé H(k) à partir des coefficients cepstraux C(m) disponibles dès la fin de l'étape c), la liaison entre l'étape c) et l'étape e) sur la figure la étant représentée en trait mixte pour cette raison. Le spectre lissé H(k) est calculé par l'application d'une transformée en cosinus sur les p coefficients cepstraux disponibles.The step of calculating the second set of values {X 2 (k)} is then followed by a step of calculating the smoothed frequency spectrum H (k) from the cepstral coefficients C (m) available from the end of l step c), the connection between step c) and step e) in FIG. 1a being shown in phantom for this reason. The smoothed spectrum H (k) is calculated by applying a cosine transform to the p cepstral coefficients available.

    L'étape de calcul du spectre de fréquences lissé est alors suivie d'une étape de calcul de la fonction P(k) vérifiant la relation précédemment citée dans la description. Dans cette relation, on indique que L est égal à M2/k pour k variant entre une première et une deuxième valeur représentatives d'une bande de fréquences basses comprises entre 70 et 450 Hz. La fonction P(k) présente alors un maximum pour p = F0, valeur extraite de la fréquence fondamentale du signal de parole.The step of calculating the smoothed frequency spectrum is then followed by a step of calculating the function P (k) verifying the relation previously cited in the description. In this relation, we indicate that L is equal to M 2 / k for k varying between a first and a second value representative of a low frequency band between 70 and 450 Hz. The function P (k) then has a maximum for p = F 0 , value extracted from the fundamental frequency of the speech signal.

    Le procédé d'extraction de la fréquence fondamentale d'un signal de parole, objet de la présente invention, permet, par compression spectrale, par le calcul du produit harmonique de la différence entre le spectre d'énergie du signal de parole et le spectre du signal lissé, d'éliminer la contribution des formants et d'extraire la structure harmonique de la fréquence fondamentale du signal de parole.The fundamental frequency extraction process of a speech signal, object of the present invention, allows, by spectral compression, by the calculation of the product harmonic of the difference between the energy spectrum of the speech signal and the spectrum of the signal smoothed, to eliminate the contribution of the formants and extracting the structure harmonic of the fundamental frequency of the speech signal.

    Dans le mode de réalisation de la figure 1a, on a représenté, à titre d'exemple non limitatif, une réalisation de type séquentiel, les étapes a) à e) pouvant être exécutées successivement. On comprend en particulier que, d'une part, le signal de parole préaccentué spp, et que, d'autre part, les coefficients cepstraux, en particulier les p coefficients cepstraux utilisés, peuvent être mémorisés à l'issue de l'étape c) respectivement postérieurement à l'étape a) pour permettre la mise en oeuvre séquentielle des étapes b) à e) précédemment mentionnées.In the embodiment of FIG. 1a, we have shown, by way of nonlimiting example, an embodiment sequential type, steps a) to e) can be executed successively. We understand in particular that, from a on the other hand, the pre-emphasized speech signal spp, and that, on the other share, cepstral coefficients, especially p cepstral coefficients used, can be stored at the outcome of step c) respectively after step a) to allow the sequential implementation of the steps b) to e) previously mentioned.

    Toutefois, et afin de ne pas surcharger inutilement le processeur de calcul utilisé pour la mise en oeuvre des étapes a) à e) précitées, mais afin toutefois de faciliter l'exécution des étapes précitées en temps réel, le procédé objet de la présente invention peut être mis en oeuvre, dans une variante d'exécution telle que représentée en figure 1b, en parallèle, les étapes, b), c) étant réalisées séquentiellement, en parallèle avec les étapes d) et e) à partir du signal de parole préaccentué spp. Ce mode de réalisation tel que représenté en figure 1b, est rendu possible en raison du fait que les étapes b) et c) sont qualitativement indépendantes des étapes d) et e) et peuvent être réalisées en parallèle sur le signal de parole préaccentué spp.However, and in order not to overload unnecessarily the calculation processor used for the implementation of steps a) to e) above, but in order to facilitate the execution of the above steps in real time, the process object of the present invention can be implemented in an alternative embodiment as shown in FIG. 1b, in parallel, steps b), c) being carried out sequentially, in parallel with steps d) and e) from of the pre-emphasized speech signal spp. This embodiment as shown in Figure 1b, is made possible by due to the fact that steps b) and c) are qualitatively independent of steps d) and e) and can be carried out in parallel on the pre-emphasized speech signal spp.

    En ce qui concerne les sous-étapes de formatage réalisées aux étapes b) et e) sur le signal de parole préaccentué spp, respectivement sur le signal de parole filtré sous-échantillonné spf, on indique que ces étapes de formatage peuvent être réalisées par un adressage approprié sur le signal de parole préaccentué spp, respectivement le signal de parole filtré sous-échantillonné spf. Bien entendu, la réalisation de la sous-étape de calcul du spectre de fréquences lissé H(k) de l'étape e) est conditionnelle à la disponibilité des p coefficients cepstraux C(m) en fin de l'étape c).Regarding the formatting substeps performed in steps b) and e) on the speech signal pre-emphasized spp, respectively on the speech signal filtered subsampled spf, we indicate that these stages of formatting can be performed by appropriate addressing on the pre-emphasized speech signal spp, respectively the spf subsampled filtered speech signal. Well understood, the realization of the substep of calculating the smoothed frequency spectrum H (k) of step e) is conditional the availability of p cepstral coefficients C (m) at the end of step c).

    La mise en oeuvre de la variante de réalisation du procédé objet de la présente invention telle que représentée en figure 1b ne préjuge aucunement de la structure mono ou multiprocesseur du dispositif permettant la mise en oeuvre du procédé objet de la présente invention, une structure monoprocesseur avec système d'exploitation multitâche pouvant bien entendu être envisagée, ainsi qu'il sera décrit ultérieurement dans la description.The implementation of the alternative embodiment of the process object of the present invention as shown in Figure 1b does not prejudge the mono structure or multiprocessor of the device allowing the implementation of the process which is the subject of the present invention, a structure single processor with multitasking operating system which can of course be envisaged, as will be described later in the description.

    En outre, on indique que, dans une autre variante de réalisation, le procédé, objet de la présente invention, peut consister à calculer un seul ensemble de valeurs, noté X(k), du spectre d'énergie du signal de parole à l'étape c) sur un nombre M de points égal par exemple à 2048, c'est-à-dire à la valeur M = M2 la plus grande précédemment décrite dans la description, et à mémoriser cet ensemble de valeurs. Le nombre M1 = 128 de valeurs utilisées pour le calcul des coefficients cepstraux à l'étape c) peut alors être obtenu par décimation à partir de l'ensemble de valeurs X(k). Toutefois, on indique que cet autre mode de réalisation, bien qu'équivalent au mode de réalisation décrit avec calcul du premier ensemble de valeurs X1(k) puis du deuxième ensemble de valeurs X2(k), présente l'inconvénient de nécessiter le maintien en mémoire de l'ensemble des valeurs X(k) pendant la totalité du temps d'exécution du processus de calcul pour chacune des trames courantes, ce qui provoque un encombrement de mémoire néfaste à la gestion de l'ensemble des ressources de calcul.Furthermore, it is indicated that, in another alternative embodiment, the method which is the subject of the present invention may consist in calculating a single set of values, denoted X (k), of the energy spectrum of the speech signal at 1 'step c) on a number M of points equal for example to 2048, that is to say the value M = M 2 the largest previously described in the description, and to store this set of values. The number M 1 = 128 of values used for the calculation of the cepstral coefficients in step c) can then be obtained by decimation from the set of values X (k). However, it is indicated that this other embodiment, although equivalent to the embodiment described with calculation of the first set of values X 1 (k) then of the second set of values X 2 (k), has the drawback of requiring keeping in memory all the values X (k) during the entire execution time of the calculation process for each of the current frames, which causes a memory congestion harmful to the management of all the resources of calculation.

    Une description plus détaillée du processus de mise en oeuvre de l'étape e) du procédé, objet de la présente invention, telle que représentée en figures 1a et 1b, sera maintenant donnée en liaison avec la figure 2a.A more detailed description of the betting process of step e) of the process, object of the present invention, as shown in Figures 1a and 1b, will now given in connection with Figure 2a.

    Selon la figure précitée, l'étape e) de calcul par compression spectrale consiste, ainsi que mentionné précédemment dans la description, à réaliser une étape e1) comprenant le formatage en trames de N2 échantillons à partir du signal de parole filtré sous-échantillonné spf et de calcul du deuxième ensemble de valeurs X2(k) du logarithme du spectre d'énergie par application d'une transformée de Fourier sur un nombre M2 de points sur une bande de fréquences comprises entre 0 et 2 KHz.According to the above-mentioned figure, step e) of calculation by spectral compression consists, as mentioned previously in the description, in performing a step e 1 ) comprising the formatting in frames of N 2 samples from the speech signal filtered under- sampled spf and calculation of the second set of values X 2 (k) of the logarithm of the energy spectrum by application of a Fourier transform on a number M 2 of points on a frequency band between 0 and 2 KHz.

    La sous-étape e1) et suivie d'une sous-étape e2) consistant à calculer l'enveloppe spectrale H(k) ou spectre de fréquences lissé de la trame courante sur la bande de fréquences comprises entre 0 et 2 kHz sur un même nombre M2 de points, par application sur les p-1 premiers coefficients cepstraux d'une transformée en cosinus vérifiant la relation :

    Figure 00130001
    Dans cette relation, k prend les valeurs [0,1,2,...M2] et M2 est égal à Q/4 avec Q = 8192.Sub-step e 1 ) and followed by sub-step e 2 ) consisting in calculating the spectral envelope H (k) or smoothed frequency spectrum of the current frame on the frequency band between 0 and 2 kHz over the same number M 2 of points, by applying to the first p-1 cepstral coefficients of a cosine transform verifying the relation:
    Figure 00130001
    In this relation, k takes the values [0,1,2, ... M 2 ] and M 2 is equal to Q / 4 with Q = 8192.

    La sous-étape e2) est suivie d'une sous-étape e3) consistant à calculer la différence, notée D(k) = X2(k) - H(k).Sub-step e 2 ) is followed by sub-step e 3 ) consisting in calculating the difference, denoted D (k) = X 2 (k) - H (k).

    La sous-étape e3) est elle-même suivie d'une sous-étape e4) consistant à calculer la fonction P(k) par compression spectrale de la différence D(k) sur la bande de fréquences basses comprises entre 70 et 450 Hz. La fonction P(k) n'est autre que le produit harmonique de la différence D(k). Ce calcul est effectué pour L = M2/k, k variant pour des valeurs représentatives de 70 à 450 Hz, c'est-à-dire dans la bande de fréquences basses précédemment citée.Sub-step e 3 ) is itself followed by a sub-step e 4 ) consisting in calculating the function P (k) by spectral compression of the difference D (k) on the low frequency band between 70 and 450 Hz. The function P (k) is none other than the harmonic product of the difference D (k). This calculation is carried out for L = M 2 / k, k varying for representative values of 70 to 450 Hz, that is to say in the low frequency band mentioned above.

    Enfin, la sous-étape e4) est elle-même suivie d'une sous-étape e5) réalisant l'extraction du maximum de la fonction P(k) pour la valeur de k représentative de la valeur F0, fréquence fondamentale du signal de parole.Finally, sub-step e 4 ) is itself followed by a sub-step e 5 ) carrying out the extraction of the maximum of the function P (k) for the value of k representative of the value F 0 , fundamental frequency of the speech signal.

    La sous-étape e5) peut être réalisée à partir d'un programme de tri des valeurs successives de la fonction P(k) dans la bande de fréquences basses précitée. Le programme de tri est un programme de type classique de recherche de valeur maximum parmi plusieurs valeurs.Sub-step e 5 ) can be carried out from a program for sorting the successive values of the function P (k) in the aforementioned low frequency band. The sorting program is a conventional type program for searching for a maximum value among several values.

    Sur la figure 2b, on a représenté successivement des diagrammes dans un espace énergie W-fréquence relatifs successivement au spectre à court terme entre 0 et 2 KHz d'une trame d'un signal de parole, la trame ayant une durée de 32 ms sur 2048 points, ce diagramme pouvant correspondre à une trame obtenue suite à la sous-étape de formatage réalisée en la sous-étape e1) de la figure 2a, l'enveloppe spectrale obtenue par transformée en cosinus appliquée sur les 16 premiers coefficients cepstraux, cette enveloppe représentant uniquement la contribution des formants, c'est-à-dire le spectre lissé H(k) obtenu à l'issue de la sous-étape e2) de la figure 2a par exemple, la différence D(k) entre les deux spectres précédents, différence dans laquelle il ne subsiste que la structure de fréquence fondamentale du signal de parole, la contribution des formants étant éliminée, ce diagramme correspondant aux valeurs D(k) de la différence obtenue à l'issue de la sous-étape e3) de la figure 2a, puis, enfin, la courbe obtenue par compression spectrale de la structure de fréquence fondamentale du signal de parole entre 70 et 450 Hz, cette fonction présentant une valeur maximum ou pic significative pour la fréquence F0, ce dernier diagramme correspondant à la mise en oeuvre des sous-étapes e4) et e5) de la figure 2a.In FIG. 2b, diagrams have been shown successively in a W-frequency energy space successively relating to the short-term spectrum between 0 and 2 KHz of a frame of a speech signal, the frame having a duration of 32 ms over 2048 points, this diagram possibly corresponding to a frame obtained following the formatting sub-step carried out in sub-step e 1 ) of FIG. 2a, the spectral envelope obtained by cosine transform applied to the first 16 cepstral coefficients, this envelope representing only the contribution of the formants, that is to say the smooth spectrum H (k) obtained at the end of sub-step e 2 ) of FIG. 2a for example, the difference D (k) between the two preceding spectra, difference in which only the fundamental frequency structure of the speech signal remains, the contribution of the formants being eliminated, this diagram corresponding to the values D (k) of the difference obtained at the end of the sub- and ape e 3 ) of FIG. 2a, then, finally, the curve obtained by spectral compression of the fundamental frequency structure of the speech signal between 70 and 450 Hz, this function having a significant maximum or peak value for the frequency F 0 , this latter diagram corresponding to the implementation of sub-steps e 4 ) and e 5 ) of FIG. 2a.

    Le procédé objet de la présente invention peut normalement être mis en oeuvre sur un flot continu ou pseudo-continu de mots ou syllabes constitutifs d'un signal de parole.The process which is the subject of the present invention can normally be implemented on a continuous stream or pseudo-continuous of words or syllables constituting a signal of speech.

    Toutefois, des investigations poussées ont montré l'intérêt de la mise en oeuvre d'un processus de discrimination entre trames voisées et trames non voisées, car l'échantillonnage de trames non voisées est susceptible d'entraíner des erreurs dans l'évaluation de la fréquence fondamentale du signal de parole en raison du fait que, pour les trames non voisées, les sons ne résultent pas d'une vibration périodique des cordes vocales, ces trames non voisées n'étant pas significatives de la fréquence fondamentale de ce signal de parole.However, extensive investigations have shown the value of implementing a discrimination process between voiced and unvoiced frames because sampling of unvoiced frames is likely cause errors in frequency evaluation fundamental of the speech signal due to the fact that for unvoiced frames, the sounds do not result from a periodic vibration of the vocal cords, these frames not voiced not being significant of the fundamental frequency of this speech signal.

    Dans ce but, et suite à la sous-étape consistant à soumettre le signal de parole préaccentué spp, respectivement le signal de parole filtré sous-échantillonné spf à la sous-étape de formatage en trames, le procédé objet de la présente invention peut consister avantageusement, en outre, à discriminer, parmi l'ensemble des trames successives, les trames voisées et les trames non voisées puis à éliminer chaque trame non voisée. En fait, les trames non voisées ne sont pas éliminées physiquement de la succession des trames courantes. Ces trames non voisées sont discriminées par affectation à celles-ci d'une valeur de fréquence fondamentale arbitraire, valeur nulle, ainsi qu'il sera décrit ultérieurement dans la description.For this purpose, and following the sub-step consisting of submit pre-emphasized speech signal spp, respectively the spf subsampled filtered speech signal at the frame formatting sub-step, the method the present invention may advantageously consist, moreover, to discriminate, among all the successive frames, the voiced frames and unvoiced frames then to be eliminated each unvoiced frame. In fact, unvoiced frames do not are not physically removed from the frame sequence common. These unvoiced frames are discriminated by assigning a fundamental frequency value thereto arbitrary, zero value, as will be described later in the description.

    Ainsi, comme représenté en figure 3a, la constitution de ces signaux en trames successives de N respectivement N2 échantillons peut être réalisée de manière classique par réception et mémorisation de ces échantillons à des adresses spécifiques d'une mémoire vive par exemple, puis lecture séquentielle, ainsi que représenté en figure 3a, des trames successives, avec lecture par exemple de la trame de rang q-1 par lecture simultanée des N échantillons correspondants, puis lecture au bout de la durée de trame, soit 32 ms, de la trame de rang q ultérieure correspondant à N échantillons en recouvrement de N/2 échantillons par rapport à la trame antérieure de rang q-1, et ainsi de suite pour la trame de rang q+1 et les trames suivantes. Ce processus de lecture peut être réalisé avantageusement par simple adressage en lecture de la mémoire contenant les échantillons du signal de parole. Ainsi que représenté en figure 3b, le formatage en trames ayant été effectué sur l'un ou l'autre signal ainsi que décrit en relation avec la figure 3a, le processus de discrimination entre trames voisées et trames non voisées peut consister, à partir de la trame courante Tq, en une étape 100, à appliquer un critère 101 de discrimination entre trames courantes voisées ou non voisées. Sur réponse négative au critère 101 précité, à la trame courante Tq est affectée une valeur arbitraire de fréquence fondamentale, valeur zéro par exemple, en une étape 102, alors qu'au contraire, sur réponse positive au critère 101, la trame courante est conservée à l'étape 103 pour traitement selon le processus de calcul pour réaliser l'extraction de la fréquence fondamentale du signal de parole. La succession des trames courantes conservées à l'étape 103 est alors soumise, en fonction du signal considéré spp, respectivement spf, au calcul du premier ensemble de valeurs X1(k) ou X2(k) respectivement, dans le cadre de la mise en oeuvre de l'étape b) ou de l'étape e), ou sous-étape e1), des figures 1a, 1b ou 2a.Thus, as represented in FIG. 3a, the constitution of these signals in successive frames of N respectively N 2 samples can be carried out in a conventional manner by reception and storage of these samples at specific addresses of a random access memory for example, then sequential reading , as shown in FIG. 3a, successive frames, with reading for example of the frame of rank q-1 by simultaneous reading of the N corresponding samples, then reading at the end of the frame duration, ie 32 ms, of the frame of subsequent rank q corresponding to N samples overlapping N / 2 samples with respect to the previous frame of rank q-1, and so on for the frame of rank q + 1 and the following frames. This reading process can advantageously be carried out by simple addressing in reading of the memory containing the samples of the speech signal. As shown in FIG. 3b, the formatting in frames having been carried out on one or the other signal as described in relation to FIG. 3a, the process of discrimination between voiced frames and unvoiced frames can consist, starting from the current frame T q , in a step 100, to apply a criterion 101 of discrimination between current frames, voiced or unvoiced. On a negative response to the aforementioned criterion 101, the current frame T q is assigned an arbitrary value of fundamental frequency, zero value for example, in a step 102, whereas on the contrary, on a positive response to criterion 101, the current frame is kept in step 103 for processing according to the calculation process to extract the fundamental frequency from the speech signal. The succession of current frames kept in step 103 is then subject, as a function of the signal considered spp, respectively spf, to the calculation of the first set of values X 1 (k) or X 2 (k) respectively, within the framework of the implementation of step b) or step e), or sub-step e 1 ), of FIGS. 1a, 1b or 2a.

    En ce qui concerne la discrimination proprement dite des trames voisées et non voisées, on indique que celle-ci peut consister, ainsi que représenté en liaison avec la figure 3c, à subdiviser chaque trame courante Tq en un nombre ST de segments de trames contigus successifs, puis à établir, pour chacun des segments de trame, un critère de discrimination de voisement. Sur la figure 3c, on a représenté quatre segments de trame contigus, notés S1 à S4, chaque segment de trame comportant donc 64 échantillons et occupant une durée de 8 ms.With regard to the actual discrimination of voiced and unvoiced frames, it is indicated that this can consist, as shown in connection with FIG. 3c, of subdividing each current frame T q into a number ST of contiguous frame segments successive, then establishing, for each of the frame segments, a voicing discrimination criterion. In FIG. 3c, four contiguous frame segments, denoted S 1 to S 4 , are shown, each frame segment therefore comprising 64 samples and occupying a duration of 8 ms.

    Selon un mode de réalisation particulièrement avantageux non limitatif, on indique que le critère de discrimination de voisement peut consister à affecter à chaque segment de trame considéré un indice de voisement dont la valeur est comprise entre 0 et 1. Chaque indice de voisement est noté Vs(1) à Vs(4) et est représentatif du niveau d'énergie basse fréquence du segment de trame S1 à S4 considéré, selon une loi sensiblement linéaire. Enfin, chaque trame courante Tq est classée comme trame non voisée par comparaison d'une combinaison linéaire des indices de voisement de chaque segment à une valeur de seuil déterminée. A titre d'exemple non limitatif, on indique que la combinaison linéaire précitée des indices de voisement peut consister à calculer la moyenne arithmétique de ces indices et à comparer cette moyenne arithmétique à la valeur de seuil ε précitée, le critère de comparaison de la combinaison linéaire s'écrivant :

    Figure 00170001
    According to a particularly advantageous, non-limiting embodiment, it is indicated that the voicing discrimination criterion may consist in assigning to each frame segment considered a voicing index whose value is between 0 and 1. Each voicing index is denoted Vs (1) to Vs (4) and is representative of the low frequency energy level of the frame segment S 1 to S 4 considered, according to a substantially linear law. Finally, each current frame T q is classified as an unvoiced frame by comparison of a linear combination of the voicing indices of each segment with a determined threshold value. By way of nonlimiting example, it is indicated that the aforementioned linear combination of the voicing indices can consist in calculating the arithmetic mean of these indices and in comparing this arithmetic mean with the aforementioned threshold value ε, the criterion for comparing the combination linear writing:
    Figure 00170001

    Enfin, ainsi que représenté en figure 3d, la valeur de chaque indice de voisement peut être affectée en fonction de l'énergie basse fréquence de chaque segment selon l'abaque représenté sur la figure précitée. Dans le mode de réalisation étudié pour la mise en oeuvre du procédé objet de la présente invention, on indique que la valeur d'indice de voisement affectée est linéaire entre les valeurs 0 et 1 pour des valeurs d'énergie basse fréquence de chaque segment comprises entre -35 et -15 dB. Ces valeurs peuvent bien entendu être modifiées.Finally, as shown in Figure 3d, the value of each voicing index can be assigned based low frequency energy of each segment according to the abacus shown in the above figure. In the mode of realization studied for the implementation of the object process of the present invention, it is indicated that the index value of affected voicing is linear between the values 0 and 1 for low frequency energy values of each segment between -35 and -15 dB. These values may well heard to be changed.

    Enfin, des erreurs peuvent survenir dans l'estimation de la valeur de la fréquence fondamentale du signal de parole, ces erreurs pouvant être dues à la présence dans une même trame de segments voisés et de segments non voisés ou de silences. Ces types d'erreurs sont désignés par erreurs de transition. De telles erreurs peuvent également survenir dans les trames voisées ou mixtes de faible énergie. Dans certaines conditions, il est alors possible de corriger ces erreurs alors que, lorsque la correction n'est pas possible, la valeur de la fréquence fondamentale du signal de parole est prise égale arbitrairement à une valeur fictive, la valeur zéro, par convention par exemple, de manière semblable à la valeur attribuée aux trames non voisées ou aux trames de silence.Finally, errors can occur in the estimation the value of the fundamental frequency of the signal speech, these errors may be due to the presence in a same frame of voiced segments and unvoiced segments or of silences. These types of errors are referred to as errors of transition. Such errors can also occur in voiced or mixed low energy frames. In certain conditions, it is then possible to correct these errors when, when correction is not possible, the value of the fundamental frequency of the speech signal is taken arbitrarily equal to a fictitious value, the zero value, by convention for example, similarly the value assigned to unvoiced frames or frames of silence.

    Le procédé objet de la présente invention peut consister alors, en outre, à effectuer un post-traitement de la valeur extraite de fréquence fondamentale du signal de parole.The process which is the subject of the present invention can then consist, in addition, in carrying out a post-processing of the value extracted from fundamental frequency of the signal speech.

    Cette étape de post-traitement peut consister par exemple à établir un histogramme des fréquences fondamentales, afin de déterminer la plage de valeurs de fréquences les plus probables ainsi que les bornes de valeurs inférieure et supérieure de ces valeurs. Suite à l'établissement de l'histogramme des fréquences fondamentales, le processus de post-traitement peut consister à soumettre chaque valeur extraite de fréquence fondamentale à un critère de tri par rapport aux bornes de valeurs inférieure et supérieure, pour obtenir des valeurs triées représentatives de l'évolution des valeurs extraites de fréquence fondamentale.This post-processing step can consist of example to establish a histogram of fundamental frequencies, to determine the range of frequency values most likely as well as lower bounds and higher of these values. Following establishment of the fundamental frequency histogram, the process post-processing can be to submit each value extracted from fundamental frequency to a sorting criterion by relation to the lower and upper value limits, for get sorted values representative of the evolution values extracted from fundamental frequency.

    Ces valeurs triées peuvent ensuite être soumises à un filtrage non linéaire pour supprimer les valeurs aberrantes.These sorted values can then be submitted to non-linear filtering to remove outliers.

    Ainsi, pour une bande de fréquences la plus probable comprise entre des valeurs notées B.Sup respectivement B.Inf, valeur supérieure et valeur inférieure de la bande de fréquences, et pour des valeurs de fréquence fondamentale successives notées F0(i), le processus de correction peut être réalisé selon les étapes de calcul ci-après :
    si F0(i) > B.Sup F0(i) = F0(i)/2

  • si F0(i) > B.Sup ou F0(i) < B.Inf F0(i) = 0
  • sinon si |F0(i) - F0(i-1)| > γ F0(i) = 0
  • sinon si F0(i) < B.Inf F0(i) = F0(i)*2
  • si F0(i) > B.Sup ou F0(i) < B.Inf F0(i) = 0
  • sinon si |F0(i) - F0(i-1)| > γ F0(i) = 0.
  • Dans le processus de calcul précité, l'indice i affecté aux valeurs de fréquence fondamentale désigne l'ordre successif des valeurs extraites, γ représente une valeur de seuil arbitraire à laquelle est comparée la différence entre deux valeurs de fréquence fondamentale successives de rang i et i-1.Thus, for a most probable frequency band between values denoted B.Sup respectively B.Inf, upper and lower value of the frequency band, and for successive fundamental frequency values denoted F 0 (i), the correction process can be carried out according to the calculation steps below:
    if F 0 (i)> B.Sup F 0 (i) = F 0 (i) / 2
  • if F 0 (i)> B. Super or F 0 (i) <B. Inf F 0 (i) = 0
  • otherwise if | F 0 (i) - F 0 (i-1) | > γ F 0 (i) = 0
  • otherwise if F 0 (i) <B.Inf F 0 (i) = F 0 (i) * 2
  • if F 0 (i)> B. Super or F 0 (i) <B. Inf F 0 (i) = 0
  • otherwise if | F 0 (i) - F 0 (i-1) | > γ F 0 (i) = 0.
  • In the abovementioned calculation process, the index i assigned to the fundamental frequency values designates the successive order of the extracted values, γ represents an arbitrary threshold value to which is compared the difference between two successive fundamental frequency values of rank i and i-1.

    Suite au filtrage non linéaire, les valeurs nulles isolées sont ensuite recalculées par interpolation linéaire, alors que les valeurs non nulles isolées au milieu d'une suite de zéros sont affectées à la valeur O par convention. Enfin, des paramètres statistiques tels que les valeurs F0 maximum et minimum ainsi que la valeur moyenne peuvent être calculés.Following the non-linear filtering, the isolated zero values are then recalculated by linear interpolation, while the non-zero values isolated in the middle of a sequence of zeros are assigned to the value O by convention. Finally, statistical parameters such as the maximum and minimum F 0 values as well as the average value can be calculated.

    Une description d'un dispositif permettant la mise en oeuvre du procédé, objet de la présente invention, sera maintenant donnée en liaison avec la figure 4.A description of a device allowing the setting implementation of the process which is the subject of the present invention will now given in connection with Figure 4.

    Le dispositif représenté sur la figure précitée permet la mise en oeuvre du procédé, objet de la présente invention, précédemment décrit dans la description. Ce dispositif présente une architecture adaptée à la mise en oeuvre de ce procédé.The device shown in the above figure allows the implementation of the process, object of the present invention, previously described in the description. This device presents an architecture adapted to the implementation work of this process.

    Ainsi que représenté sur la figure précitée, il comprend un circuit 1 d'échantillonnage et de conversion analogique-numérique d'un signal de parole analogique d'entrée en une suite d'échantillons numériques. En outre, un ordinateur hôte 2 est prévu afin de permettre la conduite de la succession des étapes a) à e) du procédé objet de la présente invention, ainsi que la gestion et la commande d'organes périphériques tels que notamment le circuit d'échantillonnage 1 et de conversion analogique-numérique, ainsi qu'il sera décrit ultérieurement dans la description.As shown in the above figure, it includes a circuit 1 for sampling and conversion analog-to-digital analog speech signal entry into a series of digital samples. In addition, a host computer 2 is provided to allow driving of the succession of steps a) to e) of the process which is the subject of the present invention, as well as management and control peripheral organs such as in particular the circuit sampling 1 and analog-digital conversion, as will be described later in the description.

    Le dispositif représenté en figure 4 comporte en outre un processeur de signal numérique dédié 3 interconnecté, d'une part, par une liaison par BUS au micro-ordinateur hôte 2, et, d'autre part, par une liaison spécifique au circuit de conversion analogique-numérique 1, ce processeur de signal numérique 3 permettant d'effectuer les opérations de calcul du premier ensemble de valeurs X1(k) du logarithme du spectre d'énergie du signal de parole par transformée de Fourier sur un nombre M1 de points, le calcul des premiers coefficients cepstraux, le filtrage passe-bas et le sous-échantillonnage du signal de parole sp ainsi que le calcul du deuxième ensemble de valeurs X2(k) du logarithme du spectre d'énergie, le calcul de l'ensemble des valeurs H(k) du spectre de fréquences lissé, le calcul de la fonction P(k) et l'opération d'extraction du maximum de la fonction P(k) pour k = F0, valeur extraite de la fréquence fondamentale du signal de parole. L'acquisition des échantillons constitutifs du signal de parole sp est conduite par l'ordinateur hôte 2, par l'intermédiaire du processeur de signal 3.The device represented in FIG. 4 further comprises a dedicated digital signal processor 3 interconnected, on the one hand, by a link by BUS to the host microcomputer 2, and, on the other hand, by a link specific to the conversion circuit analog-digital 1, this digital signal processor 3 making it possible to carry out the operations of calculating the first set of values X 1 (k) of the logarithm of the energy spectrum of the speech signal by Fourier transform on a number M 1 of points, the calculation of the first cepstral coefficients, the low-pass filtering and the sub-sampling of the speech signal sp as well as the calculation of the second set of values X 2 (k) of the logarithm of the energy spectrum, the calculation of l set of values H (k) of the smoothed frequency spectrum, the calculation of the function P (k) and the operation of extraction of the maximum of the function P (k) for k = F 0 , value extracted from the frequency of the speech signal. The acquisition of the constituent samples of the speech signal sp is carried out by the host computer 2, via the signal processor 3.

    Dans un mode de réalisation non limitatif, on indique que le processeur de signal numérique dédié 3 peut être constitué par un processeur de signal MOTOROLA, référencé DSP56001, cadencé à la fréquence d'horloge de 33 MHz. Le micro-ordinateur hôte 2 peut avantageusement être constitué par un micro-ordinateur de type PC-PENTIUM, cadencé à une fréquence d'horloge de 90 MHz et doté d'un système d'exploitation tel qu'un système d'exploitation multitâche MS-WINDOWS. Le processeur de signal numérique dédié 3 est un processeur à 24 bits en virgule fixe, ce type de processeur permettant d'effectuer les calculs précédemment cités, pour la mise en oeuvre des étapes a) à e) du procédé objet de la présente invention de manière optimale. Ce processeur de signal 3 est en fait constitué par une unité centrale de traitement 30, notée DSP-CPU, à laquelle est associé un espace de mémoire de programme noté P, référencé 31, et deux espaces de mémoire de données, notés X et Y, de capacité de 512 mots chacun et référencés 32. Les espaces de mémoire P, X et Y sont accessibles chacun par trois BUS indépendants de 24 bits, l'adressage étant effectué par trois BUS de 16 bits permettant d'adresser séparément chaque espace mémoire qui peut donc être étendu à 64 k-mots.In a nonlimiting embodiment, we indicates that the dedicated digital signal processor 3 can be made up of a MOTOROLA signal processor, referenced DSP56001, clocked at the clock frequency of 33 MHz. The host microcomputer 2 can advantageously be consisting of a PC-PENTIUM microcomputer, clocked at a clock frequency of 90 MHz and equipped with a operating system such as an operating system MS-WINDOWS multitasking. The digital signal processor dedicated 3 is a 24 bit fixed point processor, this type processor to perform calculations previously cited, for the implementation of steps a) to e) of process object of the present invention optimally. This signal processor 3 is in fact constituted by a central processing unit 30, denoted DSP-CPU, to which associated with a program memory space denoted P, referenced 31, and two data memory spaces, noted X and Y, with a capacity of 512 words each and referenced 32. The memory spaces P, X and Y are each accessible by three independent 24-bit BUSes, the addressing being done by three 16-bit BUSs for addressing separately each memory space which can therefore be extended at 64 k words.

    Pour des raisons de rapidité, les programmes et sous-programmes de calcul sont exécutés dans les 512 mots de la mémoire interne P, ces programmes ou sous-programmes étant préalablement chargés dans les 8 k-mots de la mémoire P externe. Sur instruction du micro-ordinateur hôte 2, un programme ou un sous-programme peut être transféré de la mémoire externe à la mémoire interne pour y être exécuté. Les données à traiter, données relatives au signal de parole, ainsi que les tables de calcul nécessaires au calcul des coefficients cepstraux par exemple et les résultats intermédiaires sont mémorisés dans les espaces X et Y 32 étendus à 2 x 64 k-mots.For reasons of speed, the programs and calculation subroutines are executed within 512 words of the internal memory P, these programs or subroutines being previously loaded in the 8 k-words from memory P external. On instruction from the host microcomputer 2, a program or a subroutine can be transferred from the external memory to the internal memory to be executed. The data to be processed, data relating to the signal speech, as well as the calculation tables necessary for the calculation cepstral coefficients for example and the results intermediaries are stored in spaces X and Y 32 extended to 2 x 64 k-words.

    Le micro-ordinateur hôte 2 dispose de programmes et sous-programmes permettant d'assurer un dialogue avec le processeur de signal numérique dédié 3 en vue d'effectuer le chargement de code et de données, la lecture de données, le transfert de code, l'exécution d'un ou plusieurs programmes ainsi que l'initialisation du module de conversion analogique-numérique 1 pour assurer l'acquisition et la reproduction du signal de parole.The host microcomputer 2 has programs and subprograms to ensure dialogue with the dedicated digital signal processor 3 for performing loading code and data, reading data, code transfer, execution of one or more programs as well as the initialization of the analog-digital conversion module 1 to ensure acquisition and reproduction of the speech signal.

    L'ensemble constitué par le circuit de conversion analogique-numérique 1 et le processeur de signal numérique dédié 3 est implanté sur une carte additionnelle, telle qu'une carte commercialisée par la Société DIGIMETRIE, sous la référence PC-DSP56k/AD/MEM. Cette carte, outre le processeur de signal numérique DSP56001, comporte un convertisseur analogique-numérique / numérique-analogique commercialisé par la Société TEXAS INSTRUMENTS, sous la référence TCL32040CN permettant d'assurer l'acquisition des signaux de parole, ce convertisseur portant la référence 10 sur la figure 4.The assembly constituted by the conversion circuit analog-digital 1 and digital signal processor dedicated 3 is installed on an additional card, such than a card marketed by the DIGIMETRIE Company, under the PC-DSP56k / AD / MEM reference. This card, in addition to the digital signal processor DSP56001, has a analog-to-digital / digital-to-analog converter marketed by TEXAS INSTRUMENTS, under the reference TCL32040CN to ensure the acquisition of speech signals, this converter bearing the reference 10 in figure 4.

    Compte tenu d'une telle architecture, on indique que le temps de calcul de la fréquence fondamentale, pour 100 trames de parole de durée 32 ms, est d'environ 2,7 secondes, soit 27 ms par trame de 32 ms. Le calcul de logarithme du spectre d'énergie, soit le deuxième ensemble de valeurs {X2(k)} sur M2 = 2048 points nécessite un temps de calcul de 14 ms. Compte tenu de la complexité des calculs effectués, les temps de calcul apparaissent remarquablement courts. On indique d'ailleurs qu'il est possible d'effectuer ces calculs en temps réel, puisque le temps de calcul effectif de 27 ms par trame est inférieur à la durée de chaque trame.Taking into account such an architecture, it is indicated that the calculation time of the fundamental frequency, for 100 speech frames of duration 32 ms, is approximately 2.7 seconds, or 27 ms per frame of 32 ms. The calculation of the logarithm of the energy spectrum, ie the second set of values {X 2 (k)} on M 2 = 2048 points requires a calculation time of 14 ms. Given the complexity of the calculations made, the calculation times appear remarkably short. We also indicate that it is possible to perform these calculations in real time, since the effective calculation time of 27 ms per frame is less than the duration of each frame.

    Dans le but d'améliorer les performances du système et en vue d'assurer un traitement en parallèle des étapes b), c) et d), e) du procédé, objet de la présente invention, tel que représenté par exemple en figure 1b, on indique que le micro-ordinateur hôte peut être configuré à partir du système d'exploitation MS-Windows de façon à fonctionner en mode multitâche, ce qui permet d'effectuer la conduite des opérations en parallèle dans le mode multitâche précité. Un tel mode opératoire n'est pas indispensable mais il permet d'optimiser l'utilisation des ressources de calcul.In order to improve system performance and with a view to ensuring parallel processing of the steps b), c) and d), e) of the process which is the subject of the present invention, as shown for example in Figure 1b, it is indicated that the host microcomputer can be configured from the MS-Windows operating system to operate in multitasking mode, which allows driving parallel operations in the aforementioned multitasking mode. A such a procedure is not essential but it allows optimize the use of computing resources.

    On comprend enfin qu'en ce qui concerne les opérations de post-traitement, celles-ci peuvent être réalisées au niveau du micro-ordinateur hôte 2 dans la mesure où le processus de post-traitement, tel que décrit précédemment dans la description selon l'algorithme défini précédemment, peut être réalisé grâce à un programme écrit au moyen d'un langage tel que le langage C par exemple, permettant une rapidité suffisante de traitement pour assurer la correction des valeurs et fréquences fondamentales successives extraites F0(i).It is finally understood that as regards the post-processing operations, these can be carried out at the level of the host microcomputer 2 insofar as the post-processing process, as described previously in the description according to the 'algorithm defined above, can be realized thanks to a program written by means of a language such as the language C for example, allowing a sufficient speed of treatment to ensure the correction of the successive fundamental values and frequencies extracted F 0 (i).

    Compte tenu de l'architecture précitée, on indique en particulier que le procédé et le dispositif, objets de la présente invention, peuvent avantageusement être utilisés de façon à réaliser un système d'authentification du locuteur avec une grande probabilité de réussite. En effet, on comprend en particulier que la construction de l'histogramme des fréquences peut être réalisée, soit de manière générale pour un nombre déterminé de locuteurs, soit, au contraire, pour un locuteur particulier pour lequel l'histogramme des fréquences est effectivement représentatif de ce locuteur. Il en est bien entendu de même en ce qui concerne la valeur des bornes inférieure et supérieure, ainsi que, le cas échéant, des paramètres statistiques tels que les valeurs F0max et F0min et valeur moyenne de la fréquence fondamentale du signal de parole de ce locuteur. Bien entendu, l'histogramme des fréquences précité, pour un locuteur déterminé, peut alors être réactualisé dans le temps en fonction de l'évolution de la voix du locuteur.Given the above architecture, it is indicated in particular that the method and the device, objects of the present invention, can advantageously be used so as to produce a speaker authentication system with a high probability of success. Indeed, it is understood in particular that the construction of the frequency histogram can be carried out, either generally for a determined number of speakers, or, on the contrary, for a particular speaker for which the frequency histogram is effectively representative of this speaker. It is of course the same as regards the value of the lower and upper limits, as well as, where appropriate, statistical parameters such as the values F 0max and F 0min and mean value of the fundamental frequency of the speech signal of this speaker. Of course, the aforementioned frequency histogram, for a determined speaker, can then be updated over time as a function of the evolution of the speaker's voice.

    Claims (9)

    1. A method of extracting the pitch of a speech signal, being a series of digital samples, this method comprising at least the following steps:
      a) subjecting said speech signal to a pre-accentuation process in order to generate a pre-accentuated speech signal,
      b) from the pre-accentuated speech signal and for each current frame of a series of frames, each corresponding in duration to a given number N of sample, two consecutive frames each having a time overlap in terms of number of common consecutive samples at most equal to 50/100 of the number N of samples, computing a first set of values X1(k) for the log-power spectrum by Fourier transform over a number M1 of points;
      c) from said set of values, computing a predetermined number p of first cepstral coefficients C(m) by applying a discrete cosine transform to said values X1(k) over a number of these values at least equal to half the number N of consecutive samples of said current frame, said transform verifying the equation:
      Figure 00340001
      where m = [1, 2, ..., p] and C(m) denotes said cepstral coefficients;
      d) subjecting said pre-accentuated speech signal to lowpass filtering and sub-sampling in order to generate a sub-sampled, filtered speech signal;
      e) from said sub-sampled, filtered speech signal and from said cepstral coefficients, computing, by spectral compression, for each current frame in a series of frames having a same time overlap, the maximum pitch of rank k of a function P(k) representative of the difference between a second set of values X2(k) for the log-power spectrum and the set of values H(k) of the smoothed frequency spectrum, said function verifying the equation:
      Figure 00350001
      where L = M2/k, k varying between a first and a second value representative of a band of low frequencies ranging between 70 and 450 Hz, said function P(k) having a maximum for k=F0, a pitch value extracted from the speech signal.
    2. Method as claimed in claim 1, characterised in that said step of computing by spectral compression consists in successively:
      computing from said sub-sampled, filtered speech signal, for each current frame, said second set of values X2(k) of the log-power spectrum by Fourier transform over a number M2 of points on a band of frequencies ranging between 0 and 2 kHz;
      computing the spectral envelope H(k), a smoothed spectrum of frequencies of said current frame on said band of frequencies ranging between 0 and 2 kHz over a same number M2 of points, by applying to said p-1 first cepstral coefficients a cosine transform verifying the equation:
      Figure 00350002
      where k = [0, 1, 2, ... , M2] and M2 = Q/4;
      computing the difference (D(k) = X2(k) - H(k);
      computing the harmonic product representative of the function P(k) by spectrally compressing said difference D(k) on said band of low frequencies ranging between 70 and 450 Hz;
      determining, by a sorting process, the maximum of the function P(k) and the corresponding rank k=F0, an extracted pitch value.
    3. Method as claimed in claim 1 or 2, characterised in that, after the step which consists in framing the respectively filtered and sub-sampled pre-accentuated speech signal, it additionally consists in distinguishing between the voiced frames and the unvoiced frames in all the frames, the pitch extraction process being applied to the voiced frames.
    4. Method as claimed in claim 3, characterised in that the step which consists in differentiating between the voiced frames and the unvoiced frames consists in:
      sub-dividing each frame into a number ST of successive contiguous frame segments;
      for each of said frame segments, establishing a voice discrimination criterion on the basis of a voice index ranging between 0 and 1 representing the low frequency power level of the frame segment considered in accordance with a substantially linear law;
      categorising each frame as an unvoiced frame by comparing a linear combination of the voiced indices of each segment with a given threshold value.
    5. Method as claimed in one of claims 1 to 4, characterised in that, following the step of determining the maximum of rank k of said function P(k), where k=F0 represents the pitch value of the speech signal, and with a view to eliminating any aberrant pitch value and eliminating the risk of error due to the presence of transition errors generated by the existence of voiced segments, unvoiced segments or silences in a same frame and by the existence of low-power voiced or mixed frames, said process additionally consists in applying post-processing to said pitch value extracted from said speech signal, this post-processing step consisting in:
      establishing a histogram of pitches in order to determine the range of the most probable frequency values and the upper and lower boundary values of these values;
      applying a sorting criterion to each extracted pitch value relative to said lower and upper boundary values in order to obtain sorted values representative of the change in the extracted pitch values;
      subjecting these sorted values to non-linear filtering in order to suppress the aberrant values.
    6. Method as claimed in claim 1, characterised in that steps a) to e) are performed in sequence.
    7. Method as claimed in claim 6, characterised in that steps b) and c), respectively d) and e) are performed under a multi-tasking operating system, which enables pitch extraction to be performed in real time.
    8. Device for extracting the pitch of a speech signal using the method as claimed in one of claims 1 to 7, this device comprising:
      means for sampling and converting, analogue to digital, a speech signal into a series of digital samples;
      a host micro-computer enabling steps a) to e) of the method to be performed and to manage and control peripheral units, in particular said sampling and analogue-digital conversion means;
      a digital signal processor inter-connected by a BUS link with said host micro-computer and enabling the operations to be run in order to compute the first set of values X1(k) of the log-power spectrum by Fourier transform over a number M1 of points, the first p cepstral coefficients, lowpass filtering and sub-sampling, the second set of values X2(k) of the log-power spectrum, the set of values H(k) of the smoothed spectrum of frequencies, the function
      Figure 00380001
      extraction of the maximum P(k) for k=F0, being a pitch value extracted from the speech signal.
    9. Use of the method and the device for extracting the pitch of a speech signal as claimed in one of claims 1 to 8 as a means of authenticating one or more speakers.
    EP19970401752 1996-07-24 1997-07-21 Method to determine the fundamental frequency of a speech signal Expired - Lifetime EP0821345B1 (en)

    Applications Claiming Priority (2)

    Application Number Priority Date Filing Date Title
    FR9609313 1996-07-24
    FR9609313A FR2751776B1 (en) 1996-07-24 1996-07-24 METHOD FOR EXTRACTING THE BASIC FREQUENCY OF A SPEAKING SIGNAL

    Publications (2)

    Publication Number Publication Date
    EP0821345A1 EP0821345A1 (en) 1998-01-28
    EP0821345B1 true EP0821345B1 (en) 2001-09-05

    Family

    ID=9494427

    Family Applications (1)

    Application Number Title Priority Date Filing Date
    EP19970401752 Expired - Lifetime EP0821345B1 (en) 1996-07-24 1997-07-21 Method to determine the fundamental frequency of a speech signal

    Country Status (3)

    Country Link
    EP (1) EP0821345B1 (en)
    DE (1) DE69706488T2 (en)
    FR (1) FR2751776B1 (en)

    Cited By (1)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US8259972B2 (en) 2008-01-21 2012-09-04 Bernafon Ag Hearing aid adapted to a specific type of voice in an acoustical environment, a method and use

    Families Citing this family (1)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    FR2825505B1 (en) * 2001-06-01 2003-09-05 France Telecom METHOD FOR EXTRACTING THE BASIC FREQUENCY OF A SOUND SIGNAL BY MEANS OF A DEVICE IMPLEMENTING A SELF-CORRELATION ALGORITHM

    Cited By (1)

    * Cited by examiner, † Cited by third party
    Publication number Priority date Publication date Assignee Title
    US8259972B2 (en) 2008-01-21 2012-09-04 Bernafon Ag Hearing aid adapted to a specific type of voice in an acoustical environment, a method and use

    Also Published As

    Publication number Publication date
    FR2751776A1 (en) 1998-01-30
    DE69706488T2 (en) 2002-05-23
    DE69706488D1 (en) 2001-10-11
    FR2751776B1 (en) 1998-10-09
    EP0821345A1 (en) 1998-01-28

    Similar Documents

    Publication Publication Date Title
    EP0594480B1 (en) Speech detection method
    EP2415047B1 (en) Classifying background noise contained in an audio signal
    EP2419900B1 (en) Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal
    FR2522179A1 (en) METHOD AND APPARATUS FOR RECOGNIZING WORDS FOR RECOGNIZING PARTICULAR PHONEMES OF THE VOICE SIGNAL WHATEVER THE PERSON WHO SPEAKS
    EP2603862B1 (en) Method for analyzing signals providing instantaneous frequencies and sliding fourier transforms, and device for analyzing signals
    EP0511095B1 (en) Coding and decoding method and apparatus for a digital signal
    EP0620546B1 (en) Energy detection procedure for noisy signals
    EP2795618B1 (en) Method of detecting a predetermined frequency band in an audio data signal, detection device and computer program corresponding thereto
    EP0234993B1 (en) Method and device for automatic target recognition starting from doppler echos
    EP0685833B1 (en) Method for speech coding using linear prediction
    EP0821345B1 (en) Method to determine the fundamental frequency of a speech signal
    EP0714088B1 (en) Voice activity detection
    EP0574288B1 (en) Method and apparatus for transmission error concealment of frequency transform coded digital audio signals
    EP0097754B1 (en) Tone detector and multifrequency receiver using this detector
    KR100766170B1 (en) Music summarization apparatus and method using multi-level vector quantization
    EP0621582B1 (en) Method of speech recognition with training phase
    EP0585434B1 (en) Filtering method and device for reducing digital audio signal pre-echoes
    EP1605440B1 (en) Method for signal source separation from a mixture signal
    EP1459214B1 (en) Method for characterizing a sound signal
    EP0015363A1 (en) Speech detector with a variable threshold level
    WO2002082424A1 (en) Method and device for extracting acoustic parameters of a voice signal
    WO2002082106A1 (en) Method and device for analysing a digital audio signal
    FR3032553A1 (en) METHOD FOR GENERATING A REDUCED AUDIO IMPRINT FROM A SOUND SIGNAL AND METHOD FOR IDENTIFYING A SOUND SIGNAL USING SUCH A REDUCED AUDIO IMPRINT
    FR2622727A1 (en) Method of recognising speech or any other sound wave and its method of implementation
    WO2006005815A1 (en) Deciding upon restoration of partials of a sound signal

    Legal Events

    Date Code Title Description
    PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

    Free format text: ORIGINAL CODE: 0009012

    AK Designated contracting states

    Kind code of ref document: A1

    Designated state(s): BE CH DE GB IT LI

    17P Request for examination filed

    Effective date: 19971206

    RBV Designated contracting states (corrected)

    Designated state(s): BE CH DE GB IT LI

    17Q First examination report despatched

    Effective date: 20000105

    RIC1 Information provided on ipc code assigned before grant

    Free format text: 7G 10L 11/04 A

    GRAG Despatch of communication of intention to grant

    Free format text: ORIGINAL CODE: EPIDOS AGRA

    GRAG Despatch of communication of intention to grant

    Free format text: ORIGINAL CODE: EPIDOS AGRA

    GRAH Despatch of communication of intention to grant a patent

    Free format text: ORIGINAL CODE: EPIDOS IGRA

    GRAH Despatch of communication of intention to grant a patent

    Free format text: ORIGINAL CODE: EPIDOS IGRA

    GRAA (expected) grant

    Free format text: ORIGINAL CODE: 0009210

    AK Designated contracting states

    Kind code of ref document: B1

    Designated state(s): BE CH DE GB IT LI

    REG Reference to a national code

    Ref country code: CH

    Ref legal event code: EP

    REF Corresponds to:

    Ref document number: 69706488

    Country of ref document: DE

    Date of ref document: 20011011

    REG Reference to a national code

    Ref country code: CH

    Ref legal event code: NV

    Representative=s name: KELLER & PARTNER PATENTANWAELTE AG

    REG Reference to a national code

    Ref country code: GB

    Ref legal event code: IF02

    GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

    Effective date: 20011206

    PLBE No opposition filed within time limit

    Free format text: ORIGINAL CODE: 0009261

    STAA Information on the status of an ep patent application or granted ep patent

    Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

    26N No opposition filed
    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: GB

    Payment date: 20040628

    Year of fee payment: 8

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: DE

    Payment date: 20040629

    Year of fee payment: 8

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: BE

    Payment date: 20050622

    Year of fee payment: 9

    PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

    Ref country code: CH

    Payment date: 20050624

    Year of fee payment: 9

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: IT

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20050721

    Ref country code: GB

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20050721

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: DE

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20060201

    GBPC Gb: european patent ceased through non-payment of renewal fee

    Effective date: 20050721

    PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

    Ref country code: LI

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20060731

    Ref country code: CH

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20060731

    Ref country code: BE

    Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

    Effective date: 20060731

    REG Reference to a national code

    Ref country code: CH

    Ref legal event code: PL

    BERE Be: lapsed

    Owner name: *LA POSTE

    Effective date: 20060731

    Owner name: *FRANCE TELECOM

    Effective date: 20060731