EP1006511B1

EP1006511B1 - Sound processing method and device for adapting a hearing aid for hearing impaired

Info

Publication number: EP1006511B1
Application number: EP99403027A
Authority: EP
Inventors: Gilles Quagliaro; Philippe Gournay; Frédéric Chartier; Gwénael Guilmin
Original assignee: Thales SA
Current assignee: Thales SA
Priority date: 1998-12-04
Filing date: 1999-12-03
Publication date: 2004-04-28
Anticipated expiration: 2019-12-03
Also published as: DE69916756T2; EP1006511A1; FR2786908A1; FR2786908B1; DE69916756D1; US6408273B1; ATE265733T1

Abstract

A parameter characterizing a pitch is modified (61) by applying a multiplying factor to a value of the pitch. The parameter characterizing a voisement is modified (62) by a multiplying factor. A parameter characterizing the energy is modified (62) by a compression function. The parameter characterizing the spectral envelope is modified (64) by a compression rate of the frequency scale, and the length of interval of time taking into an account of a of synthesized phase (7) while multiplying a time interval of by a time factor. An Independent claim is included for: (a) a device for implementing a method

Description

La présente invention concerne un procédé et un dispositif de correction des sons pour malentendants. Elle s'applique aussi bien à la réalisation de prothèses auditives, que de logiciels exécutables sur des ordinateurs personnels ou des répondeurs téléphoniques et de manière générale à tous dispositifs destinés à améliorer le confort d'écoute et la compréhension de la parole des personnes atteintes de surdité.The present invention relates to a method and a device for correcting sounds for the hearing impaired. It applies as well to the production of hearing aids, as software executable on personal computers or answering machines and in general to all devices intended to improve the listening comfort and the understanding of the speech of people with deafness.

Le problème posé par les malentendants provient essentiellement du caractère spécifique et dégradé de leur perception auditive.The problem posed by the hard of hearing comes essentially from the specific and degraded nature of their auditory perception.

Dans son besoin de communiquer, l'homme a depuis l'aube des temps construit un mode de communication oral, la parole, qui s'appuie sur les caractéristiques moyennes de production (la voix) et de perception (l'oreille) du signal sonore. Le langage courant est donc celui du plus grand nombre. A contrario l'audition du malentendant est très éloignée de la moyenne et le langage courant lui est difficilement, voire pas du tout, accessible.In his need to communicate, man has since the dawn of time built a mode of oral communication, speech, which is based on the average characteristics of production (voice) and perception (ear) of the signal sound. Common language is therefore that of the greatest number. Conversely, the hearing of the hard of hearing is far from the average and everyday language is difficult, if not at all, accessible to him.

La compréhension du langage courant est un passage obligé pour l'intégration du malentendant dans sa communauté. Dans ce qui peut être considéré comme un réflexe de survie sociale, tout malentendant est amené naturellement à se constituer un langage à lui et à mettre en oeuvre des procédés, des techniques, et une stratégie de communication lui permettant de transposer le langage commun vers son langage particulier. Un exemple connu et spectaculaire est celui de la lecture labiale, qui permet d'accéder à la parole normale au travers d'un alphabet visuel de position des lèvres.Understanding everyday language is a prerequisite for integrating the hearing impaired into their community. In what can be considered as a reflex of social survival, any hearing impaired person is naturally led to form his own language and to implement procedures, techniques, and a communication strategy allowing him to transpose the common language into his particular language. A known and spectacular example is that of lip reading, which allows access to normal speech through a visual alphabet of lip position.

Le vingtième siècle a vu un effort continu dans la conception de machines destinées à soulager les malentendants et à les aider.The twentieth century has seen a continuous effort in the design of machines to relieve and assist the hearing impaired.

Deux classes de machines ont été développées.Two classes of machines have been developed.

Une première classe s'adresse aux surdités « légères » et vise à corriger l'audition et à la rendre normale, autant que possible. C'est ce que font les prothèses usuelles largement disponibles sur le marché.A first class is aimed at "mild" deafness and aims to correct hearing and make it normal as much as possible. This is what conventional prostheses widely available on the market do.

Une deuxième classe s'adresse aux surdités plus lourdes et vise à réaliser une transformation de la parole en une parole de synthèse accessible à la personne malentendante. Dans cette catégorie la plupart des réalisations s'adressent aux "sourds profonds". Un exemple remarquable est celui de l'implant cochléaire qui agit au moyen d'électrodes par stimulation directe du nerf auditif.A second class is intended for heavier deafness and aims to transform speech into synthetic speech accessible to the hearing impaired person. In this category, most of the creations are aimed at the "deep deaf". A remarkable example is that of the cochlear implant which acts by means of electrodes by direct stimulation of the auditory nerve.

La présente invention vise à proposer une solution pour les personnes souffrant de surdité dites « intermédiaires ». Ces personnes n'ont actuellement pas d'aide technique adaptée. Elles sont trop touchées pour être servies par les prothèses usuelles, mais leur acquis auditif est suffisant pour pouvoir se passer des dispositifs pour sourds profonds.The present invention aims to provide a solution for people suffering from deafness called "intermediate". These people currently do not have adequate technical assistance. They are too affected to be served by the usual prostheses, but their hearing is sufficient to be able to do without devices for the deaf.

Les prothèses usuelles mettent généralement en oeuvre un procédé d'amplification sélective de la parole en fonction de la fréquence. Dans sa mise en oeuvre un automatisme de régulation du niveau sonore agit sur le gain d'amplification, le but étant de donner le meilleur confort d'écoute et une protection contre les pics de puissance instantanés.The usual prostheses generally use a method of selective amplification of speech as a function of frequency. In its implementation, an automatic regulation of the sound level acts on the gain of amplification, the aim being to give the best listening comfort and protection against instantaneous power peaks.

Pour des raisons de stratégie commerciale et en réponse à la demande des patients, ces prothèses sont miniaturisées pour être portées en contour d'oreille ou en insert, ce qui conduit à des performances relativement médiocres ne pouvant satisfaire que des corrections auditives très grossières. Typiquement, il est défini seulement trois bandes de fréquences pour la correction fréquencielle. Ces prothèses s'adressent sans ambiguïté aux surdités "légères" les plus fréquentes. Des surdités plus lourdes peuvent être soulagées, mais au prix d'inconvénients pénibles causés notamment par l'amplification du bruit de fond, et le phénomène du Larsen. D'autre part il n'y a pas de possibilité de correction dans les zones fréquencielles pour lesquelles il n'existe pas d'audition.For reasons of commercial strategy and in response to patient demand, these prostheses are miniaturized to be worn behind the ear or as an insert, which leads to relatively poor performance that can only satisfy very coarse hearing corrections. Typically, only three frequency bands are defined for frequency correction. These prostheses unambiguously address the most frequent "mild" deafness. Heavier deafness can be relieved, but at the cost of painful inconveniences caused in particular by the amplification of the background noise, and the phenomenon of Larsen. On the other hand there is no possibility of correction in the frequency zones for which there is no hearing.

Sur l'historique des prothèses pour sourds profond on peut se reporter utilement aux travaux de M.J.M.TATO professeur d'ORL et MM VIGNERON et LAMOTTE cités dans l'article M J C LAFON ayant pour titre "Transposition et modulation", publié au bulletin d'audiophonologie annales scientifiques de Franche Comté, Volume XII, N°3&4, monographie 164, 1996. Ces prothèses exploitent le fait que les sourds sont rarement complètement sourds, et qu'un très faible reliquat de perception persiste, souvent dans les graves, dont il a souvent été essayé de tirer parti.On the history of prostheses for the deaf one can usefully refer to the work of MJMTATO professor of ORL and MM VIGNERON and LAMOTTE cited in the article MJC LAFON having for title "Transposition and modulation", published in the audiophonology bulletin scientific annals of Franche Comté, Volume XII, N ° 3 & 4, monograph 164, 1996. These prostheses exploit the fact that the deaf are rarely completely deaf, and that a very weak balance of perception persists, often in the grave, which he has often been tried to take advantage.

C'est ainsi qu'il est possible de redonner de manière très rustique, une perception du son aux sourds par des procédés dits de «transposition » des aiguës vers les graves. Malheureusement la compréhension du langage exige plus qu'une simple perception et il s'avère que la transmission de l'intelligibilité est inséparable d'une nécessaire « richesse " du son. Redonner cette « richesse » est devenu un des principaux sujets de préoccupation. C'est ainsi qu'il a été envisagé de créer une parole de synthèse dans le but de restituer les éléments structurels qui forment le support à l'intelligibilité du langage courant.This is how it is possible to restore, in a very rustic way, a perception of sound to the deaf by so-called “transposition” processes. from treble to bass. Unfortunately the understanding of language requires more than a simple perception and it turns out that the transmission of intelligibility is inseparable from a necessary "richness" of sound. Giving back this "richness" has become one of the main subjects of concern. It was thus that it was envisaged to create a speech of synthesis in order to restore the structural elements which form the support for the intelligibility of everyday language.

La technique mise en oeuvre en 1952 par M J.M. TATO, consiste à enregistrer la parole dite très rapidement et à la restituer à vitesse moitié. Ceci permet d'effectuer une transposition d'un octave dans les graves, tout en conservant la structure de la parole initiale. Des essais ont montré un certain avantage pour les sourds.The technique implemented in 1952 by M J.M. TATO, consists in recording the speech said very quickly and in restoring it at half speed. This allows transposition of an octave in the bass, while preserving the structure of the initial speech. Tests have shown some benefit for the deaf.

Mais le procédé présente l'inconvénient de ne pouvoir être utilisée qu'en temps différé. La technique développée en 1971 par MM C.VIGNERON et M.LAMOTTE permet d'effectuer une adaptation en «temps réel »par une découpe du temps en intervalles de 1/100 de secondes en supprimant un intervalle sur deux, et en appliquant le procédé de M J.M.TATO sur les intervalles restants. Mais ce système présente malheureusement un bruit de fond important.However, the method has the drawback of being able to be used only in deferred time. The technique developed in 1971 by MM C. VIGNERON and M. LAMOTTE makes it possible to carry out an adaptation in “real time” by cutting time in intervals of 1/100 of seconds by removing one interval out of two, and by applying the process by M JMTATO on the remaining intervals. However, this system unfortunately has significant background noise.

L'idée de construire des sons « naturels » est également présente dans une prothèse également citée sous le nom GALAXIE dans l'article de M JC LAFON. Cette prothèse met en oeuvre une batterie de filtres et de mélangeurs répartis sur six sous bandes et réalise une transposition dans les graves utilisables pour les sourds profonds.The idea of building “natural” sounds is also present in a prosthesis also cited under the name GALAXIE in the article by M JC LAFON. This prosthesis implements a battery of filters and mixers distributed over six sub-bands and realizes a transposition in the bass usable for the deep deaf.

Malheureusement, ces procédés qui interviennent au niveau du signal présentent trop de distorsions et un inconfort d'écoute trop important pour pouvoir être utilisés par les personnes souffrant de surdités intermédiaires.Unfortunately, these methods which intervene at the signal level present too much distortion and too great discomfort of listening to be able to be used by people suffering from intermediate deafness.

De l'article de M Jean Claude LAFON se dégagent trois orientations qui peuvent être retenues dans la réalisation d'un bon traitement prothétique.

1- Il parait important de pouvoir transposer la globalité de la structure acoustique c'est à dire de ramener les éléments structurels de la parole porteurs de l'intelligibilité dans la zone de perception du malentendant.
2- Il parait aussi important de produire des sons « naturels »c'est à dire de reproduire une parole synthétique porteuse de l'information qui a une structure en harmonie avec l'acquis auditif du malentendant.
Le brevet US 4 051 331 divulgue un procédé de correction auditive des malentendants où la restitution de la parole s'appuie sur des sources oscillatrices chacune centrées sur la fréquence centrale des formants.
3- Enfin il faut veiller à conserver la temporalité du signal de parole car le rythme est porteur d'information accessible au malentendant. L'idée à l'origine de l'invention est de pallier les inconvénients précités en utilisant un modèle paramétrique du signal de parole capable d'effectuer des transformations pertinentes en vue d'une correction auditive pour des malentendants en mettant en oeuvre une méthode capable de satisfaire les trois contraintes citées précédemment.

From the article by Mr. Jean Claude LAFON, three directions emerge which can be retained in the achievement of good prosthetic treatment.

1- It seems important to be able to transpose the whole of the acoustic structure, that is to say to bring the structural elements of speech carrying intelligibility back into the perception area of the hearing impaired.
2- It also seems important to produce “natural” sounds, that is to say to reproduce synthetic speech carrying information which has a structure in harmony with the hearing impaired person's hearing.
US Pat. No. 4,051,331 discloses a hearing correction process for the hearing impaired in which speech restitution is based on oscillating sources each centered on the central frequency of the formants.
3- Finally, care must be taken to preserve the temporality of the speech signal because the rhythm carries information accessible to the hearing impaired. The idea behind the invention is to overcome the aforementioned drawbacks by using a parametric model of the speech signal capable of performing relevant transformations for hearing correction for the hearing impaired by implementing a method capable of to satisfy the three constraints mentioned above.

A cet effet l'invention a pour objet, un procédé pour la correction auditive des malentendants, du type consistant à analyser un signal de parole pour en extraire des paramètres caractérisant le pitch, le voisement, l'énergie et le spectre du signal de parole, à modifier les paramètres pour rendre la parole intelligible à un malentendant, et à synthétiser un signal de parole perceptible par le malentendant à partir des paramètres modifiés de la manière suivante :

Le paramètre caractérisant le pitch est modifié en appliquant à la valeur du pitch extrait un facteur multiplicateur,
Le paramètre caractérisant l'énergie est modifié par une fonction de compression,
Le paramètre caractérisant l'enveloppe spectrale est modifié par une compression homothétique de l'échelle des fréquences

Caractérisé en ce que

Le paramètre caractérisant le voisement est donné sous la forme d'une fréquence de transition entre une bande voisée et une bande haute non voisée, qui est modifiée par un facteur multiplicateur afin de créer une bande haute modifiée sur laquelle on génère un bruit blanc pseudo aléatoire lors de la phase de synthèse ; et en ce que
La durée de l'intervalle de temps pris en compte pour la phase de synthèse est modifié en multipliant l'intervalle de temps par un facteur temps.

To this end, the subject of the invention is a method for the hearing correction of the hearing impaired, of the type consisting in analyzing a speech signal in order to extract therefrom parameters characterizing the pitch, the voicing, the energy and the spectrum of the speech signal. , to modify the parameters to make speech intelligible to the hearing impaired, and to synthesize a speech signal perceptible by the hearing impaired from parameters modified in the following manner:

The parameter characterizing the pitch is modified by applying a multiplier factor to the value of the extracted pitch,
The parameter characterizing the energy is modified by a compression function,
The parameter characterizing the spectral envelope is modified by a homothetic compression of the frequency scale

Characterized in that

The parameter characterizing the voicing is given in the form of a transition frequency between a voiced band and an unvoiced high band, which is modified by a multiplying factor in order to create a modified high band on which pseudo-random white noise is generated during the synthesis phase; and in that
The duration of the time interval taken into account for the synthesis phase is modified by multiplying the time interval by a time factor.

L'invention a également pour objet , un dispositif pour la mise en oeuvre du procédé précité.The subject of the invention is also a device for implementing the above method.

Le procédé et le dispositif selon l'invention ont pour avantages de mettre en oeuvre les modèles paramétriques qui sont couramment utilisés dans les vocodeurs pour les adaptateur à l'audition des malentendants. Ceci permet de travailler non plus au niveau du signal sonore, comme le font les techniques antérieures mais au niveau de la structure symbolique du signal de parole afin d'en préserver son intelligibilité Les vocodeurs présentent en effet l'avantage d'utiliser un alphabet qui intègre les notions de « pitch », « spectre », « voisement » et « énergie » qui sont très proches du modèle physiologique de la bouche et de l'oreille. En vertu de la théorie de SHANNON, l'information transmise est alors bien porteuse de l'intelligibilité de la parole. La matérialisation de l'intelligibilité de la parole sous une forme informatique ouvre ainsi une perspective nouvelle. L'intelligibilité peut ainsi être acquise lors de l'opération d'analyse, et elle est restituée lors de la synthèse.The method and the device according to the invention have the advantages of implementing the parametric models which are commonly used in vocoders for adapters to the hearing of the hearing impaired. This makes it possible to work no longer at the level of the sound signal, as do the prior techniques but at the level of the symbolic structure of the speech signal in order to preserve its intelligibility. The vocoders indeed have the advantage of using an alphabet which integrates the notions of "pitch", "spectrum", "voicing" and "energy" which are very close to the physiological model of the mouth and the ear. According to SHANNON theory, the information transmitted then carries the intelligibility of speech. The materialization of speech intelligibility in computer form thus opens up a new perspective. The intelligibility can thus be acquired during the analysis operation, and it is restored during the synthesis.

Grâce à l'invention, l'opération de synthèse d'un vocodeur paramétrique peut dès lors être adaptée aux caractéristiques auditives des malentendants. Cette technique, associée à des procédés plus conventionnels, permet d'envisager un procédé prothétique particulièrement général pouvant servir une population très large, et notamment les personnes souffrant de surdité intermédiaire.Thanks to the invention, the operation of synthesizing a parametric vocoder can therefore be adapted to the hearing characteristics of deaf. This technique, associated with more conventional methods, makes it possible to envisage a particularly general prosthetic process which can serve a very large population, and in particular people suffering from intermediate deafness.

Comme autre avantage le procédé et le dispositif selon l'invention offrent une grande liberté dans les réglages, chaque paramètre pouvant être modifié indépendamment des autres sans impact réciproque, avec un réglage spécifique pour chaque oreille.As another advantage, the method and the device according to the invention offer great freedom in the adjustments, each parameter being able to be modified independently of the others without mutual impact, with a specific adjustment for each ear.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'aide de la description qui suit faite en regard des dessins annexés qui représentent:Other characteristics and advantages of the invention will become apparent from the following description given with reference to the appended drawings which represent:

La figure 1, les paramètres de modélisation du signal de parole utilisés dans la mise en oeuvre de l'invention.FIG. 1, the parameters for modeling the speech signal used in the implementation of the invention.

La figure 2, un modèle paramétrique de production du signal de parole.Figure 2, a parametric model of speech signal production.

La figure 3, les différentes étapes nécessaires à la mise en oeuvre du procédé selon l'invention sous la forme d'un organigramme.FIG. 3, the different steps necessary for implementing the method according to the invention in the form of a flowchart.

La figure 4, une courbe de transformation lors de la synthèse du signal de parole de l'énergie du signal de parole mesurée lors du processus d'analyse du signal de parole.FIG. 4, a transformation curve during the synthesis of the speech signal of the energy of the speech signal measured during the process of analysis of the speech signal.

La figure 5, un mode de réalisation d'un dispositif pour la mise en oeuvre du procédé selon l'invention.FIG. 5, an embodiment of a device for implementing the method according to the invention.

Le procédé de traitement du signal de parole selon l'invention est basé sur une modélisation paramétrique du signal de parole du type de celle couramment mises en oeuvre dans la technique de réalisation des vocodeurs numériques HSX et dont une description peut être trouvée dans l'article de MM P.Goumay, F. Charité, intitulé "A 1200 bits/s HSX speech coder for very low bit rate communications", et publié dans IEEE Proceedings Workshop on Signal Processing System (Sips'98), Boston, 8-10 Octobre 1998.The speech signal processing method according to the invention is based on a parametric modeling of the speech signal of the type commonly used in the technique for producing HSX digital vocoders and a description of which can be found in the article. by MM P. Goumay, F. Charité, entitled "A 1200 bits / s HSX speech coder for very low bit rate communications", and published in IEEE Proceedings Workshop on Signal Processing System (Sips'98), Boston, October 8-10 1998.

Ce modèle est défini principalement par quatre paramètres :

un paramètre de voisement qui décrit le caractère plus ou moins périodique des sons voisés ou aléatoire des sons non voisés du signal de parole,
un paramètre définissant la fréquence fondamentale ou « PITCH » des sons voisés,
un paramètre représentatif de l'évolution temporelle de l'énergie,
et un paramètre représentatif de l'enveloppe spectrale du signal de parole.

This model is mainly defined by four parameters:

a voicing parameter which describes the more or less periodic nature of the voiced or random sounds of the unvoiced sounds of the speech signal,
a parameter defining the fundamental frequency or “PITCH” of the voiced sounds,
a parameter representative of the time evolution of the energy,
and a parameter representative of the spectral envelope of the speech signal.

L'enveloppe spectrale du signal, ou « spectre », peut être obtenue par une modélisation autorégressive à l'aide d'un filtre de prédiction linéaire ou par une analyse de Fourier à court terme synchrone avec le pitch. Ces quatre paramètres sont estimés périodiquement sur le signal de parole de une à plusieurs fois par trame selon le paramètre, pour une durée trame typiquement comprise entre 10 et 30 ms.The spectral envelope of the signal, or "spectrum", can be obtained by an autoregressive modeling using a linear prediction filter or by a short-term Fourier analysis synchronous with the pitch. These four parameters are estimated periodically on the speech signal from one to several times per frame depending on the parameter, for a frame duration typically between 10 and 30 ms.

La restitution du signal de parole s'effectue de la façon représentée à la figure 2, en excitant par le pitch ou par un bruit stochastique, un filtre numérique de synthèse 1 qui modélise par sa fonction de transfert le conduit vocal selon que respectivement le son est voisé ou non voisé.The speech signal is reproduced in the manner shown in FIG. 2, by stimulating by pitch or by stochastic noise, a digital synthesis filter 1 which models the vocal tract by its transfer function according to whether the sound respectively is voiced or unvoiced.

Un commutateur 2 assure la transmission du pitch ou du bruit à l'entrée du filtre de synthèse 1.A switch 2 transmits the pitch or the noise to the input of the synthesis filter 1.

Un amplificateur 3 de gain variable en fonction de l'énergie du signal de parole est placé en sortie du filtre de synthèse 1.An amplifier 3 of variable gain as a function of the energy of the speech signal is placed at the output of the synthesis filter 1.

Dans le cas d'un modèle paramétrique simple comportant une décision binaire son voisé / son non voisé, la procédure de synthèse peut se résumer à celle représentée sur la figure 2. Cependant le procédé selon l'invention qui est représenté à la figure 3 sous la forme d'un organigramme, est plus complexe et se déroule en quatre étapes, se décomposant en une étape 4 de prétraitement, une étape 5 d'analyse du signal obtenu à l'étape 4 pour l'extraction des paramètres caractérisant le pitch, le voisement, l'énergie, et le spectre du signal de parole, une étape 6 durant laquelle les paramètres obtenus à l'étape 5 sont modifiés et une étape 7 de synthèse d'un signal de parole composé à partir des paramètres modifiés de l'étape 6.In the case of a simple parametric model comprising a binary decision its voiced / its voiced, the synthesis procedure can be summarized to that represented in FIG. 2. However, the method according to the invention which is represented in FIG. 3 under the form of a flowchart is more complex and takes place in four stages, breaking down into a preprocessing stage 4, a stage 5 of analysis of the signal obtained in stage 4 for the extraction of the parameters characterizing the pitch, the voicing, the energy, and the spectrum of the speech signal, a step 6 during which the parameters obtained in step 5 are modified and a step 7 of synthesis of a speech signal composed from the modified parameters of l 'step 6.

L'étape 4 est celle qui est classiquement mise en oeuvre dans les vocodeurs. Elle consiste notamment, après avoir converti le signal de parole en un signal numérique, à réduire le bruit de fond en utilisant par exemple la méthode décrite par M.D.Malah, publié dans IEEE trans.Acoust.,Speech Processing, Volume-12 N.6, pp.1109-1121 1984, ayant pour titre "Speech enhancement using a minimum square error short time spectral amplitude estimator", à annuler les échos acoustique en utilisant par exemple la méthode décrite dans l'article de MM.K.Murano, S. Unjani et F. Amano ayant pour titre: "Echo cancellation and applications" publié dans la revue IEEE Com. May, 28 (1), pp 49-55, janvier 1990, à réaliser une commande automatique de gain, ou encore à préaccentuer le signal.Step 4 is that which is conventionally implemented in vocoders. It consists notably, after converting the speech signal into a digital signal, to reduce the background noise by using for example the method described by MDMalah, published in IEEE trans.Acoust., Speech Processing, Volume-12 N.6, pp. 1109-1121 1984, entitled "Speech enhancement using a minimum square error short time spectral amplitude estimator ", to cancel acoustic echoes using for example the method described in the article by MM.K. Murano, S. Unjani and F. Amano having for title:" Echo cancellation and applications "published in the journal IEEE Com. May, 28 (1), pp 49-55, January 1990, to perform an automatic gain control, or to pre-emphasize the signal.

Le traitement paramétrique du signal de parole obtenu en fin de l'étape 4 est effectué à l'étape 5. Il consiste à découper le signal de parole en échantillons de durée constante Tanalyse (typiquement 5 à 30 millisecondes) pour réaliser sur chacun d'eux l'estimation des paramètres du modèle de signal de parole. En utilisant le modèle d'analyse HSX décrite dans l'article de MM.Gournay et F.Chartier cité précédemment, le pitch et le voisement sont estimés toutes les 22,5 millisecondes. L'information de voisement est donnée sous la forme d'une fréquence de transition entre une bande basse voisée et une bande haute non voisée. L'énergie du signal est estimée toutes les 5,625 millisecondes. Lors des périodes non voisées du signal, elle est estimée sur une durée de 45 échantillons (5,625 ms) et exprimée en dB par échantillon. Lors des périodes voisées du signal, elle est estimée sur un nombre entier de périodes fondamentales au minimum égal à 45 et exprimée en dB par échantillon. L'enveloppe spectrale S(ω) est estimée toutes les 11,25 millisecondes. Elle est obtenue par prédiction linéaire (LPC) par une modélisation auto-régressive d'un filtre d'ordre OLPC= 16 de fonction de transfert : ${S(ω) = 1 / |A(z)|}^{2} avec z = exp(jω) et ω = 2 π f$ où A(z) est défini par: The parametric processing of the speech signal obtained at the end of step 4 is carried out in step 5. It consists of cutting the speech signal into samples of constant analysis duration (typically 5 to 30 milliseconds) to perform on each of them the estimation of the parameters of the speech signal model. Using the HSX analysis model described in the article by Messrs. Gournay and F. Chartier cited above, the pitch and the voicing are estimated every 22.5 milliseconds. The voicing information is given in the form of a transition frequency between a voiced low band and an unvoiced high band. The signal energy is estimated every 5.625 milliseconds. During unvoiced periods of the signal, it is estimated over a duration of 45 samples (5.625 ms) and expressed in dB per sample. During voiced signal periods, it is estimated over a whole number of fundamental periods at least equal to 45 and expressed in dB per sample. The spectral envelope S (ω) is estimated every 11.25 milliseconds. It is obtained by linear prediction (LPC) by an auto-regressive modeling of a filter of order OLPC = 16 of transfer function: ${S (ω) = 1 / | A (z) |}^{2} with z = exp (jω) and ω = 2 π f$ where A (z) is defined by:

Dans ce qui suit les paramètres issus de l'analyse sont notés :

PitchAnalyse ;
VoisementAnalyse ;
EnergieAnalyse[i], i=O à 3 ;
LpcAnalyse[k], k=I à 16.

In the following, the parameters resulting from the analysis are noted:

PitchAnalyse;
VoisementAnalyse;
EnergieAnalyse [i], i = O to 3;
LpcAnalyse [k], k = I to 16.

La procédure de synthèse consiste, pour chaque intervalle de durée Tanalyse, à stimuler le filtre de synthèse donnant S(ω) par la somme pondérée en fréquence (bande basse/ bande haute définie par la fréquence de voisement), d'un bruit blanc pseudo aléatoire pour la bande haute et d'un signal périodique en peigne de Dirac de fréquence fondamentale égale au pitch pour ia bande basse.The synthesis procedure consists, for each analysis duration interval, in stimulating the synthesis filter giving S (ω) by the frequency-weighted sum (low band / high band defined by the frequency voicing), a pseudo-random white noise for the high band and a periodic Dirac comb signal of fundamental frequency equal to the pitch for the low band.

Selon l'invention de nombreuses transformations peuvent être appliquées aux paramètres issus de l'analyse de l'étape 5. Chaque paramètre peut en effet être modifié indépendamment des autres, sans interaction réciproque. De plus, ces transformations peuvent être constantes ou n'être activées que dans des conditions particulières (par exemple, déclenchement de la modification de l'enveloppe spectrale pour certaines configurations de répartition de l'énergie en fonction de la fréquence,... ).According to the invention, numerous transformations can be applied to the parameters resulting from the analysis of step 5. Each parameter can indeed be modified independently of the others, without reciprocal interaction. In addition, these transformations can be constant or only be activated under specific conditions (for example, triggering of the modification of the spectral envelope for certain configurations of energy distribution as a function of frequency, etc.) .

Ces modifications sont effectuées aux étapes 6 ₁ à 6₄ et elles portent essentiellement sur la valeur du pitch qui caractérise la fréquence fondamentale, le voisement, l'énergie et l'enveloppe spectrale.These modifications are carried out in steps 6 ₁ to 6 ₄ and they relate essentially to the value of the pitch which characterizes the fundamental frequency, the voicing, the energy and the spectral envelope.

Pour le déroulement de l'étape 6 ₁ toute transformation définissant une nouvelle valeur de « pitch » à partir de la valeur du pitch d'analyse obtenue à l'étape 5 est applicable.For the course of step 6 _1, any transformation defining a new “pitch” value from the value of the analysis pitch obtained in step 5 is applicable.

La transformation élémentaire est l'homothétie, définie par la relation : $PitchSynthèse = PitchAnalyse * FacteurPitch,$ avec les limitations suivantes : $0,25 < FacteurPitch < 4.0$ $50 Hz < PitchSynthèse < 400 Hz.$ Elementary transformation is homothety, defined by the relation: $PitchSynthèse = PitchAnalyse * FacteurPitch,$ with the following limitations: $0.25 <Pitch Factor <4.0$ $50 Hz <PitchSynthesis <400 Hz.$

Le facteur Facteur Pitch est ajustable pour le type de surdité considéré.The Pitch Factor factor is adjustable for the type of deafness considered.

Comme pour le pitch la fréquence de voisement peut être modifiée par toute transformation définissant une « fréquence de voisement » pour chaque valeur de la fréquence de voisement analysée à l'étape 5.As for the pitch, the voicing frequency can be modified by any transformation defining a "voicing frequency" for each value of the voicing frequency analyzed in step 5.

Dans l'exemple de mise en oeuvre de l'invention la transformation choisie est une homothétie, définie par la relation : $VoisementSynthèse = VoisementAnalyse * FacteurVoisement,$ avec les limitations suivantes : $0,25 < FacteurVoisement < 4.$ $0 Hz < VoisementSynthèse < 4000 Hz.$ In the example of implementation of the invention, the chosen transformation is a homothety, defined by the relation: $VoisementSynthèse = VoisementAnalyse * FactorVoisement,$ with the following limitations: $0.25 <Voice Factor <4.$ $0 Hz <Voice Synthesis <4000 Hz.$

Lorsque la fréquence de transition de voisement issue de l'analyse VoisementAnalyse est maximum (signal entièrement voisé, VoisementAnalyse = VoisementMaximum) la fréquence de voisement utilisée en synthèse est inchangée (VoisementSynthèse = VoisementMaximum). Lui appliquer un facteur multiplicatif serait en effet totalement arbitraire (VoisementAnalyse = VoisementMaximum ne signifie pas une absence de voisement au dessus de VoisementMaxmum). A titre d'exemple VoisementMaximum peut être fixé à 3625 Hz.When the voicing transition frequency resulting from the VoisementAnalyse analysis is maximum (fully voiced signal, VoisementAnalyse = VoisementMaximum) the voicing frequency used in synthesis is unchanged (VoisementSynthèse = VoisementMaximum). Applying a multiplicative factor to it would indeed be completely arbitrary (VoisementAnalyse = VoisementMaximum does not mean an absence of voicing above VoisementMaxmum). For example VoisementMaximum can be set to 3625 Hz.

Le facteur Facteur Voisement est ajustable pour le type de surdité considéré.The Voice Factor factor is adjustable for the type of deafness considered.

Le traitement de l'énergie est effectué à l'étape 6 ₃ Comme précédemment toute transformation définissant une énergie à partir de l'énergie du signal de parole analysé à l'étape 6 ₃ est applicable. Dans l'exemple décrit ci après le procédé selon l'invention applique à l'énergie une fonction de compression à quatre segments linéaires de la manière représentée sur le graphe de la figure 4.The energy processing is carried out in step 6 ₃ As previously any transformation defining an energy from the energy of the speech signal analyzed in step 6 ₃ is applicable. In the example described below, the method according to the invention applies to energy a compression function with four linear segments as shown in the graph of FIG. 4.

L'énergie utilisée en synthèse est donnée par la relation : $EnergieSynthèse[i] = Pente * EnergieAnalyse[i] + EnergieSynthèseSeuil - Pente*EnergieAnalyseSeuil,$ pour i=O à 3, avec $Pente = PenteBasse pour EnergieAnalyse < EnergieAnalyseSeuil ;$ $Pente = PenteHaute pour EnergieAnalyse >= EnergieAnalyseSeuil ;$ et avec les limitations suivantes : $EnergieSynthèse <= EnergieSynthèseMax ;$ $EnergieSynthèse = - Infini pour EnergieAnalyse < EnergieAnalyseMin.$ Les paramètres de traitements EnergieAnalyseMin, EnergieSynthèseMax, PenteBasse, PenteHaute et EnergieSynthèseSeuil sont ajustables pour le type de surdité considéré.The energy used in synthesis is given by the relation: $EnergieSynthèse [i] = Slope * EnergieAnalyse [i] + EnergieSynthèseSeuil - * Slope EnergieAnalyseSeuil,$ for i = O to 3, with $Slope = Low Slope for EnergieAnalyse <EnergieAnalyseSeuil;$ $Slope = High Slope for EnergieAnalyse> = EnergieAnalyseSeuil;$ and with the following limitations: $EnergieSynthèse <= EnergieSynthèseMax;$ $EnergieSynthèse = - Infinite for EnergieAnalyse <EnergieAnalyseMin.$ The EnergieAnalyseMin, EnergieSynthèseMax, SlopeLow, SlopeHigh and EnergieSynthèseSeuil treatment parameters are adjustable for the type of deafness considered.

Le traitement de l'enveloppe spectrale a lieu à l'étape 6 ₄ Dans cette étape toute transformation définissant un spectre S'(ω) à partir du spectre S(ω) analysé à l'étape 5 est applicable.The processing of the spectral envelope takes place in step 6 ₄ In this step any transformation defining a spectrum S '(ω) from the spectrum S (ω) analyzed in step 5 is applicable.

Dans le mode de réalisation de l'invention décrit ci après la transformation élémentaire sur le spectre qui est mise en oeuvre est une compression homothétique de l'échelle des fréquences.In the embodiment of the invention described below, the elementary transformation on the spectrum which is implemented is a homothetic compression of the frequency scale.

L'échelle des fréquences est comprimée d'un facteur FacteurSpectre de sorte que les bandes utiles avant et après le traitement soient respectivement égales à [O..FECH/2] et à [O..FECH/(2*FacteurSpectre)], où FECH est la fréquence d'échantillonnage du système.The frequency scale is compressed by a factor FactorSpectrum so that the useful bands before and after the treatment are respectively equal to [O..FECH / 2] and [O..FECH / (2 * FacteurSpectre)], where FECH is the sampling frequency of the system.

La mise en oeuvre de cette compression homothétique est très simple lorsque le facteur de compression est une valeur entière. Il suffit alors de remplacer z par z^{FacteurSpectre} dans l'expression du filtre tout pôle de synthèse, puis d'appliquer au signal synthétisé un filtrage passe-bas de fréquence de coupure FECH/(2*FacteurSpectre).The implementation of this homothetic compression is very simple when the compression factor is an integer value. It then suffices to replace z by z ^{FacteurSpectre} in the expression of the filter any pole of synthesis, then to apply to the synthesized signal a low-pass filtering of cut-off frequency FECH / (2 * FacteurSpectre).

Une première justification théorique de la validité du procédé décrit ci-dessus consiste à dire que cette opération équivaut à opérer un suréchantillonnage d'un facteur FacteurSpectre de la réponse impulsionnelle du conduit vocal, par insertion de FacteurSpectre-1 échantillons nuls entre chaque échantillons de la réponse impulsionnelle du conduit vocal d'origine puis par filtrage passe-bas du signal synthétisé avec une fréquence de coupure égale à FECHI(2*FacteurSpectre).A first theoretical justification for the validity of the method described above consists in saying that this operation is equivalent to operating an oversampling of a factor FactorSpectrum of the impulse response of the vocal tract, by insertion of FactorSpectrum-1 null samples between each samples of the impulse response from the original vocal tract and then by low-pass filtering of the synthesized signal with a cutoff frequency equal to FECHI (2 * FacteurSpectre).

Une seconde justification théorique consiste à considérer que cette opération équivaut à dupliquer et déplacer les pôles de la fonction de transfert.A second theoretical justification consists in considering that this operation is equivalent to duplicating and moving the poles of the transfer function.

En effet, en considérant que les OLPC pôles uniques notés zi=pi.exp(2iπFi) de la fonction de transfert 1/A(z), les "FacteurSpectre*OLPC pôles" de 1/A(Z^{FacteurSPectre}) sont alors les "FacteurSpectre" racines complexes de chacun des zi. Les pôles conservés par l'opération de filtrage passe-bas sont du type z'i = p_i ^{I/FacteurSpectre} exp(2. i.π.Fi/FacteurSpectre) ce qui montre que leur fréquence de résonance a effectivement subi une compression homothétique d'un facteur "FacteurSpectre".Indeed, considering that the single pole OLPCs noted zi = pi.exp (2iπFi) of the transfer function 1 / A (z), the "SpectrumFactor * pole OLPCs" of 1 / A (Z ^{FactorSPectre} ) are then the " FactorSpectrum "complex roots of each of the zi. The poles conserved by the low-pass filtering operation are of the type z'i = p _i ^{I / FacteurSpectre} exp (2. I.π.Fi / FacteurSpectre) which shows that their resonance frequency has actually undergone homothetic compression a "FactorSpectrum" factor.

Le filtre LPC utilisé en synthèse peut donc s'exprimer sous la forme; OLPC2 = FacteurSpectre * OLPC ;

LpcSynthèse[k] = 0 pour k=I à OLPC2, k non multiple de FacteurSpectre.
LpcSynthèse [FacteurSpectre * k] = LpcAnalyse[k] pour k=I à OLPC ;

The LPC filter used in synthesis can therefore be expressed in the form; OLPC2 = FactorSpectrum * OLPC;

LpcSynthesis [k] = 0 for k = I at OLPC2, k not multiple of FactorSpectrum.
LpcSynthèse [FacteurSpectre * k] = LpcAnalyse [k] for k = I at OLPC;

Il est possible de restreindre le facteur de compression de l'enveloppe spectrale à être un entier compris entre 1 et 4 tel que : $1 < FacteurSpectre < 4.$ It is possible to restrict the compression factor of the spectral envelope to be an integer between 1 and 4 such that: $1 <SpectrumFactor <4.$

La parole restituée à l'étape 7 peut encore être accélérée ou ralentie par simple modification de la durée de l'intervalle de temps pris en compte pour la phase de synthèse.The speech restored in step 7 can be further accelerated or slowed down by simple modification of the duration of the time interval taken into account for the synthesis phase.

En pratique, cette opération peut avoir lieu en implémentant une procédure de transformation homothétique définie par la relation : $Tsynthèse = Tanalyse * FacteurTemps$ In practice, this operation can take place by implementing a homothetic transformation procedure defined by the relation: $Tsynthesis = Tanalyse * TimeFactor$

Si FacteurTemps > 1, alors il s'agit d'un ralentissement de la parole. Si FacteurTemps < 1, alors il s'agit d'une accélération de la parole.If TimeFactor> 1, then it is a speech slowdown. If TimeFactor <1, then it is an acceleration of speech.

En plus des traitements précédents un certain nombre de post-traitements peuvent être envisagés consistant par exemple à effectuer un filtrage passe bande et une égalisation linéaire du signal synthétisé, ou encore un multiplexage du son sur les deux oreilles.In addition to the preceding processing operations, a certain number of post-processing operations can be envisaged, consisting for example of performing bandpass filtering and linear equalization of the synthesized signal, or alternatively a multiplexing of the sound on the two ears.

L'opération d'égalisation linéaire à pour objectif de compenser l'audiogramme du patient en amplifiant ou en atténuant certaines bandes de fréquences. Dans le cadre du prototype, le gain à 7 fréquences (0, 125, 250, 500, 1000, 2000 et 4000 Hz) peut être ajusté au cours du temps entre -80 et +10 dB selon les besoins du patient ou les spécificités de son audiogramme. Cette opération peut être réalisée par exemple par filtrage par une transformée de Fourier rapide (FFT) de la manière décrite par exemple dans le livre de M.D Elliott ayant pour titre "Handbook of Digital signal processing" publié en 1987 chez Academic Press.The objective of the linear equalization operation is to compensate for the patient's audiogram by amplifying or attenuating certain frequency bands. As part of the prototype, the gain at 7 frequencies (0, 125, 250, 500, 1000, 2000 and 4000 Hz) can be adjusted over time between -80 and +10 dB according to the needs of the patient or the specifics of his audiogram. This operation can be carried out for example by filtering by a fast Fourier transform (FFT) in the manner described for example in the book by M.D Elliott having for title "Handbook of Digital signal processing" published in 1987 at Academic Press.

L'opération de multiplexage permet une restitution monophonique (par exemple un signal traité seul) ou stéréophonique (par exemple un signal traité sur une voie et un signal non traité sur une autre). La restitution stéréophonique permet au malentendant d'adapter le traitement pour chacune de ses oreilles (deux égaliseurs linéaires pour compenser deux audiogrammes différents par exemple), et de conserver éventuellement intact sur une oreille une forme de signal auquel il est habitué et sur laquelle il peut s'appuyer pour, par exemple, se synchroniser.The multiplexing operation allows monophonic (for example a signal processed alone) or stereophonic (for example a signal processed on one channel and an unprocessed signal on another) reproduction. The stereophonic reproduction allows the hearing impaired to adapt the treatment for each of his ears (two linear equalizers to compensate for two different audiograms for example), and possibly to keep intact on one ear a form of signal to which he is accustomed and on which he can lean on, for example, to synchronize.

Le dispositif pour la mise en oeuvre du procédé selon l'invention qui est représenté à la figure 5 comporte une première voie composée d'un dispositif d'analyse 8, d'un dispositif de synthèse 9 et d'un premier égaliseur 10 et d'une deuxième voie comportant un deuxième égaliseur 11, l'ensemble des deux voies étant couplé entre un dispositif de prise de son 13 et une paire d'écouteurs 12_a, 12_b .Les dispositifs d'analyse 8 et de synthèse 9 peuvent être mis en oeuvre en empruntant les techniques connues de réalisation des vocodeurs et notamment celle des vocodeurs HSX précitée. Les sorties des égaliseurs des deux voies sont multiplexées par un multiplexeur 14 pour permettre une restitution du son monophonique ou stéréophonique. Un dispositif de traitement 15 formé par un microprocesseur ou tout dispositif équivalent est couplé au dispositif de synthèse 9 pour assurer la modification des paramètres fournis par le dispositif d'analyse 8.The device for implementing the method according to the invention which is shown in FIG. 5 comprises a first channel composed of an analysis device 8, a synthesis device 9 and a first equalizer 10 and a second channel comprising a second equalizer 11, the assembly of the two channels being coupled between a sound pickup device 13 and a pair of headphones 12 _a , 12 _b . The analysis 8 and synthesis devices 9 can be implemented using the known techniques for producing vocoders and in particular that of the HSX vocoders mentioned above. The outputs of the equalizers of the two channels are multiplexed by a multiplexer 14 to allow reproduction of the monophonic or stereophonic sound. A processing device 15 formed by a microprocessor or any equivalent device is coupled to the synthesis device 9 to ensure the modification of the parameters supplied by the analysis device 8.

Un dispositif de prétraitement 16 interposé entre le dispositif de prise de son 13 et chacune des deux voies assure le débruitage et la conversion du signal de parole en échantillons numériques Les échantillons numériques débruités sont appliqués respectivement à l'entrée de l'égaliseur 11 et à l'entrée du dispositif d'analyse 8.A preprocessing device 16 interposed between the sound pickup device 13 and each of the two channels ensures the denoising and the conversion of the speech signal into digital samples. The denoised digital samples are applied respectively to the input of the equalizer 11 and to the input of the analysis device 8.

Suivant d'autres modes de réalisation du dispositif selon l'invention, le dispositif de traitement 15 peut être intégré au dispositif de synthèse 9, comme il est aussi possible d'intégrer l'ensemble des traitements d'analyse et de synthèse dans un même logiciel exécutable sur un ordinateur personnel ou sur un répondeur téléphonique par exemple.According to other embodiments of the device according to the invention, the processing device 15 can be integrated into the synthesis device 9, as it is also possible to integrate all of the analysis and synthesis treatments in the same software executable on a personal computer or on an answering machine for example.

Claims

Method for the auditory correction of hearing impaired individuals, of the type consisting in analysing a speech signal (5) so as to extract therefrom parameters characterizing the pitch, the voicing, the energy and the spectrum of the speech signal, in modifying (6) the parameters so as to render the speech intelligible to a hearing impaired individual, and in synthesizing (7) a speech signal perceptible to the hearing impaired individual on the basis of the parameters modified in the following manner:
• the parameter characterizing the pitch is modified (61) by applying a multiplier factor to the value of the pitch extracted,

• the parameter characterizing the energy is modified (63) by a compression function,

• the parameter characterizing the spectral envelope is modified (64) by a homothetic compression of the frequency scale
characterized in that
• the parameter characterizing the voicing is given in the form of a frequency of transition between a voiced band and an unvoiced high band, which is modified (62) by a multiplier factor so as to create a modified high band on which pseudo-random white noise is generated during the synthesis phase; and in that

• the duration of the time interval taken into account for the synthesis phase is modified (7) by multiplying the time interval by a time factor.
Device for the implementation of the method according to Claim 1, characterized in that it comprises an analysis device (8) for extracting the parameters of the speech signal coupled to a synthesis device (9) as well as a processing device (14) coupled to the synthesis device (9) devised so as to modify the parameters provided by the analysis device (8) according to the characteristics of Claim 1 and to apply the modified parameters to the synthesis device (9) so as to reconstruct a speech signal with the modified parameters.
Device according to Claim 2, characterized in that the analysis device (8) and the synthesis device (9) comprise a linear prediction vocoder analysis and synthesis device.