EP4287648A1

EP4287648A1 - Electronic device and associated processing method, acoustic apparatus and computer program

Info

Publication number: EP4287648A1
Application number: EP23175647.9A
Authority: EP
Inventors: Arthur Henri LACROIX; Clément Jean-Baptiste Albert; Mathieu Clément Nicolas DEXHEIMER; Thierry Pierre François GAIFFE
Original assignee: Elno SAS
Current assignee: Elno SAS
Priority date: 2022-05-30
Filing date: 2023-05-26
Publication date: 2023-12-06
Also published as: US20230388704A1; FR3136096A1; KR20230166920A

Abstract

Ce dispositif électronique de traitement (20) pour un appareil acoustique (10) comportant un premier microphone (12) aérien et un deuxième microphone (14) ostéophonique, est configuré pour être connecté aux premier et deuxième microphones (12,14), pour recevoir en entrée des premier, et respectivement deuxième, signaux analogiques issus des premier, et respectivement deuxième, microphones (12,14) et pour délivrer en sortie un signal corrigé.Le dispositif de traitement (20) comprend :- un module d'hybridation (30) configuré pour calculer un signal hybride à partir des premier et deuxième signaux analogiques ;- un module d'estimation (32) configuré pour estimer un bruit dans le signal hybride ;- un module de réduction de bruit (34) configuré pour calculer le signal corrigé en appliquant un algorithme de soustraction spectrale généralisée au signal hybride et en fonction du bruit estimé.This electronic processing device (20) for an acoustic device (10) comprising a first aerial microphone (12) and a second osteophonic microphone (14), is configured to be connected to the first and second microphones (12,14), to receive as input to the first, and respectively second, analog signals coming from the first, and respectively second, microphones (12,14) and to deliver a corrected signal at the output. The processing device (20) comprises: - a hybridization module ( 30) configured to calculate a hybrid signal from the first and second analog signals; - an estimation module (32) configured to estimate noise in the hybrid signal; - a noise reduction module (34) configured to calculate the signal corrected by applying a generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise.

Description

La présente invention concerne un dispositif électronique de traitement pour un appareil acoustique.The present invention relates to an electronic processing device for an acoustic apparatus.

L'invention concerne également un appareil acoustique comprenant un premier microphone comportant un transducteur électroacoustique apte à recevoir des ondes sonores acoustiques d'un signal sonore issu de cordes vocales d'un utilisateur et à transformer lesdites ondes acoustiques en un premier signal analogique ; un deuxième microphone comportant un transducteur à excitation mécanique osseuse apte à recevoir par conduction osseuse des oscillations vibratoires dudit signal sonore et à transformer lesdites oscillations vibratoires en un deuxième signal analogique ; et un tel dispositif électronique de traitement connecté aux premier et deuxième microphones, le dispositif de traitement étant configuré pour recevoir en entrée les premier et deuxième signaux analogiques, puis pour délivrer en sortie un signal corrigé.The invention also relates to an acoustic apparatus comprising a first microphone comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from a user's vocal cords and of transforming said acoustic waves into a first analog signal; a second microphone comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal; and such an electronic processing device connected to the first and second microphones, the processing device being configured to receive the first and second analog signals as input, then to output a corrected signal.

Le dispositif électronique de traitement comprend un module d'hybridation configuré pour calculer un signal hybride à partir des premier et deuxième signaux analogiques.The electronic processing device includes a hybridization module configured to calculate a hybrid signal from the first and second analog signals.

L'invention concerne aussi un procédé de traitement mis en oeuvre par un tel dispositif électronique de traitement ; ainsi qu'un programme d'ordinateur comportant des instructions logicielles qui, lorsqu'elles sont exécutées par un ordinateur, mettent en oeuvre un tel procédé de traitement.The invention also relates to a processing method implemented by such an electronic processing device; as well as a computer program comprising software instructions which, when executed by a computer, implement such a processing method.

On connaît du document FR 3 019 422 B1 un appareil acoustique du type précité. L'appareil acoustique comprend le premier microphone avec un tel transducteur électroacoustique, également appelé transducteur aérien ; le deuxième microphone avec un tel transducteur à excitation mécanique osseuse, également appelé transducteur solidien ; des moyens de calcul d'un signal électrique corrigé en fonction du premier signal électrique et du deuxième signal électrique, le signal électrique corrigé étant propre à être délivré en sortie de l'appareil acoustique ; et un dispositif de réduction du bruit connecté en sortie du transducteur électroacoustique pour réduire le bruit dans le premier signal électrique ; les moyens de calcul étant connectés, d'une part, en sortie du dispositif de réduction du bruit, et d'autre part, en sortie du transducteur à excitation mécanique osseuse.We know of the document FR 3 019 422 B1 an acoustic device of the aforementioned type. The acoustic apparatus includes the first microphone with such an electroacoustic transducer, also called an aerial transducer; the second microphone with such a bone mechanical excitation transducer, also called solid-body transducer; means for calculating a corrected electrical signal as a function of the first electrical signal and the second electrical signal, the corrected electrical signal being suitable for being delivered at the output of the acoustic device; and a noise reduction device connected to the output of the electroacoustic transducer to reduce noise in the first electrical signal; the calculation means being connected, on the one hand, to the output of the noise reduction device, and on the other hand, to the output of the mechanical bone excitation transducer.

Toutefois, avec un tel appareil acoustique, la réduction de bruit n'est pas toujours optimale, et il subsiste parfois du bruit de fond relativement élevé dans le signal délivré en sortie de l'appareil acoustique.However, with such an acoustic device, the noise reduction is not always optimal, and there is sometimes relatively high background noise in the signal delivered at the output of the acoustic device.

Le but de l'invention est alors de proposer un dispositif électronique de traitement, et un procédé de traitement associé, permettant d'améliorer encore la réduction du bruit dans le signal délivré en sortie de l'appareil acoustique, c'est-à-dire de réduire la présence de bruit dans ledit signal.The aim of the invention is then to propose an electronic processing device, and an associated processing method, making it possible to further improve the reduction of noise in the signal delivered at the output of the acoustic device, that is to say say to reduce the presence of noise in said signal.

A cet effet, l'invention a pour objet un dispositif électronique de traitement pour un appareil acoustique,

l'appareil acoustique comprenant un premier microphone comportant un transducteur électroacoustique apte à recevoir des ondes sonores acoustiques d'un signal sonore issu de cordes vocales d'un utilisateur et à transformer lesdites ondes acoustiques en un premier signal analogique ; et un deuxième microphone comportant un transducteur à excitation mécanique osseuse apte à recevoir par conduction osseuse des oscillations vibratoires dudit signal sonore et à transformer lesdites oscillations vibratoires en un deuxième signal analogique,
le dispositif électronique de traitement étant configuré pour être connecté aux premier et deuxième microphones, pour recevoir en entrée les premier et deuxième signaux analogiques et pour délivrer en sortie un signal corrigé,
le dispositif électronique de traitement comprenant :
- un module d'hybridation configuré pour calculer un signal hybride à partir des premier et deuxième signaux analogiques ;
- un module d'estimation connecté au module d'hybridation et configuré pour estimer un bruit dans le signal hybride ; et
- un module de réduction de bruit connecté au module d'hybridation et au module d'estimation, le module de réduction de bruit étant configuré pour calculer le signal corrigé en appliquant un algorithme de soustraction spectrale généralisée au signal hybride et en fonction du bruit estimé.

To this end, the subject of the invention is an electronic processing device for an acoustic device,

the acoustic apparatus comprising a first microphone comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from a user's vocal cords and of transforming said acoustic waves into a first analog signal; and a second microphone comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal,
the electronic processing device being configured to be connected to the first and second microphones, to receive the first and second analog signals as input and to output a corrected signal,
the electronic processing device comprising:
- a hybridization module configured to calculate a hybrid signal from the first and second analog signals;
- an estimation module connected to the hybridization module and configured to estimate noise in the hybrid signal; And
- a noise reduction module connected to the hybridization module and to the estimation module, the noise reduction module being configured to calculate the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise.

Avec le dispositif électronique de traitement selon l'invention, le fait d'estimer le bruit dans le signal hybride calculé à partir des premier et deuxième signaux analogiques, c'est-à-dire dans le signal hybride obtenu à partir des signaux issus d'une part du transducteur électroacoustique, ou aérien, et d'autre part du transducteur à excitation mécanique osseuse, également appelé transducteur ostéophonique, ou encore solidien, permet d'avoir une estimation plus précise du bruit, puis ensuite d'obtenir - via le module de réduction du bruit - un meilleur signal corrigé en appliquant l'algorithme de soustraction spectrale généralisée au signal de hybride et en fonction du bruit ainsi estimé.With the electronic processing device according to the invention, the fact of estimating the noise in the hybrid signal calculated from the first and second analog signals, that is to say in the hybrid signal obtained from the signals resulting from 'on the one hand the electroacoustic, or aerial, transducer, and on the other hand the transducer with mechanical bone excitation, also called osteophonic, or even solidian, transducer, makes it possible to have a more precise estimate of the noise, then to obtain - via the noise reduction module - a better signal corrected by applying the generalized spectral subtraction algorithm to the hybrid signal and based on the noise thus estimated.

De préférence, le signal hybride comporte plusieurs tronçons successifs, chaque tronçon correspondant au signal hybride au cours d'une période temporelle, et le dispositif de traitement comporte en outre un module de détection d'activité vocale apte à déterminer si chaque tronçon du signal hybride comporte une présence de voix ou non, le module d'estimation étant alors configuré pour estimer le bruit dans le signal hybride seulement à partir de chaque tronçon sans voix.Preferably, the hybrid signal comprises several successive sections, each section corresponding to the hybrid signal during a time period, and the device processing further comprises a voice activity detection module capable of determining whether each section of the hybrid signal includes a presence of voice or not, the estimation module then being configured to estimate the noise in the hybrid signal only from every voiceless stretch.

La présence ou l'absence de voix est de préférence encore déterminée à partir du deuxième signal issu du transducteur ostéophonique, la présence ou l'absence de voix étant mieux détectable dans un signal provenant d'un microphone ostéophonique, plutôt que dans un signal provenant d'un microphone aérien.The presence or absence of voice is preferably still determined from the second signal from the osteophonic transducer, the presence or absence of voice being better detectable in a signal coming from an osteophonic microphone, rather than in a signal coming from from an overhead microphone.

Suivant d'autres aspects avantageux de l'invention, le dispositif électronique de traitement comprend une ou plusieurs des caractéristiques suivantes, prises isolément ou suivant toutes les combinaisons techniquement possibles :

le signal hybride comporte plusieurs tronçons successifs, et le dispositif comprend en outre un module de détection d'activité vocale connecté au module d'hybridation et configuré pour déterminer une présence de voix ou une absence de voix dans chaque tronçon du signal hybride ; le module d'estimation étant alors configuré pour estimer le bruit dans le signal hybride en fonction de chaque tronçon avec une absence déterminée de voix ;
le module de détection d'activité vocale est configuré pour déterminer la présence de voix ou l'absence de voix à partir du deuxième signal issu du transducteur à excitation mécanique osseuse ;
le module de détection d'activité vocale étant de préférence configuré pour déterminer la présence de voix ou l'absence de voix uniquement à partir du deuxième signal, sans prise en compte du premier signal ;
le deuxième signal comporte plusieurs tronçons successifs, et le module de détection d'activité vocale est configuré pour calculer une valeur RMS pour chaque tronçon du deuxième signal, puis pour déterminer la présence de voix ou l'absence de voix en fonction de valeur(s) RMS respective(s) ;
le module de détection d'activité vocale est configuré pour déterminer la présence de voix ou l'absence de voix en fonction d'une valeur moyenne de M dernière(s) valeur(s) RMS calculée(s) et/ou d'une variation de valeur RMS entre une valeur RMS courante et une valeur RMS précédente, M étant un nombre entier supérieur ou égal à 1 ;
le module de détection d'activité vocale étant de préférence configuré pour déterminer la présence de voix si ladite valeur moyenne est supérieure ou égale à un seuil prédéfini de moyenne ou si ladite variation de valeur RMS est supérieure ou égale à un seuil prédéfini de variation ;
le module d'hybridation est configuré pour convertir le premier signal analogique en un premier signal numérique, au fur et à mesure de la réception du premier signal analogique, et pour générer des premiers tronçons successifs à partir du premier signal numérique, chaque nouveau premier tronçon généré comportant des échantillons d'un premier tronçon précédent et de nouveaux échantillons du premier signal numérique ; et
- le module d'hybridation est configuré pour convertir le deuxième signal analogique en un deuxième signal numérique, au fur et à mesure de la réception du deuxième signal analogique, et pour générer des deuxièmes tronçons successifs à partir du deuxième signal numérique, chaque nouveau deuxième tronçon généré comportant des échantillons d'un deuxième tronçon précédent et de nouveaux échantillons du deuxième signal numérique ;
- des tronçons hybrides du signal hybride étant alors calculés au fur et à mesure à partir des premiers et deuxièmes tronçons générés ; le signal corrigé étant ensuite calculé à partir desdits tronçons hybrides ;
le module d'hybridation est configuré pour obtenir un premier signal filtré en appliquant au premier signal un premier filtre associé à une première plage de fréquences ; pour obtenir un deuxième signal filtré en appliquant au deuxième signal un deuxième filtre associé à une deuxième plage de fréquences ; puis pour calculer le signal hybride en sommant le premier signal filtré et le deuxième signal filtré, la deuxième plage de fréquences étant distincte de la première plage de fréquences ;
- la première plage de fréquences comportant de préférence des fréquences supérieures à celles de la deuxième plage de fréquences ;
- les première et deuxième plages de fréquences étant de préférence encore disjointes.

According to other advantageous aspects of the invention, the electronic processing device comprises one or more of the following characteristics, taken in isolation or in all technically possible combinations:

the hybrid signal comprises several successive sections, and the device further comprises a voice activity detection module connected to the hybridization module and configured to determine a presence of voice or an absence of voice in each section of the hybrid signal; the estimation module then being configured to estimate the noise in the hybrid signal as a function of each section with a determined absence of voice;
the voice activity detection module is configured to determine the presence of voice or the absence of voice from the second signal from the bone mechanical excitation transducer;
the voice activity detection module being preferably configured to determine the presence of voice or the absence of voice solely from the second signal, without taking into account the first signal;
the second signal comprises several successive sections, and the voice activity detection module is configured to calculate an RMS value for each section of the second signal, then to determine the presence of voice or the absence of voice as a function of value(s ) respective RMS(s);
the voice activity detection module is configured to determine the presence of voices or the absence of voices based on an average value of M last calculated RMS value(s) and/or a variation in RMS value between a current RMS value and a previous RMS value, M being an integer greater than or equal to 1;
the voice activity detection module being preferably configured to determine the presence of voices if said average value is greater than or equal to a predefined average threshold or if said RMS value variation is greater than or equal to a predefined variation threshold;
the hybridization module is configured to convert the first analog signal into a first digital signal, as the first analog signal is received, and to generate successive first sections from the first digital signal, each new first section generated comprising samples of a first previous section and new samples of the first digital signal; And
- the hybridization module is configured to convert the second analog signal into a second digital signal, as the second analog signal is received, and to generate successive second sections from the second digital signal, each new second section generated comprising samples of a second previous section and new samples of the second digital signal;
- hybrid sections of the hybrid signal then being calculated progressively from the first and second sections generated; the corrected signal then being calculated from said hybrid sections;
the hybridization module is configured to obtain a first filtered signal by applying to the first signal a first filter associated with a first frequency range; to obtain a second filtered signal by applying to the second signal a second filter associated with a second frequency range; then to calculate the hybrid signal by summing the first filtered signal and the second filtered signal, the second frequency range being distinct from the first frequency range;
- the first frequency range preferably comprising frequencies higher than those of the second frequency range;
- the first and second frequency ranges being preferably still disjoint.

L'invention concerne également un appareil acoustique comprenant :

un premier microphone comportant un transducteur électroacoustique apte à recevoir des ondes sonores acoustiques d'un signal sonore issu de cordes vocales d'un utilisateur et à transformer lesdites ondes acoustiques en un premier signal analogique ;
un deuxième microphone comportant un transducteur à excitation mécanique osseuse apte à recevoir par conduction osseuse des oscillations vibratoires dudit signal sonore et à transformer lesdites oscillations vibratoires en un deuxième signal analogique ;
un dispositif électronique de traitement connecté aux premier et deuxième microphones, le dispositif électronique de traitement étant configuré pour recevoir en entrée les premier et deuxième signaux analogiques, puis pour délivrer en sortie un signal corrigé ; le dispositif électronique de traitement étant tel que défini ci-dessus.

The invention also relates to an acoustic device comprising:

a first microphone comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from the vocal cords of a user and of transforming said acoustic waves into a first analog signal;
a second microphone comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal;
an electronic processing device connected to the first and second microphones, the electronic processing device being configured to receive the first and second analog signals as input, then to output a corrected signal; the electronic processing device being as defined above.

Suivant un autre aspect avantageux de l'invention, l'appareil acoustique comprend en outre deux modules acoustiques latéraux en appui sur les flancs latéraux du crâne et propres à transmettre un signal sonore au nerf auditif.According to another advantageous aspect of the invention, the acoustic device further comprises two lateral acoustic modules resting on the lateral sides of the skull and capable of transmitting a sound signal to the auditory nerve.

L'invention concerne aussi un équipement de tête pour opérateur comprenant un casque de protection, et un appareil acoustique tel que défini ci-dessus.The invention also relates to head equipment for an operator comprising a protective helmet, and an acoustic device as defined above.

L'invention a également pour objet un procédé de traitement, le procédé étant mis en oeuvre par un dispositif électronique de traitement connecté à des premier et deuxième microphones, le premier microphone comportant un transducteur électroacoustique apte à recevoir des ondes sonores acoustiques d'un signal sonore issu de cordes vocales d'un utilisateur et à transformer lesdites ondes acoustiques en un premier signal analogique ; et le deuxième microphone comportant un transducteur à excitation mécanique osseuse apte à recevoir par conduction osseuse des oscillations vibratoires dudit signal sonore et à transformer lesdites oscillations vibratoires en un deuxième signal analogique, le dispositif électronique de traitement étant configuré pour recevoir en entrée les premier et deuxième signaux analogiques et pour délivrer en sortie un signal corrigé,
le procédé de traitement comprenant :

une étape d'hybridation comportant le calcul d'un signal hybride à partir des premier et deuxième signaux analogiques ;
une étape d'estimation d'un bruit dans le signal hybride ; et
une étape de réduction de bruit comportant le calcul du signal corrigé en appliquant un algorithme de soustraction spectrale généralisée au signal hybride et en fonction du bruit estimé.

The invention also relates to a processing method, the method being implemented by an electronic processing device connected to first and second microphones, the first microphone comprising an electroacoustic transducer capable of receiving acoustic sound waves of a signal sound coming from a user's vocal cords and transforming said acoustic waves into a first analog signal; and the second microphone comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal, the electronic processing device being configured to receive as input the first and second analog signals and to output a corrected signal,
the treatment process comprising:

a hybridization step comprising the calculation of a hybrid signal from the first and second analog signals;
a step of estimating noise in the hybrid signal; And
a noise reduction step comprising the calculation of the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise.

L'invention concerne également un programme d'ordinateur comportant des instructions logicielles qui, lorsqu'elles sont exécutées par un ordinateur, mettent en oeuvre un procédé de traitement tel que défini ci-dessus.The invention also relates to a computer program comprising software instructions which, when executed by a computer, implement a processing method as defined above.

Ces caractéristiques et avantages de l'invention apparaîtront plus clairement à la lecture de la description qui va suivre, donnée uniquement à titre d'exemple non limitatif, et faite en référence aux dessins annexés, sur lesquels :

la figure 1 est une vue d'ensemble en perspective d'un appareil acoustique selon l'invention, l'appareil acoustique comprenant un premier microphone aérien, un deuxième microphone ostéophonique, et un dispositif électronique de traitement à délivrer un signal électrique corrigé à partir des signaux électriques issus des premier et deuxième microphones ;
la figure 2 est une représentation schématique sous forme d'un synoptique du dispositif de traitement de la figure 1, connecté au premier microphone aérien et au deuxième microphone ostéophonique ;
la figure 3 est une représentation schématique d'une génération de tronçons chevauchés, effectuée par le dispositif de traitement de la figure 1 ;
la figure 4 est un organigramme d'un procédé de traitement selon l'invention, le procédé étant mis en oeuvre par le dispositif de traitement de la figure 1 ;
la figure 5 est une vue représentant, en partie supérieure, un signal de voix bruité enregistré par un microphone aérien de l'état de la technique ; et en partie inférieure, un signal hybride obtenu avec les premier et deuxième microphones, et après réduction de bruit via le dispositif de traitement de la figure 1 ;
la figure 6 est une vue avec plusieurs courbes illustrant une détection d'activité vocale de l'état de la technique, via un microphone aérien et pour un seuil de détection bas ;
la figure 7 est une vue analogue à celle de la figure 6, pour un seuil de détection plus élevé ; et
la figure 8 est une vue analogue à celles des figures 6 et 7, illustrant une détection d'activité vocale selon l'invention, via un microphone ostéophonique.

These characteristics and advantages of the invention will appear more clearly on reading the description which follows, given solely by way of non-limiting example, and made with reference to the appended drawings, in which:

there figure 1 is an overall perspective view of an acoustic device according to the invention, the acoustic device comprising a first aerial microphone, a second osteophonic microphone, and an electronic processing device for delivering an electrical signal corrected from the electrical signals from the first and second microphones;
there figure 2 is a schematic representation in the form of a synoptic of the device for processing the figure 1 , connected to the first aerial microphone and the second osteophonic microphone;
there Figure 3 is a schematic representation of a generation of overlapped sections, carried out by the processing device of the figure 1 ;
there figure 4 is a flowchart of a processing method according to the invention, the method being implemented by the processing device of the figure 1 ;
there figure 5 is a view representing, in the upper part, a noisy voice signal recorded by an aerial microphone of the state of the art; and in the lower part, a hybrid signal obtained with the first and second microphones, and after noise reduction via the noise processing device. figure 1 ;
there Figure 6 is a view with several curves illustrating voice activity detection of the state of the art, via an aerial microphone and for a low detection threshold;
there Figure 7 is a view analogous to that of the Figure 6 , for a higher detection threshold; And
there figure 8 is a view analogous to those of figures 6 And 7 , illustrating detection of vocal activity according to the invention, via an osteophonic microphone.

Dans la suite de la description, l'expression « sensiblement égal(e) à » définit une relation d'égalité à plus ou moins 20%, de préférence encore à plus ou moins 10%, de préférence encore à plus ou moins 5%.In the remainder of the description, the expression “substantially equal to” defines a relationship of equality of plus or minus 20%, more preferably more or less 10%, more preferably more or less 5%. .

Sur la figure 1, un appareil acoustique 10 comprend un premier microphone 12, également appelé microphone aérien, apte à recevoir des ondes sonores acoustiques et à les transformer en un premier signal électrique, tel qu'un premier signal analogique, et un deuxième microphone 14, également appelé microphone ostéophonique ou encore microphone solidien, apte à recevoir par conduction osseuse des oscillations vibratoires et à les transformer en un deuxième signal électrique, tel qu'un deuxième signal analogique.On the figure 1 , an acoustic device 10 comprises a first microphone 12, also called an aerial microphone, capable of receiving acoustic sound waves and transforming them into a first electrical signal, such as a first analog signal, and a second microphone 14, also called microphone osteophonic or even solid-body microphone, capable of receiving vibratory oscillations by bone conduction and transforming them into a second electrical signal, such as a second analog signal.

L'appareil acoustique 10 comprend un boîtier de protection 18 et un dispositif de traitement 20 disposé à l'intérieur du boîtier de protection 18, le dispositif de traitement 20 étant connecté au premier microphone 12 et au deuxième microphone 14, et configuré pour recevoir en entrée les premier et deuxième signaux analogiques et délivrer en sortie un signal corrigé dans lequel le bruit a été réduit.The acoustic apparatus 10 comprises a protective housing 18 and a processing device 20 disposed within the protective housing 18, the processing device 20 being connected to the first microphone 12 and the second microphone 14, and configured to receive in input the first and second analog signals and output a corrected signal in which the noise has been reduced.

En complément, l'appareil acoustique 10 comprend également deux modules acoustiques 22 latéraux, un arceau supérieur 24, un arceau arrière 26 de liaison des modules acoustiques et un câble de connexion 27, le câble de connexion 27 étant équipé à son extrémité d'un connecteur, non représenté. Les modules acoustiques latéraux 22, l'arceau supérieur 24, l'arceau arrière 26 et le câble de connexion 27 sont connus en soi, par exemple du document FR 3 019 422 B1 .In addition, the acoustic device 10 also comprises two side acoustic modules 22, an upper arch 24, a rear arch 26 for connecting the acoustic modules and a connection cable 27, the connection cable 27 being equipped at its end with a connector, not shown. The side acoustic modules 22, the upper arch 24, the rear arch 26 and the connection cable 27 are known per se, for example from the document FR 3 019 422 B1 .

Le premier microphone 12 est connu, par exemple du document FR 3 019 422 B1 , et comporte un transducteur électroacoustique, non représenté, apte à recevoir des ondes sonores acoustiques d'un signal sonore issu des cordes vocales et à transformer lesdites ondes acoustiques en le premier signal électrique. Le premier microphone 12 est connecté en entrée du dispositif de traitement 20.The first microphone 12 is known, for example from the document FR 3 019 422 B1 , and comprises an electroacoustic transducer, not shown, capable of receiving waves acoustic sound waves of a sound signal coming from the vocal cords and transforming said acoustic waves into the first electrical signal. The first microphone 12 is connected to the input of the processing device 20.

Le deuxième microphone 14 est également connu, par exemple du document FR 3 019 422 B1 , et comporte un transducteur à excitation mécanique osseuse, non représenté, apte à recevoir par conduction osseuse, notamment à travers un os correspondant du crâne, les ondes vibratoires du signal sonore issu des cordes vocales de l'utilisateur et à le transformer en le deuxième signal électrique. Le transducteur à excitation mécanique osseuse est également appelé transducteur ostéophonique, ou encore transducteur solidien. Le deuxième microphone 14 est aussi connecté en entrée du dispositif de traitement 20.The second microphone 14 is also known, for example from the document FR 3 019 422 B1 , and comprises a transducer with mechanical bone excitation, not shown, capable of receiving by bone conduction, in particular through a corresponding bone of the skull, the vibration waves of the sound signal coming from the vocal cords of the user and of transforming it into the second electrical signal. The bone mechanical excitation transducer is also called osteophonic transducer, or solid-body transducer. The second microphone 14 is also connected to the input of the processing device 20.

Dans l'exemple de la figure 1, le premier microphone 12 et le deuxième microphone 14 ne sont pas disposés dans le boîtier de protection 18, mais sont disposés dans un boîtier additionnel 28, le boîtier additionnel 28 étant relié à l'un des deux modules acoustique 22 par deux bras de liaison 29. Le transducteur électroacoustique et le transducteur à excitation mécanique osseuse sont alors chacun disposés dans le boîtier additionnel 28. Ce boîtier additionnel 28 est de préférence destiné à être appliqué au contact du côté droit du crâne de l'utilisateur, et est alors de préférence relié au module acoustique 22 droit.In the example of the figure 1 , the first microphone 12 and the second microphone 14 are not arranged in the protective housing 18, but are arranged in an additional housing 28, the additional housing 28 being connected to one of the two acoustic modules 22 by two connecting arms 29. The electroacoustic transducer and the mechanical bone excitation transducer are then each arranged in the additional housing 28. This additional housing 28 is preferably intended to be applied in contact with the right side of the user's skull, and is then preferably connected to the right acoustic module 22.

En variante, comme illustré dans l'exemple de la figure 13 du document FR 3 019 422 B1 , le deuxième microphone 14 n'est pas disposé dans le boîtier de protection 18, mais est disposé dans un autre boîtier additionnel, l'autre boîtier additionnel étant relié à l'un des deux modules acoustique 22 par deux bras de liaison. Le transducteur à excitation mécanique osseuse du deuxième microphone est alors disposé dans l'autre boîtier additionnel. Cet autre boîtier additionnel est de préférence destiné à être appliqué au contact du côté droit du crâne de l'utilisateur, et est alors de préférence relié au module acoustique 22 droit.Alternatively, as illustrated in the example of Figure 13 of the document FR 3 019 422 B1 , the second microphone 14 is not placed in the protective housing 18, but is arranged in another additional housing, the other additional housing being connected to one of the two acoustic modules 22 by two connecting arms. The bone mechanical excitation transducer of the second microphone is then placed in the other additional housing. This other additional housing is preferably intended to be applied in contact with the right side of the user's skull, and is then preferably connected to the right acoustic module 22.

En variante encore, comme illustré dans l'exemple de la figure 1 du document FR 3 019 422 B1 , le premier microphone 12 comporte une protubérance, par exemple venue de matière avec le boîtier de protection 18. Selon cette variante, le deuxième microphone 14, en particulier son transducteur à excitation mécanique osseuse, est disposé à l'intérieur du boîtier de protection 18.As a further variation, as illustrated in the example of the figure 1 of the document FR 3 019 422 B1 , the first microphone 12 comprises a protuberance, for example made integrally with the protective housing 18. According to this variant, the second microphone 14, in particular its bone mechanical excitation transducer, is arranged inside the protective housing 18 .

Le dispositif électronique de traitement 20 comprend un module d'hybridation 30 connecté au premier microphone 12 et au deuxième microphone 14 ; un module d'estimation 32 connecté au module d'hybridation 30 ; et un module de réduction de bruit 34 connecté au module d'hybridation 30 et au module d'estimation 32, comme représenté sur la figure 2.The electronic processing device 20 comprises a hybridization module 30 connected to the first microphone 12 and the second microphone 14; an estimation module 32 connected to the hybridization module 30; and a noise reduction module 34 connected to the hybridization module 30 and to the estimation module 32, as shown in the figure 2 .

En complément facultatif, le dispositif électronique de traitement 20 comprend en outre un module de détection d'activité vocale 36 connecté au module d'hybridation 30.As an optional complement, the electronic processing device 20 further comprises a voice activity detection module 36 connected to the hybridization module 30.

Dans l'exemple de la figure 1, le dispositif électronique de traitement 20 comprend une unité de traitement d'informations 40 formée par exemple d'une mémoire 42 et d'un processeur 44 associé à la mémoire 42.In the example of the figure 1 , the electronic processing device 20 comprises an information processing unit 40 formed for example by a memory 42 and a processor 44 associated with the memory 42.

Dans l'exemple de la figure 1, le module d'hybridation 30, le module d'estimation 32, le module de réduction de bruit 34, ainsi qu'en complément facultatif le module de détection d'activité vocale 36, sont réalisés chacun sous forme d'un logiciel, ou d'une brique logicielle, exécutable par le processeur 44. La mémoire 42 du dispositif de traitement 20 est alors apte à stocker un logiciel d'hybridation des premier et deuxième signaux analogiques en un signal hybride, un logiciel d'estimation du bruit dans le signal hybride, et un logiciel de réduction du bruit dans le signal hybride, ainsi qu'en complément facultatif un logiciel de détection d'activité vocale dans le signal hybride. Le processeur 44 est alors apte à exécuter chacun des logiciels parmi le logiciel d'hybridation, le logiciel d'estimation et le logiciel de réduction de bruit, ainsi qu'en complément facultatif le logiciel de détection d'activité vocale.In the example of the figure 1 , the hybridization module 30, the estimation module 32, the noise reduction module 34, as well as in optional addition the voice activity detection module 36, are each produced in the form of software, or of a software brick, executable by the processor 44. The memory 42 of the processing device 20 is then capable of storing software for hybridizing the first and second analog signals into a hybrid signal, software for estimating the noise in the hybrid signal, and noise reduction software in the hybrid signal, as well as in optional addition voice activity detection software in the hybrid signal. The processor 44 is then capable of executing each of the software programs among the hybridization software, the estimation software and the noise reduction software, as well as in optional addition the voice activity detection software.

En variante non représentée, le module d'hybridation 30, le module d'estimation 32, le module de réduction de bruit 34, ainsi qu'en complément facultatif le module de détection d'activité vocale 36, sont réalisés chacun sous forme d'un composant logique programmable, tel qu'un FPGA (de l'anglais Field Programmable Gate Array), ou encore d'un circuit intégré, tel qu'un ASIC (de l'anglais Application Spécifie Integrated Circuit). As a variant not shown, the hybridization module 30, the estimation module 32, the noise reduction module 34, as well as in optional addition the voice activity detection module 36, are each produced in the form of a programmable logic component, such as an FPGA ( Field Programmable Gate Array ) , or an integrated circuit, such as an ASIC ( Application Specifies Integrated Circuit ) .

Lorsque le dispositif électronique de traitement 20 est réalisé sous forme d'un ou plusieurs logiciels, c'est-à-dire sous forme d'un programme d'ordinateur, également appelé produit programme d'ordinateur, il est en outre apte à être enregistré sur un support, non représenté, lisible par ordinateur. Le support lisible par ordinateur est par exemple un medium apte à mémoriser des instructions électroniques et à être couplé à un bus d'un système informatique. A titre d'exemple, le support lisible est un disque optique, un disque magnéto-optique, une mémoire ROM, une mémoire RAM, tout type de mémoire non-volatile (par exemple EPROM, EEPROM, FLASH, NVRAM), une carte magnétique ou une carte optique. Sur le support lisible est alors mémorisé un programme d'ordinateur comprenant des instructions logicielles.When the electronic processing device 20 is produced in the form of one or more software programs, that is to say in the form of a computer program, also called a computer program product, it is also capable of being recorded on a medium, not shown, readable by computer. The computer-readable medium is for example a medium capable of storing electronic instructions and of being coupled to a bus of a computer system. For example, the readable medium is an optical disk, a magneto-optical disk, a ROM memory, a RAM memory, any type of non-volatile memory (for example EPROM, EEPROM, FLASH, NVRAM), a magnetic card or an optical card. A computer program comprising software instructions is then stored on the readable medium.

Le module d'hybridation 30 est configuré pour calculer le signal hybride à partir des premier et deuxième signaux analogiques.The hybridization module 30 is configured to calculate the hybrid signal from the first and second analog signals.

Le module d'hybridation 30 est par exemple configuré pour obtenir un premier signal filtré en appliquant au premier signal un premier filtre associé à une première plage de fréquences ; pour obtenir un deuxième signal filtré en appliquant au deuxième signal un deuxième filtre associé à une deuxième plage de fréquences ; puis pour calculer le signal hybride en sommant le premier signal filtré et le deuxième signal filtré, la deuxième plage de fréquences étant distincte de la première plage de fréquences.The hybridization module 30 is for example configured to obtain a first filtered signal by applying to the first signal a first filter associated with a first frequency range; to obtain a second filtered signal by applying to the second signal a second filter associated with a second frequency range; then to calculate the signal hybrid by summing the first filtered signal and the second filtered signal, the second frequency range being distinct from the first frequency range.

La première plage de fréquences comporte typiquement des fréquences supérieures à celles de la deuxième plage de fréquences ; les première et deuxième plages de fréquences étant par exemple disjointes.The first frequency range typically includes frequencies higher than those of the second frequency range; the first and second frequency ranges being for example disjoint.

Le premier filtre est typiquement un filtre passe-haut avec une fréquence de coupure f_c sensiblement égale à 1000 Hz, le filtre passe-haut étant par exemple un filtre passe-haut gaussien. Le deuxième filtre est typiquement un filtre passe-bas avec une fréquence de coupure également sensiblement égale à 1000 Hz, le filtre passe-bas étant par exemple un filtre passe-bas gaussien. Autrement dit, la première plage de fréquences est alors la plage des fréquences supérieures à 1000 Hz, et la deuxième plage de fréquence est celle des fréquences inférieures à 1000 Hz.The first filter is typically a high-pass filter with a cutoff frequency f _c substantially equal to 1000 Hz, the high-pass filter being for example a Gaussian high-pass filter. The second filter is typically a low-pass filter with a cutoff frequency also substantially equal to 1000 Hz, the low-pass filter being for example a Gaussian low-pass filter. In other words, the first frequency range is then the range of frequencies above 1000 Hz, and the second frequency range is that of frequencies below 1000 Hz.

En complément, le module d'hybridation 30 est configuré pour convertir le premier signal analogique en un premier signal numérique, au fur et à mesure de la réception du premier signal analogique, et pour générer des premiers tronçons successifs à partir du premier signal numérique.In addition, the hybridization module 30 is configured to convert the first analog signal into a first digital signal, as the first analog signal is received, and to generate first successive sections from the first digital signal.

Selon ce complément, le module d'hybridation 30 est également configuré pour convertir le deuxième signal analogique en un deuxième signal numérique, au fur et à mesure de la réception du deuxième signal analogique, et pour générer des deuxièmes tronçons successifs à partir du deuxième signal numérique.According to this addition, the hybridization module 30 is also configured to convert the second analog signal into a second digital signal, as the second analog signal is received, and to generate second successive sections from the second signal. digital.

Selon ce complément facultatif, le module d'hybridation 30 est alors configuré pour calculer des tronçons hybrides du signal hybride au fur et à mesure, à partir des premiers et deuxièmes tronçons générés ; le signal corrigé étant ensuite calculé à partir desdits tronçons hybrides.According to this optional addition, the hybridization module 30 is then configured to calculate hybrid sections of the hybrid signal gradually, from the first and second sections generated; the corrected signal then being calculated from said hybrid sections.

Dans l'exemple de la figure 2, le module d'hybridation 30 comporte un premier convertisseur analogique-numérique 50, connecté au premier microphone 12 aérien et configuré pour convertir le premier signal analogique issu du premier microphone 12 en un premier signal numérique x_k ^aer, avec une fréquence d'échantillonnage f_e par exemple sensiblement égale à 22 kHz. En complément, le premier convertisseur analogique-numérique 50 est configuré pour découper le premier signal numérique x_k ^aer, converti et échantillonné, en premiers tronçons successifs, chaque premier tronçon comportant par exemple un nombre N d'échantillons. Le nombre N d'échantillons dans chaque premier tronçon est par exemple sensiblement égal à 512. L'homme du métier observera alors qu'avec la fréquence d'échantillonnage f_e sensiblement égale à 22 kHz et le nombre N d'échantillons sensiblement égal à 512, la durée de chaque premier tronçon est d'environ 20 ms, et typiquement sensiblement égale à 23 ms.In the example of the figure 2 , the hybridization module 30 comprises a first analog-digital converter 50, connected to the first aerial microphone 12 and configured to convert the first analog signal coming from the first microphone 12 into a first digital signal x _k ^aer , with a sampling frequency f _e for example substantially equal to 22 kHz. In addition, the first analog-digital converter 50 is configured to divide the first digital signal x _k ^aer , converted and sampled, into first successive sections, each first section comprising for example a number N of samples. The number N of samples in each first section is for example substantially equal to 512. Those skilled in the art will then observe that with the sampling frequency f _e substantially equal to 22 kHz and the number N of samples substantially equal to 512, the duration of each first section is approximately 20 ms, and typically substantially equal to 23 ms.

Dans l'exemple de la figure 2, le module d'hybridation 30 comporte en outre un premier convertisseur temporel-fréquentiel 52, connecté en sortie du premier convertisseur analogique-numérique 50 et configuré pour calculer un premier spectre ${\tilde{X}}_{k}^{aer}$

du premier signal numérique x_k ^aer, typiquement via une transformée de Fourier, telle qu'une transformée de Fourier rapide, également notée FFT (de l'anglais Fast Fourier Transform). Le module hybridation 30 comporte ensuite une première unité de filtrage 54, connectée en sortie du premier convertisseur temporel-fréquentiel 52 et configurée pour appliquer le premier filtre, typiquement le filtre passe-haut gaussien de fréquence de coupure f_c sensiblement égale à 1000 Hz, pour obtenir le premier signal filtré

{\tilde{X}}_{k}^{{aer}_{HF}}

.In the example of the figure 2 , the hybridization module 30 further comprises a first time-frequency converter 52, connected to the output of the first analog-digital converter 50 and configured to calculate a first spectrum

{\tilde{X}}_{k}^{air}

of the first digital signal x _k ^aer , typically via a Fourier transform, such as a fast Fourier transform, also denoted FFT ( Fast Fourier Transform ) . The hybridization module 30 then comprises a first filtering unit 54, connected to the output of the first time-frequency converter 52 and configured to apply the first filter, typically the Gaussian high-pass filter with a cutoff frequency f _c substantially equal to 1000 Hz, to obtain the first filtered signal

{\tilde{X}}_{k}^{{air}_{HF}}

.

Dans l'exemple de la figure 2, le module d'hybridation 30 comporte un deuxième convertisseur analogique-numérique 60, connecté au deuxième microphone 14 ostéophonique et configuré pour convertir le deuxième signal analogique issu du deuxième microphone 14 en un deuxième signal numérique x_k ^ost, avec la fréquence d'échantillonnage f_e. En complément, le deuxième convertisseur analogique-numérique 60 est configuré pour découper le deuxième signal numérique x_k ^ost, converti et échantillonné, en deuxièmes tronçons successifs, chaque deuxième tronçon comportant par exemple le nombre N d'échantillons. L'homme du métier observera alors qu'avec la fréquence d'échantillonnage f_e sensiblement égale à 22 kHz et le nombre N d'échantillons sensiblement égal à 512, la durée de chaque deuxième tronçon est d'environ 20 ms, et typiquement sensiblement égale à 23 ms.In the example of the figure 2 , the hybridization module 30 comprises a second analog-digital converter 60, connected to the second osteophonic microphone 14 and configured to convert the second analog signal coming from the second microphone 14 into a second digital signal x _k ^ost , with the sampling frequency f _e . In addition, the second analog-digital converter 60 is configured to divide the second digital signal x _k ^ost , converted and sampled, into second successive sections, each second section comprising for example the number N of samples. Those skilled in the art will then observe that with the sampling frequency f _e substantially equal to 22 kHz and the number N of samples substantially equal to 512, the duration of each second section is approximately 20 ms, and typically substantially equal to 23 ms.

Dans l'exemple de la figure 2, le module d'hybridation 30 comporte en outre un deuxième convertisseur temporel-fréquentiel 62, connecté en sortie du deuxième convertisseur analogique-numérique 60 et configuré pour calculer un deuxième spectre ${\tilde{X}}_{k}^{ost}$

du deuxième signal numérique x_k ^ost, typiquement via une transformée de Fourier, telle que la transformée de Fourier rapide, ou FFT. Le module hybridation 30 comporte ensuite une deuxième unité de filtrage 64, connectée en sortie du deuxième convertisseur temporel-fréquentiel 62 et configurée pour appliquer le deuxième filtre, typiquement le filtre passe-bas gaussien de fréquence de coupure f_c sensiblement égale à 1000 Hz, pour obtenir le deuxième signal filtré

{\tilde{X}}_{k}^{{ost}_{BF}}

.In the example of the figure 2 , the hybridization module 30 further comprises a second time-frequency converter 62, connected to the output of the second analog-digital converter 60 and configured to calculate a second spectrum

{\tilde{X}}_{k}^{ost}

of the second digital signal x _k ^ost , typically via a Fourier transform, such as the fast Fourier transform, or FFT. The hybridization module 30 then comprises a second filtering unit 64, connected to the output of the second time-frequency converter 62 and configured to apply the second filter, typically the Gaussian low-pass filter with a cutoff frequency f _c substantially equal to 1000 Hz, to obtain the second filtered signal

{\tilde{X}}_{k}^{{ost}_{B.F.}}

.

Par convention, dans la présente description, pour un signal noté x, sa forme continue dans le temps est notée x(t), et sa forme discrétisée est notée x[n] où n est un entier naturel, n formant alors une variable représentant le temps discrétisé. Dans le domaine fréquentiel, m représente la variable de fréquence discrète, comprise entre 0 et N/2, où N représente le nombre d'échantillons par tronçon, par exemple égal à 512.By convention, in the present description, for a signal denoted x, its continuous form over time is denoted x(t), and its discretized form is denoted x[n] where n is a natural integer, n then forming a variable representing discretized time. In the frequency domain, m represents the discrete frequency variable, between 0 and N/2, where N represents the number of samples per section, for example equal to 512.

La forme discrétisée de chaque signal vérifie alors l'équation suivante : $x [n] = x (n \times T_{e})$

où n est la variable entière représentant le temps discrétisé, et
T_e est un pas de discrétisation temporelle vérifiant l'équation suivante : $T_{e} = \frac{1}{f_{e}}$
où f_e est la fréquence d'échantillonnage, par exemple sensiblement égale à 22 kHz.

The discretized form of each signal then satisfies the following equation:

x [not] = x (not \times T_{e})

where n is the integer variable representing the discretized time, and
T _e is a temporal discretization step verifying the following equation: $T_{e} = \frac{1}{f_{e}}$
where f _e is the sampling frequency, for example substantially equal to 22 kHz.

La variable de fréquence discrète m est typiquement associée à un vecteur fréquence f[m] vérifiant l'équation suivante : $f [m] = m \times \frac{f_{e}}{N}$

où N est le nombre d'échantillons compris dans un tronçon,
m est la variable de fréquence discrète, et
f_e est la fréquence d'échantillonnage.

The discrete frequency variable m is typically associated with a frequency vector f[m] verifying the following equation:

f [m] = m \times \frac{f_{e}}{NOT}

where N is the number of samples included in a section,
m is the discrete frequency variable, and
f _e is the sampling frequency.

La fréquence varie alors typiquement entre 0 Hz et f_e/2 Hz, avec un pas fréquentiel égal à f_e/N.The frequency then typically varies between 0 Hz and f _e /2 Hz, with a frequency step equal to f _e /N.

Par convention, le k^ème tronçon du signal x est noté x_k ou x_k [n], et $\tilde{X_{k}} [m]$

dans le domaine fréquentiel avec :

\tilde{X_{k}} [m] = FFT (x_{k} [n])

où FFT représente l'opérateur numérique permettant d'estimer la transformée de Fourier discrète d'un signal, par exemple mis en oeuvre via le convertisseur temporel-fréquentiel 52, 62 respectif.By convention, the ^kth section of signal x is denoted x _k or x _k [n], and

\tilde{X_{k}} [m]

in the frequency domain with:

\tilde{X_{k}} [m] = FFT (x_{k} [not])

where FFT represents the digital operator making it possible to estimate the discrete Fourier transform of a signal, for example implemented via the respective time-

frequency converter

52, 62.

La soustraction spectrale décrite par la suite ne nécessite de travailler que sur le spectre en amplitude du signal, la phase étant conservée et inchangée tout au long du processus, avec $|\tilde{X_{k}} [m]|$

représentant le spectre en amplitude et

ϕ (\tilde{X_{k}} [m])

représentant le spectre en phase de x_k [n] respectivement. Par convention, le spectre sans autre précision désignera alors par la suite le spectre en amplitude.The spectral subtraction described below only requires working on the amplitude spectrum of the signal, the phase being preserved and unchanged throughout the process, with

|\tilde{X_{k}} [m]|

representing the spectrum in amplitude and

ϕ (\tilde{X_{k}} [m])

representing the phase spectrum of x _k [ n ] respectively. By convention, the spectrum without further precision will then subsequently designate the amplitude spectrum.

Dans l'exemple de la figure 2, le module hybridation 30 comporte également un sommateur 70, également appelé additionneur, connecté en sortie d'une part de la première unité de filtrage 54, et d'autre part de la deuxième unité de filtrage 64, et configuré pour sommer le premier signal filtré ${\tilde{X}}_{k}^{{aer}_{HF}}$

et le deuxième signal filtré

{\tilde{X}}_{k}^{{ost}_{BF}}

afin d'obtenir le signal hybride

{\tilde{X}}_{k}^{hyb}

.In the example of the figure 2 , the hybridization module 30 also includes an adder 70, also called an adder, connected at the output of the first filtering unit 54, and the second filtering unit 64, and configured to sum the first signal filtered

{\tilde{X}}_{k}^{{air}_{HF}}

and the second filtered signal

{\tilde{X}}_{k}^{{ost}_{B.F.}}

in order to obtain the hybrid signal

{\tilde{X}}_{k}^{hyb}

.

Le module hybridation 30 est alors par exemple configuré pour calculer le signal hybride ${\tilde{X}}_{k}^{hyb}$

en sommant le premier signal filtré

{\tilde{X}}_{k}^{{aer}_{HF}}

et le deuxième signal filtré

{\tilde{X}}_{k}^{{ost}_{BF}}

via l'équation suivante :

{\tilde{X}}_{k}^{hyb} = α {\tilde{X}}_{k}^{{aer}_{HF}} + β {\tilde{X}}_{k}^{{ost}_{BF}}

où α et β sont des constantes.The hybridization module 30 is then, for example, configured to calculate the hybrid signal

{\tilde{X}}_{k}^{hyb}

by summing the first filtered signal

{\tilde{X}}_{k}^{{air}_{HF}}

and the second filtered signal

{\tilde{X}}_{k}^{{ost}_{B.F.}}

via the following equation:

{\tilde{X}}_{k}^{hyb} = α {\tilde{X}}_{k}^{{air}_{HF}} + β {\tilde{X}}_{k}^{{ost}_{B.F.}}

where α and β are constants.

Les valeurs des constantes α et β sont de préférence réglables permettant d'avoir un signal de sortie au niveau équivalent à celui d'entrée du premier microphone 12 aérien. En outre, cela permet de donner une éventuelle prépondérance au signal aérien, ou respectivement au signal ostéophonique.The values of the constants α and β are preferably adjustable making it possible to have an output signal at the level equivalent to that of the input of the first overhead microphone 12. In addition, this makes it possible to give a possible preponderance to the air signal, or respectively to the osteophonic signal.

En complément facultatif, le module d'hybridation 30 est configuré, lors de la génération des premiers tronçons successifs, pour générer chaque nouveau premier tronçon avec des échantillons d'un premier tronçon précédent et de nouveaux échantillons du premier signal numérique.As an optional complement, the hybridization module 30 is configured, during the generation of the first successive sections, to generate each new first section with samples of a first previous section and new samples of the first digital signal.

Selon ce complément facultatif, le module d'hybridation 30 est configuré de manière analogue, lors de la génération des deuxièmes tronçons successifs, pour générer chaque nouveau deuxième tronçon avec des échantillons d'un deuxième tronçon précédent et de nouveaux échantillons du deuxième signal numérique.According to this optional complement, the hybridization module 30 is configured in a similar manner, during the generation of the second successive sections, to generate each new second section with samples of a previous second section and new samples of the second digital signal.

Il y a alors un chevauchement entre les premiers tronçons successifs ainsi générés, c'est-à-dire d'un premier tronçon généré au suivant ; et de manière analogue entre les deuxièmes tronçons successifs ainsi générés, c'est-à-dire d'un deuxième tronçon généré au suivant.There is then an overlap between the first successive sections thus generated, that is to say from a first section generated to the next; and similarly between the second successive sections thus generated, that is to say from a second section generated to the next.

Un taux de chevauchement correspond alors à un ratio, au sein de chaque nouveau premier tronçon, entre le nombre d'échantillons du premier tronçon précédent utilisés et le nombre total d'échantillons du premier tronçon, c'est-à-dire du nouveau premier tronçon généré ; ou respectivement au ratio, au sein de chaque nouveau deuxième tronçon, entre le nombre d'échantillons du deuxième tronçon précédent utilisés et le nombre total d'échantillons du deuxième tronçon. Le taux de chevauchement est par exemple compris entre 50 % et 75 %, c'est-à-dire entre 0,5 et 0,75. Autrement dit, au sein de chaque nouveau premier tronçon, entre la moitié et trois-quarts des derniers échantillons du premier tronçon précédent sont utilisés ; et de manière analogue au sein de chaque nouveau deuxième tronçon, entre la moitié et trois-quarts des derniers échantillons du deuxième tronçon précédent sont utilisés. Ce chevauchement entre tronçons est illustré à la figure 3.An overlap rate then corresponds to a ratio, within each new first section, between the number of samples from the previous first section used and the total number of samples from the first section, that is to say from the new first segment generated; or respectively to the ratio, within each new second section, between the number of samples from the previous second section used and the total number of samples from the second section. The overlap rate is for example between 50% and 75%, that is to say between 0.5 and 0.75. In other words, within each new first section, between half and three-quarters of the last samples from the previous first section are used; and similarly within each new second section, between half and three-quarters of the last samples from the previous second section are used. This overlap between sections is illustrated in Figure 3 .

Sur la figure 3, les tronçons qui seraient obtenus par un simple découpage (i.e. sans chevauchement) du signal issu du premier convertisseur analogique-numérique 50, respectivement du deuxième convertisseur analogique-numérique 60, sont notés x_i, qu'il s'agisse des premiers ou des deuxièmes tronçons, où i est un indice prenant les valeurs successives k-2, k-1 et k dans cet exemple. Ces tronçons x_i qui seraient obtenus par simple découpage et sans chevauchement sont également appelés tronçons physiques. Les autres tronçons, représentés à la figure 3 et illustrant ce chevauchement, sont également appelés tronçons chevauchés et notés x'_i, avec i égal à k-1 ou k dans cet exemple.On the Figure 3 , the sections which would be obtained by a simple division (ie without overlap) of the signal coming from the first analog-digital converter 50, respectively from the second analog-digital converter 60, are denoted x _i , which it concerns the first or second sections, where i is an index taking the successive values k-2, k-1 and k in this example. These sections x _i which would be obtained by simple cutting and without overlapping are also called physical sections. The other sections, represented in Figure 3 and illustrating this overlap, are also called overlapped sections and denoted x' _i , with i equal to k-1 or k in this example.

Dans l'exemple de la figure 3, l'homme du métier observera que le taux de chevauchement est sensiblement égal à 50 %, et que le tronçon x'_k-1 comporte alors 50 % d'échantillons issus du tronçon précédent, correspondant à la dernière moitié du tronçon x_k-2 dans cet exemple ; et 50 % de nouveaux échantillons, correspondant à la première moitié du tronçon x_k-1 dans cet exemple.In the example of the Figure 3 , those skilled in the art will observe that the overlap rate is substantially equal to 50%, and that the section x' _k-1 then comprises 50% of samples from the previous section, corresponding to the last half of the section x _{k- 2} in this example; and 50% new samples, corresponding to the first half of the section x _k-1 in this example.

Sur la figure 3, les tronçons obtenus après réduction de bruit par le module de réduction de bruit 34 sont notés y_i lorsqu'ils résultent de tronçons physiques x_i, et respectivement y'_i lorsqu'ils résultent de tronçons chevauchés x'_i, avec i égal à k-1 ou k dans cet exemple.On the Figure 3 , the sections obtained after noise reduction by the noise reduction module 34 are denoted y _i when they result from physical sections x _i , and respectively y' _i when they result from overlapped sections x' _i , with i equal to k-1 or k in this example.

Dans le cas d'un chevauchement à 50% le tronçon de sortie y_k ^out vérifie alors typiquement l'équation suivante : $y_{k}^{out} = \frac{1}{2} y_{k - 1}^{'} [\frac{N}{2} : N] + \frac{1}{2} y_{k - 1} [0 : N] + \frac{1}{2} y_{k}^{'} [0 : \frac{N}{2}]$

où N représente le nombre d'échantillons par tronçon, par exemple égal à 512,
y_i représente un tronçon obtenu après réduction de bruit à partir d'un tronçon physique x_i, et
y'_i représente un tronçon obtenu après réduction de bruit à partir d'un tronçon chevauché x'_i.

In the case of a 50% overlap, the output section y _k ^out then typically verifies the following equation:

y_{k}^{out} = \frac{1}{2} y_{k - 1}^{'} [\frac{NOT}{2} : NOT] + \frac{1}{2} y_{k - 1} [0 : NOT] + \frac{1}{2} y_{k}^{'} [0 : \frac{NOT}{2}]

where N represents the number of samples per section, for example equal to 512,
y _i represents a section obtained after noise reduction from a physical section x _i , and
y' _i represents a section obtained after noise reduction from an overlapped section x' _i .

Le module d'estimation 32 est configuré pour estimer un bruit dans le signal hybride.The estimation module 32 is configured to estimate noise in the hybrid signal.

Lorsqu'on complément facultatif, le module de détection d'activité vocale 36 est configuré pour déterminer une présence de voix ou une absence de voix dans chaque tronçon du signal hybride, le module d'estimation 32 est alors configuré pour estimer le bruit dans le signal hybride en fonction de chaque tronçon avec une absence déterminée de voix.When optional, the voice activity detection module 36 is configured to determine a presence of voice or an absence of voice in each section of the hybrid signal, the estimation module 32 is then configured to estimate the noise in the hybrid signal depending on each section with a determined absence of voice.

Autrement dit, lorsque le module de détection d'activité vocale 36 détermine une présence de voix dans un tronçon donné, le spectre du bruit n'est pas mis à jour. Au contraire, lorsque le module de détection d'activité vocale 36 détermine une présence de voix dans un tronçon donné, le spectre du bruit de fond est mis à jour. Cette mise à jour du spectre du bruit de fond est alors effectuée lorsque le tronçon n'est pas de la voix et que la probabilité que cela soit du bruit est élevée. La robustesse du module de détection d'activité vocale 36 permettra d'avoir autant plus de précision sur l'estimation et la poursuite du bruit.In other words, when the voice activity detection module 36 determines the presence of voices in a given section, the noise spectrum is not updated. On the contrary, when the voice activity detection module 36 determines the presence of voices in a given section, the background noise spectrum is updated. This update of the background noise spectrum is then carried out when the section is not voice and the The probability that this is noise is high. The robustness of the voice activity detection module 36 will make it possible to have even more precision in the estimation and tracking of noise.

Selon ce complément facultatif, le module d'estimation 32 est typiquement configuré pour mettre à jour le spectre du bruit de fond |Ñ_k | selon l'équation suivante : $\begin{array}{l} {|\tilde{N_{k}}| = p \times |{\tilde{N}}_{k - 1}| + (1 - p) \times |{\tilde{X}}_{k}^{hyb}| \\ |\tilde{N_{k}}| = |{\tilde{N}}_{k - 1}| si DAV = 1 \end{array}$

si DAV = 0

où p est un facteur d'oubli, de valeur par exemple égale à 0,95 ;
DAV est un indicateur d'activité vocale issu du module de détection d'activité vocale 36, DAV étant égal à 1 si une présence de voix est déterminée, et à 0 sinon, i.e. si une absence de voix est déterminée ;
$|{\tilde{X}}_{k}^{hyb}|$
représente le spectre du signal hybride ${\tilde{X}}_{k}^{hyb}$
$|{\tilde{N}}_{k - 1}|$
, et resp. |Ñ_k |, représentent les spectres du bruit de fond pour le tronçon d'indice k-1, et resp. d'indice k.

According to this optional complement, the estimation module 32 is typically configured to update the background noise spectrum | _Ñk | according to the following equation:

\begin{array}{l} {|\tilde{{NOT}_{k}}| = p \times |{\tilde{NOT}}_{k - 1}| + (1 - p) \times |{\tilde{X}}_{k}^{hyb}| \\ |\tilde{{NOT}_{k}}| = |{\tilde{NOT}}_{k - 1}| if DAV = 1 \end{array}

if DAV = 0

where p is a forgetting factor, with a value for example equal to 0.95;
DAV is a voice activity indicator from the voice activity detection module 36, DAV being equal to 1 if a presence of voice is determined, and to 0 otherwise, ie if an absence of voice is determined;
$|{\tilde{X}}_{k}^{hyb}|$
represents the spectrum of the hybrid signal ${\tilde{X}}_{k}^{hyb}$
$|{\tilde{NOT}}_{k - 1}|$
, and resp. | Ñ _k |, represent the background noise spectra for the section with index k-1, and resp. of index k.

Le module de réduction de bruit 34 est configuré pour calculer le signal corrigé en appliquant un algorithme de soustraction spectrale généralisée au signal hybride et en fonction du bruit estimé.The noise reduction module 34 is configured to calculate the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise.

Dans l'exemple de la figure 2, le module de réduction de bruit 34 comporte une unité de soustraction spectrale généralisée 80, également appelée unité SSG 80, apte à mettre en oeuvre l'algorithme de soustraction spectrale généralisée.In the example of the figure 2 , the noise reduction module 34 comprises a generalized spectral subtraction unit 80, also called SSG unit 80, capable of implementing the generalized spectral subtraction algorithm.

L'algorithme de soustraction spectrale généralisée vérifie par exemple l'équation suivante : $\begin{array}{l} {{|\tilde{Y_{k}} [m]|}^{γ} = {|\tilde{X_{k}} [m]|}^{γ} - α_{k} [m] \times δ [m] \times {|\tilde{N_{k}} [m]|}^{γ} si {|\tilde{X_{k}} [m]|}^{γ} - α_{k} [m] \times δ [m] \times {|\tilde{N_{k}} [m]|}^{γ} \geq β {|\tilde{N_{k}} [m]|}^{γ} \\ {|\tilde{Y_{k}} [m]|}^{γ} = β {|\tilde{N_{k}} [m]|}^{γ} \end{array}$

sinon

|Ỹ_k [m] | représente le spectre du signal débruité pour le tronçon d'indice k ;
$|\tilde{X_{k}} [m]|$
représente le spectre du signal hybride pour le tronçon d'indice k ;
$|\tilde{N_{k}} [m]|$
représente le spectre du bruit de fond pour le tronçon d'indice k ;
α_k représente un coefficient de surestimation du bruit pour le tronçon d'indice k ;
δ représente un coefficient de correction ;
β représente un coefficient de réintroduction du bruit ; et
γ représente un coefficient de puissance, typiquement égal à 1 ou 2.

The generalized spectral subtraction algorithm verifies for example the following equation:

\begin{array}{l} {{|\tilde{Y_{k}} [m]|}^{γ} = {|\tilde{X_{k}} [m]|}^{γ} - α_{k} [m] \times δ [m] \times {|\tilde{{NOT}_{k}} [m]|}^{γ} if {|\tilde{X_{k}} [m]|}^{γ} - α_{k} [m] \times δ [m] \times {|\tilde{{NOT}_{k}} [m]|}^{γ} \geq β {|\tilde{{NOT}_{k}} [m]|}^{γ} \\ {|\tilde{Y_{k}} [m]|}^{γ} = β {|\tilde{{NOT}_{k}} [m]|}^{γ} \end{array}

Otherwise

| _Ỹk [m] | represents the spectrum of the denoised signal for the section of index k;
$|\tilde{X_{k}} [m]|$
represents the spectrum of the hybrid signal for the section of index k;
$|\tilde{{NOT}_{k}} [m]|$
represents the background noise spectrum for the section of index k;
α _k represents a noise overestimation coefficient for the section of index k;
δ represents a correction coefficient;
β represents a noise reintroduction coefficient; And
γ represents a power coefficient, typically equal to 1 or 2.

L'algorithme de soustraction spectrale généralisée se calcule par exemple en amplitude, et le coefficient de puissance γ est alors égal à 1 ; ou encore en puissance, et le coefficient de puissance γ est alors égal à 2.The generalized spectral subtraction algorithm is calculated for example in amplitude, and the power coefficient γ is then equal to 1; or even in power, and the power coefficient γ is then equal to 2.

Dans le cas d'un calcul en amplitude de la soustraction spectrale généralisée, avec γ=1, peu de bruit musical sera produit, mais le signal de voix estimé pourra être plus ou moins distordu en fonction du rapport signal sur bruit. Le bruit musical est un ensemble d'artefacts produits lors de la soustraction spectrale, constitué de tonales courtes en temps et produisant un bruit relativement désagréable.In the case of an amplitude calculation of the generalized spectral subtraction, with γ=1, little musical noise will be produced, but the estimated voice signal may be more or less distorted depending on the signal-to-noise ratio. Musical noise is a set of artifacts produced during spectral subtraction, consisting of short tones in time and producing a relatively unpleasant noise.

Dans le cas d'un calcul en puissance de la soustraction spectrale généralisée, avec γ =2, peu de distorsion sera créée, mais une quantité non négligeable de bruit musical pourra être générée.In the case of a power calculation of the generalized spectral subtraction, with γ =2, little distortion will be created, but a significant amount of musical noise may be generated.

Le coefficient de surestimation de bruit α est de préférence recalculé à chaque tronçon d'indice k, et est alors noté α_k. Ce coefficient permet d'éviter la génération d'une quantité trop importante de bruit musical. Pour maximiser son efficacité, son calcul s'effectue par bandes de fréquences et dépend du rapport signal sur bruit sur chacune de ces bandes.The noise overestimation coefficient α is preferably recalculated at each section of index k, and is then denoted α _k . This coefficient helps prevent the generation of too much musical noise. To maximize its efficiency, its calculation is carried out by frequency bands and depends on the signal-to-noise ratio on each of these bands.

Les spectres $|\tilde{X_{k}} [m]|$

et

|\tilde{N_{k}} [m]|

sont d'abord découpés en sous-spectres notés

|\tilde{X_{k}^{J}} [m]|

et

|\tilde{N_{k}^{J}} [m]|

, où j représente le numéro de la bande de fréquence. Ainsi, j valeurs du rapport signal sur bruit, notées RSB_k ^j, chacune associée à une bande de fréquence d'indice j, sont typiquement calculées selon l'équation suivante :

{RSB}_{k}^{j} = 10 \times \log_{10} (\frac{\sum_{m = 0}^{N_{j}} {|\tilde{X_{k}^{J}} [m]|}^{2}}{\sum_{m = 0}^{N_{j}} {|\tilde{N_{k}^{J}} [m]|}^{2}})

où RSB_k ^j représente le rapport signal sur bruit pour le tronçon d'indice k et la bande de fréquence d'indice j,
Nj représente le nombre d'échantillons fréquentiels contenus dans la bande d'indice j ;
$|\tilde{X_{k}} [m]|$
représente le spectre du signal hybride pour le tronçon d'indice k ; et
$|\tilde{N_{k}} [m]|$
représente le spectre du bruit de fond pour le tronçon d'indice k.

The spectra

|\tilde{X_{k}} [m]|

And

|\tilde{{NOT}_{k}} [m]|

are first divided into subspectra noted

|\tilde{X_{k}^{J}} [m]|

And

|\tilde{{NOT}_{k}^{J}} [m]|

, where j represents the frequency band number. Thus, j values of the signal-to-noise ratio, denoted SNR _k ^j , each associated with a frequency band of index j, are typically calculated according to the following equation:

{RSB}_{k}^{j} = 10 \times \log_{10} (\frac{\sum_{m = 0}^{{NOT}_{j}} {|\tilde{X_{k}^{J}} [m] []|}^{2}}{\sum_{m = 0}^{{NOT}_{j}} {|\tilde{{NOT}_{k}^{J}} [m] []|}^{2}})

where RSB _k ^j represents the signal-to-noise ratio for the section of index k and the frequency band of index j,
Nj represents the number of frequency samples contained in the band of index j;
$|\tilde{X_{k}} [m]|$
represents the spectrum of the hybrid signal for the section of index k; And
$|\tilde{{NOT}_{k}} [m]|$
represents the background noise spectrum for the section with index k.

Puis, pour chaque valeur de rapport signal sur bruit, le coefficient de surestimation du bruit α_k vérifie par exemple l'équation suivante : $\begin{matrix} {α_{k}^{j} = 4.75 si {RSB}^{j} < - 5 dB \\ α_{k}^{j} = 4 - \frac{3}{20} \times {RSB}_{k}^{j} si - 5 dB \leq {RSB}_{k}^{j} \leq 20 dB \\ α_{k}^{j} = 1 si {RSB}_{k}^{j} > 20 dB \end{matrix}$

Then, for each signal-to-noise ratio value, the noise overestimation coefficient α _k verifies, for example, the following equation:

\begin{matrix} {α_{k}^{j} = 4.75 if {RSB}^{j} < - 5 dB \\ α_{k}^{j} = 4 - \frac{3}{20} \times {RSB}_{k}^{j} if - 5 dB \leq {RSB}_{k}^{j} \leq 20 dB \\ α_{k}^{j} = 1 if {RSB}_{k}^{j} > 20 dB \end{matrix}

Globalement, ce calcul du coefficient de surestimation de bruit α permet de surestimer le bruit lorsque le rapport signal sur bruit est faible, et de réduire l'introduction d'artefacts de type bruit musical.Overall, this calculation of the noise overestimation coefficient α makes it possible to overestimate the noise when the signal-to-noise ratio is low, and to reduce the introduction of musical noise type artifacts.

Le coefficient de surestimation du bruit α_k ^j est ensuite converti pour pouvoir être réintroduit dans l'équation (8), par exemple selon l'équation suivante : $α_{k} [m] = α_{k}^{j} \forall m \in [\{f_{j}, f_{j + 1}\}]$

où l'intervalle [f_j ; f _j+1] correspond à toutes les fréquences de la j^ème bande de fréquences. Typiquement, à chaque tronçon la fonction α_k[m] sera une fonction constante par morceaux, où chaque morceau correspondra à une bande de fréquences déterminée par l'utilisateur.The noise overestimation coefficient α _k ^j is then converted so that it can be reintroduced into equation (8), for example according to the following equation:

α_{k} [m] = α_{k}^{j} \forall m \in [\{f_{j}, f_{j + 1}\}]

where the interval [ f _j ; f _{j +1} ] corresponds to all frequencies of the j ^th frequency band. Typically, at each chunk the function α _k [m] will be a piecewise constant function, where each chunk will correspond to a frequency band determined by the user.

Le coefficient de correction δ est un coefficient de correction fréquentiel calculé une seule fois, typiquement au début de l'algorithme, et n'évoluant pas au cours du temps.The correction coefficient δ is a frequency correction coefficient calculated only once, typically at the start of the algorithm, and does not change over time.

Ce coefficient est un simple pré-facteur dépendant de la fréquence, afin de maximiser certaines bandes de fréquences de manière adaptée à la captation de voix.This coefficient is a simple pre-factor depending on the frequency, in order to maximize certain frequency bands in a manner adapted to voice capture.

Le coefficient de correction δ est par exemple une fonction constante par morceaux, vérifiant l'équation suivante : $\begin{array}{l} {δ [m] = 1 \forall f [m] < 1000 Hz \\ δ [m] = 2.5 \forall f [m] \in [1000,2000 [Hz \\ δ [m] = 1.5 \forall f [m] \in [2000,4000 [Hz \\ δ [m] = 1 \forall f [m] \geq 4000 Hz \end{array}$

The correction coefficient δ is for example a piecewise constant function, verifying the following equation:

\begin{array}{l} {δ [m] = 1 \forall f [m] < 1000 Hz \\ δ [m] = 2.5 \forall f [m] \in [1000,2000 [Hz \\ δ [m] = 1.5 \forall f [m] \in [2000,4000 [Hz \\ δ [m] = 1 \forall f [m] \geq 4000 Hz \end{array}

Compte tenu des calculs effectués avec les spectres en amplitude, il ne faut pas que l'estimation |Ỹ_k [m]| ^γ soit négative car cela n'aurait pas de sens mathématiquement. C'est pourquoi l'équation (8) comporte une condition pour éviter les valeurs négatives.Taking into account the calculations carried out with the amplitude spectra, it is not necessary that the estimation | _Ỹk [ m ]| ^γ is negative because it would not make sense mathematically. This is why equation (8) includes a condition to avoid negative values.

Le coefficient de réintroduction du bruit β permet alors de choisir si l'on réintroduit du bruit ou non en cas de valeurs potentiellement négatives. Lorsque le coefficient de réintroduction du bruit β est choisi égal à 0, toute soustraction conduisant à une valeur négative est remplacée par la valeur nulle. En revanche pour toute valeur supérieure à 0, on réintroduit du bruit. Cela permet de conserver une partie du bruit qui peut être perçu comme un bruit de confort masquant une partie du bruit musical lorsqu'il y en a qui est créé.The noise reintroduction coefficient β then makes it possible to choose whether or not to reintroduce noise in the event of potentially negative values. When the noise reintroduction coefficient β is chosen equal to 0, any subtraction leading to a negative value is replaced by the zero value. On the other hand, for any value greater than 0, noise is reintroduced. This keeps some of the noise that may be perceived as comfort noise masking some of the musical noise when any is created.

Le coefficient de réintroduction du bruit β vaut généralement quelques pourcents. Le coefficient de réintroduction du bruit β est par exemple sensiblement égal à 0,05, soit une réintroduction de 5% du bruit de fond dans le signal de sortie. Cette valeur est un paramètre prédéfini.The noise reintroduction coefficient β is generally worth a few percent. The noise reintroduction coefficient β is for example substantially equal to 0.05, i.e. a reintroduction of 5% of the background noise into the output signal. This value is a predefined parameter.

Il est à noter que plus le rapport signal sur bruit est faible ou mauvais, moins l'estimation du signal débruité est efficace et plus la voix sera altérée. Il est donc intéressant de mettre une valeur du coefficient de réintroduction du bruit β plus élevée dans le cas d'un mauvais rapport signal sur bruit, afin de recapter quelques harmoniques de la voix dans le bruit de fond qui seraient perdues dans la soustraction spectrale autrement.It should be noted that the weaker or worse the signal-to-noise ratio, the less effective the estimation of the denoised signal is and the more the voice will be altered. It is therefore interesting to set a higher value for the noise reintroduction coefficient β in the case of a poor signal-to-noise ratio, in order to recapture some harmonics of the voice in the background noise which would otherwise be lost in the spectral subtraction. .

Dans l'exemple de la figure 2, le module de réduction de bruit 34 comporte en outre un convertisseur fréquentiel-temporel 82, connecté en sortie de l'unité de soustraction spectrale généralisée 80, et configuré pour calculer un signal temporel à partir du signal fréquentiel issu de l'unité SSG 80, typiquement via une transformée de Fourier inverse, telle qu'une transformée de Fourier rapide inverse, également notée IFFT (de l'anglais Inverse Fast Fourier Transform). In the example of the figure 2 , the noise reduction module 34 further comprises a frequency-time converter 82, connected to the output of the generalized spectral subtraction unit 80, and configured to calculate a time signal from the frequency signal coming from the SSG unit 80 , typically via an inverse Fourier transform, such as an inverse fast Fourier transform, also denoted IFFT ( Inverse Fast Fourier Transform ) .

Comme indiqué précédemment, les calculs dans le domaine fréquentiel ont été effectués avec l'amplitude du spectre du signal du tronçon. La phase de celui-ci, qui demeure non modifiée, est alors réintégrée au signal avant la transformée de Fourier inverse permettant de revenir dans le domaine temporel, par exemple selon l'équation suivante : $y_{k} [n] = IFFT (|\tilde{Y_{k}} [m]| \times e^{jϕ (\tilde{X_{k}} [m])})$

où y_k[n] représente le signal de sortie débruité pour le tronçon d'indice k ;
IFFT représente l'opérateur numérique de transformée de Fourier inverse ;
|Ỹ_k [m]| , et resp. $ϕ (\tilde{X_{k}} [m])$
, représentent le spectre en amplitude, et resp. en phase, du signal débruité pour le tronçon d'indice k.

As stated previously, the frequency domain calculations were carried out with the amplitude of the signal spectrum of the section. The phase of the latter, which remains unmodified, is then reintegrated into the signal before the inverse Fourier transform making it possible to return to the time domain, for example according to the following equation:

y_{k} [not] = IFFT (|\tilde{Y_{k}} [m]| \times e^{jϕ (\tilde{X_{k}} [m])})

where y _k [n] represents the denoised output signal for the section of index k;
IFFT represents the inverse Fourier transform digital operator;
| _Ỹk [ m ]| , and resp. $ϕ (\tilde{X_{k}} [m])$
, represent the amplitude spectrum, and resp. in phase, of the denoised signal for the section of index k.

Dans l'exemple de la figure 2, le module de réduction de bruit 34 comporte ensuite un convertisseur numérique-analogique 84, connecté en sortie du convertisseur fréquentiel-temporel 82 et configuré pour fournir le signal corrigé y(t) sous forme analogique. Le signal débruité y_k ^hyb issu du convertisseur fréquentiel-temporel 82 est alors resynthétisé en le signal corrigé y(t) via le convertisseur numérique-analogique 84, avec synthèse des tronçons chevauchés le cas échéant, puis délivré en sortie du dispositif de traitement 20.In the example of the figure 2 , the noise reduction module 34 then comprises a digital-analog converter 84, connected to the output of the frequency-time converter 82 and configured to provide the corrected signal y(t) in analog form. The denoised signal y _k ^hyb from the frequency-time converter 82 is then resynthesized into the corrected signal y(t) via the digital-analog converter 84, with synthesis of the overlapped sections if necessary, then delivered at the output of the processing device 20 .

Le module de détection d'activité vocale 36 est configuré pour déterminer une présence de voix ou une absence de voix dans chaque tronçon du signal hybride.The voice activity detection module 36 is configured to determine a presence of voice or an absence of voice in each section of the hybrid signal.

Le module de détection d'activité vocale 36 est par exemple configuré pour déterminer la présence de voix ou l'absence de voix à partir du deuxième signal issu du transducteur à excitation mécanique osseuse ; et de préférence uniquement à partir dudit deuxième signal, sans prise en compte du premier signal.The voice activity detection module 36 is for example configured to determine the presence of voice or the absence of voice from the second signal from the bone mechanical excitation transducer; and preferably only from said second signal, without taking into account the first signal.

Le deuxième microphone 14, ostéophonique ou solidien, est apte à mesurer les vibrations de la peau et du visage liée à la sollicitation des cordes vocales, et permet de capter la partie voisée d'un signal vocal tout en étant très peu sensible au bruit de fond (qui a priori ne fait pas suffisamment vibrer la peau de l'utilisateur pour être captée).The second microphone 14, osteophonic or solid, is capable of measuring the vibrations of the skin and the face linked to the solicitation of the vocal cords, and makes it possible to capture the voiced part of a vocal signal while being very insensitive to the noise of background (which a priori does not vibrate the user's skin enough to be captured).

L'intérêt d'utiliser le deuxième microphone 14 ostéophonique réside dans son immunité au bruit de fond. Cette immunité est encore plus grande dans la partie basse fréquence du signal acquis.The advantage of using the second osteophonic microphone 14 lies in its immunity to background noise. This immunity is even greater in the low frequency part of the acquired signal.

Avantageusement, la détection d'activité vocale est alors effectuée après un filtrage dans le domaine fréquentiel (fonctionnant également dans le domaine temporel) du signal solidien. Le module de détection d'activité vocale 36 est alors de préférence configuré pour déterminer la présence de voix ou l'absence de voix à partir du deuxième signal filtré issu du deuxième signal filtré ${\tilde{X}}_{k}^{{ost}_{BF}}$

issu de la deuxième unité de filtrage 64.Advantageously, the detection of voice activity is then carried out after filtering in the frequency domain (also operating in the time domain) of the solid-body signal. The voice activity detection module 36 is then preferably configured to determine the presence of voice or the absence of voice from the second filtered signal from the second filtered signal.

{\tilde{X}}_{k}^{{ost}_{B.F.}}

from the second filtering unit 64.

En complément facultatif, le module de détection d'activité vocale 36 est configuré pour calculer une valeur RMS pour chaque tronçon du deuxième signal, i.e. pour chaque deuxième tronçon ; puis pour déterminer la présence de voix ou l'absence de voix en fonction de valeurs RMS respectives.As an optional complement, the voice activity detection module 36 is configured to calculate an RMS value for each section of the second signal, i.e. for each second section; then to determine the presence of voice or absence of voice based on respective RMS values.

Le traitement est basé sur le calcul de l'énergie du signal tronçon par tronçon. Cependant ici, grâce au caractère immune au bruit du signal du microphone solidien filtré, l'énergie de la voix émergera tout le temps de l'énergie plancher du bruit. Le calcul du niveau RMS permet alors de connaître l'énergie du signal.The processing is based on the calculation of the signal energy section by section. However here, thanks to the noise-immune nature of the filtered solid-state microphone signal, the energy of the voice will emerge all the time from the noise floor energy. Calculating the RMS level then allows us to know the energy of the signal.

Comme connu en soi, la valeur efficace, dite aussi valeur RMS (de l'anglais Root Mean Square, signifiant moyenne quadratique), d'un signal périodique est la racine carrée de la moyenne du carré de cette grandeur, sur un intervalle de temps donné ou la racine carrée du moment d'ordre deux (ou variance) du signal.As known per se, the effective value, also called RMS value (from the English Root Mean Square, meaning square mean), of a periodic signal is the square root of the average of the square of this quantity, over a time interval given or the square root of the moment of order two (or variance) of the signal.

Pour un tronçon temporel x_k [n] de N échantillons, le calcul de la valeur RMS s'effectue alors typiquement via l'équation suivante : ${RMS}_{k} = \sqrt{\frac{\sum_{n = 0}^{N} x_{k} {[n]}^{2}}{N}}$

où RMS_k représente la valeur RMS pour le tronçon d'indice k ;
x_k [n] représente le signal pour le tronçon d'indice k ;
N représente le nombre d'échantillons dudit tronçon.

For a time segment x _k [ n ] of N samples, the calculation of the RMS value is then typically carried out via the following equation:

{RMS}_{k} = \sqrt{\frac{\sum_{not = 0}^{NOT} x_{k} {[not]}^{2}}{NOT}}

where RMS _k represents the RMS value for the section of index k;
x _k [ n ] represents the signal for the section of index k;
N represents the number of samples of said section.

Or, dans le domaine fréquentiel, grâce à l'identité de Parseval selon laquelle l'énergie est égale dans les domaines fréquentiel et temporel, on obtient l'équation suivante : ${RMS}_{k} = \frac{1}{2 N} \sqrt{\sum_{m = - \frac{N}{2}}^{\frac{N}{2}} {|\tilde{X_{k}} [m]|}^{2}}$

où RMS_k représente la valeur RMS pour le tronçon d'indice k ;
$|\tilde{X_{k}} [m]|$
représente le spectre du signal hybride pour le tronçon d'indice k ; et
N représente le nombre d'échantillons dudit tronçon.

However, in the frequency domain, thanks to the Parseval identity according to which the energy is equal in the frequency and time domains, we obtain the following equation:

{RMS}_{k} = \frac{1}{2 NOT} \sqrt{\sum_{m = - \frac{NOT}{2}}^{\frac{NOT}{2}} {|\tilde{X_{k}} [m] []|}^{2}}

where RMS _k represents the RMS value for the section of index k;
$|\tilde{X_{k}} [m]|$
represents the spectrum of the hybrid signal for the section of index k; And
N represents the number of samples of said section.

Cette valeur du niveau RMS est optionnellement convertie en une valeur dBFS à partir de l'équation suivante : ${RMS}_{k}^{dB} = 20 \times \log_{10} ({RMS}_{k})$

où log₁₀ représente l'opérateur logarithme décimal, ou encore logarithme de base 10.This RMS level value is optionally converted into a dBFS value from the following equation:

{RMS}_{k}^{dB} = 20 \times \log_{10} ({RMS}_{k})

where log ₁₀ represents the decimal logarithm operator, or logarithm to base 10.

Cette valeur dBFS est typiquement comprise entre -94 dBFS au minimum (dans le cas d'une résolution dynamique de 16 bits) et 0 dBFS au maximum (pour un signal constant qui vaudrait 1).This dBFS value is typically between -94 dBFS at least (in the case of a dynamic resolution of 16 bits) and 0 dBFS at maximum (for a constant signal which would be worth 1).

En complément facultatif encore, le module de détection d'activité vocale 36 est configuré pour déterminer la présence de voix ou l'absence de voix en fonction d'une valeur moyenne de M dernières valeurs RMS calculées, également appelée RMS lissé, et/ou d'une variation de valeur RMS entre une valeur RMS courante et une valeur RMS précédente, également appelée taux de variation du niveau RMS, avec M un nombre entier supérieur ou égal à 1.As an optional addition, the voice activity detection module 36 is configured to determine the presence of voices or the absence of voices based on an average value of M last calculated RMS values, also called smoothed RMS, and/or of a variation in RMS value between a current RMS value and a previous RMS value, also called rate of variation of the RMS level, with M an integer greater than or equal to 1.

Selon ce complément facultatif encore, le module de détection d'activité vocale 36 est par exemple configuré pour déterminer la présence de voix si ladite valeur moyenne est supérieure ou égale à un seuil prédéfini de moyenne A ou si ladite variation de valeur RMS est supérieure ou égale à un seuil prédéfini de variation B.According to this optional addition again, the voice activity detection module 36 is for example configured to determine the presence of voices if said average value is greater than or equal to a predefined average threshold A or if said RMS value variation is greater than or equal to a predefined variation threshold B.

La valeur du niveau RMS est susceptible de varier dans le temps, et de subir des brusques variations lorsque le microphone concerné, en particulier le deuxième microphone 14, capte une vibration importante. Ce complément facultatif permet alors d'améliorer la précision et de réduire les erreurs de l'algorithme, avec un moyennage sur les M dernières valeurs calculées du niveau RMS (lors des M derniers tronçons). Ceci est par exemple mis en oeuvre via un buffer circulaire qui à chaque nouveau tronçon vient ajouter la nouvelle valeur RMS calculée, supprime la M^ième dernière, puis moyenne l'ancienne. Le niveau RMS lissé au k^ième tronçon, noté $\overline{{RMS}_{k}^{dB}}$

, vérifie par exemple l'équation suivante :

\overline{{RMS}_{k}^{dB}} = \frac{1}{M} \times \sum_{j = 0}^{M - 1} {RMS}_{k - j}^{dB}

The value of the RMS level is likely to vary over time, and to undergo sudden variations when the microphone concerned, in particular the second microphone 14, picks up a significant vibration. This optional addition then makes it possible to improve the precision and reduce the errors of the algorithm, with averaging over the last M calculated values of the RMS level (during the last M sections). This is for example implemented via a circular buffer which for each new section adds the new calculated RMS value, deletes the last ^{Mth value} , then averages the old one. The smoothed RMS level at the ^kth section, noted

\overline{{RMS}_{k}^{dB}}

, verifies for example the following equation:

\overline{{RMS}_{k}^{dB}} = \frac{1}{M} \times \sum_{j = 0}^{M - 1} {RMS}_{k - j}^{dB}

Le suivi de la valeur de $\overline{{RMS}_{k}^{dB}}$

au cours du temps permet de repérer les zones de voix lorsque celui-ci dépasse un certain seuil. Néanmoins, dû au lissage, ce niveau peut dépasser le seuil choisi légèrement en retard. Avantageusement, une deuxième métrique liée au niveau au RMS, à savoir le taux de variation du niveau RMS noté ΔRMS_k ^dB, est alors calculée pour mieux détecter l'apparition de la voix, par exemple via l'équation suivante :

{ΔRMS}_{k}^{dB} = \frac{(\overline{{RMS}_{k}^{dB}} - \overline{{RMS}_{k - 1}^{dB}})}{dt}

où ΔRMS_k ^dB représente le taux de variation du niveau RMS pour le tronçon d'indice k ;
$\overline{{RMS}_{k - 1}^{dB}}$
, resp. $\overline{{RMS}_{k}^{dB}}$
, représente le niveau RMS lissé pour le tronçon d'indice k-1, et resp. d'indice k ;
dt représente un delta de temps entre deux tronçons successifs.

Monitoring the value of

\overline{{RMS}_{k}^{dB}}

over time makes it possible to identify areas of voice when it exceeds a certain threshold. However, due to smoothing, this level may exceed the chosen threshold slightly late. Advantageously, a second metric linked to the RMS level, namely the rate of variation of the RMS level denoted ΔRMS _k ^dB , is then calculated to better detect the appearance of the voice, for example via the following equation:

{ΔRMS}_{k}^{dB} = \frac{(\overline{{RMS}_{k}^{dB}} - \overline{{RMS}_{k - 1}^{dB}})}{dt}

where ΔRMS _k ^dB represents the rate of variation of the RMS level for the section of index k;
$\overline{{RMS}_{k - 1}^{dB}}$
, resp. $\overline{{RMS}_{k}^{dB}}$
, represents the smoothed RMS level for the section with index k-1, and resp. of index k;
dt represents a time delta between two successive sections.

La valeur dt peut correspondre exactement au delta de temps entre deux tronçons successifs, et la variation du niveau RMS sera alors exprimée en dB.s^-1, mais celui-ci peut prendre des valeurs très importantes.The value dt can correspond exactly to the time delta between two successive sections, and the variation of the RMS level will then be expressed in dB.s ^-1 , but this can take very large values.

En variante, et par commodité, la valeur dt est choisie égale à 1. Le cas échéant, ΔRMS_k ^dB est un taux de variation exprimé en dB.tronçon^-1. Cette grandeur est pertinente car au moment où un interlocuteur se met à parler, le niveau RMS augmente brutalement, se traduisant par un ΔRMS_k ^dB positif et supérieur à 1 dB.tronçon^-1. Cette grandeur variant vite, elle permet de détecter la voix très rapidement, évitant ainsi de louper le début d'une phrase.Alternatively, and for convenience, the value dt is chosen equal to 1. Where appropriate, ΔRMS _k ^dB is a rate of variation expressed in dB.section ^-1 . This quantity is relevant because the moment an interlocutor begins to speak, the RMS level increases suddenly, resulting in a positive ΔRMS _k ^dB greater than 1 dB.section ^-1 . This size varies quickly, it makes it possible to detect the voice very quickly, thus avoiding missing the start of a sentence.

La prise de décision pour la détection d'activité vocale instantanée est alors définie par exemple par l'équation suivante : $\begin{matrix} {Si \overline{{RMS}_{k}^{dB}} \geq A alors {DAV}_{k} = 1 \\ Ou si {ΔRMS}_{k}^{dB} \geq B alors {DAV}_{k} = 1 \\ Sinon {DAV}_{k} = 0 \end{matrix}$

où $\overline{{RMS}_{k}^{dB}}$
représente le niveau RMS lissé pour le tronçon d'indice k ;
ΔRMS_k ^dB représente taux de variation du niveau RMS pour le tronçon d'indice k ;
DAV_k est un indicateur d'activité vocale pour le tronçon d'indice k, cet indicateur étant égal à 1 si une présence de voix est déterminée, et à 0 sinon ;
A représente le seuil prédéfini de moyenne et B représente le seuil prédéfini de variation, correspondant respectivement aux seuils de niveau et du taux de variation à dépasser pour considérer que le tronçon est parlé.

Decision making for instantaneous voice activity detection is then defined for example by the following equation:

\begin{matrix} {If \overline{{RMS}_{k}^{dB}} \geq HAS SO {DAV}_{k} = 1 \\ Or if {ΔRMS}_{k}^{dB} \geq B SO {DAV}_{k} = 1 \\ Otherwise {DAV}_{k} = 0 \end{matrix}

Or $\overline{{RMS}_{k}^{dB}}$
represents the smoothed RMS level for the section of index k;
ΔRMS _k ^dB represents rate of variation of the RMS level for the section of index k;
DAV _k is a voice activity indicator for the section of index k, this indicator being equal to 1 if the presence of voice is determined, and to 0 otherwise;
A represents the predefined average threshold and B represents the predefined variation threshold, corresponding respectively to the level and rate of variation thresholds to be exceeded to consider that the section is spoken.

Ces valeurs de seuil A et B sont prédéfinies en fonction de la dynamique de l'appareil acoustique 10, par exemple en fonction du gain du microphone concerné, en particulier du deuxième microphone 14, etc.These threshold values A and B are predefined as a function of the dynamics of the acoustic device 10, for example as a function of the gain of the microphone concerned, in particular of the second microphone 14, etc.

Le calcul de la détection d'activité vocale décrit ci-dessus donne une valeur instantanée pour chaque tronçon successif (qu'il soit chevauché ou non). Se baser uniquement sur une valeur instantanée peut conduire à des erreurs, par exemple un micro-silence dans la voix pourrait créer un passage à 0 non souhaité de l'indicateur d'activité vocale DAV. Au contraire, un bruit impulsionnel très court peut conduire à un indicateur d'activité vocale DAV égal à 1 pour un seul tronçon, avant de repasser à 0. En fonction de l'utilisation du module de détection d'activité vocale 36 (avec un mode où le canal n'est ouvert que si DAV = 1 par exemple), ce comportement peut provoquer des artefacts désagréables. C'est pourquoi le calcul de la détection d'activité vocale est avantageusement lissé afin d'éviter ces artefacts.The voice activity detection calculation described above gives an instantaneous value for each successive chunk (whether overlapped or not). Relying solely on an instantaneous value can lead to errors, for example a micro-silence in the voice could create an unwanted change to 0 in the DAV voice activity indicator. On the contrary, a very short impulse noise can lead to a DAV voice activity indicator equal to 1 for a single section, before returning to 0. Depending on the use of the voice activity detection module 36 (with a mode where the channel is only open if DAV = 1 for example), this behavior can cause unpleasant artifacts. This is why the calculation of voice activity detection is advantageously smoothed in order to avoid these artifacts.

Ce lissage est par exemple réalisé à partir de l'utilisation d'un temps d'attaque et d'un temps de relâche. Lorsqu'un indicateur d'activité vocale DAV instantané DAV_inst ^k est égal à 1 au moins aussi longtemps que le temps d'attaque (ou le nombre de tronçon(s) équivalent), alors un indicateur d'activité vocale DAV lissé DAV_lisse ^k devient égal à 1. Au contraire, lorsque l'indicateur d'activité vocale DAV instantané DAV_inst ^k est égal à 0 au moins aussi longtemps que le temps de relâche, alors l'indicateur d'activité vocale DAV lissé DAV_lisse ^k repasse à 0. Dans tous les autres cas, l'indicateur d'activité vocale DAV lissé DAV_lisse ^k conserve la valeur qu'il avait au tronçon précédent. Pour la mise en oeuvre de ce lissage, un compteur C_k est par exemple utilisé. La modification de ce compteur C_k est typiquement régie par le tableau 1 ci-après pour chaque tronçon courant d'indice k, en fonction de l'indicateur d'activité vocale DAV instantané DAV_inst ^k et de la valeur du compteur C_k-1 au tronçon précédent d'indice k-1 : [Table 1] ET C_k-1 ≥ 0 C_k-1 < 0 DAV_inst^k = 0 Réinitialisation du compteur : C_k = 0 C_k = C_k-1 -1 DAV_inst^k = 1 C_k = C_k-1 +1 Réinitialisation du compteur : C_k = 0 This smoothing is for example carried out using an attack time and a release time. When an instantaneous DAV voice activity indicator DAV _inst ^k is equal to 1 at least as long as the attack time (or the equivalent number of chunks), then a smoothed DAV voice activity indicator _smooth DAV ^k becomes equal to 1. On the contrary, when the voice activity indicator DAV instantaneous DAV _inst ^k is equal to 0 at least as long as the release time, then the voice activity indicator DAV smoothed DAV _smooth ^k returns to 0. In all other cases, the smoothed DAV voice activity indicator _smoothed DAV ^k retains the value it had at the previous section. To implement this smoothing, a counter C _k is for example used. The modification of this counter C _k is typically governed by Table 1 below for each current section of index k, as a function of the instantaneous DAV voice activity indicator DAV _inst ^k and the value of the counter C _{k- 1} to the previous section of index k-1: [Table 1] AND C _k-1 ≥ 0 C _k-1 < 0 DAV _ins t ^k = 0 Counter reset: C _k = 0 C _k = C _k-1 -1 DAV _ins t ^k = 1 C _k = C _k-1 +1 Counter reset: C _k = 0

La prise de décision pour la détection d'activité vocale lissée est alors définie par exemple par l'équation suivante : $\begin{matrix} {Si C_{k} > t_{atk} alors {DAV}_{lisse}^{k} = 1 \\ Si C_{k} < - t_{rel} alors {DAV}_{lisse}^{k} = 0 \\ Sinon {DAV}_{lisse}^{k} = {DAV}_{lisse}^{k - 1} \end{matrix}$

où DAV_lisse ^k est l'indicateur d'activité vocale lissé pour le tronçon d'indice k, cet indicateur étant égal à 1 si une présence de voix est déterminée, et à 0 sinon ;
C_k est le compteur pour le tronçon d'indice k ;
t_atk représente le temps d'attaque ; et
t_rel représente le temps de relâche.

The decision-making for the detection of smoothed vocal activity is then defined for example by the following equation:

\begin{matrix} {If {VS}_{k} > t_{atk} SO {DAV}_{smooth}^{k} = 1 \\ If {VS}_{k} < - t_{rel} SO {DAV}_{smooth}^{k} = 0 \\ Otherwise {DAV}_{smooth}^{k} = {DAV}_{smooth}^{k - 1} \end{matrix}

where _smooth DAV ^k is the smoothed voice activity indicator for the section of index k, this indicator being equal to 1 if a presence of voice is determined, and to 0 otherwise;
C _k is the counter for the section with index k;
t _atk represents the attack time; And
t _rel represents the release time.

Le fonctionnement de l'appareil acoustique 10, et en particulier du dispositif de traitement 20, selon l'invention va être à présent décrit en regard de la figure 4 représentant un organigramme du procédé de traitement selon l'invention.The operation of the acoustic device 10, and in particular of the processing device 20, according to the invention will now be described with regard to the figure 4 representing a flowchart of the treatment method according to the invention.

Le traitement appliqué au signal pour réduire le bruit est effectué de manière numérique et en temps réel. En effet, lorsque l'opérateur utilise l'appareil acoustique 10, le signal doit être débruité et envoyé à son interlocuteur le plus rapidement possible, en cherchant à diminuer au maximum la latence, avec une valeur souhaitée de 20 à 30 ms. Pour permettre un débruitage qualitatif, il faut disposer d'un minimum d'informations à analyser avant de pouvoir réduire le bruit efficacement. Le traitement effectué est alors un traitement par bloc, appliqué tronçon par tronçon au signal d'entrée. Comme indiqué précédemment, les tronçons sont typiquement chacun d'une durée d'environ 20 ms. En effet, sur cette durée, la voix a un comportement quasi stationnaire, alors que le bruit l'est sur des durées bien plus importantes.The processing applied to the signal to reduce noise is carried out digitally and in real time. Indeed, when the operator uses the acoustic device 10, the signal must be denoised and sent to the interlocutor as quickly as possible, seeking to reduce the latency as much as possible, with a desired value of 20 to 30 ms. To enable qualitative denoising, there must be a minimum amount of information to analyze before the noise can be reduced effectively. The processing carried out is then block processing, applied section by section to the input signal. As indicated previously, the chunks are typically each approximately 20 ms long. Indeed, over this duration, the voice behaves almost stationary, whereas noise does so over much longer durations.

Afin d'optimiser la consommation électrique, la fréquence d'échantillonnage est de préférence inférieure à 22 050 Hz, permettant une bande passante comprise dans l'intervalle [0 ; 11 025 Hz]. En conséquence pour avoir des tronçons de signal d'environ 20 ms à cette fréquence d'échantillonnage, ceux-ci devront contenir typiquement 512 échantillons.In order to optimize power consumption, the sampling frequency is preferably less than 22,050 Hz, allowing a bandwidth in the interval [0; 11,025 Hz]. Consequently, to have signal sections of approximately 20 ms at this sampling frequency, these must typically contain 512 samples.

Le traitement appliqué au signal pour réduire le bruit est en grande partie effectué dans le domaine fréquentiel, qui est plus adapté au débruitage du fait que le but est de réduire le niveau dans les bandes de fréquences contenant le plus de bruit. Néanmoins, du fait de travailler par tronçons en fréquentiel, des problèmes de discontinuités et d'imprécisions peuvent apparaître d'un tronçon à un autre, et un chevauchement des tronçons, avec un taux de chevauchement de préférence supérieur à 50%, idéalement égal à 75%, tel que décrit ci-dessus, est alors avantageusement mis en oeuvre pour les atténuer.The processing applied to the signal to reduce noise is largely carried out in the frequency domain, which is more suitable for denoising because the aim is to reduce the level in the frequency bands containing the most noise. However, due to working in frequency sections, problems of discontinuities and inaccuracies can appear from one section to another, and an overlap of sections, with an overlap rate preferably greater than 50%, ideally equal to 75%, as described above, is then advantageously used to attenuate them.

Lors d'une étape initiale 100, le dispositif de traitement 20 calcule alors, via son module d'hybridation 30, le signal hybride à partir des premier et deuxième signaux analogiques, issus des premier et deuxième microphones 12, 14, de la manière décrite précédemment.During an initial step 100, the processing device 20 then calculates, via its hybridization module 30, the hybrid signal from the first and second analog signals, coming from the first and second microphones 12, 14, in the manner described previously.

Lors d'une étape optionnelle suivante 110, le dispositif de traitement 20 détermine, via son module de détection d'activité vocale 36, une présence de voix ou une absence de voix dans chaque tronçon du signal hybride, de la manière décrite précédemment.During a following optional step 110, the processing device 20 determines, via its voice activity detection module 36, a presence of voice or an absence of voice in each section of the hybrid signal, in the manner described above.

Le dispositif de traitement 20 estime ensuite, lors de l'étape suivante 120 et via son module d'estimation 32, le bruit dans le signal hybride, obtenu précédemment lors de l'étape d'hybridation 100, de la manière décrite précédemment.The processing device 20 then estimates, during the next step 120 and via its estimation module 32, the noise in the hybrid signal, obtained previously during the hybridization step 100, in the manner described above.

Lorsqu'optionnellement une présence de voix ou une absence de voix dans chaque tronçon du signal hybride a été déterminée lors de l'étape de détection d'activité vocale 110, le bruit est alors, lors de l'étape d'estimation 120, estimé dans le signal hybride en fonction de chaque tronçon avec une absence déterminée de voix, de la manière décrite précédemment.When optionally a presence of voice or an absence of voice in each section of the hybrid signal has been determined during the voice activity detection step 110, the noise is then, during the estimation step 120, estimated in the hybrid signal according to each section with a determined absence of voice, in the manner described previously.

Enfin, lors de l'étape suivante 130, le dispositif de traitement 20 applique, via son module de réduction de bruit 34, l'algorithme de soustraction spectrale généralisée au signal hybride et en fonction du bruit estimé, afin de calculer le signal corrigé.Finally, during the next step 130, the processing device 20 applies, via its noise reduction module 34, the generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise, in order to calculate the corrected signal.

Comme indiqué précédemment, le procédé de traitement est en temps réel ou en quasi-temps réel, avec une latence d'environ 20 à 30 ms, et un traitement par bloc, appliqué tronçon par tronçon au signal d'entrée.As indicated previously, the processing method is in real time or near real time, with a latency of approximately 20 to 30 ms, and block processing, applied section by section to the input signal.

Aussi, à l'issue de l'étape 130, le procédé de traitement retourne à l'étape initiale 100, et plus généralement, chacune des étapes 100, optionnellement 110, 120 et 130 est réitérée régulièrement afin d'être mise en oeuvre pour chaque tronçon successif de signal.Also, at the end of step 130, the processing process returns to the initial step 100, and more generally, each of the steps 100, optionally 110, 120 and 130 is repeated regularly in order to be implemented for each successive signal section.

Sur la figure 5, la courbe 200 représente alors un exemple avec un signal provenant d'un enregistrement aérien d'un locuteur s'exprimant dans un environnement fortement bruité (bruit véhicule à plus de 90 db(A)). La courbe 250 à la figure 5 présente le même signal après la mise en oeuvre du dispositif de traitement 20 selon l'invention. On constate que le bruit est fortement atténué avec le dispositif de traitement 20 selon l'invention, tout en observant que les parties correspondant à la voix sont bien visibles et présentent alors une bonne intelligibilité.On the figure 5 , curve 200 then represents an example with a signal coming from an aerial recording of a speaker speaking in a highly noisy environment (vehicle noise at more than 90 db(A)). The 250 curve at the figure 5 presents the same signal after the implementation of the processing device 20 according to the invention. We note that the noise is greatly attenuated with the processing device 20 according to the invention, while observing that the parts corresponding to the voice are clearly visible and then have good intelligibility.

La figure 6 présente un exemple de détection d'activité vocale utilisée sur un signal de voix enregistré par un microphone aérien classique pour différentes phases successives de bruit, d'une absence de bruit jusqu'à un bruit fort. La courbe 300 est la représentation temporelle de ce signal sur laquelle est superposée la décision prise par la détection d'activité vocale, où les zones grisées 310 correspondent à des zones pour lesquelles une présence de voix a été déterminée, i.e. DAV = 1 ; les autres zones, non grisées ou blanches, correspondant à des zones pour lesquelles une absence de voix a été déterminée, i.e. DAV = 0. Sur la figure 6, la courbe 320 représente le niveau RMS de ce signal issu du microphone aérien au cours du temps avec le niveau seuil à dépasser pour la prise de décision, le niveau seuil étant représenté par la droite horizontale 330 en trait pointillé. La courbe 340 correspond à l'estimation par l'algorithme du niveau RMS du bruit de fond dans les phases où la détection d'activité vocale a déterminé une absence de voix.There Figure 6 presents an example of voice activity detection used on a voice signal recorded by a conventional overhead microphone for different successive phases of noise, from no noise to loud noise. The curve 300 is the temporal representation of this signal on which the decision taken by the detection of voice activity is superimposed, where the gray areas 310 correspond to areas for which the presence of voice has been determined, ie DAV = 1; the other zones, not grayed or white, corresponding to zones for which an absence of voice has been determined, ie DAV = 0. On the Figure 6 , the curve 320 represents the RMS level of this signal coming from the aerial microphone over time with the threshold level to be exceeded for taking decision, the threshold level being represented by the horizontal line 330 in dotted line. Curve 340 corresponds to the algorithm's estimation of the RMS level of the background noise in the phases where the detection of voice activity has determined an absence of voice.

Dans cet exemple de la figure 6, le niveau seuil a été choisi volontairement bas, avec une valeur sensiblement égale à -40 dBFS pour permettre une bonne détection de la voix en l'absence de bruit. En effet, on constate que dans la phase sans bruit, pour la période temporelle comprise entre les instants temporels 0s et 15s, la voix émerge bien du bruit et le niveau RMS moyenné dépasse bien le seuil à chaque fois que l'utilisateur parle. La détection d'activité vocale classique est donc correcte sur la partie silencieuse. Cependant, dès que le bruit présente un niveau modéré, le niveau RMS moyenné est systématiquement au-dessus du seuil fixé, puisque trop bas. En conséquence, cela aboutit à une détermination erronée d'une présence de voix pendant toute la suite du signal : la détection d'activité vocale devient alors inopérante, car incapable de séparer la contribution du bruit de celle de la voix. La détection d'activité vocale donnant une réponse toujours positive, l'estimation du niveau RMS du bruit est par la même également totalement faussée, et reste sur la valeur prise lors de l'absence de bruit.In this example of the Figure 6 , the threshold level was deliberately chosen low, with a value approximately equal to -40 dBFS to allow good voice detection in the absence of noise. Indeed, we see that in the noise-free phase, for the time period between time instants 0s and 15s, the voice emerges from the noise and the averaged RMS level exceeds the threshold each time the user speaks. Classic voice activity detection is therefore correct on the silent part. However, as soon as the noise presents a moderate level, the averaged RMS level is systematically above the set threshold, since it is too low. Consequently, this results in an erroneous determination of the presence of voices throughout the rest of the signal: the detection of vocal activity then becomes ineffective, because it is incapable of separating the contribution of the noise from that of the voice. Since the detection of voice activity always gives a positive response, the estimate of the RMS level of the noise is also completely distorted, and remains at the value taken when there was no noise.

La figure 7 est analogue à la figure 6, à la différence que le seuil de détection a été remonté à une valeur sensiblement égale à -20 dBFS. La courbe 400 est la représentation temporelle de ce signal sur laquelle est superposée la décision prise par la détection d'activité vocale, où les zones grisées 410 correspondent à des zones pour lesquelles une présence de voix a été déterminée, i.e. DAV = 1 ; les autres zones, non grisées ou blanches, correspondant à des zones pour lesquelles une absence de voix a été déterminée, i.e. DAV = 0. Sur la figure 7, la courbe 420 représente le niveau RMS de ce signal issu du microphone aérien au cours du temps avec le niveau seuil à dépasser pour la prise de décision, le niveau seuil étant représenté par la droite horizontale 430 en trait pointillé. La courbe 440 correspond à l'estimation par l'algorithme du niveau RMS du bruit de fond dans les phases où la détection d'activité vocale a déterminé une absence de voix.There Figure 7 is analogous to the Figure 6 , with the difference that the detection threshold has been raised to a value approximately equal to -20 dBFS. The curve 400 is the temporal representation of this signal on which the decision taken by the detection of voice activity is superimposed, where the gray areas 410 correspond to areas for which the presence of voice has been determined, ie DAV = 1; the other zones, not grayed or white, corresponding to zones for which an absence of voice has been determined, ie DAV = 0. On the Figure 7 , the curve 420 represents the RMS level of this signal coming from the aerial microphone over time with the threshold level to be exceeded for decision-making, the threshold level being represented by the horizontal line 430 in dotted line. Curve 440 corresponds to the algorithm's estimation of the RMS level of the background noise in the phases where the detection of voice activity has determined an absence of voice.

Sur la figure 7, l'homme du métier constatera alors que la détection de voix dans la partie à bruit modéré, entre les instants temporels 15s et 30s environ, est plutôt correcte. Le niveau RMS, aux moments où il y a de la voix, permet de discriminer celle-ci du bruit. Cependant, dès que l'on augmente encore le niveau de bruit, ce seuil ne permet plus de bien distinguer la voix du bruit, et de nombreuses zones sont considérées comme exclusivement parlées, entre les instants temporels 34s et 42s par exemple, alors qu'il y a en réalité des moments d'absence de voix dans ces zones. Pire encore, en raison du seuil trop haut, dans la partie sans bruit, la détection d'activité vocale de l'état de la technique confond plusieurs fois la voix avec du bruit et manque certaines détections ou les coupe trop tôt. Cela détériore alors gravement le signal de voix. De plus, cela fausse totalement l'estimation du niveau de bruit, correspondant à la courbe 440, qui est artificiellement augmentée lorsque la personne parle.On the Figure 7 , those skilled in the art will then note that voice detection in the part with moderate noise, between approximately time points 15s and 30s, is rather correct. The RMS level, at times when there is voice, allows it to be discriminated from noise. However, as soon as the noise level is further increased, this threshold no longer makes it possible to clearly distinguish the voice from the noise, and many areas are considered as exclusively spoken, between the temporal instants 34s and 42s for example, whereas there are actually moments of voicelessness in these areas. Worse still, due to the threshold being too high, in the noise-free part, state-of-the-art voice activity detection confuses the voice with noise several times and misses some detections or cuts them off. too early. This then seriously deteriorates the voice signal. In addition, this completely distorts the estimate of the noise level, corresponding to curve 440, which is artificially increased when the person speaks.

Finalement, au travers de ces deux exemples des figures 6 et 7 illustrant l'état de la technique, l'homme du métier comprendra qu'il faudrait que le seuil varie automatiquement (bas pour les phases de silence, plus haut pour les phases de bruit) pour permettre de bons résultats de la détection d'activité vocale de l'état de la technique avec un microphone aérien. En effet, avec la détection d'activité vocale classique, un réglage fixe du seuil ne peut correspondre correctement à la fois à un environnement bruité et à un environnement calme, notamment en raison de la forte sensibilité des microphones aérien à l'environnement.Finally, through these two examples of figures 6 And 7 illustrating the state of the art, those skilled in the art will understand that the threshold should vary automatically (low for the silent phases, higher for the noise phases) to allow good activity detection results state of the art voice with an overhead microphone. Indeed, with traditional voice activity detection, a fixed threshold setting cannot correctly correspond to both a noisy environment and a quiet environment, in particular due to the high sensitivity of aerial microphones to the environment.

La figure 8 illustre la mise en oeuvre du dispositif de traitement 20 selon l'invention, et notamment la détection d'activité vocale selon l'invention à partir du deuxième signal issu du transducteur à excitation mécanique osseuse, ceci sur le même enregistrement que celui utilisé pour les exemples des figures 6 et 7, mais avec le deuxième microphone 14 ostéophonique, et ensuite l'utilisation de l'algorithme de soustraction spectrale généralisée.There figure 8 illustrates the implementation of the processing device 20 according to the invention, and in particular the detection of vocal activity according to the invention from the second signal coming from the transducer with mechanical bone excitation, this on the same recording as that used for the examples of figures 6 And 7 , but with the second osteophonic microphone 14, and then the use of the generalized spectral subtraction algorithm.

La courbe 500 est la représentation temporelle de ce signal sur laquelle est superposée la décision prise par la détection d'activité vocale, où les zones grisées 510 correspondent à des zones pour lesquelles une présence de voix a été déterminée, i.e. DAV = 1 ; les autres zones, non grisées ou blanches, correspondant à des zones pour lesquelles une absence de voix a été déterminée, i.e. DAV = 0. Sur la figure 8, la courbe 520 représente le niveau RMS de ce signal issu du deuxième microphone 14 ostéophonique au cours du temps avec le niveau seuil à dépasser pour la prise de décision, le niveau seuil étant représenté par la droite horizontale 530 en trait pointillé. La courbe 540 correspond à l'estimation par l'algorithme du niveau RMS du bruit de fond dans les phases où la détection d'activité vocale a déterminé une absence de voix.The curve 500 is the temporal representation of this signal on which the decision taken by the detection of voice activity is superimposed, where the gray areas 510 correspond to areas for which the presence of voice has been determined, ie DAV = 1; the other zones, not grayed or white, corresponding to zones for which an absence of voice has been determined, ie DAV = 0. On the figure 8 , the curve 520 represents the RMS level of this signal coming from the second osteophonic microphone 14 over time with the threshold level to be exceeded for decision-making, the threshold level being represented by the horizontal line 530 in dotted line. Curve 540 corresponds to the algorithm's estimation of the RMS level of the background noise in the phases where the detection of voice activity has determined an absence of voice.

Avec le dispositif de traitement 20 selon l'invention, un premier élément marquant est que la forme d'onde associée à cet enregistrement ostéophonique filtré (filtre passe-bas) est beaucoup moins marquée par le bruit. Quel que soit le niveau de bruit, la voix émerge très facilement de celui-ci. Cet effet est encore plus visible sur la représentation du niveau RMS du signal filtré au cours du temps, il y a près de 40 dB de différence entre les pics liés à la voix et le bruit de fond. En conséquence, le choix de la valeur seuil devient plus aisé et offre une plus grande latitude qu'avec le dispositif de traitement de l'état de la technique. Ce seuil a par exemple été fixé arbitrairement ici à -35dBFS, tout en observant qu'une valeur de seuil à -25dBFS ou à -45dBFS aurait donné des résultats semblables. Grâce à cette émergence naturelle, l'algorithme de soustraction spectrale généralisée est particulièrement efficace et repère aussi bien la voix dans trois zones de bruits différents.With the processing device 20 according to the invention, a first striking element is that the waveform associated with this filtered osteophonic recording (low-pass filter) is much less marked by noise. Regardless of the noise level, the voice emerges very easily from it. This effect is even more visible on the representation of the RMS level of the filtered signal over time, there is almost 40 dB difference between the peaks linked to the voice and the background noise. Consequently, the choice of the threshold value becomes easier and offers greater latitude than with the processing device of the state of the art. This threshold was for example set arbitrarily here at -35dBFS, while observing that a threshold value at -25dBFS or -45dBFS would have given similar results. Thanks to this natural emergence, the generalized spectral subtraction algorithm is particularly effective and identifies the voice equally well in three different noise zones.

Enfin, grâce à ses performances, le dispositif de traitement 20 selon l'invention est capable de détecter précisément les périodes temporelles en présence de bruit uniquement. De cette façon, le moyennage du niveau RMS du microphone aérien uniquement aux moments où DAV = 0, permet d'obtenir une bonne estimation du niveau du bruit de fond, représenté par la courbe 540.Finally, thanks to its performance, the processing device 20 according to the invention is capable of precisely detecting the temporal periods in the presence of noise only. In this way, averaging the RMS level of the overhead microphone only at times when DAV = 0 makes it possible to obtain a good estimate of the background noise level, represented by curve 540.

Ces résultats montrent bien l'intérêt du dispositif de traitement 20 selon l'invention de par le gain important en performance et en coût de calcul, par rapport au dispositif de traitement de l'état de la technique.These results clearly show the interest of the processing device 20 according to the invention due to the significant gain in performance and calculation cost, compared to the processing device of the state of the art.

Ainsi, lorsque l'utilisateur se trouve dans un environnement bruité, et qu'il utilise l'appareil acoustique 10, par exemple avec une radio, pour communiquer avec un interlocuteur à distance, le signal envoyé à l'interlocuteur serait, sans mise en oeuvre de l'invention, altéré par la captation non souhaitée d'une portion de bruit de fond. Le dispositif électronique de traitement 20 selon l'invention permet de réduire la présence de ce bruit de fond dans le signal envoyé à l'interlocuteur, et en particulier de filtrer la voix de ce bruit, afin de viser à n'envoyer que le signal utile à l'interlocuteur via la radio.Thus, when the user is in a noisy environment, and he uses the acoustic device 10, for example with a radio, to communicate with a remote interlocutor, the signal sent to the interlocutor would be, without implementation work of the invention, altered by the unwanted capture of a portion of background noise. The electronic processing device 20 according to the invention makes it possible to reduce the presence of this background noise in the signal sent to the interlocutor, and in particular to filter the voice from this noise, in order to aim to send only the signal useful to the interlocutor via the radio.

Les résultats obtenus avec le dispositif électronique de traitement 20 selon l'invention, notamment ceux présentés ci-dessus en regard des figures 5 et 8, montrent en outre la synergie entre la détection d'activité vocale basée sur la captation d'un signal via le deuxième microphone 14 ostéophonique et la réduction de bruit via l'algorithme de soustraction spectrale généralisée. Cette synergie permet d'avoir une très bonne précision quant à l'activité vocale, ce qui permet de mettre à jour le spectre du bruit de façon efficace. Les résultats obtenus avec l'algorithme de soustraction spectrale généralisée sont alors améliorés, tout en utilisant un nombre limité d'opérations de calcul.The results obtained with the electronic processing device 20 according to the invention, in particular those presented above with regard to the figures 5 And 8 , further show the synergy between the detection of vocal activity based on the capture of a signal via the second osteophonic microphone 14 and the reduction of noise via the generalized spectral subtraction algorithm. This synergy allows for very good precision regarding vocal activity, which allows the noise spectrum to be updated effectively. The results obtained with the generalized spectral subtraction algorithm are then improved, while using a limited number of calculation operations.

On conçoit ainsi que le dispositif électronique de traitement 20, et le procédé de traitement associé, permettent d'améliorer encore la réduction du bruit dans le signal délivré en sortie de l'appareil acoustique 10.We can thus see that the electronic processing device 20, and the associated processing method, make it possible to further improve the reduction of noise in the signal delivered at the output of the acoustic device 10.

Claims

Electronic processing device (20) for an acoustic device (10), the acoustic apparatus (10) comprising a first microphone (12) comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from a user's vocal cords and of transforming said acoustic waves into a first analog signal; and a second microphone (14) comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal,

the electronic processing device (20) being configured to be connected to the first and second microphones (12,14), to receive the first and second analog signals as input and to output a corrected signal,

the electronic processing device (20) comprising: - a hybridization module (30) configured to calculate a hybrid signal from the first and second analog signals;

characterized in that it further comprises: - an estimation module (32) connected to the hybridization module (30) and configured to estimate noise in the hybrid signal;

- a noise reduction module (34) connected to the hybridization module (30) and to the estimation module (32), the noise reduction module (34) being configured to calculate the corrected signal by applying an algorithm generalized spectral subtraction of the hybrid signal and as a function of the estimated noise.

Device (20) according to claim 1, in which the hybrid signal comprises several successive sections, and the device (20) further comprises a voice activity detection module (36) connected to the hybridization module (30) and configured to determine a presence of voice or an absence of voice in each section of the hybrid signal; the estimation module (32) then being configured to estimate the noise in the hybrid signal as a function of each section with a determined absence of voice.

Device (20) according to claim 2, in which the voice activity detection module (36) is configured to determine the presence of voice or the absence of voice from the second signal from the bone mechanical excitation transducer;
the voice activity detection module (36) preferably being configured to determine the presence of voice or the absence of voice solely from the second signal, without taking into account the first signal.

Device (20) according to claim 3, in which the second signal comprises several successive sections, and the voice activity detection module (36) is configured to calculate an RMS value for each section of the second signal, then to determine the presence voice or absence of voice depending on respective RMS value(s).

Device (20) according to claim 4, wherein the voice activity detection module (36) is configured to determine the presence of voices or the absence of voices based on an average value of M last value(s) (s) calculated RMS and/or a variation in RMS value between a current RMS value and a previous RMS value, M being an integer greater than or equal to 1;
the voice activity detection module (36) preferably being configured to determine the presence of voices if said average value is greater than or equal to a predefined average threshold (A) or if said RMS value variation is greater than or equal to a predefined variation threshold (B).

Device (20) according to any one of the preceding claims, in which the hybridization module (30) is configured to convert the first analog signal into a first digital signal, as the first analog signal is received, and to generate first successive sections from the first digital signal, each new first section generated comprising samples of a first previous section and new samples of the first digital signal; And the hybridization module (30) is configured to convert the second analog signal into a second digital signal, as the second analog signal is received, and to generate second successive sections from the second digital signal, each new second section generated comprising samples of a previous second section and new samples of the second digital signal;

hybrid sections of the hybrid signal then being calculated progressively from the first and second sections generated; the corrected signal then being calculated from said hybrid sections.

Device (20) according to any one of the preceding claims, in which the hybridization module (30) is configured to obtain a first filtered signal by applying to the first signal a first filter associated with a first frequency range; to obtain a second filtered signal by applying to the second signal a second filter associated with a second frequency range; then to calculate the hybrid signal by summing the first filtered signal and the second filtered signal, the second frequency range being distinct from the first frequency range; the first frequency range preferably comprising frequencies higher than those of the second frequency range;

the first and second frequency ranges being preferably still disjoint.

Acoustic device (10) comprising: - a first microphone (12) comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from the vocal cords of a user and of transforming said acoustic waves into a first analog signal;

- a second microphone (14) comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal;

- an electronic processing device (20) connected to the first and second microphones (12,14), the electronic processing device (20) being configured to receive the first and second analog signals as input, then to output a corrected signal ;

characterized in that the electronic processing device (20) is according to any one of the preceding claims.

Processing method, the method being implemented by an electronic processing device (20) connected to first and second microphones (12,14), the first microphone (12) comprising an electroacoustic transducer capable of receiving acoustic sound waves of a sound signal coming from a user's vocal cords and transforming said acoustic waves into a first analog signal; and the second microphone (14) comprising a transducer with mechanical bone excitation capable of receiving by bone conduction vibrational oscillations of said sound signal and of transforming said vibrational oscillations into a second analog signal, the electronic processing device (20) being configured to receive as input the first and second analog signals and to output a corrected signal, the treatment process comprising: - a hybridization step (100) comprising the calculation of a hybrid signal from the first and second analog signals;

characterized in that it further comprises: - a step of estimating (120) a noise in the hybrid signal; And

- a noise reduction step (130) comprising the calculation of the corrected signal by applying a generalized spectral subtraction algorithm to the hybrid signal and as a function of the estimated noise.

Computer program comprising software instructions which, when executed by a computer, implement a method according to the preceding claim.