EP2795618B1

EP2795618B1 - Method of detecting a predetermined frequency band in an audio data signal, detection device and computer program corresponding thereto

Info

Publication number: EP2795618B1
Application number: EP12816709.5A
Authority: EP
Inventors: Arnault Nagle; Claude Lamblin
Original assignee: Orange SA
Current assignee: Orange SA
Priority date: 2011-12-20
Filing date: 2012-12-11
Publication date: 2017-11-01
Anticipated expiration: 2032-12-11
Also published as: EP2795618A1; CN104137179A; US9431030B2; WO2013093291A1; FR2984580A1; US20160171986A1; US9928852B2; CN104137179B; US20150179190A1

Description

Field of the invention

La présente invention se rapporte de manière générale au domaine du traitement de données sonores.The present invention relates generally to the field of sound data processing.

Ce traitement est adapté notamment à la transmission et/ou au stockage de signaux multimédias tels que les signaux audio (parole et/ou sons).This processing is adapted in particular to the transmission and / or storage of multimedia signals such as audio signals (speech and / or sounds).

La présente invention vise plus particulièrement l'analyse d'un signal audio issu d'un tel traitement.The present invention more specifically aims at analyzing an audio signal resulting from such a treatment.

Plus précisément, un tel traitement comprend une phase de codage du type à prédiction linéaire LPC (abréviation anglaise de "Linear Predictive Coding").More precisely, such a processing comprises a coding phase of linear prediction type LPC (abbreviation of Linear Predictive Coding ).

Background of the invention

Dans le domaine de la compression, les codeurs utilisent les propriétés du signal telles que sa structure harmonique, exploitée par des filtres de prédiction à long terme, ainsi que sa stationnarité locale, exploitée par des filtres de prédiction à court terme. Typiquement, le signal de parole peut être considéré comme un signal stationnaire par exemple sur des intervalles de temps de 10 à 20 ms. Il est donc possible d'analyser ce signal par blocs d'échantillons appelés trames, après un fenêtrage approprié. Les corrélations à court terme peuvent être modélisées par des filtres linéaires variant dans le temps dont les coefficients sont obtenus à l'aide d'une analyse par prédiction linéaire sur des trames, de faible durée (de 10 à 20 ms dans l'exemple précité). Le document US 2008/0059166 décrit un codeur scalable d'un signal audio.In the field of compression, encoders use signal properties such as its harmonic structure, exploited by long-term prediction filters, as well as its local stationarity, exploited by short-term prediction filters. Typically, the speech signal can be considered as a stationary signal for example over time intervals of 10 to 20 ms. It is therefore possible to analyze this signal by sample blocks called frames, after an appropriate windowing. The short-term correlations can be modeled by time-varying linear filters whose coefficients are obtained by means of a linear prediction analysis on frames, of short duration (from 10 to 20 ms in the aforementioned example ). The document US 2008/0059166 describes a scalable encoder of an audio signal.

Le codage par prédiction linéaire LPC est l'une des techniques de codage numérique les plus utilisées, en particulier dans le secteur de la téléphonie mobile, notamment dans le codeur 3GPP AMR-WB tel que décrit dans le document « 3GPP TS 26.190 V10.0.0 (2011-03) 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions (Release 10) ». Le codage LPC consiste à effectuer une analyse LPC du signal à coder pour déterminer un filtre LPC, puis à quantifier ce filtre, d'une part, et à modéliser et coder le signal d'excitation, d'autre part. Cette analyse LPC est effectuée en minimisant l'erreur de prédiction sur le signal à modéliser ou une version modifiée de ce signal. Le modèle autorégressif de prédiction linéaire d'ordre P consiste à déterminer un échantillon de signal à un instant n par une combinaison linéaire des P échantillons passés (principe de la prédiction). Le filtre de prédiction à court terme, noté A(z), modélise l'enveloppe spectrale du signal: $A (z) = \sum_{i = 0}^{P} - a_{i} \times z^{- i}$

LPC linear prediction coding is one of the most widely used digital coding techniques, particularly in the mobile telephony sector, particularly in the 3GPP AMR-WB encoder as described in the document 3GPP TS 26.190 V10.0.0 (2011-03) 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions (Release 10) ". The LPC coding consists in performing an LPC analysis of the signal to be coded in order to determine an LPC filter, then in quantifying this filter, on the one hand, and in modeling and coding the excitation signal, on the other hand. This LPC analysis is performed by minimizing the prediction error on the signal to be modeled or a modified version of this signal. The autoregressive P-order linear prediction model consists in determining a signal sample at an instant n by a linear combination of the past P samples (prediction principle). The short-term prediction filter, denoted by A (z), models the spectral envelope of the signal:

AT (z) = Σ_{i = 0}^{P} - {at}_{i} \times z^{- i}

La différence entre le signal S(n) à l'instant n et sa valeur prédite S̃(n) est l'erreur de prédiction: $e (n) = S (n) - \tilde{S} (n) = S (n) + \sum_{i = 1}^{P} a_{i} S (n - i)$

The difference between the signal S ( n ) at instant n and its predicted value S ( n ) is the prediction error:

e (not) = S (not) - \tilde{S} (not) = S (not) + Σ_{i = 1}^{P} {at}_{i} S (not - i)

Le calcul des coefficients de prédiction s'effectue en minimisant l'énergie E de l'erreur de prédiction donnée par: $E = \sum_{n} e {(n)}^{2} = \sum_{n} {(S (n) + \sum_{i = 1}^{P} a_{i} S (n - i))}^{2}$

The calculation of the prediction coefficients is done by minimizing the energy E of the prediction error given by:

E = \underset{not}{Σ} e {(not)}^{2} = \underset{not}{Σ} {(S (not) + Σ_{i = 1}^{P} {at}_{i} S (not - i))}^{2}

La résolution de ce système est bien connue, notamment par l'algorithme de Levinson-Durbin ou l'algorithme de Schur.The resolution of this system is well known, in particular by the Levinson-Durbin algorithm or the Schur algorithm.

Les coefficients a_i du filtre doivent être transmis au récepteur. Cependant, ces coefficients n'ayant pas de bonnes propriétés de quantification, des transformations sont préférentiellement utilisées. Parmi les plus courantes, on peut citer:

les coefficients PARCORs (abréviation anglaise de "PARtial CORrelation") consistant en des coefficients de réflexion ou coefficients de corrélation partielle,
les Rapports d'Aires Logarithmiques LAR (abréviation anglaise de "Log Area Ratio") des coefficients PARCORs,
les lignes spectrales par paires LSP (abréviation anglaise de "Line Spectral Pairs").

The coefficients a _i of the filter should be transmitted to the receiver. However, since these coefficients do not have good quantization properties, transformations are preferentially used. Among the most common are:

the coefficients PARCORs (abbreviation of "PARtial CORrelation" ) consisting of reflection coefficients or partial correlation coefficients,
Logarithmic Area Reports LAR (abbreviation of "Log Area Ratio" ) of the PARCOR coefficients,
spectral lines in pairs LSP (abbreviation of "Line Spectral Pairs").

Les coefficients LSP sont maintenant les plus utilisés pour la représentation du filtre LPC car ils se prêtent bien à la quantification vectorielle.LSP coefficients are now the most used for the representation of the LPC filter because they are well suited for vector quantization.

D'autres représentations équivalentes des coefficients LSP existent:

les coefficients LSF (abréviation anglaise de "Line Spectral Frequencies"),
les coefficients ISP (abréviation anglaise de "Immittance Spectral Pairs"),
ou encore les coefficients ISF (abréviation anglaise de "Immittance Spectral Frequencies").

Other equivalent representations of the LSP coefficients exist:

the coefficients LSF (abbreviation of "Line Spectral Frequencies"),
ISP coefficients (abbreviation for "Immittance Spectral Pairs")
or the ISF coefficients (abbreviation of "Immittance Spectral Frequencies") .

La technique de codage par prédiction linéaire LPC permet une réduction substantielle du débit au profit d'une qualité de restitution audio élevée. Toutefois, le codage à prédiction linéaire se prête mal à certaines applications de traitement de signaux audio codés, telles que la détection d'une bande de fréquence prédéterminée dans de tels signaux codés.The LPC linear prediction coding technique allows a substantial reduction of the bit rate in favor of high audio quality. However, linear prediction coding is poorly suited to certain coded audio signal processing applications, such as detecting a predetermined frequency band in such coded signals.

Il convient de rappeler qu'une telle détection peut s'avérer utile, voire nécessaire, compte tenu à l'heure actuelle, de la multiplicité croissante des formats de compression audio.It should be remembered that such detection may be useful or even necessary, given the current increasing number of audio compression formats.

En effet, pour offrir mobilité et continuité, les services de communication multimédias modernes et innovants doivent pouvoir fonctionner dans une grande variété de conditions. Le dynamisme du secteur de la communication multimédia et l'hétérogénéité des réseaux, accès et terminaux ont engendré une prolifération de formats de compression dont la présence dans les chaînes de communication nécessite plusieurs codages soit en cascade (transcodage), soit en parallèle (codage multi-format ou codage multi-mode).Indeed, to offer mobility and continuity, modern and innovative multimedia communication services must be able to operate in a wide variety of conditions. The dynamism of the multimedia communication sector and the heterogeneity of networks, accesses and terminals have led to a proliferation of compression formats whose presence in communication channels requires several codings either in cascade (transcoding) or in parallel (multi-coding). -format or multi-mode encoding).

Outre la technique de codage par prédiction linéaire mentionnée ci-dessus, il existe d'autres techniques de compression audio pour réduire le débit tout en maintenant une bonne qualité, telles que par exemple :

les techniques MIC "Modulation par Impulsions et Codage" (en anglais PCM "Pulse Code Modulation"),
et les techniques par transformée fréquentielle telles celles du type MDCT (abréviation anglaise de "Modified Discrete Cosine Transformation") ou FFT (abréviation anglaise de « Fast Fourier Transform »).

In addition to the linear prediction coding technique mentioned above, there are other audio compression techniques to reduce throughput while maintaining good quality, such as:

the MIC "Pulse Code Modulation" (PCM) techniques,
and frequency transform techniques such as those of the MDCT (Modified Discrete Cosine Transformation) or FFT (Fast Fourier Transform) type.

Certains codeurs combinent différentes techniques de codage. Ainsi dans le document Combescure P., Schnitzler J., Fischer K., Kircherr R., Lamblin C., Le Guyader A., Massaloux D., Quinquis C., Stegmann J., Vary P., A 16, 24, 32 kbit/s wideband speech codec based on ATCELP, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999 (ICASSP99), Page(s): 5 - 8 vol.1 , il est proposé de combiner une technique de transformée fréquentielle de type MDCT et une technique de codage par prédiction linéaire de type CELP (abréviation anglaise de « Code Excited Linear Prediction ») pour coder des signaux bande élargie, la commutation entre les deux technologies étant contrôlée par une classification du signal.Some coders combine different coding techniques. So in the document Combescure P., Schnitzler J., Fischer K., Kircherr R., Lamblin C., Guyader A., Massaloux D., Quinquis C., Stegmann J., Vary P., A 16, 24, 32 kbit / s Broadband speech codec based on ATCELP, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999 (ICASSP99), Page (s): 5 - 8 vol.1 it is proposed to combine an MDCT-type frequency transform technique and a CELP (Code Excited Linear Prediction) type coding technique for encoding broadband signals, the switching between the two technologies being controlled by a signal classification.

Le transcodage est nécessaire lorsque dans une chaîne de transmission, une trame de signal compressée émise par un codeur ne peut plus poursuivre son chemin, sous ce format. Le transcodage permet de convertir cette trame sous un autre format compatible avec la suite de la chaîne de transmission. La solution la plus élémentaire (et la plus courante à l'heure actuelle) est la mise bout à bout d'un décodeur et d'un codeur. La trame compressée arrive sous un premier format, puis elle est décompressée. Le signal décompressé est alors compressé à nouveau sous un second format accepté par la suite de la chaîne de communication. Cette mise en cascade d'un décodeur et d'un codeur est appelée un tandem.Transcoding is necessary when in a transmission chain, a compressed signal frame emitted by an encoder can no longer continue in this format. Transcoding makes it possible to convert this frame into another format compatible with the rest of the transmission chain. The most basic solution (and the most common at the moment) is the end-to-end addition of a decoder and an encoder. The compressed frame arrives in a first format, then it is decompressed. The decompressed signal is then compressed again in a second format accepted later in the communication chain. This cascading of a decoder and an encoder is called a tandem.

Dans le cas particulier d'un tandem, des codeurs codant respectivement des bandes de fréquence différentes peuvent être mis en cascade. Ainsi, un codeur fonctionnant dans une bande de fréquence élargie [50Hz-7kHz], appelée également bande WB (abréviation anglaise de « WideBand ») peut être amené à coder un contenu audio fonctionnant dans une bande de fréquence plus restreinte que la bande élargie. Par exemple, le contenu à coder par un codeur 3GPP AMR-WB tel que mentionné plus haut, bien qu'échantillonné à 16 kHz, peut n'être en fait qu'en bande téléphonique si un tel contenu a été codé précédemment par un codeur fonctionnant dans une bande de fréquence étroite [300 Hz, 3400 Hz], appelée également bande NB (abréviation anglaise de « NarrowBand »). Il se peut aussi que la qualité limitée de l'acoustique du terminal émetteur ne permette pas de couvrir toute la bande élargie.In the particular case of a tandem, coders respectively coding different frequency bands can be cascaded. Thus, an encoder operating in an enlarged frequency band [50Hz-7kHz], also called WB (WideBand), may be required to encode an audio content operating in a narrower frequency band than the enlarged band. For example, the content to be encoded by a 3GPP AMR-WB encoder as mentioned above, although sampled at 16 kHz, may be in fact only in a telephone band if such content has been encoded previously by an encoder operating in a narrow frequency band [300 Hz, 3400 Hz], also known as the NB band (abbreviation of "NarrowBand"). It is also possible that the limited quality of the acoustics of the transmitting terminal can not cover the entire enlarged band.

Il apparaît donc que la bande audio d'un flux codé par un codeur fonctionnant sur des signaux échantillonnés à une fréquence d'échantillonnage donnée peut être bien plus restreinte que celle réellement supportée par le codeur.It thus appears that the audio band of an encoder-encoded stream operating on sampled signals at a given sampling frequency may be much more restricted than that actually supported by the encoder.

Parmi les applications de traitement du signal audio exploitant avantageusement la connaissance de la bande de fréquence audio du contenu à traiter, on peut citer :

la classification des signaux audio,
la reconnaissance automatique de parole,
la conversion de la parole au texte (en anglais STT "Speech To Text") d'émissions de radio ou de télévision contenant des passages en bande étroite,
le tatouage numérique,
l'analyse non intrusive de flux par des sondes placées sur le plan média dans les réseaux, ce qui permet notamment de détecter le changement de bande des contenus transportés et éventuellement la durée desdits contenus dans une bande donnée, au sein du réseau suite à ce changement de bande,
l'affichage sur un terminal mobile d'un logo « HD Voice » (abréviation anglaise de « High-Definition Voice »), tel qu'approuvé par la GSMA en août 2011 pour les réseaux et terminaux mobiles et tel que décrit dans le document disponible à l'adresse Internet : http://www.gsm.org/membership/industry_logos.htm,
l'indicateur de nombres d'appels déposés en bande élargie sur une messagerie vocale mobile.

Among the audio signal processing applications advantageously exploiting knowledge of the audio frequency band of the content to be processed, mention may be made of:

the classification of audio signals,
automatic speech recognition,
speech to text (STT "Speech To Text") conversion of radio or television programs containing narrowband passages,
the digital tattoo,
the non intrusive analysis of streams by probes placed on the media plane in the networks, which makes it possible in particular to detect the band change of the contents transported and possibly the duration of said contents in a given band, within the network following this band change,
the display on a mobile terminal of an "HD Voice" logo (abbreviation of "High-Definition Voice"), as approved by the GSMA in August 2011 for mobile networks and terminals and as described in the document available on the Internet at: http://www.gsm.org/membership/industry_logos.htm,
the indicator of numbers of calls placed in broadband on a mobile voice mail.

Parmi les méthodes connues de détection de la bande de fréquence d'un signal audio numérique, il y a celles opérant dans le domaine signal (original ou décodé), et celles opérant dans le domaine codé.Among the known methods for detecting the frequency band of a digital audio signal are those operating in the signal domain (original or decoded), and those operating in the coded domain.

La détection de la bande de fréquence dans le domaine signal repose sur une analyse spectrale du signal audio numérique. A titre d'exemple, une telle détection est mise en oeuvre dans le codec 3GPP2 VMR-WB tel que décrit dans le document 3GPP2 C.S0052-0 (June 11, 2004) « Source-Controlled Variable-Rate Multimode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems » , afin de détecter un contenu audio bande étroite qui a été sur-échantillonné à la fréquence d'échantillonnage de 16 kHz propre à ce codec.The detection of the frequency band in the signal domain is based on a spectral analysis of the digital audio signal. By way of example, such a detection is implemented in the 3GPP2 VMR-WB codec as described in the document 3GPP2 C.S0052-0 (June 11, 2004) "Source-Controlled Variable-Rate Multi-mode Wideband Speech Codec (VMR-WB) Service Option 62 for Spread Spectrum Systems » , in order to detect narrowband audio content that has been oversampled at the 16 kHz sampling frequency specific to that codec.

Le codec précité procède à une analyse spectrale du signal temporel (après sous-échantillonnage à 12.8 kHz, filtrage passe-haut et pré-emphase) en effectuant deux transformées fréquentielles FFT sur 256 échantillons par trame, pour obtenir deux jeux de paramètres spectraux par trame. Le spectre obtenu par l'analyse FFT est divisé en 20 bandes critiques, le nombre de bins de fréquence dans ces 20 bandes étant M_CB= {2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 8, 9, 11, 14, 18, 21}. Puis, l'énergie dans chaque bande critique est calculée, selon la formule: $E_{CB} (i) = \frac{1}{{({}^{L_{FFT}}{/_{2}})}^{2} M_{CB} (i)} \sum_{k = 0}^{M_{CB} (i) - 1} (X_{R}^{2} (k + j_{i}) + X_{I}^{2} (k + j_{i})), i = 0, \dots, 19$

l'indice j_i est l'indice du premier bin de la bande

i (j_{i} = \sum_{k = 0}^{i - 1} M_{CB} (k) + 1),

et X_R (k) et X_I (k) étant les parties réelles et imaginaires du spectre FFT.The above-mentioned codec performs a spectral analysis of the time signal (after 12.8 kHz subsampling, high-pass filtering and pre-emphasis) by performing two FFT frequency transforms on 256 samples per frame, to obtain two sets of spectral parameters per frame. . The spectrum obtained by the FFT analysis is divided into 20 critical bands, the number of frequency bins in these 20 bands being M _CB = {2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 8, 9, 11, 14, 18, 21}. Then, the energy in each critical band is calculated, according to the formula:

E_{CB} (i) = \frac{1}{{({}^{{The}_{FFT}}{/_{2}})}^{2} M_{CB} (i)} Σ_{k = 0}^{M_{CB} (i) - 1} (X_{R}^{2} (k + j_{i}) + X_{I}^{2} (k + j_{i})), i = 0, ..., 19

the index j _i is the index of the first bin of the band

i (j_{i} = Σ_{k = 0}^{i - 1} M_{CB} (k) + 1),

and X _R ( k ) and X _I ( k ) being the real and imaginary parts of the FFT spectrum.

Afin de traiter correctement les signaux bande étroite sur-échantillonnés, un algorithme de détection est appliqué pour détecter de tels signaux. Il consiste à tester le niveau d'énergie lissée dans les deux dernières bandes.In order to correctly process oversampled narrow-band signals, a detection algorithm is applied to detect such signals. It consists of testing the level of smoothed energy in the last two bands.

En variante à la transformée FFT précitée, d'autres transformées fréquentielles peuvent être utilisées, telles que par exemple la transformée MDCT (abréviation anglaise de "Modified Discrete Cosine Transformation »).As an alternative to the above-mentioned FFT transform, other frequency transforms can be used, such as, for example, the Modified Discrete Cosine Transformation (MDCT).

La détection de la bande de fréquence dans le domaine codé peut reposer quant à elle sur un décodage préalable du signal codé puis sur l'application des techniques d'analyse spectrale ci-dessus telles qu'utilisées dans le domaine signal pour analyser les contenus audio originaux (non codés ou avant codage). Cependant, le décodage augmente la complexité et le retard du traitement. Dans bien des applications, il est donc souhaitable, pour éviter ces problèmes de complexité et/ou de retard, d'extraire les caractéristiques du signal sans effectuer un décodage complet du signal.The detection of the frequency band in the coded domain can be based on a prior decoding of the coded signal and then on the application of the spectral analysis techniques above as used in the signal domain to analyze the audio contents. originals (not coded or before coding). However, decoding increases the complexity and delay of processing. In many applications, it is therefore desirable, in order to avoid these problems of complexity and / or of delay, to extract the characteristics of the signal without performing a complete decoding of the signal.

Plusieurs techniques d'analyse dans le domaine codé ont été proposées. Elles concernent les codeurs par transformée ou en-sous bandes tels les codeurs MPEG (e.g. MP3, AAC, ...).Several analysis techniques in the coded domain have been proposed. They concern transform or sub-band encoders such as MPEG coders (eg MP3, AAC, etc.).

Dans de tels codeurs, le flux codé comporte en effet des coefficients spectraux codés, comme par exemple, les coefficients MDCT dans le codeur MP3. Ainsi dans le document Liaoyu Chang, Xiaoqing Yu, Haiying Tan, Wanggen Wan, Research and Application of Audio Feature in Compressed Domain, IET Conference on Wireless, Mobile and Sensor Networks, 2007. (CCWMSN07), Page(s): 390 - 393, 2007 , il est proposé, plutôt que de décoder la totalité du signal audio codé, de décoder uniquement les coefficients MDCT qui permettent à eux seuls de déterminer les caractéristiques spectrales du signal codé. La largeur de bande BW (abréviation anglaise de "Bandwidth") du contenu audio codé est ainsi déterminée à partir de ces coefficients MDCT à l'aide de l'expression suivante: $BW = Max \{i | {SMRS}_{i} \geq T_{SRMS}\} - Min \{i | {SMRS}_{i} \leq T_{SRMS}\}$

où SMRS_i est la racine carrée de l'énergie de la i^eme bande (

{SMRS}_{i} = \sqrt{\frac{1}{N_{i}} \sum_{j} S_{i, j}^{2}},

où S_i,j représente le j^ieme coefficient de la i^ieme bande et N_i, le nombre de coefficients dans la i^ieme bande) et T_SRMS un seuil.In such coders, the coded stream indeed comprises coded spectral coefficients, such as, for example, the MDCT coefficients in the MP3 encoder. So in the document Liaoyu Chang, Xiaoqing Yu, Haiying Tan, Wanggen Wan, IET Conference on Wireless, Mobile and Sensor Networks, 2007. (CCWMSN07), Page (s): 390 - 393, 2007 , rather than decoding the entire coded audio signal , it is proposed to decode only the MDCT coefficients which alone allow the spectral characteristics of the coded signal to be determined. Bandwidth BW ( Bandwidth ) is thus determined from these MDCT coefficients using the following expression:

BW = Max \{i | {SMRS}_{i} \geq T_{SRMS}\} - Low \{i | {SMRS}_{i} \leq T_{SRMS}\}

where SMRS _i is the square root of the energy of the ^ith band (

{SMRS}_{i} = \sqrt{\frac{1}{{NOT}_{i}} \underset{j}{Σ} S_{i, j}^{2}},

where S _{i, j} represents the j ^th coefficient of the i ^th band, and N _i, the number of coefficients in the i ^th band) and T _SRMS a threshold.

Les méthodes de détection de la bande de fréquence d'un signal audio numérique qui viennent d'être décrites reposent principalement sur une analyse fréquentielle du spectre du signal. Dans le cas où le contenu audio a été codé par une transformée fréquentielle, la détection de la bande de fréquence audio dans le contenu codé exploite avantageusement l'information spectrale contenue dans le flux binaire codé en ne décodant pas complètement le signal. Ceci réduit notablement la complexité de la détection en éliminant les coûteuses opérations que requièrent le décodage complet et l'analyse spectrale (à base de FFT ou de MDCT) du signal audio codé.The methods for detecting the frequency band of a digital audio signal which have just been described are mainly based on a frequency analysis of the signal spectrum. In the case where the audio content has been encoded by a frequency transform, the detection of the audio frequency band in the coded content advantageously exploits the spectral information contained in the coded bitstream by not completely decoding the signal. This significantly reduces the complexity of the detection by eliminating the costly operations required for full decoding and spectral analysis (FFT or MDCT based) of the encoded audio signal.

Or si les technologies de compression par transformée sont très répandues en codage audio (hauts débits, fréquence d'échantillonnage élevée), ce n'est pas le cas en codage de parole où les procédés de codage utilisent majoritairement les technologies de compression à prédiction linéaire telles que décrites précédemment et qui reposent pourtant sur une modélisation de l'enveloppe spectrale du signal par les coefficients de prédiction linéaire du filtre LPC à court terme et les diverses transformations (ex : LSP) utilisées pour la quantification.Although transform compression technologies are widely used in audio coding (high data rates, high sampling rate), this is not the case in speech coding where coding methods mainly use linear prediction compression technologies. as previously described and which nevertheless rely on a modeling of the spectral envelope of the signal by the linear prediction coefficients of the short-term LPC filter and the various transformations (ex: LSP) used for the quantification.

Une solution pour déterminer la bande de fréquence audio d'un signal codé par un codeur à prédiction linéaire consiste à décoder le signal puis à lui appliquer une méthode de détection de bande de fréquence dans le domaine signal, telle que celle qui a été décrite ci-dessus. Cependant, une telle solution s'avère très coûteuse en complexité de calculs, entraînant de ce fait une consommation non souhaitée des ressources de l'unité centrale de traitement CPU (abréviation anglaise de « Central Processing Unit »). La complexité de calculs est engendrée par l'application des transformées fréquentielles FFT ou MDCT qui restent des opérations complexes.One solution for determining the audio frequency band of a signal encoded by a linear prediction coder is to decode the signal and then to apply to it a frequency band detection method in the signal domain, such as that described above. -above. However, such a solution proves to be very expensive in computational complexity, thereby resulting in an unwanted consumption of CPU resources (abbreviation of "Central Processing Unit"). The complexity of calculations is generated by the application of frequency transforms FFT or MDCT which remain complex operations.

De plus, si dans certaines des applications précitées de traitement du signal audio bénéficiant de la connaissance de la bande de fréquence audio, le signal décodé est disponible, telles que par exemple l'application consistant à afficher sur un terminal mobile un logo « HD Voice », ce n'est pas le cas de toutes les applications. Ainsi, par exemple, dans l'application d'indicateur de nombres d'appels déposés en bande élargie sur une messagerie vocale mobile, il faut alors rajouter à la complexité de la transformée temps-fréquence et de la détection de la bande audio à partir des énergies par bande, la complexité du décodage. Or, dans un codeur, tel qu'en particulier le codeur AMR-WB précité, le décodage représente 20% de la complexité totale du codeur, elle-même estimée autour de 40 WMOPS (abréviation anglaise de « Weighted Millions of Operations Per Second »).In addition, if in some of the aforementioned audio signal processing applications benefiting from knowledge of the audio frequency band, the decoded signal is available, such as for example the application of displaying on a mobile terminal a logo "HD Voice This is not the case for all applications. Thus, for example, in the application of an indicator of the number of calls placed in broadband on a mobile voice mail, it is then necessary to add to the complexity of the time-frequency transform and the detection of the audio band from band energies, the complexity of the decoding. However, in an encoder, such as in particular the aforementioned AMR-WB encoder, the decoding represents 20% of the total complexity of the encoder, itself estimated around 40 WMOPS (abbreviation of "Weighted Millions of Operations Per Second"). ).

Comme indiqué précédemment, certains codeurs combinent des techniques de codage par prédiction linéaire avec d'autres techniques de compression telles que par exemple des techniques de codage par transformée fréquentielle de type MDCT. On pourrait alors se contenter de n'effectuer la détection que sur les blocs de signal audio codés par une technique de transformée fréquentielle en utilisant pour ces blocs une méthode de l'état de l'art. Cependant cette solution nuirait à la réactivité de la détection car selon le type du contenu et/ou le débit, le codage à prédiction linéaire peut être majoritairement utilisé.As indicated above, some coders combine linear prediction coding techniques with other compression techniques such as, for example, MDCT-type frequency transform coding techniques. It would then be sufficient to perform the detection on the blocks of audio signal encoded by a frequency transform technique using for these blocks a state of the art method. However this solution would harm the reactivity of the detection because depending on the type of the content and / or the bit rate, the linear prediction coding may be mainly used.

Object and summary of the invention

Un des buts de l'invention est de remédier à des inconvénients de l'état des techniques précitées.One of the aims of the invention is to overcome disadvantages of the state of the aforementioned techniques.

A cet effet, un objet de la présente invention concerne un procédé de détection selon la revendication 1. Une telle disposition permet d'identifier, avec un faible coût de calculs, si la bande de fréquence audio d'un contenu préalablement codé par un codeur à prédiction linéaire est plus restreinte ou non que la bande de fréquence audio dans laquelle fonctionne un tel codeur.For this purpose, an object of the present invention relates to a detection method according to claim 1. Such an arrangement makes it possible to identify, with a low cost of calculations, whether the audio frequency band of a content previously coded by an encoder linear prediction is more restricted or not than the audio frequency band in which such an encoder operates.

Dans le cas par exemple du codeur AMR-WB pour lequel le signal est échantillonné à 16 kHz, puis sous-échantillonné à 12.8 kHz en vue de l'analyse LPC de ce dernier, l'invention permet de déterminer par exemple la présence d'un contenu audio de fréquence supérieure à 4 kHz.In the case, for example, of the AMR-WB encoder for which the signal is sampled at 16 kHz and then subsampled at 12.8 kHz for LPC analysis of the latter, the invention makes it possible, for example, to determine the presence of audio content above 4 kHz.

Une telle disposition est particulièrement avantageuse en ce sens qu'elle n'impose pas nécessairement un décodage complet du signal audio. Ainsi, l'invention peut être avantageusement mise en oeuvre dans certaines applications de détection de bandes de fréquences qui n'ont pas besoin de réaliser un décodage du signal audio codé, telles que par exemple l'indicateur de nombres d'appels déposés en bande élargie sur une messagerie vocale mobile.Such an arrangement is particularly advantageous in that it does not necessarily require complete decoding of the audio signal. Thus, the invention can be advantageously implemented in certain frequency band detection applications that do not need to perform a decoding of the coded audio signal, such as for example the indicator of numbers of calls deposited in broadband on a mobile voice mail.

Grâce à la simplicité d'une telle détection basée principalement sur l'analyse des différences dans les distributions d'une partie seulement des paramètres spectraux de prédiction linéaire décodés, les performances de cette détection s'en trouvent optimisées. En outre, la complexité des calculs effectués pour la mise en oeuvre d'une telle détection est nettement réduite en comparaison de la complexité de calculs engendrée par l'application de transformées fréquentielles FFT ou MDCT sur des signaux décodés des méthodes de détection de bande de fréquence de l'art antérieur.Thanks to the simplicity of such a detection based mainly on the analysis of the differences in the distributions of only a part of the decoded linear prediction spectral parameters, the performances of this detection are optimized. Moreover, the complexity of the calculations carried out for the implementation of such a detection is significantly reduced in comparison with the complexity of computations generated by the application of FFT or MDCT frequency transforms on decoded signals of the band detection methods. frequency of the prior art.

Dans un mode de réalisation particulier, tous les paramètres spectraux de l'ensemble de paramètres spectraux précité sont préalablement décodés.In a particular embodiment, all the spectral parameters of the above set of spectral parameters are previously decoded.

Une telle disposition permet de détecter de façon simple la bande de fréquence d'un contenu audio décodé, par un accès direct aux paramètres de prédiction linéaire décodés associés à ce contenu, et sans ajouter de complexité supplémentaire (décodage complet, transformée temps-fréquence).Such an arrangement makes it possible to detect in a simple manner the frequency band of a decoded audio content, by direct access to the decoded linear prediction parameters associated with this content, and without adding any additional complexity (complete decoding, time-frequency transform) .

Ainsi, par exemple, l'invention est particulièrement adaptée à sa mise en oeuvre dans un terminal de communication, fixe ou mobile, qui comprend par nature un codeur et un décodeur audio, et plus précisément à l'application dans ce terminal qui consiste à afficher sur l'écran de ce dernier un logo « HD Voice ».Thus, for example, the invention is particularly adapted to its implementation in a communication terminal, fixed or mobile, which comprises by nature an encoder and an audio decoder, and more specifically to the application in this terminal which consists in display on the screen of the latter a logo "HD Voice".

Dans encore un autre mode de réalisation, dans le cas où parmi la succession de blocs de données, certains blocs contiennent chacun un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire et certains autres blocs contiennent chacun un ensemble de paramètres spectraux obtenus par transformation fréquentielle, seuls sont considérés, en vue de la détection selon l'invention, les blocs contenant chacun un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire.In still another embodiment, in the case where among the succession of data blocks, some blocks each contain a set of spectral parameters representing a linear prediction filter and some other blocks each contain a set of spectral parameters obtained by frequency transformation. , only for the detection according to the invention are considered the blocks each containing a set of spectral parameters representing a linear prediction filter.

S'agissant des blocs contenant chacun un ensemble de paramètres spectraux obtenus par transformation fréquentielle, une méthode de détection de bande de fréquence de l'art antérieur pourra par exemple être appliquée.For blocks each containing a set of spectral parameters obtained by frequency transformation, a frequency band detection method of the prior art may for example be applied.

Dans un autre mode de réalisation particulier, lorsque la bande de fréquence prédéterminée à détecter est la bande des hautes fréquences, l'étape de détermination consiste à rechercher préférentiellement l'indice du premier paramètre spectral supérieur à une fréquence seuil.In another particular embodiment, when the predetermined frequency band to be detected is the high frequency band, the determining step consists in preferably searching for the index of the first spectral parameter greater than a threshold frequency.

Selon l'invention, on entend par bande des hautes fréquences, la bande des fréquences supérieures à un certain seuil. Par exemple, en bande élargie, on peut considérer que la bande haute fréquence correspond aux fréquences supérieures à 4 kHz (ou 3,4 kHz). Plus généralement, pour un signal échantillonné à une fréquence d'échantillonnage Fe et de largeur de bande inférieure ou égal à 0,5 Fe, la bande des hautes fréquences sera la bande des fréquences supérieures à α'0.5Fe (0<α'<1 ), α' étant ajustable.According to the invention, the term high frequency band, the frequency band above a certain threshold. For example, in an enlarged band, it may be considered that the high frequency band corresponds to frequencies greater than 4 kHz (or 3.4 kHz). More generally, for a signal sampled at a sampling frequency Fe and a bandwidth less than or equal to 0.5 Fe, the high frequency band will be the frequency band greater than α'0.5Fe (0 <α '< 1), α 'being adjustable.

De même, on entend par bande des basses fréquences, la bande des fréquences inférieures à un certain seuil. Lorsque la bande de fréquence prédéterminée à détecter est la bande des basses fréquences, ladite étape de détermination consiste à rechercher préférentiellement l'indice du dernier paramètre spectral inférieur à une fréquence seuil.Similarly, the term low frequency band, the frequency band below a certain threshold. When the predetermined frequency band to be detected is the low frequency band, said determining step consists in preferably searching for the index of the last spectral parameter lower than a threshold frequency.

Une telle disposition permet ainsi de mettre en oeuvre l'invention par exemple dans des applications de traitement de la voix en qualité HD, en particulier aussi bien dans un terminal de communication mobile capable de fonctionner dans la plage de fréquences précitée, que dans un serveur de messagerie vocale capable de traiter des contenus audio HD, voire au sein d'une sonde se trouvant en coupure de flux audio d'un réseau de communication.Such an arrangement thus makes it possible to implement the invention for example in speech processing applications in HD quality, in particular both in a mobile communication terminal capable of operating in the aforementioned frequency range, and in a server voicemail capable of processing HD audio content, or even within a probe being in audio stream cutoff of a communication network.

Dans encore un autre mode de réalisation particulier, le bloc courant contient des données représentatives d'une activité vocale.In yet another particular embodiment, the current block contains data representative of a voice activity.

Une telle disposition optionnelle permet, dans le cas particulier où il s'agit de détecter dans le signal audio codé une bande située dans les hautes fréquences, d'optimiser la réduction de la complexité du procédé de détection en effectuant la détection, non pas sur toutes les trames contenant au moins un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire, mais seulement sur des trames pertinentes susceptibles de contenir des hautes fréquences, c'est-à-dire celles susceptibles de contenir des données voix et/ou musique.Such an optional arrangement makes it possible, in the particular case where it is a question of detecting in the coded audio signal a band situated in the high frequencies, to optimize the reduction of the complexity of the detection method by carrying out the detection, not on all the frames containing at least one set of spectral parameters representing a linear prediction filter, but only on relevant frames likely to contain high frequencies, that is to say those likely to contain voice and / or music data.

Dans encore un autre mode de réalisation particulier, le critère est calculé par comparaison entre :

la valeur maximale de la distance entre deux paramètres spectraux décodés voisins, estimée par rapport à la valeur de l'indice du premier paramètre spectral décodé qui a été obtenu à l'issue de l'étape de détermination,
la valeur minimale de la distance entre deux paramètres spectraux décodés voisins, estimée par rapport à la valeur de l'indice du premier paramètre spectral décodé qui a été obtenu à l'issue de l'étape de détermination.

In yet another particular embodiment, the criterion is calculated by comparison between:

the maximum value of the distance between two adjacent decoded spectral parameters, estimated with respect to the value of the index of the first decoded spectral parameter which has been obtained at the end of the determination step,
the minimum value of the distance between two adjacent decoded spectral parameters, estimated with respect to the value of the index of the first decoded spectral parameter which was obtained at the end of the determination step.

Une telle disposition permet de réaliser, à partir d'un calcul simple, si la bande de fréquence prédéterminée est détectée, tout en respectant un compromis complexité/fiabilité/réactivité de la détection.Such an arrangement makes it possible to perform, from a simple calculation, if the predetermined frequency band is detected, while respecting a compromise complexity / reliability / reactivity of the detection.

En variante, le critère précité est calculé à l'aide d'une fonction mathématique utilisant comme paramètre au moins l'indice du premier paramètre spectral décodé qui a été obtenu à l'issue de l'étape de détermination précitée.As a variant, the aforementioned criterion is calculated using a mathematical function using as parameter at least the index of the first decoded spectral parameter that was obtained at the end of the aforementioned determination step.

Dans encore un autre mode de réalisation particulier, à la suite de l'étape de décision mise en oeuvre pour le bloc courant, une étape de décision globale est mise en oeuvre par lissage du résultat de cette étape de décision et de K résultats de décision antérieurs, relatifs respectivement à K blocs précédant le bloc courant. Un tel lissage sur plusieurs blocs des détections locales propres à chaque bloc permet ainsi d'augmenter la fiabilité de la détection et par exemple de se prémunir d'un contenu audio réellement bande étroite pendant quelques trames (bruit par ex.).In yet another particular embodiment, following the decision step implemented for the current block, a global decision step is implemented by smoothing the result of this decision step and K decision results. previous, relating respectively to K blocks preceding the current block. Such multi-block smoothing of the local detections specific to each block thus makes it possible to increase the reliability of the detection and for example to protect itself from a really narrow band audio content during a few frames (noise, for example).

Corrélativement, l'invention concerne un dispositif de détection selon la revendication 9. En particulier, un tel dispositif de détection est destiné à mettre en oeuvre tous les modes de réalisation du procédé de détection qui ont été mentionnés ci-dessus. Dans d'autres modes de réalisation particuliers, le dispositif de détection est apte à être contenu dans un terminal de communication, dans un serveur de messagerie vocale ou bien dans une sonde.Correlatively, the invention relates to a detection device according to claim 9. In particular, such a detection device is intended to implement all the embodiments of the detection method which have been mentioned above. In other particular embodiments, the detection device is adapted to be contained in a communication terminal, in a voicemail server or in a probe.

L'invention vise également un programme d'ordinateur selon la revendication 11. Un tel programme peut utiliser n'importe quel langage de programmation, et être sous la forme de code source, code objet, ou de code intermédiaire entre code source et code objet, tel que dans une forme partiellement compilée, ou dans n'importe quelle autre forme souhaitable.The invention also relates to a computer program according to claim 11. Such a program can use any programming language, and be in the form of source code, object code, or intermediate code between source code and object code , such as in a partially compiled form, or in any other desirable form.

Encore un autre objet de l'invention vise aussi un support d'enregistrement lisible par un ordinateur, selon la revendication 12. Le support d'enregistrement peut être n'importe quelle entité ou dispositif capable de stocker le programme. Par exemple, un tel support peut comporter un moyen de stockage, tel qu'une ROM, par exemple un CD ROM ou une ROM de circuit microélectronique, ou encore un moyen d'enregistrement magnétique, par exemple une disquette (floppy disc) ou un disque dur.Yet another object of the invention is also a computer-readable recording medium according to claim 12. The recording medium can be any entity or device capable of storing the program. For example, such a medium may comprise storage means, such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or a magnetic recording medium, for example a floppy disk or a diskette. Hard disk.

D'autre part, un tel support d'enregistrement peut être un support transmissible tel qu'un signal électrique ou optique, qui peut être acheminé via un câble électrique ou optique, par radio ou par d'autres moyens. Le programme selon l'invention peut être en particulier téléchargé sur un réseau de type Internet.On the other hand, such a recording medium can be a transmissible medium such as an electrical or optical signal, which can be routed via an electrical or optical cable, by radio or other means. The program according to the invention can be downloaded in particular on an Internet type network.

Alternativement, un tel support d'enregistrement peut être un circuit intégré dans lequel le programme est incorporé, le circuit étant adapté pour exécuter le procédé en question ou pour être utilisé dans l'exécution de ce dernier.Alternatively, such a recording medium may be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in the execution of the latter.

Le dispositif de détection et le programme d'ordinateur précités présentent au moins les mêmes avantages que ceux conférés par le procédé de détection selon la présente invention.The aforementioned detection device and computer program have at least the same advantages as those conferred by the detection method according to the present invention.

Brief description of the drawings

D'autres caractéristiques et avantages apparaîtront à la lecture de modes de réalisation préférés décrits en référence aux figures dans lesquelles:

la figure 1 représente les principales étapes du procédé de détection selon l'invention,
la figure 2 représente un mode de réalisation d'un dispositif de détection selon l'invention,
la figure 3 représente différents exemples de valeurs de fréquence seuil utilisées dans le procédé et le dispositif de détection selon l'invention,
la figure 4A représente un histogramme de l'indice du premier paramètre spectral supérieur à 4kHz, pour les blocs codés par le codeur AMR-WB contenant des données représentatives d'une activité vocale (flagVAD=1),
la figure 4B représente un histogramme de l'indice du premier paramètre spectral supérieur à 4kHz, pour tous les blocs codés par le codeur AMR-WB, sans tenir compte de l'indication d'activité vocale,
la figure 5A représente un histogramme cumulé du rapport entre la différence maximum et la différence minimum entre deux paramètres spectraux successifs à partir de l'indice du premier paramètre spectral supérieur à 4kHz, pour les blocs codés par le codeur AMR-WB contenant des données représentatives d'une activité vocale (flagVAD=1),
la figure 5B représente un histogramme cumulé du rapport entre la différence maximum et la différence minimum entre deux paramètres spectraux successifs à partir de l'indice du premier paramètre spectral supérieur à 4kHz, pour tous les blocs codés par le codeur AMR-WB, sans tenir compte de l'indication d'activité vocale,
la figure 6A représente un terminal de communication mobile apte à mettre en oeuvre le procédé de détection tel que représenté sur la figure 1,
la figure 6B représente un serveur de messagerie vocale apte à mettre en oeuvre le procédé de détection tel que représenté sur la figure 1.

Other features and advantages will appear on reading preferred embodiments described with reference to the figures in which:

the figure 1 represents the main steps of the detection method according to the invention,
the figure 2 represents an embodiment of a detection device according to the invention,
the figure 3 represents various examples of threshold frequency values used in the method and the detection device according to the invention,
the Figure 4A represents a histogram of the index of the first spectral parameter greater than 4kHz, for the blocks coded by the AMR-WB encoder containing data representative of a vocal activity (flagVAD = 1),
the Figure 4B represents a histogram of the index of the first spectral parameter greater than 4 kHz, for all the blocks coded by the AMR-WB encoder, without taking into account the indication of vocal activity,
the Figure 5A represents a cumulative histogram of the ratio between the maximum difference and the minimum difference between two successive spectral parameters from the index of the first spectral parameter greater than 4 kHz, for the blocks encoded by the AMR-WB encoder containing data representative of a vocal activity (flagVAD = 1),
the Figure 5B represents a cumulative histogram of the ratio between the maximum difference and the minimum difference between two successive spectral parameters from the index of the first spectral parameter greater than 4 kHz, for all the blocks coded by the AMR-WB coder, without taking into account the indication of vocal activity,
the Figure 6A represents a mobile communication terminal able to implement the detection method as represented on the figure 1 ,
the Figure 6B represents a voicemail server able to implement the detection method as represented on the figure 1 .

General principle of the detection process

Le principe général de l'invention va maintenant être décrit en référence aux figures 1 et 2 . The general principle of the invention will now be described with reference to Figures 1 and 2 .

Sur la figure 1 , le procédé de détection de bande de fréquence selon l'invention est représenté sous la forme d'un algorithme comportant des étapes S0 à S4.On the figure 1 , the frequency band detection method according to the invention is represented in the form of an algorithm comprising steps S0 to S4.

Sur la figure 2 , le procédé de détection précité est implémenté de manière logicielle ou matérielle dans un dispositif de détection DET représenté sur la figure 2 , qui comprend à cet effet un module de traitement TR spécifique à la détection.On the figure 2 , the aforementioned detection method is implemented in a software or hardware way in a DET detection device represented on the figure 2 Which comprises for this purpose a specific processing module TR detection.

En vue de la détection d'une bande de fréquence prédéterminée dans un signal audio considéré, un tel dispositif de détection DET est destiné à être agencé :

soit associé à un décodeur audio de façon à récupérer certains paramètres décodés associés audit signal audio décodé, lesquels seront décrits plus loin dans la description,
soit de façon indépendante du décodeur de façon à lire le signal audio codé puis à effectuer un décodage partiel de certains paramètres codés associés audit signal audio codé, lesquels seront décrits plus loin dans la description,
soit en coupure d'un signal audio codé de façon à lire ledit signal puis à effectuer un décodage partiel de certains paramètres codés associés audit signal audio codé, lesquels seront décrits plus loin dans la description.

In order to detect a predetermined frequency band in a given audio signal, such a detection device DET is intended to be arranged:

is associated with an audio decoder so as to recover certain decoded parameters associated with said decoded audio signal, which will be described later in the description,
either independently of the decoder so as to read the coded audio signal and then to perform a partial decoding of certain coded parameters associated with said coded audio signal, which will be described later in the description,
either by cutting an audio signal coded so as to read said signal and then to perform a partial decoding of certain associated coded parameters said coded audio signal, which will be described later in the description.

Dans le cas d'un agencement du dispositif de détection DET dans un décodeur audio, le dispositif de détection DET est par exemple contenu dans un terminal de communication fixe ou mobile.In the case of an arrangement of the detection device DET in an audio decoder, the detection device DET is for example contained in a fixed or mobile communication terminal.

Dans le cas d'un agencement du dispositif de détection DET de façon indépendante du décodeur ou bien en coupure d'un signal audio codé, le dispositif de détection DET est par exemple contenu dans un élément de la chaine de transmission du signal audio (ex : serveur de messagerie dans lequel les messages audio sont stockés sans décodage).In the case of an arrangement of the detecting device DET independently of the decoder or at the cutoff of an encoded audio signal, the detecting device DET is for example contained in an element of the audio signal transmission chain (ex : mail server in which audio messages are stored without decoding).

Préalablement à la mise en oeuvre du procédé de détection d'une bande de fréquence prédéterminée dans un signal audio, il est procédé au codage de ce signal, lequel a été dans un premier temps échantillonné à une fréquence d'échantillonnage prédéterminée Fe.Prior to the implementation of the method for detecting a predetermined frequency band in an audio signal, this signal is coded, which has first been sampled at a predetermined sampling frequency Fe.

Selon l'invention, le codage dudit signal est effectué par exemple dans un codeur à prédiction linéaire utilisant des paramètres spectraux LPC à court terme, tels que des coefficients ISP ou une représentation associée, couvrant au moins une partie du spectre en fréquences (normalisées ou non).According to the invention, the coding of said signal is carried out for example in a linear prediction coder using short-term LPC spectral parameters, such as ISP coefficients or an associated representation, covering at least part of the frequency spectrum (normalized or no).

Ledit codeur est par exemple le codeur 3GPP AMR-WB, tel que mentionné plus haut dans la description.Said coder is for example the 3GPP AMR-WB encoder, as mentioned above in the description.

A titre d'alternative, le codage dudit signal pourrait être effectué par un codeur tel que par exemple celui qui a été mentionné plus haut dans la description, lequel combine une technique de transformée fréquentielle de type MDCT et une technique de codage par prédiction linéaire de type CELP.As an alternative, the coding of said signal could be carried out by an encoder such as, for example, that which was mentioned above in the description, which combines a frequency transformation technique of the MDCT type and a linear prediction coding technique of type CELP.

Dans l'exemple représenté, la fréquence d'échantillonnage est égale à 16 kHz, correspondant à la fréquence d'échantillonnage nominale du codeur AMR-WB fonctionnant dans la bande utile de 50 Hz à 7 kHz.In the example shown, the sampling frequency is equal to 16 kHz, corresponding to the nominal sampling frequency of the AMR-WB encoder operating in the useful band of 50 Hz to 7 kHz.

A l'issue de l'étape de codage à prédiction linéaire réalisée dans le codeur AMR-WB, est obtenue une pluralité Z de blocs consécutifs de données B₁, B₂, ..., B_Z, comme représenté sur les figures 1 et 2 . Chaque bloc contient au moins un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire.At the end of the linear prediction coding step performed in the AMR-WB encoder, a plurality Z of consecutive blocks of data B ₁ , B ₂ ,..., B _Z are obtained, as shown in FIGS. Figures 1 and 2 . Each block contains at least one set of spectral parameters representing a linear prediction filter.

Dans le cas de l'alternative précitée, à l'issue de l'étape de codage est obtenue une pluralité de blocs consécutifs de données, certains desdits blocs contenant au moins un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire et certains autres desdits blocs contenant au moins un ensemble de paramètres spectraux obtenus par transformée fréquentielle.In the case of the aforementioned alternative, at the end of the coding step is obtained a plurality of consecutive blocks of data, some of said blocks containing at least one set of spectral parameters representing a linear prediction filter and some others of said blocks containing at least one set of spectral parameters obtained by frequency transform.

Puis est mis en oeuvre le procédé de détection d'une bande de fréquence prédéterminée du signal audio qui vient d'être codé, à partir d'une analyse de chacun des blocs précités.Then is implemented the method of detecting a predetermined frequency band of the audio signal that has just been coded, from an analysis of each of the aforementioned blocks.

Le procédé de détection selon l'invention s'applique uniquement sur les blocs qui contiennent au moins un ensemble de paramètres spectraux représentant un filtre de prédiction linéaire, une pluralité de ces paramètres ayant été préalablement décodés.The detection method according to the invention applies only to the blocks which contain at least one set of spectral parameters representing a linear prediction filter, a plurality of these parameters having been previously decoded.

Dans le cas de l'alternative précitée, s'agissant des blocs contenant chacun un ensemble de paramètres spectraux obtenus par transformée fréquentielle, une méthode de détection de bande de fréquence de l'art antérieur pourra par exemple être appliquée.In the case of the aforementioned alternative, with regard to the blocks each containing a set of spectral parameters obtained by frequency transform, a frequency band detection method of the prior art may for example be applied.

Conformément au mode de réalisation, la bande de fréquence prédéterminée est la bande HF d'un contenu bande élargie.According to the embodiment, the predetermined frequency band is the HF band of an expanded band content.

Au cours d'une étape S1 représentée à la figure 1 , il est procédé au traitement d'un bloc courant B_n (n étant un entier tel que 1≤n≤Z). Le bloc courant B_n contient M paramètres spectraux p(i_k) préalablement décodés, ayant un sous-ensemble ordonné de M' (M'≤M) paramètres spectraux qui s'étend par exemple entre les indices i_min et i_max, tel que p(i_min)<...<p(i_k)<...<p(i_max), où i_min représente l'indice du plus petit paramètre spectral dudit sous-ensemble et i_max représente l'indice du plus grand paramètre spectral dudit sous-ensemble.During a step S1 represented at figure 1 , a current block B _n is processed (n being an integer such that 1≤n≤Z). The current block B _n contains M previously decoded spectral parameters p (i _k ), having an ordered subset of M '(M'≤M) spectral parameters which extends for example between the indices i _min and i _max , such that p (i _min) <... <p (i _k ) <... <p (i _max ), where i _min represents the index of the smallest spectral parameter of said subset and i _max represents the subscript of the largest spectral parameter of said subset.

Par souci de concision, on décrit dans la suite le cas où les paramètres spectraux du sous-ensemble ordonné vérifient la relation: p(i)<p(j) si i<j, i, j ∈ {i_min...,i_max}. Il est évident pour l'homme de l'art que l'invention s'applique aussi à d'autres cas: comme par exemple, le cas où les paramètres spectraux du sous-ensemble ordonné vérifient la relation: p(i)>p(j) si i<j, i, j ∈ {i_min,...,i_max}.For the sake of brevity, the following is the case in which the spectral parameters of the ordered subset satisfy the relation: p (i) <p (j) if i <j, i, j ∈ {i _min ..., i _max }. It is obvious to those skilled in the art that the invention also applies to other cases: for example, the case where the spectral parameters of the ordered subset satisfy the relation: p (i)> p (j) if i <j, i, j ∈ {i _min , ..., i _max }.

L'étape S1 précitée est mise en oeuvre par un premier sous-module logiciel de calcul CAL1 du dispositif de détection DET, tel que représenté sur la figure 2 . The above-mentioned step S1 is implemented by a first calculation software sub-module CAL1 of the detection device DET, as represented in FIG. figure 2 .

A cet effet, le sous-module de calcul CAL1 détermine, parmi lesdits M' paramètres spectraux, l'indice i_F du premier paramètre spectral qui est le plus proche d'une fréquence seuil, ladite fréquence seuil étant déterminée à partir de la fréquence d'échantillonnage F_e dudit signal audio. $i_{F} = \arg (\min_{i \in \{i_{\min}, \dots, i_{\max}\}} |p (i) - F_{th}|)$

For this purpose, the calculation sub-module CAL1 determines, from among said M 'spectral parameters, the index i _F of the first spectral parameter which is the most close to a threshold frequency, said threshold frequency being determined from the sampling frequency F _{e of} said audio signal.

i_{F} = \arg (\min_{i \in \{i_{\min}, ..., i_{\max}\}} |p (i) - F_{th}|)

Dans l'exemple représenté, F_th= αF_e (α<0.5), où α est un paramètre ajustable. La figure 3 représente différentes valeurs possibles de F_th selon la fréquence d'échantillonnage F_e utilisée et la valeur du paramètre α.In the example shown, F _th = αF _e (α <0.5), where α is an adjustable parameter. The figure 3 represents different possible values of F _th according to the sampling frequency F _e used and the value of the parameter α.

Plus particulièrement, au cours de l'étape S1, le sous-module de calcul CAL1 recherche l'indice i_HF du premier paramètre spectral p(i_k) supérieur à F_th conformément à l'opération suivante : $i_{HF} = \min (\underset{i \in \{i_{\min}, \dots, i_{\max}\}}{\arg} (p (i) \geq F_{th}))$

More particularly, during step S1, the calculation sub-module CAL1 searches for the index i _HF of the first spectral parameter p (i _k ) greater than F _th according to the following operation:

i_{HF} = \min (\underset{i \in \{i_{\min}, ..., i_{\max}\}}{\arg} (p (i) \geq F_{th}))

Ou inversement, au cours de l'étape S1, le sous-module de calcul CAL1 recherche l'indice i_BF du dernier paramètre spectral p(i) inférieur à F_th conformément à l'opération suivante : $i_{BF} = \max (\underset{i \in \{i_{\min}, \dots, i_{\max}\}}{\arg} (p (i) \leq F_{th}))$

Or conversely, during step S1, the calculation sub-module CAL1 searches for the index i _BF of the last spectral parameter p (i) less than F _th according to the following operation:

i_{BF} = \max (\underset{i \in \{i_{\min}, ..., i_{\max}\}}{\arg} (p (i) \leq F_{th}))

Préférentiellement, l'étape S1 est précédée d'une étape de présélection S0, au cours de laquelle sont présélectionnés, parmi les blocs B₁, B₂, ..., B_Z, uniquement des blocs qui contiennent des données représentatives d'une activité vocale.Preferably, the step S1 is preceded by a preselection step S0, during which are preselected, among the blocks B ₁ , B ₂ , ..., B _Z , only blocks that contain data representative of a voice activity.

La détection d'activité vocale de tels blocs est effectuée classiquement lors du codage de ces derniers par un module de détection d'activité vocale VAD (abréviation anglaise de « Voice Activity Detection »), lequel :

soit utilise l'information disponible dans le bloc (ex : indicateur VAD=1 dans le bloc codé, mode « DTX on » du module de transmission discontinue DTX (abréviation anglaise de « Discontinuous Transmission »), classification du bloc codé comme contenant une activité vocale lorsque le bloc a été codé par un codeur EVRC (abréviation anglaise de « Enhanced Variable Rate CODEC »)),
soit calcule dans le signal audio codé un critère d'activité vocale.

The detection of voice activity of such blocks is carried out conventionally during the coding of the latter by a voice activity detection module VAD (abbreviation of "Voice Activity Detection"), which:

either uses the information available in the block (eg: VAD indicator = 1 in the coded block, "DTX on" mode of the discontinuous transmission module DTX (abbreviation of "Discontinuous Transmission"), classification of the block coded as containing an activity when the block has been coded by an Enhanced Variable Rate CODEC (EVRC),
is calculated in the coded audio signal a voice activity criterion.

L'étape de présélection S0 est mise en oeuvre par un module logiciel de présélection PRES représenté sur la figure 2 . The preselection step S0 is implemented by a preselection software module PRES represented on the figure 2 .

L'étape S0 étant optionnelle, elle est représentée en pointillé sur la figure 1 . De façon correspondante, le module PRES de la figure 2 est également représenté en pointillé.Step S0 being optional, it is represented in dotted line on the figure 1 . Correspondingly, the PRES module of the figure 2 is also represented in dashed line.

Il est ensuite procédé, au cours d'une étape S2 représentée à la figure 1 , au calcul d'au moins un critère à partir dudit indice i_F déterminé. Une telle étape est mise en oeuvre par un deuxième sous-module logiciel de calcul CAL2 du dispositif de détection DET, tel que représenté sur la figure 2 . It is then proceeded, during a step S2 represented at figure 1 , calculating at least one criterion from said determined index i _F. Such a step is implemented by a second calculating software sub-module CAL2 of the detecting device DET, as shown in FIG. figure 2 .

Selon une première variante de réalisation, un tel critère est basé sur la comparaison de la « distance » entre deux paramètres spectraux successifs par rapport à l'indice i_F déterminé.According to a first variant embodiment, such a criterion is based on the comparison of the "distance" between two successive spectral parameters with respect to the index i _F determined.

Une telle distance est évaluée conformément à la relation ci-dessous : $d (i) = dist (p (i), p (i - 1))$

Such a distance is evaluated according to the relationship below:

d (i) = dist (p (i), p (i - 1))

Préférentiellement, une telle distance correspond à la simple différence entre deux paramètres spectraux successifs: $d (i) = dist (p (i), p (i - 1)) = ((p (i) - p (i - 1))$

Preferably, such a distance corresponds to the simple difference between two successive spectral parameters:

d (i) = dist (p (i), p (i - 1)) = ((p (i) - p (i - 1))

Plus précisément, le sous-module logiciel CAL2 calcule d'abord respectivement :

la valeur maximale d_max de la distance entre deux paramètres spectraux voisins, estimée par rapport à l'indice i_F déterminé, et
la valeur minimale d_min de la distance entre deux paramètres spectraux voisins, estimée par rapport à l'indice i_F déterminé.

More precisely, the software sub-module CAL2 first calculates respectively:

the maximum value d _max of the distance between two neighboring spectral parameters, estimated with respect to the determined index i _F , and
the minimum value d _min of the distance between two neighboring spectral parameters, estimated with respect to the index i _F determined.

Un tel calcul est effectué selon les relations suivantes ci-dessous : $d_{\max} = \max_{i_{k} \in [i_{HF}, i_{\max}]} (d (i_{k})) = \max_{i_{k} \in [i_{HF}, i_{\max}]} ((p (i_{k}) - p (i_{k} - 1)))$

et

d_{\min} = \min_{i_{k} \in [i_{HF}, i_{\max}]} (d (i_{k})) = \min_{i_{k} \in [i_{HF}, i_{\max}]} ((p (i_{k}) - p (i_{k} - 1)))

ou bien

d_{\max} = \max_{i_{k} \in] i_{\min}, i_{BF}]} (d (i_{k})) = \max_{i_{k} \in] i_{\min}, i_{BF}]} (p (i_{k}) - p (i_{k} - 1))

et

d_{\min} = \min_{i_{k} \in] i_{\min}, i_{BF}]} (d (i_{k})) = \min_{i_{k} \in] i_{\min}, i_{BF}]} (p (i_{k}) - p (i_{k} - 1))

Such a calculation is performed according to the following relationships below:

d_{\max} = \max_{i_{k} \in [i_{HF}, i_{\max}]} (d (i_{k})) = \max_{i_{k} \in [i_{HF}, i_{\max}]} ((p (i_{k}) - p (i_{k} - 1)))

and

d_{\min} = \min_{i_{k} \in [i_{HF}, i_{\max}]} (d (i_{k})) = \min_{i_{k} \in [i_{HF}, i_{\max}]} ((p (i_{k}) - p (i_{k} - 1)))

or

d_{\max} = \max_{i_{k} \in] i_{\min}, i_{BF}]} (d (i_{k})) = \max_{i_{k} \in] i_{\min}, i_{BF}]} (p (i_{k}) - p (i_{k} - 1))

and

d_{\min} = \min_{i_{k} \in] i_{\min}, i_{BF}]} (d (i_{k})) = \min_{i_{k} \in] i_{\min}, i_{BF}]} (p (i_{k}) - p (i_{k} - 1))

Puis le sous-module logiciel de calcul CAL2 calcule un critère en fonction des deux distances calculées d_max et d_min pour détecter la présence d'un contenu audio HF (ou BF). Ce critère est noté par exemple crit(d_min , d_max ).Then the calculation software sub-module CAL2 calculates a criterion according to the two calculated distances d _max and d _min to detect the presence of an audio content HF (or BF). This criterion is noted for example crit ( d _min , d _max ).

Préférentiellement, ce critère est le rapport ρ entre les deux distances calculées précédemment, tel que: $ρ = crit (d_{\min}, d_{\max}) = d_{\max} / d_{\min} (ou crit (d_{\min}, d_{\max}) = d_{\min} / d_{\max})$

Preferentially, this criterion is the ratio ρ between the two distances calculated previously, such that:

ρ = crit (d_{\min}, d_{\max}) = d_{\max} / d_{\min} (or crit (d_{\min}, d_{\max}) = d_{\min} / d_{\max})

Selon une deuxième variante de réalisation, un tel critère est basé sur une fonction mathématique F(i_F ) utilisant comme paramètre l'indice i_F. According to a second variant embodiment, such a criterion is based on a mathematical function F ( i _F ) using as parameter the index i _F.

Ladite fonction mathématique F(i_F ) consiste par exemple en une fonction affine par morceaux telle que: $F (i_{F}) = a_{0} i_{F} + b_{0} {si i}_{\min} \leq i_{F} < l_{0}$

F (i_{F}) = a_{1} i_{F} + b_{1} {si l}_{0} \leq i_{F} < l_{1}

F (i_{F}) = a_{N - 1} i_{F} + b_{N - 1} {si l}_{N - 2} \leq i_{F} \leq i_{\max}

Said mathematical function F ( i _F ) consists for example of a piecewise affine function such that:

F (i_{F}) = {at}_{0} i_{F} + b_{0} {if i}_{\min} \leq i_{F} < l_{0}

F (i_{F}) = {at}_{1} i_{F} + b_{1} {ifl}_{0} \leq i_{F} < l_{1}

F (i_{F}) = {at}_{NOT - 1} i_{F} + b_{NOT - 1} {if l}_{NOT - 2} \leq i_{F} \leq i_{\max}

En particulier, ladite fonction peut être en quatre morceaux, telle que: ${si i}_{\min} \leq i_{F} < 8, F (i_{F}) = 4 * i_{F} - 36$

si 8 \leq i_{F} < 10, F (i_{F}) = 3 * i_{F} - 30

si 10 \leq i_{F} < 13, F (i_{F}) = 2 * i_{F} - 21

si 13 \leq i_{F} \leq i_{\max}, F (i_{F}) = 3 * i_{F} - 30

In particular, said function can be in four pieces, such as:

{if i}_{\min} \leq i_{F} < 8, F (i_{F}) = 4 * i_{F} - 36

if 8 \leq i_{F} < 10, F (i_{F}) = 3 * i_{F} - 30

if 10 \leq i_{F} < 13, F (i_{F}) = 2 * i_{F} - 21

if 13 \leq i_{F} \leq i_{\max}, F (i_{F}) = 3 * i_{F} - 30

Ainsi, selon cette variante, le critère dépend de la valeur de la fonction affine.Thus, according to this variant, the criterion depends on the value of the affine function.

D'autres fonctions peuvent bien entendu être utilisées. On citera par exemple, la fonction suivante : $F (i_{F}) = sign (i_{F} - c) * {(i_{F} - c)}^{2}, où sign (x) = - 1 si x < 0, 1 sign (x) = 1$

sinon,
où c est une variable ou une constante égale à environ 10,5.Other functions can of course be used. For example, the following function:

F (i_{F}) = sign (i_{F} - vs) * {(i_{F} - vs)}^{2}, where sign (x) = - 1 if x < 0, 1 sign (x) = 1

if not,
where c is a variable or a constant equal to about 10.5.

A la suite de l'étape S2 précitée, une étape S3 représentée à la figure 1 consiste à décider si la bande de fréquence prédéterminée est détectée dans le bloc courant B_n, en fonction de l'un des critères qui a été calculé à l'étape S2. Une telle étape est mise en oeuvre par un troisième sous-module logiciel de calcul CAL3 du dispositif de détection DET, tel que représenté sur la figure 2 . Following the aforementioned step S2, a step S3 represented in FIG. figure 1 is to decide whether the predetermined frequency band is detected in the current block B _n , according to one of the criteria that was calculated in step S2. Such a step is implemented by a third calculation software sub-module CAL3 of the detection device DET, as shown in FIG. figure 2 .

A titre d'alternative, la décision est fonction de l'un ou de l'autre des deux critères mentionnés ci-dessus, ou bien encore d'une combinaison de ces derniers.As an alternative, the decision is based on one or the other of the two criteria mentioned above, or a combination thereof.

Dans le cas où le critère calculé est conforme à la première variante précitée, à savoir ρ = d _max/d _min, la décision peut être souple ou dure.In the case where the calculated criterion is in accordance with the first aforementioned variant, namely ρ = d _max / d _min , the decision can be flexible or hard.

Par souci de concision, on décrit dans la suite le cas où l'étape de décision est relative à la détection d'une bande de hautes fréquences. Il est évident pour l'homme de l'art d'appliquer cette étape de décision de façon similaire, s'agissant de la détection d'une autre bande de fréquence, telle que par exemple une bande de basses fréquences.For the sake of brevity, the case in which the decision step relates to the detection of a band of high frequencies is described below. It is obvious to one skilled in the art to apply this decision step in a similar manner, with regard to the detection of another frequency band, such as for example a low frequency band.

La décision dure consiste à comparer le critère ρ à un seuil prédéterminé adaptatif ou non, noté crit_th. La comparaison est par exemple effectuée selon les calculs ci-dessous : $Si ρ > {crit}_{th}, {flag}_{HF} = 1$

Sinon flag_HF = 0
où flag_HF est un bit qui est soit mis à 1 pour indiquer que le contenu HF a été détecté, soit mis à 0 pour indiquer que le contenu HF n'a pas été détecté.The hard decision consists in comparing the criterion ρ with a predetermined threshold adaptive or not, noted crit _th . The comparison is for example made according to the calculations below:

Yes ρ > {crit}_{th}, {flag}_{HF} = 1

Otherwise flag _HF = 0
where flag _HF is a bit that is either set to indicate that the HF content has been detected, or set to 0 to indicate that the HF content has not been detected.

Une décision souple consiste par exemple à utiliser la valeur de ρ bornée dans l'intervalle [1,3]. Plus cette valeur est proche de la borne inférieure « 1 » de cet intervalle, plus un contenu HF est considéré non détecté dans le bloc du signal audio. Plus cette valeur est proche de la borne supérieure « 3 » de l'intervalle, plus un contenu HF est considéré détecté dans le signal audio.A flexible decision is for example to use the value of ρ bounded in the interval [1,3]. The closer this value is to the lower bound "1" of this interval, the more HF content is considered undetected in the block of the audio signal. The closer this value is to the upper bound "3" of the interval, the more HF content is considered detected in the audio signal.

Considérons maintenant le cas où le critère est ρ'=d _min/d _max.Consider now the case where the criterion is ρ '= d _min / d _max .

La décision dure consiste à comparer le critère ρ' à un seuil prédéterminé adaptatif ou non, noté crit'_th. La comparaison étant alors: $Si ρ' > crit'_{th}, {flag}_{HF} = 0$

Sinon flag_HF = 1
où flag_HF égal 1 (respectivement 0) indique que le contenu HF a été détecté, (resp. que le contenu HF n'a pas été détecté).The hard decision is to compare the criterion ρ 'to an adaptive predetermined threshold or not, denoted crit' _th. The comparison then being:

Yes ρ' > crit'_{th}, {flag}_{HF} = 0

Otherwise flag _HF = 1
where flag _HF equal 1 (respectively 0) indicates that the RF content has been detected, (or that the RF content has not been detected).

La décision souple consiste par exemple à utiliser la valeur de ρ' dans l'intervalle [0,1]. Plus cette valeur est proche de la borne inférieure « 0 » de cet intervalle, plus un contenu HF est considéré comme détecté dans le bloc du signal audio. Plus cette valeur est proche de la borne supérieure « 1 » de l'intervalle, plus un contenu HF est considéré comme non détecté dans le signal audio. Plus la valeur des critères est proche des bornes de l'intervalle plus la décision pour le bloc (détection ou non de contenu HF) apparaît fiable, tandis qu'une valeur de ρ' proche du seuil crit'_th indique une faible fiabilité de la décision.The soft decision is for example to use the value of ρ 'in the interval [0,1]. The closer this value is to the lower bound "0" of this interval, the more HF content is considered to be detected in the block of the audio signal. The closer this value is to the upper bound "1" of the interval, the more HF content is considered undetected in the audio signal. The more the value of the criteria is close to the limits of the interval more decision for the block (detection or not of HF content) appears to be reliable, while a value of ρ 'close to the threshold criterion' _th indicates a low reliability of the decision.

Dans le cas où le critère calculé est conforme à la deuxième variante précitée, à savoir une fonction mathématique F(i_F ), la décision peut être également souple ou dure.In the case where the calculated criterion is in accordance with the above-mentioned second variant, namely a mathematical function F ( i _F ), the decision can also be flexible or hard.

Prenons par exemple le cas où la fonction mathématique F(i_F)= sign(i_F-c) *(i_F-c)² sert à détecter si un contenu HF est présent.Consider, for example, the case where the mathematical function F (i _F ) = sign (i _F -c) * (i _F -c) ² serves to detect if an RF content is present.

Une décision dure consiste par exemple à comparer le critère F(i_HF ) à 0, selon les calculs ci-dessous : $Si F (i_{HF}) < 0, {flag}_{HF} = 1$

Sinon flag_HF = 0
où flag_HF est un bit qui est soit mis à 1 pour indiquer que le contenu HF a été détecté, soit mis à 0 pour indiquer que le contenu HF n'a pas été détecté.A hard decision consists for example in comparing the criterion F (i _HF) at 0, as calculated below:

Yes F (i_{HF}) < 0, {flag}_{HF} = 1

Dans ce cas, la décision souple peut alors consister à prendre la valeur de la fonction mathématique. Plus cette valeur est négative (respectivement positive), plus la fiabilité de la détection de la présence (respectivement de l'absence) d'un contenu HF est élevée. Par contre, une valeur de la fonction mathématique proche de zéro indique que la fiabilité de la détection est faible.In this case, the soft decision can then consist in taking the value of the mathematical function. The higher this value is negative (respectively positive), the greater the reliability of the detection of the presence (or lack thereof) of an RF content is high. On the other hand, a value of the mathematical function close to zero indicates that the reliability of the detection is low.

Dans le cas où le dispositif de détection DET détient déjà K résultats de décision relatifs respectivement à K blocs précédant le bloc courant B_n, il est avantageux, pour augmenter la fiabilité de la détection, de procéder, au cours d'une étape suivante S4 représentée à la figure 1 , à un lissage de ces K résultats et du résultat de la décision qui vient d'être obtenu pour le bloc courant B_n à l'étape S3 précitée, par une fenêtre éventuellement glissante. Là encore, la détection sur la fenêtre peut être une décision souple ou dure, que les détections locales relatives à chaque bloc aient été obtenues par décision souple ou dure. Une telle étape de lissage S4 est mise en oeuvre par un quatrième sous-module logiciel de calcul CAL4 représenté à la figure 2 . In the case where the detecting device DET already has K decision results relating respectively to K blocks preceding the current block B _n , it is advantageous, to increase the reliability of the detection, to proceed, in a next step S4. represented at figure 1 , smoothing these K results and the result of the decision just obtained for the current block B _n in the above-mentioned step S3, by a possibly slippery window. Again, detection on the window may be a soft or hard decision, as the local detections for each block were obtained by soft or hard decision. Such smoothing step S4 is implemented by a fourth calculation software sub-module CAL4 shown in FIG. figure 2 .

L'étape S4 étant optionnelle, elle est représentée en pointillé sur la figure 1 . De façon correspondante, le sous-module CAL4 de la figure 2 est également représenté en pointillé.Step S4 being optional, it is represented in dotted line on the figure 1 . Correspondingly, the submodule CAL4 of the figure 2 is also represented in dashed line.

Dans le mode de réalisation représenté, où le codeur audio est le codeur 3GPP AMR-WB, chaque bloc de données codées contient 16 paramètres dont les 15 premiers sont des paramètres spectraux ordonnés couvrant le spectre (normalisé) entre 0 et 6.4 kHz, le seizième paramètre étant l'indicateur d'activité vocale (VAD) codé sur un bit.In the embodiment shown, where the audio encoder is the 3GPP AMR-WB encoder, each coded data block contains 16 parameters, the first 15 of which are ordered spectral parameters covering the (normalized) spectrum between 0 and 6.4 kHz, the sixteenth parameter being the one-bit voice activity indicator (VAD).

Les figures 4A et 4B représentent chacune un histogramme de l'indice i_HF du paramètre spectral p(i) supérieur à F_th =4 kHz du codec AMR-WB. Les indices sont représentés en abscisse et la distribution en pourcentage de ces indices est représentée en ordonnée. Sur la figure 4A , le procédé de détection qui a été mis en oeuvre comprend l'étape S0 de présélection des blocs contenant une activité vocale. Sur la figure 4B , le procédé de détection qui a été mis en oeuvre ne comprend pas l'étape S0. Quatre configurations différentes sont représentées à titre d'exemple sur les figures 4A et 4B : celle représentée en trait plein gras qui correspond au codec AMR-WB seul, celle représentée en trait pointillé qui correspond au codeur AMR-WB disposé en tandem après un autre codeur WB, tel que par exemple le codeur HD fixe G.722 à 64 kbit/s, celle représentée en trait fin qui correspond au codeur AMR-WB disposé en tandem après un codeur NB tel que par exemple le codeur pivot G.711, et celle représentée en trait mixte qui correspond au codeur AMR-WB disposé en tandem après un codeur NB, tel que le codeur mobile FR (abréviation anglaise de "Full Rate ").The Figures 4A and 4B each represent a histogram of the index i _HF of the spectral parameter p (i) greater than F _th = 4 kHz of the AMR-WB codec. The indices are represented on the abscissa and the percentage distribution of these indices is represented on the ordinate. On the Figure 4A , The detection method that has been implemented includes the step S0 blocks containing preset vocal activity. On the Figure 4B , The detection method has been used does not include the step S0. Four different configurations are represented by way of example on Figures 4A and 4B : that represented in full bold line which corresponds to the AMR-WB codec alone, that represented in dashed line corresponding to the AMR-WB coder arranged in tandem after another WB encoder, such as for example the fixed HD coder G.722 to 64 kbit / s, the one shown in fine line which corresponds to the AMR-WB encoder arranged in tandem after a NB encoder such as, for example, the G.711 pivot encoder, and that represented in dashed line, which corresponds to the AMR-WB encoder arranged in tandem after a NB encoder, such as the FR (abbreviation for " Full Rate ").

Les histogrammes ont été obtenus sur des longs fichiers de parole avec différents bruits de fond (trafic routier, cafétéria, brouhaha), en tenant compte de trois rapports signal-à-bruit RSB différents (RSB= 5, 10, 20 dB).The histograms were obtained on long speech files with different background noise (road traffic, cafeteria, hubbub), taking into account three different SNR signal-to-noise ratios (RSB = 5, 10, 20 dB).

Comme le montrent les figures 4A et 4B , la distribution de l'indice du premier paramètre spectral supérieur à 4 kHz diffère nettement selon que le premier codeur est de type WB ou NB. En particulier pour les codeurs WB, un pic est obtenu pour un indice i_HF =10. As shown by Figures 4A and 4B , the distribution of the index of the first spectral parameter greater than 4 kHz differs significantly according to whether the first encoder is of WB or NB type. In particular for the WB encoders, a peak is obtained for an index i _HF = 10 .

De façon correspondante, les figures 5A et 5B représentent chacune un histogramme cumulé du rapport ρ entre la différence maximum et la différence minimum entre deux paramètres spectraux successifs à partir de l'indice i_HF du paramètre spectral supérieur à F_th =4 kHz du codec AMR-WB. Les valeurs du rapport ρ sont représentées en abscisse et la distribution en pourcentage de ces rapports est représentée en ordonnée. Sur la figure 5A , le procédé de détection qui a été mis en oeuvre comprend l'étape S0 de présélection des blocs contenant une activité vocale. Sur la figure 5B , le procédé de détection qui a été mis en oeuvre ne comprend pas l'étape S0. Quatre configurations, qui correspondent respectivement à celles des figures 4A et 4B , sont représentées sur les figures 5A et 5B . Les quatre configurations des figures 5A et 5B sont symbolisées de la même façon que sur les figures 4A et 4B . Correspondingly, the Figures 5A and 5B each represent a cumulative histogram of the ratio ρ between the maximum difference and the minimum difference between two successive spectral parameters from the index i _HF of the spectral parameter greater than F _th = 4 kHz of the AMR-WB codec. The values of the ratio ρ are represented on the abscissa and the distribution as a percentage of these ratios are represented on the ordinate. On the Figure 5A , The detection method that has been implemented includes the step S0 blocks containing preset vocal activity. On the Figure 5B , The detection method has been used does not include the step S0. Four configurations, corresponding respectively to those of Figures 4A and 4B , are represented on the Figures 5A and 5B . The four configurations of Figures 5A and 5B are symbolized in the same way as on the Figures 4A and 4B .

Comme le montrent les figures 5A et 5B , la distribution du rapport ρ diffère nettement selon que le codeur est de type WB ou NB. En particulier, les distributions du rapport ρ relatif aux codeurs WB et les distributions du rapport ρ relatif aux codeurs NB s'écartent l'une de l'autre à partir de ρ=1,9.As shown by Figures 5A and 5B , The distribution of the ratio ρ differs markedly depending on whether the encoder is WB or NB-type. In particular, the distributions of the ratio ρ relating to the WB encoders and the distributions of the ratio ρ relative to the NB coders deviate from each other starting from ρ = 1.9.

De tels exemples de distributions sont ainsi exploités avantageusement par l'invention pour détecter si un signal audio codé par un codeur à prédiction linéaire tel que le codeur AMR-WB contient des hautes fréquences, une telle détection étant avantageusement effectuée :

avec une faible complexité algorithmique,
sans décodage complet du signal audio pour certaines applications audio ne proposant pas de décodage audio,
sans appliquer une coûteuse transformée fréquentielle.

Such examples of distributions are thus advantageously exploited by the invention to detect whether an audio signal coded by a linear prediction coder such as the AMR-WB encoder contains high frequencies, such detection being advantageously carried out:

with low algorithmic complexity,
without complete decoding of the audio signal for certain audio applications that do not provide audio decoding,
without applying a costly transformed frequency.

On va maintenant décrire une première application du procédé de détection qui vient d'être décrit ci-dessus en vue de l'affichage d'un logo HD sur un terminal de communication mobile HD.We will now describe a first application of the detection method that has just been described above for the display of an HD logo on an HD mobile communication terminal.

Un tel terminal est désigné par la référence TER sur la figure 6A . Such a terminal is designated by the reference TER on the Figure 6A .

De façon connue en soi, le terminal TER comprend :

une interface utilisateur INT comprenant classiquement un clavier, un écran, un micro et un haut parleur,
un module de communication COM1, par exemple de type 3G,
une mémoire morte MEM1 comprenant un module de codage audio CO1 et un module de décodage audio DO1.

In a manner known per se, the TER terminal comprises:

an INT user interface conventionally comprising a keyboard, a screen, a microphone and a speaker,
a communication module COM1, for example of type 3G,
a memory MEM1 comprising an audio coding module CO1 and an audio decoding module DO1.

Dans l'exemple représenté, le module de codage CO1 et le module de décodage DO1 sont du type AMR-WB.In the example shown, the coding module CO1 and the decoding module DO1 are of the AMR-WB type.

Conformément à l'invention, la mémoire morte MEM1 ou bien une autre mémoire du terminal mobile TER contient en outre un dispositif DET1 de détection d'une bande de fréquence prédéterminée, similaire au dispositif de détection DET représenté sur la figure 2 . In accordance with the invention, the ROM MEM1 or another memory of the mobile terminal TER furthermore contains a device DET1 for detecting a predetermined frequency band, similar to the detection device DET represented on FIG. figure 2 .

Dans cette application, de façon classique, un flux audio codé est reçu par le module de communication COM1, puis entièrement décodé par le module de décodage DO1, de façon à ce que le terminal mobile TER restitue la parole par l'intermédiaire du haut-parleur de son interface utilisateur INT. Parmi les paramètres décodés délivrés par le décodeur DO1 au dispositif de détection DET1 figurent les 15 premiers coefficients ISF, paramètres spectraux ordonnés couvrant le spectre (normalisé) entre 0 et 6.4 kHz, et éventuellement l'indicateur VAD dont la valeur est mise à 1 si l'encodeur du terminal ayant émis le flux audio codé à destination du terminal TER a estimé que le signal de la trame était actif (tonalité, parole, musique), ou à zéro sinon.In this application, conventionally, a coded audio stream is received by the communication module COM1, then completely decoded by the decoding module DO1, so that the mobile terminal TER renders the speech via the built-in loudspeaker. speaker of its INT user interface. Among the decoded parameters delivered by the decoder DO1 to the detection device DET1 are the first 15 ISF coefficients, ordered spectral parameters covering the (normalized) spectrum between 0 and 6.4 kHz, and possibly the VAD indicator whose value is set to 1 if the encoder of the terminal that sent the coded audio stream to the terminal TER estimated that the signal of the frame was active (tone, speech, music), or zero otherwise.

Sur la base desdits 15 premiers coefficients ISF et éventuellement de l'indicateur VAD, le dispositif de détection DET1 du terminal TER met alors en oeuvre directement le procédé de détection de bande de fréquence prédéterminée tel que décrit à la figure 1 , avec une faible complexité bien inférieure par exemple à la complexité de l'application d'une transformée temps-fréquence sur le signal préalablement décodé.On the basis of said first 15 ISF coefficients and possibly of the VAD indicator, the detection device DET1 of the terminal TER then directly implements the predetermined frequency band detection method as described in FIG. figure 1 , with a low complexity much lower for example the complexity of the application of a time-frequency transform on the previously decoded signal.

A cet effet, préalablement à la mise en oeuvre de l'étape S0 précitée, il est procédé, dans le cas où l'étape de lissage S4 optionnelle est mise en oeuvre, à l'initialisation à zéro des quatre valeurs suivantes:

un critère global critGlob,
un indice ind , pour indexer une table de critères locaux,
un compteur de trames nbFrm pour lesquelles une décision a été prise,
un tableau tabDec de décisions locales.

For this purpose, prior to the implementation of the above-mentioned step S0, in the case where the optional smoothing step S4 is implemented, the following four values are initialized to zero:

a global criterion critGlob,
an index ind , to index a table of local criteria,
a counter of frames nbFrm for which a decision has been taken,
a table tabDec of local decisions.

A l'issue de l'étape d'initialisation, les valeurs suivantes sont obtenues: critGlob =0;

 ind =0;
 nbFrm = 0;
 tabDec[i] = 0; avec i=0,... ,nbCount,
 où nbCount est le nombre de décisions locales à partir desquelles une décision
 globale (0<nbCount) est prise.

At the end of the initialization step, the following values are obtained: critGlob = 0;

 ind = 0;
 nbFrm = 0;
 tabDec [i] = 0; with i = 0, ..., nbCount, 
 where nbCount is the number of local decisions from which a decision
 global (0 <nbCount) is taken.

Au cours de l'étape S1 représentée à la figure 1 , il est procédé au traitement d'un bloc courant B_n (n étant un entier tel que 1≤n≤Z). Le bloc courant B_n contient les quinze/seize paramètres précités (15 coefficients spectraux et éventuellement l'indicateur VAD) qui ont été décodés par le module de décodage DO1.During step S1 represented in figure 1 , a current block B _n is processed (n being an integer such that 1≤n≤Z). The current block B _n contains the fifteen / sixteen aforementioned parameters (15 spectral coefficients and possibly the VAD indicator) which have been decoded by the decoding module DO1.

Préférentiellement, l'étape S1 est précédée de l'étape de présélection S0, au cours de laquelle sont présélectionnés, parmi les blocs B₁, B₂,..., B_Z, uniquement des blocs qui contiennent des données représentatives d'une activité vocale, pour lesquels l'indicateur VAD est à 1.Preferably, the step S1 is preceded by the preselection step S0, during which are preselected, among the blocks B ₁ , B ₂ , ..., B _Z , only blocks that contain data representative of a voice activity, for which the VAD flag is 1.

Au cours du traitement dudit bloc courant B_n, il est procédé à la recherche de l'indice i_HF du premier paramètre spectral p(i_k) supérieur à F_th conformément à l'opération suivante : $i_{HF} = \min (\underset{i_{k} \in [i_{o}, i_{1}]}{\arg} (p (i_{k}) \geq F_{th}))$

During the processing of said current block B _n , the index i _HF of the first spectral parameter p (i _k ) greater than F _th is searched in accordance with the following operation:

i_{HF} = \min (\underset{i_{k} \in [i_{o}, i_{1}]}{\arg} (p (i_{k}) \geq F_{th}))

On peut évidemment choisir comme intervalle de recherche i₀=0 et i₁=15. Avantageusement, on réduit cet intervalle de recherche, entraînant de ce fait une détection plus rapide et moins complexe. Par exemple, en choisissant i₀=8 au lieu de i₀=0.One can obviously choose as search interval i ₀ = 0 and i ₁ = 15. Advantageously, this search interval is reduced, thereby resulting in faster and less complex detection. For example, choosing i ₀ = 8 instead of i ₀ = 0.

De même, l'intervalle de recherche pourrait être limité un peu plus en choisissant i₁=12 au lieu de i₁=15.Similarly, the search interval could be limited a little more by choosing i ₁ = 12 instead of i ₁ = 15.

Dans l'exemple représenté, la fréquence seuil F_th est égale à 4 kHz. La valeur de cette fréquence exprimée en fréquence normalisée par rapport à 0.5 (correspondant à 6.4 kHz) vaut alors 0.3125 (soit 10240 =0.3125*32768 en virgule fixe Q15).In the example shown, the threshold frequency F _th is equal to 4 kHz. The value of this frequency expressed in frequency normalized with respect to 0.5 (corresponding to 6.4 kHz) is then worth 0.3125 (ie 10240 = 0.3125 * 32768 in fixed point Q15).

Un exemple de pseudo-code en langage informatique C de cette étape est donné ci-dessous.

 iHF= i1; move 16();
 FOR(i=i1-1; i>= i0; i--)
 {
   if(sub(p(i), Fth) >=0)
   {
       iHF = i; move16();
   }
   }

An example of pseudo-code in computer language C of this step is given below.

 iHF = i1; move 16 ();
 FOR (i = i1-1; i> = i0; i--)
 {
   if (sub (p (i), Fth)> = 0)
   { 
       iHF = i; move16 ();
   }
   }

Il est ensuite procédé, au cours d'une étape S2 représentée à la figure 1 , au calcul d'au moins un critère local sur le bloc courant B_n, à partir dudit paramètre spectral d'indice i_HF .It is then proceeded, during a step S2 represented at figure 1 , In calculating at least one local condition on the current block B _n, from said spectral parameter index i _HF.

Le critère choisi dans ce mode de réalisation est: $F (i_{HF}) = sign (i_{HF} - c) * {(2 i_{HF} - c)}^{2},$

où sign(x) = -1 si x<0, et sign(x) = 1 sinon, avec c= 21.The criterion chosen in this embodiment is:

F (i_{HF}) = sign (i_{HF} - vs) * {(2 i_{HF} - vs)}^{2},

where sign (x) = -1 if x <0, and sign (x) = 1 otherwise, with c = 21.

Un exemple de pseudo-code C de cette étape est donné ci-dessous:

 diff = shl(iHF, 1);
 diff = sub(diff, c);
 critLoc = L_mult0(diff, diff);
 if(diff < 0) {
   critLoc= L_negate(critLoc);
   }

An example of pseudo-C code of this step is given below:

 diff = shl (iHF, 1);
 diff = sub (diff, c);
 critLoc = L_mult0 (diff, diff);
 if (diff <0) {
   critLoc = L_negate (critLoc);
   }

A la suite de l'étape S2 précitée, une étape S3 représentée à la figure 1 consiste à décider si la bande de fréquence prédéterminée est détectée dans le bloc courant B_n, en fonction de l'un des critères qui a été calculé à l'étape S2.Following the aforementioned step S2, a step S3 represented in FIG. figure 1 is to decide whether the predetermined frequency band is detected in the current block B _n , according to one of the criteria that was calculated in step S2.

Préférentiellement, la décision est une décision souple donnée par le critère local calculé à l'étape précédente.Preferably, the decision is a flexible decision given by the local criterion calculated in the previous step.

Un exemple de pseudo-code C de cette étape est donné ci-dessous: $decLoc = critLoc; move 16 ();$

An example of pseudo-C code of this step is given below:

decLoc = critLoc; move 16 ();

En pratique, à l'issue de cette étape, le logo HD est destiné à s'afficher sur l'écran du terminal TER avec un contraste plus ou moins élevé qui correspond respectivement à une valeur plus ou moins élevée du critère calculé.In practice, at the end of this step, the HD logo is intended to be displayed on the TER terminal screen with a higher or lower contrast which respectively corresponds to a higher or lower value of the calculated criterion.

A titre d'alternative la décision est une décision dure déterminée par le critère local calculé à l'étape précédente.As an alternative, the decision is a hard decision determined by the local criterion calculated in the previous step.

Un exemple de pseudo-code C de cette étape alternative est donné ci-dessous:

   decLoc = 1; movel 16(); /* NB */
   if (critLoc<0)
   {
       decLoc = 1; move160();/* WB */
   }

An example of a pseudo-code C of this alternative step is given below:

 decLoc = 1; movel 16 (); / * NB * /
   if (critLoc <0)
   {
       decLoc = 1; move160 (); / * WB * /
   }

En pratique, à l'issue de cette étape alternative, le logo HD est destiné à s'afficher sur l'écran du terminal TER si le critère calculé est inférieur à 0, ou à ne pas s'afficher sinon.In practice, at the end of this alternative step, the HD logo is intended to be displayed on the TER terminal screen if the calculated criterion is less than 0, or not to be displayed otherwise.

Avantageusement, au cours de l'étape S4 optionnelle représentée à la figure 1 , pour augmenter la fiabilité de la détection, les détections locales sont lissées sur plusieurs blocs (nbCount > 1) par une fenêtre éventuellement glissante. Là encore, de façon similaire à l'étape précédente, la détection sur la fenêtre peut être une décision decGlob souple ou dure, que les détections locales aient été obtenues par décision souple ou dure.Advantageously, during the optional step S4 represented in FIG. figure 1 , To increase the reliability of detection, local detections are smoothed over several blocks (nbCount> 1) by an optionally sliding window. Again, similarly to the previous step, the detection on the window may be a soft or hard decGlob decision, whether the local detections were obtained by soft or hard decision.

Pour cela, les décisions locales (souples ou dures) sont stockées dans le tableau de décisions locales et sont utilisées pour mettre à jour le critère global critGlob. For this, local (soft or hard) decisions are stored in the local decision table and are used to update the critGlob global criterion .

Un exemple de pseudo-code C de cette étape est donné ci-dessous dans le cas où les décisions locales sont souples (decLoc = critLoc) et la décision globale dure:

Après une étape d'initialisation - mise à zéro des variables critGlob et ind, et du tableau tabDec[nbCount], pour chaque bloc de données pour lequel une décision locale decLoc a été déterminée :

       critGlob = L_sub(critGlob, tabDec[ind]);
       critGlob = L_add(critGlob, decLoc);
       tabDec[ind]= decLoc; move32();
       ind = add(ind, 1);
       if(sub(ind, nbCount) == 0)
       {
           ind = 0; move 16();
       }
       flagWB = 1; /* assume WB */
       if(critGlob > 0) {
           flagWB = 0; /* NB détecté */
       }

An example of pseudo-code C of this step is given below in the case where the local decisions are flexible (decLoc = critLoc) and the global decision lasts:

After an initialization step - zeroing of the variables critGlob and ind, and the array tabDec [nbCount], for each block of data for which a local decision decLoc has been determined:

 critGlob = L_sub (critGlob, tabDec [ind]);
       critGlob = L_add (critGlob, decLoc);
       tabDec [ind] = decLoc; move32 ();
       ind = add (ind, 1);
       if (sub (ind, nbCount) == 0) 
       {
           ind = 0; move 16 ();
       }
       flagWB = 1; / * assume WB * /
       if (critGlob> 0) {
           flagWB = 0; / * NB detected * /
       }

La décision globale est ici prise sur une fenêtre glissante.The overall decision is here taken on a slippery window.

Dans une variante de réalisation, la décision globale est prise sur des fenêtres ne se recouvrant pas. Dans ce cas, il est inutile de stocker un tableau de décisions locales, il suffit d'ajouter les décisions locales au critère global qui est réinitialisé à zéro au début de chaque fenêtre traitée. Un exemple de pseudo-code C de cette variante est donné ci-dessous dans le cas où les décisions locales sont souples (decLoc = critLoc) et la décision globale dure: Après une étape d'initialisation - mise à zéro des variables critGlob et ind, pour chaque bloc de données pour lequel une décision locale decLoc a été déterminée :

       critGlob = L_add(critGlob, decLoc);
       ind = add(ind, 1);
       IF (sub(ind, nbCount)==0)
       {
           ind = 0; movel 16();
           flagWB = 1; movel 16();
           /* assume WB */
           if(critGlob > 0) {
               flagWB = 0; move16();/* NB détecté */
           }
           critGlob = 0; move32();
       }

In an alternative embodiment, the overall decision is made on non-overlapping windows. In this case, there is no need to store a local decision array, just add the local decisions to the global criterion that is reset to zero at the beginning of each processed window. An example of the pseudo-code C of this variant is given below in the case where the local decisions are flexible (decLoc = critLoc) and the global decision lasts: After an initialization step - setting to zero the critGlob and ind variables , for each block of data for which a local decision decLoc has been determined:

 critGlob = L_add (critGlob, decLoc);
       ind = add (ind, 1);
       IF (sub (ind, nbCount) == 0)
       {
           ind = 0; movel 16 ();
           flagWB = 1; movel 16 ();
           / * assume WB * /
           if (critGlob> 0) {
               flagWB = 0; move16 (); / * NB detected * /
           }
           critGlob = 0; move32 ();
       }

L'application qui vient d'être décrite ci-dessus réalise ainsi un compromis entre le temps de réactivité de l'affichage ou non du logo HD et la fiabilité de la détection.The application that has just been described above thus makes a compromise between the reactivity time of the display or not of the HD logo and the reliability of the detection.

En outre la complexité des calculs est relativement faible comme le montre la table ci-dessous qui indique le poids de certaines des instructions mentionnées ci-dessus : Instructions Poids en complexité Label de l'instruction Accès en mémoire (écriture ou lecture) mot sur 16 bits 1 move16() Accès en mémoire (écriture ou lecture) mot sur 32 bits 2 move32() Addition/soustraction de 2 mots de 16 bits 1 add()/sub() Addition/soustraction de 2 mots de 32 1 L_add()/L_sub() Décalage binaire à gauche (multiplication par une puissance de 2) 1 shl() Multiplication de 2 mots de 16 bits 1 L_mult0() Test "simple" (suivi d'un seul opérateur de base simple) 0 if Boucle effectuée un nombre de fois N constant 4 FOR In addition the complexity of the calculations is relatively low as shown in the table below which indicates the weight of some of the instructions mentioned above: Instructions Weight in complexity Label of instruction Access memory (write or read) word on 16 bits 1 move16 () Memory access (write or read) 32-bit word 2 move32 () Addition / subtraction of 2 words of 16 bits 1 add () / sub () Addition / subtraction of 2 words of 32 1 L_ADD () / L_sub () Left binary shift (multiplication by a power of 2) 1 shl () Multiplication of 2 words of 16 bits 1 L_mult0 () "Simple" test (followed by a single basic operator) 0 yew Loop performed a constant number of times N 4 FOR

On va maintenant décrire une deuxième application du procédé de détection qui a été décrit plus haut en référence à la figure 1 , en vue de l'indication du nombre d'appels déposés en bande élargie sur un serveur de messagerie vocale mobile.We will now describe a second application of the detection method which has been described above with reference to the figure 1 , for the purpose of indicating the number of broadband calls placed on a mobile voice mail server.

Un tel serveur est désigné par la référence SER sur la figure 6B . Such a server is designated by the SER reference on the Figure 6B .

En particulier, un tel serveur comprend de façon classique :

un ensemble EBR de boîtes de réception de messages,
un module de communication COM2, par exemple de type IP,
une mémoire morte MEM2 qui contient un module GES de gestion des messages vocaux enregistrés dans les boîtes de réception de l'ensemble EBR précité.

In particular, such a server conventionally comprises:

an EBR set of message inboxes,
a communication module COM2, for example of IP type,
a memory MEM2 which contains a GES module for managing the voice messages recorded in the inboxes of the aforementioned EBR set.

La mémoire MEM2 contient en outre un module de décodage DO2 et un module d'encodage CO2 qui sont destinés si besoin respectivement à décoder, puis réencoder le contenu audio du message vocal déposé.The memory MEM2 furthermore contains a decoding module DO2 and a coding module CO2 which are destined respectively to decode and re-encode the audio content of the voice message deposited.

Une telle opération s'avère nécessaire par exemple dans le cas où le contenu audio du message vocal déposé a été codé initialement par un codeur qui est différent du codeur contenu dans le terminal destiné à consulter ledit message vocal ou proposé par le réseau lors de la consultation dudit message.Such an operation is necessary for example in the case where the audio content of the voice message deposited was initially coded by an encoder which is different from the encoder contained in the terminal intended to consult said voice message or proposed by the network during the consultation of said message.

Une telle opération peut également s'avérer nécessaire en vue de stocker un message vocal déposé dans un format de codage différent, ce qui peut être un choix de l'opérateur pour une application de type webmail par exemple qui vise à proposer le message sur la boite mail du propriétaire de la messagerie vocale.Such an operation may also be necessary in order to store a voice message deposited in a different coding format, which may be an operator's choice for an application such as webmail, which aims to propose the message on the mailbox of the owner of the voicemail.

Conformément à l'invention, la mémoire morte MEM2 ou bien une autre mémoire du serveur SER contient en outre :

un dispositif DET2 de détection d'une bande de fréquence prédéterminée, similaire au dispositif de détection DET représenté sur la figure 2 ,
un module de décodage partiel DP.

In accordance with the invention, the memory MEM2 or another memory of the SER server also contains:

a detecting device DET2 of a predetermined frequency band, similar to the detecting device DET shown in FIG. figure 2 ,
a partial decoding module DP.

Dans le cas où les messages vocaux déposés dans le serveur SER sont des flux codés qui n'ont pas besoin d'être immédiatement décodés puis réencodés par le module de décodage DO2 et le module d'encodage CO2 respectivement, parce que par exemple, l'application de webmail n'est pas disponible chez l'opérateur, le module de décodage partiel DP est apte, préalablement à la détection du contenu HF, à décoder une partie seulement des 15 premiers coefficients ISF et éventuellement l'indicateur VAD. Une telle disposition est possible compte tenu de la quantification vectorielle des coefficients ISF selon deux sous-vecteurs, telle que mise en oeuvre dans un codeur du type AMR-WB. Il convient de rappeler qu'une telle quantification est mise en oeuvre à l'aide d'une combinaison bien connue de l'Homme du métier d'une méthode de quantification de type codes-produits SVQ (abréviation anglaise de "Split Vector Quantization") et d'une méthode de quantification de type multi-étages MSVQ (abréviation anglaise de « Multi Stage Vector Quantization »).In the case where the voice messages deposited in the SER server are coded streams that do not need to be immediately decoded and then reencoded by the decoding module DO2 and the CO2 encoding module respectively, because for example, the webmail application is not available at the operator, the partial decoding module DP is able, prior to the detection of the RF content, to decode only part of the first 15 ISF coefficients and possibly the VAD indicator. Such an arrangement is possible taking into account the vector quantization of the ISF coefficients according to two sub-vectors, as implemented in an AMR-WB type encoder. It should be remembered that such quantization is implemented using a combination well known to those skilled in the art of a quantization method of product-code SVQ (abbreviation of " Split Vector Quantization "). ) and a multistage MSVQ quantization method (abbreviation of "Multi Stage Vector Quantization").

Ainsi, conformément à l'invention, le module de décodage DP ne décode que le deuxième sous-vecteur des coefficients ISF, c'est-à-dire celui qui contient les huit derniers coefficients ISF d'indice les plus élevés, dont la distribution est plus susceptible de démontrer la présence de contenu HF. Eventuellement, le module de décodage DP décode l'indicateur VAD.Thus, in accordance with the invention, the decoding module DP decodes only the second sub-vector of the ISF coefficients, that is to say the one containing the last eight highest index ISF coefficients, whose distribution is more likely to demonstrate the presence of HF content. Optionally, the decoding module DP decodes the VAD indicator.

Une telle disposition permet avantageusement de réduire la complexité calculatoire de la détection de la bande de fréquence du flux audio codé. Une telle disposition permet en outre d'économiser les ressources de la mémoire MEM2 par élimination des instructions de décodage du premier sous-vecteur des coefficients ISF et du stockage de ses dictionnaires de quantification vectorielle.Such an arrangement advantageously makes it possible to reduce the computational complexity of detecting the frequency band of the coded audio stream. Such an arrangement also makes it possible to save the resources of the memory MEM2 by eliminating the decoding instructions of the first sub-vector of the ISF coefficients and the storage of its vector quantization dictionaries.

Sur la base d'une partie des coefficients spectraux décodés ainsi obtenus, le dispositif de détection DET2 du serveur SER met alors en oeuvre directement le procédé de détection de bande de fréquence prédéterminée tel que décrit à la figure 1 . On the basis of a part of the decoded spectral coefficients thus obtained, the detection device DET2 of the server SER then implements directly the predetermined frequency band detection method as described in FIG. figure 1 .

Les étapes S0 à S4 de ce procédé sont similaires à celles qui viennent d'être décrites ci-dessus en liaison avec le terminal TER de la figure 6A . Elles ne seront donc pas décrites à nouveau.The steps S0 to S4 of this method are similar to those just described above in connection with the terminal TER of the Figure 6A . They will not be described again.

Dans cette deuxième application plus particulièrement, le fait de limiter le décodage à une partie seulement des paramètres spectraux permet avantageusement, au profit d'un coût de traitement faible, d'identifier sur les trames codées par un codeur à prédiction linéaire tel que l'AMR-WB, si le contenu codé a bien des composantes hautes fréquences et donc s'il est réellement HD et ainsi d'avoir des informations pertinentes de la bande audio des contenus au niveau d'un système n'effectuant pas de décodage des flux binaires (tel qu'un serveur de messagerie vocale).In this second application more particularly, the fact of limiting the decoding to only a part of the spectral parameters advantageously makes it possible, in favor of a low processing cost, to identify on the frames coded by a linear prediction coder such as the AMR-WB, if the coded content has indeed high frequency components and therefore if it is really HD and thus have relevant information of the audio band contents at a system not performing decoding of the streams binaries (such as a voicemail server).

Selon une alternative qui correspond au cas où les messages vocaux déposés dans le serveur SER sont des flux codés qui ont besoin d'être décodés puis réencodés par le module de décodage DO2 et le module d'encodage CO2 respectivement (ex : application webmail), le module de décodage DP fonctionne alors de la même façon que le module de décodage DO1 qui a été décrit en référence à la figure 6A . According to an alternative which corresponds to the case where the voice messages deposited in the SER server are coded streams which need to be decoded and then reencoded by the decoding module DO2 and the CO2 encoding module respectively (ex: webmail application), the decoding module DP then operates in the same way as the decoding module DO1 which has been described with reference to the Figure 6A .

Il va de soi que les modes de réalisation qui ont été décrits ci-dessus ont été donnés à titre purement indicatif et nullement limitatif, et que de nombreuses modifications peuvent être facilement apportées par l'homme de l'art sans pour autant sortir du cadre de l'invention, telle que définie par les revendications ci-jointes. Ainsi par exemple, le procédé de détection d'une bande de fréquence prédéterminée, au lieu d'être utilisé dans un serveur de messagerie en mode décodage partiel, pourrait être utilisé de façon similaire dans une sonde se trouvant en coupure d'un flux audio.It goes without saying that the embodiments which have been described above have been given for purely indicative and non-limiting purposes, and that many modifications can easily be made by those skilled in the art without departing from the scope. of the invention, as defined by the appended claims. Thus, for example, the method of detecting a predetermined frequency band, instead of being used in a partial decode mode mail server, could be used in a similar manner in a probe that is cut off from an audio stream. .

En outre, le procédé de détection d'une bande de fréquence prédéterminée n'est pas obligatoirement limité aux contenus codés par un codeur bande élargie. Cette largeur de bande peut aussi être variable.In addition, the method for detecting a predetermined frequency band is not necessarily limited to the contents coded by an enlarged band coder. This bandwidth can also be variable.

De même le procédé de détection pourrait être mis en oeuvre pour détecter un contenu en bande de basses fréquences au lieu d'un contenu en bande de hautes fréquences. Dans ce cas, comme mentionné précédemment l'étape de détermination S2 précitée consisterait naturellement à rechercher, parmi au moins une pluralité de paramètres spectraux préalablement décodés de l'ensemble de paramètres spectraux, l'indice du plus grand paramètre spectral inférieur à une fréquence seuil.Similarly, the detection method could be implemented to detect a low frequency band content instead of a high frequency band content. In this case, as previously mentioned, the above-mentioned determination step S2 would naturally consist of searching, among at least a plurality of previously decoded spectral parameters of the set of spectral parameters, of the index of the largest spectral parameter less than a threshold frequency. .

La fréquence seuil F_th pourrait par ailleurs varier au cours de l'une des applications précitées.The threshold frequency F _th could also vary during one of the aforementioned applications.

Le procédé de détection peut être également mis en oeuvre selon plusieurs variantes, tant dans le choix des critères, dans la manière de combiner éventuellement plusieurs critères, ou bien dans l'utilisation de décisions souples ou dures, tant localement que globalement. Selon la variante sélectionnée, il est alors possible d'optimiser le compromis complexité/fiabilité/réactivité de la détection.The detection method can also be implemented according to several variants, both in the choice of criteria, in the manner of possibly combining several criteria, or in the use of flexible or hard decisions, both locally and globally. Depending on the variant selected, it is then possible to optimize the complexity / reliability / reactivity compromise of the detection.

Enfin, bien que l'invention ait été décrite en liaison avec un réseau de communication mobile, cette dernière peut bien entendu être mise en oeuvre en liaison avec d'autres types de réseaux de communication (réseau fixe de type RTC, VoIP mobile, etc...) dans lesquels est susceptible d'être utilisé un codeur à prédiction linéaire.Finally, although the invention has been described in connection with a mobile communication network, the latter can of course be implemented in connection with other types of communication networks (fixed network type RTC, mobile VoIP, etc. ...) in which a linear prediction encoder can be used.

Claims

Method of detection of a predetermined frequency band in an audio data signal which has been previously coded according to a succession of data blocks (B₁, B₂, ..., B_Z), among which at least certain blocks contain respectively at least one set of spectral parameters representing a linear predictive filter, the predetermined frequency band to be detected being the band of the low frequencies or the band of the high frequencies, said detection method implementing, for a current block (B_n) among said at least certain blocks and of which at least one plurality of spectral parameters of said set have been previously decoded, said decoded spectral parameters having an ordered subset of spectral parameters which extends over a predetermined frequency spectrum, the steps consisting in:
- determining (S1), among said subset of previously decoded and ordered spectral parameters, the index of the first spectral parameter closest to a threshold frequency,

- calculating (S2) a criterion for detecting a predetermined frequency band on the basis of said index determined, the criterion being based on the comparison of the distance between two successive parameters among said subset of previously decoded and ordered spectral parameters with respect to said index determined and/or on a mathematical function using as parameter said index determined,

- deciding (S3) whether said predetermined frequency band is detected in said current block, as a function of the criterion calculated.
Method of detection according to Claim 1, in the course of which all the spectral parameters of said set are decoded beforehand.
Method according to Claim 1 or Claim 2, in the course of which, in the case where among said succession of data blocks, certain blocks each contain a set of spectral parameters representing a linear predictive filter and certain other blocks each contain a set of spectral parameters obtained by frequency transformation, only the blocks each containing a set of spectral parameters representing a linear predictive filter are considered with a view to said detection.
Method of detection according to any one of Claims 1 to 3, in the course of which, when said predetermined frequency band to be detected is the band of the high frequencies, said determining step consists in searching for the index of the first spectral parameter above a threshold frequency.
Method of detection according to any one of Claims 1 to 3, in the course of which, when said predetermined frequency band to be detected is the band of the low frequencies, said determining step consists in searching for the index of the last spectral parameter below a threshold frequency.
Method of detection according to any one of Claims 1 to 4, in the course of which the current block contains data representative of voice activity.
Method of detection according to any one of Claims 1 to 6, in the course of which said criterion is calculated by comparison between:
- the maximum value (d_max ) of the distance between two neighboring decoded spectral parameters, said value being estimated with respect to the value of the index of the first decoded spectral parameter which has been obtained on completion of said determining step,

- the minimum value (d_min ) of the distance between two neighboring decoded spectral parameters, said value being estimated with respect to the value of the index of the first decoded spectral parameter which has been obtained on completion of said determining step.
Method of detection according to any one of Claims 1 to 7, in the course of which, subsequent to said decision step implemented for said current block, a global decision step (S4) is implemented by smoothing of the result of said decision step and of K earlier decision results, relating respectively to K blocks preceding said current block.
Detection device adapted to implement the method of detection according to any one of Claims 1 to 8.
Detection device according to Claim 9, said device being able to be contained in a communication terminal (TER) or else in a voice messaging server (SER).
Computer program comprising instructions adapted for implementing the method of detection according to any one of Claims 1 to 8, when said method of detection is executed on a computer.
Recording medium readable by a computer on which is recorded a computer program comprising instructions adapted for the execution of the steps of the method of detection according to any one of Claims 1 to 8, when said program is executed by a computer.