FR2857552A1

FR2857552A1 - Signal decoding process for sound scene reconstruction, involves generating two frequency spectrums from signal and canceling imaginary part of continuous components and at half of sample frequency of spectrum

Info

Publication number: FR2857552A1
Application number: FR0308579A
Authority: FR
Inventors: Jean Bernard Rault; Pierrick Philippe
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2003-07-11
Filing date: 2003-07-11
Publication date: 2005-01-14
Anticipated expiration: 2023-07-11
Also published as: FR2857552B1

Abstract

The process involves generating two frequency spectrums from a signal and canceling imaginary part of continuous components and at Fe/2 of the spectrum. Fe is a sample frequency of the spectrums. The spectrums grouped in pairs are pre-matrixed to form a combined spectrum from each spectrum pair. The combined spectrum is frequency-time transformed for obtaining a combined time signal. Independent claims are also included for the following: (a) a signal decoding device (b) a computer program comprising instructions for executing a signal decoding procedure.

Description

Procédé de décodage d'un signal permettant de reconstituer une scèneMethod for decoding a signal for reconstructing a scene

sonore à transformation temps-fréquence faible complexité, et dispositif correspondant. low frequency time-frequency transformation sound, and corresponding device.

Le domaine de l'invention est celui du décodage de signaux, et notamment de signaux représentatifs d'une scène sonore. L'invention s'inscrit notamment, mais non exclusivement, dans le cadre de la norme MPEG-4 Audio (et plus précisément MPEG-4 Extension 2) pour le codage audionumérique haute qualité à faible débit. The field of the invention is that of the decoding of signals, and in particular of signals representative of a sound scene. The invention notably, but not exclusively, falls within the scope of the MPEG-4 Audio standard (and more specifically MPEG-4 Extension 2) for high-quality, low-bit-rate digital audio coding.

Une scène sonore est classiquement constituée d'un ensemble d'objets sonores, d'intensités différentes, caractérisés par leur position au sein de la scène. On peut ainsi imaginer une scène sonore représentative d'un orchestre, dans lequel les violons, les clarinettes et le piano sont chacun associés à un emplacement précis de la scène. En outre, les sons issus de chacun des instruments sont plus ou moins puissants, en fonction de l'instrument considéré et de la partition jouée. A sound stage is conventionally composed of a set of sound objects, of different intensities, characterized by their position within the scene. One can thus imagine a sound stage representative of an orchestra, in which the violins, the clarinets and the piano are each associated with a precise location of the stage. In addition, the sounds from each instrument are more or less powerful, depending on the instrument and the score played.

Afin de restituer fidèlement une telle scène sonore, après enregistrement et transmission par exemple, il est donc nécessaire de reconstruire les effets stéréophoniques associés à cette scène. In order to reproduce faithfully such a sound scene, after recording and transmission, for example, it is necessary to reconstruct the stereophonic effects associated with this scene.

Les techniques actuelles mises en oeuvre pour la représentation paramétrique des effets stéréophoniques reposent sur l'extraction, dans un signal complexe, des objets sonores dominants et de leurs indices de localisation dans la scène sonore. The current techniques used for the parametric representation of stereophonic effects are based on the extraction, in a complex signal, of the dominant sound objects and their location indices in the soundscape.

Les indices de localisation ainsi extraits sont donnés, le plus souvent, sous la forme de différences d'intensité et de déphasages temporels entre les différents objets sonores de la scène, encore appelés différences d'intensité et déphasages temporels interauraux . The location indices thus extracted are given, in most cases, in the form of intensity differences and phase shifts between the different sound objects of the scene, also called intensity differences and interaural temporal phase shifts.

Au lieu de construire un signal complexe correspondant à la somme brute de tous les sons de la scène sonore, on peut utiliser ces indices pour combiner les objets constituants la scène sonore afin de former un signal moins complexe que le signal analysé. On peut ainsi réduire le nombre de canaux en passant d'un signal stéréophonique à deux canaux à un signal monophonique. Instead of constructing a complex signal corresponding to the raw sum of all the sounds of the sound stage, these indices can be used to combine the objects constituting the sound scene to form a signal less complex than the analyzed signal. It is thus possible to reduce the number of channels by switching from a two-channel stereophonic signal to a monophonic signal.

A l'inverse, lorsque l'on cherche à restituer la scène sonore à partir du signal ainsi construit, il est possible de re-séparer les objets sonores à partir du signal combiné pour reconstituer une scène sonore proche de l'originale. Conversely, when one seeks to restore the sound scene from the signal thus constructed, it is possible to re-separate the sound objects from the combined signal to reconstruct a sound scene close to the original.

Ces techniques, en permettant une réduction du nombre de canaux à traiter, typiquement de deux vers un, pour un surcoût faible par rapport à une approche purement monophonique, sont particulièrement avantageuses pour réaliser une compression audionumérique. These techniques, by making it possible to reduce the number of channels to be processed, typically from two to one, for a small additional cost compared to a purely monophonic approach, are particularly advantageous for performing digital audio compression.

Ce surcoût, de l'ordre de 2 à 4 kbit/s, est principalement lié à l'encodage et à la transmission des indices de localisation extraits et ayant servi à construire le signal combiné. Il est ainsi possible, grâce à ces techniques, de disposer d'un signal codé en stéréo à très bas débit (c'est-à-dire en deçà de 24 kbit/s). This additional cost, of the order of 2 to 4 kbit / s, is mainly related to the encoding and transmission of extracted location indices and used to build the combined signal. It is thus possible, thanks to these techniques, to have a signal coded in stereo at very low bit rate (that is to say below 24 kbit / s).

Ces techniques font l'objet de la phase dite MPEG-4 Extension 2 pour le codage audionumérique haute qualité à faible débit (24 kbit/s par voie pleine bande) du comité MPEG (pour Moving Picture Experts Group , en français Groupe d'Experts en Codage d'Images ) de l'ISO (pour International Standardisation Organisation , en français Organisation de normalisation internationale ). These techniques are the subject of the so-called MPEG-4 Extension 2 phase for the high-quality, low-bandwidth 24 kbit / s full-band audio coding of the MPEG (Moving Picture Experts Group) panel. in Image Coding) of the ISO (for International Standardization Organization, in French International Standardization Organization).

Notamment, une technique, appelée Parametric Stereo (PS, en français Stéréo paramétrique ) s'appuie sur un codage paramétrique de type sinusoïdal (SSC pour SinuSoidal Coding , en français codage sinusoïdal ) pour encoder le signal combiné qui est monophonique. In particular, a technique called Parametric Stereo (PS) is based on a sinusoidal parametric encoding (SSC for Sinusoidal Coding) to encode the combined signal which is monophonic.

Par rapport aux techniques traditionnelles, le PS permet de prendre aussi en compte la corrélation inter objets sonores, en plus de leurs indices de localisation. Le schéma générique de fonctionnement de cette technique est illustré sur la figure 1. Compared to traditional techniques, the PS allows to take into account the correlation between sound objects, in addition to their location indices. The generic scheme of operation of this technique is illustrated in FIG.

On capture, à partir de la scène sonore, les signaux 1(n) et r(n) correspondant respectivement aux échantillons temporels gauche et droite du 30 signal sonore global associé à la scène. The signals 1 (n) and r (n) corresponding to the left and right temporal samples of the overall sound signal associated with the scene are captured from the sound scene.

Ces signaux 1(n) et r(n) sont analysés par un bloc d'analyse 10, afin d'identifier quels sont les objets sonores dominants, quelles sont les corrélations existant entre les différents objets de la scène, ainsi que leurs indices de localisation. These signals 1 (n) and r (n) are analyzed by an analysis block 10, in order to identify which are the dominant sound objects, which are the correlations existing between the different objects of the scene, as well as their indices of location.

En sortie du bloc d'analyse 10, on récupère ainsi les paramètres ild, itd et rho, qui sont respectivement les différences de niveaux, les déphasages temporels et les corrélations interauraux. Ces paramètres sont donnés par bande de fréquences (b) et par trame (nT). At the output of the analysis block 10, the parameters ild, itd and rho, which are respectively the level differences, the phase shifts and the interaural correlations, are thus recovered. These parameters are given per frequency band (b) and per frame (nT).

Ils alimentent, avec les signaux 1(n) et r(n), une matrice 11, délivrant en 10 sortie un signal simple, par exemple monophonique m(n), et un signal décorrélé d(n) obtenu par filtrage du signal m(n). They feed, with the signals 1 (n) and r (n), a matrix 11, outputting a single signal, for example monophonic m (n), and a decorrelated signal d (n) obtained by filtering the signal m (not).

Le signal m(n) et les paramètres ild, itd et rho sont ensuite codés par le codeur SSC 12, et transmis le long d'un canal de transmission qui n'a pas été représenté sur la figure 1. The signal m (n) and the parameters ild, itd and rho are then encoded by the SSC encoder 12, and transmitted along a transmission channel which has not been shown in FIG.

Le signal transmis est ensuite reçu et décodé par le décodeur SSC 13, qui en extrait une estimation ild', itd' et rho' des paramètres ild, itd et rho, ainsi qu'un signal estimé m' (n). The transmitted signal is then received and decoded by the decoder SSC 13, which extracts an estimate ild ', itd' and rho 'of the parameters ild, itd and rho, as well as an estimated signal m' (n).

Par filtrage du signal m'(n) dans le décorrélateur 14, on récupère un signal décorrélé d'(n) qui, avec le signal m'(n) et les paramètres ild', itd' et rho', alimente une matrice 15 inverse de la matrice 11 utilisée lors du codage. Cette matrice inverse 15 délivre en sortie les signaux droite et gauche r' (n) et l' (n) estimés permettant de reconstruire la scène sonore. By filtering the signal m '(n) in the decorrelator 14, a decorrelated signal of (n) is recovered which, with the signal m' (n) and the parameters ild ', itd' and rho ', feeds a matrix 15 inverse of the matrix 11 used during the coding. This inverse matrix 15 outputs the left and right signals r '(n) and the (n) estimated to reconstruct the sound scene.

La figure 2 illustre plus en détail le principe mis en oeuvre lors du décodage du signal reçu, en vue de la restitution de la scène sonore. FIG. 2 illustrates in more detail the principle implemented during the decoding of the received signal, with a view to restoring the sound scene.

Le signal d(n) qui était présent à l'encodeur, est reconstruit au décodeur par décorrélation temporelle 14 du signal m(n) décodé, i.e. m'(n) . Ensuite les deux signaux m'(n) et d'(n) sont traités à l'aide d'une transformée de Fourier (FFT) 20, 21 par signal afin de calculer leur spectre M'(k) et D'(k). The signal d (n) which was present at the encoder, is reconstructed at the decoder by time decorrelation 14 of the decoded signal m (n), i.e. m '(n). Then the two signals m '(n) and d' (n) are processed by means of a Fourier transform (FFT) 20, 21 by signal in order to calculate their spectrum M '(k) and D' (k ).

Les spectres M'(k) et D'(k) sont fournis, avec les paramètres ild', itd' et 30 rho', en entrée de la matrice inverse M- 15, qui délivre les spectres des signaux gauche et droite L'(k) et R'(k). Ces spectres subissent ensuite une transformée de Fourier inverse IFFT 22, 23, permettant de récupérer les échantillons temporels gauche et droite 1' (n) et r' (n). The spectra M '(k) and D' (k) are provided, with the parameters ild ', itd' and rho ', at the input of the inverse matrix M-15, which delivers the spectra of the left and right signals L' (k) and R '(k). These spectra are then subjected to an IFFT inverse Fourier transform 22, 23, making it possible to recover the left and right temporal samples 1 '(n) and r' (n).

L'opération de matriçage mise en oeuvre au décodeur (M-') 15 utilise des coefficients complexes, calculés à partir des coefficients de localisation et dé-corrélation décodés, ild', itd' et rho'. En particulier, ces coefficients complexes ne sont pas purement réels aux indices 0 et N/2, où N est la taille de la FFT inverse utilisée. Ceci a pour conséquence que les signaux fréquentiels obtenus en sortie de matriçage 15 ne sont pas purement réels aux indices 0 et N/2. The matrixing operation implemented at the decoder (M '') uses complex coefficients, calculated from the decoded localization and decorrelation coefficients, ild ', itd' and rho '. In particular, these complex coefficients are not purely real to indices 0 and N / 2, where N is the size of the inverse FFT used. This has the consequence that the frequency signals obtained at the matrixing output 15 are not purely real at the indices 0 and N / 2.

Les signaux temporels l'(n) et r'(n) délivrés en sortie des opérateurs de Fourier inverses 22, 23 ne sont donc pas purement réels. The time signals I (n) and r '(n) delivered at the output of the inverse Fourier operators 22, 23 are therefore not purely real.

Selon la technique de l'art antérieur proposée par la norme MPEG-4 Audio, on utilise donc une transformation fréquence-temps (de type transformation de Fourier inverse) par canal (droite et gauche) et on prend la partie réelle en sortie pour obtenir les signaux temporels droite r'(n) et gauche 1' (n). According to the technique of the prior art proposed by the MPEG-4 Audio standard, a frequency-time transformation (of inverse Fourier transform type) is used per channel (right and left) and the real part at the output is taken to obtain the right temporal signals r '(n) and left 1' (n).

Un inconvénient de cette technique de l'art antérieur est qu'elle est complexe à mettre en oeuvre. Notamment, un inconvénient de cette technique de l'art antérieur est qu'elle est coûteuse en opérations puisqu'elle impose de réaliser deux transformées fréquence-temps, associées respectivement à chacun des canaux droite et gauche. A disadvantage of this technique of the prior art is that it is complex to implement. In particular, a disadvantage of this technique of the prior art is that it is expensive in operations since it requires two frequency-time transforms, respectively associated with each of the right and left channels.

L'invention a notamment pour objectif de pallier ces inconvénients de l'art antérieur. The invention particularly aims to overcome these disadvantages of the prior art.

Plus précisément, un objectif de l'invention est de fournir une technique de décodage d'un signal permettant de reconstituer une scène sonore qui soit moins complexe que les techniques de l'art antérieur. L'invention a notamment pour objectif de simplifier l'opération de transformation fréquence-temps mise en oeuvre dans une telle technique de décodage. More specifically, an object of the invention is to provide a decoding technique of a signal for reconstructing a sound scene that is less complex than the techniques of the prior art. The invention particularly aims to simplify the frequency-time transformation operation implemented in such a decoding technique.

Un autre objectif de l'invention est de mettre en oeuvre une telle technique qui soit moins coûteuse en termes de ressources (mémoire, capacité de calcul,...) que les techniques de l'art antérieur. Another object of the invention is to implement such a technique which is less expensive in terms of resources (memory, computing capacity, ...) than the techniques of the prior art.

L'invention a encore pour objectif de fournir une telle technique qui ne 5 dégrade pas de façon sensible la qualité de la scène sonore restituée par rapport aux techniques de l'art antérieur. It is another object of the invention to provide such a technique which does not significantly degrade the quality of the reconstructed sound stage compared with prior art techniques.

Ces objectifs, ainsi que d'autres qui apparaîtront par la suite, sont atteints à l'aide d'un procédé de décodage d'un signal, permettant de reconstituer une scène sonore, comprenant une étape de génération, à partir dudit signal, d'au moins deux spectres fréquentiels correspondant chacun à une partie de ladite scène sonore. These objectives, as well as others which will appear later, are achieved by means of a method of decoding a signal, making it possible to reconstruct a sound scene, comprising a step of generating, from said signal, at least two frequency spectra each corresponding to a part of said sound scene.

Selon l'invention, on annule la partie imaginaire d'au moins certaines composantes desdits spectres, de façon à simplifier la transformation ultérieure desdits spectres dans le domaine temporel. According to the invention, the imaginary part of at least some components of said spectra is canceled, so as to simplify the subsequent transformation of said spectra in the time domain.

Ainsi, l'invention repose sur une approche tout à fait nouvelle et inventive du décodage d'un signal sonore. En effet, l'invention permet de fortement simplifier l'opération de transformation fréquence-temps des spectres traités dans le cadre de ce décodage, grâce à la mise à zéro de certaines composantes fréquentielles de ces spectres. Ainsi, on choisit de dégrader certaines raies du spectre, ce qui va à l'encontre des préjugés de l'Homme du Métier qui considère cette opération comme nuisible à la qualité de la restitution ultérieure de la scène sonore. Thus, the invention is based on a completely new and inventive approach to decoding a sound signal. In fact, the invention makes it possible to greatly simplify the frequency-time transformation operation of the spectra processed in the context of this decoding, by the zeroing of certain frequency components of these spectra. Thus, one chooses to degrade certain lines of the spectrum, which goes against the prejudices of the skilled person who considers this operation as detrimental to the quality of the subsequent restitution of the sound stage.

Une telle annulation de certaines des composantes permet avantageusement de garantir que les spectres traités sont à symétrie hermitienne, 25 et facilite donc leur transformation dans le domaine temporel. Such a cancellation of some of the components advantageously ensures that the processed spectra are hermit symmetric, and thus facilitates their transformation in the time domain.

Préférentiellement, on annule la partie imaginaire des composantes continue et à Fe/2 desdits spectres, où Fe est une fréquence d'échantillonnage desdits spectres. Preferably, the imaginary part of the DC and Fe / 2 components of said spectra is canceled, where Fe is a sampling frequency of said spectra.

On agit ainsi sur les composantes très basse et très haute fréquences 30 desdits spectres, qui ne sont pas ou très peu audibles pour l'oreille humaine. This acts on the very low and very high frequency components of said spectra, which are not or very little audible for the human ear.

Avantageusement, un tel procédé de décodage comprend également une étape de pré-matriçage d'au moins certains desdits spectres groupés par paires, permettant d'élaborer un spectre combiné à partir de chacune desdites paires de spectres. Advantageously, such a decoding method also comprises a step of pre-mastering at least some of said pairs of spectra, making it possible to develop a combined spectrum from each of said pairs of spectra.

On crée ainsi un spectre complexe, dont les parties réelle et imaginaire sont une combinaison astucieuse des spectres de départ. Cette combinaison est réalisée comme suit: pour les raies fréquentielles représentant la première partie du spectre, de la raie 0 à la raie N/2 -1, N étant le nombre de raies de la représentation de Fourier des spectres, la partie réelle du spectre combiné est obtenue en ajoutant la partie réelle du premier desdits spectres à l'opposée de la partie imaginaire du second desdits spectres et la partie imaginaire du spectre combiné est obtenue en ajoutant la partie imaginaire du premier desdits spectres à la partie réelle du second desdits spectres; - pour les raies fréquentielles représentant la deuxième partie du spectre, de la raie N/2 à la raie N-1, la partie réelle du spectre combiné est obtenue en ajoutant la partie réelle du premier desdits spectres à la partie imaginaire du second desdits spectres, en inversant le sens de lecture des raies de ces deux derniers spectres, et la partie imaginaire du spectre combiné est obtenue en ajoutant l'opposé de la partie imaginaire du premier desdits spectres à la partie réelle du second desdits spectres, en inversant également le sens de lecture des raies de ces deux derniers spectres. This creates a complex spectrum whose real and imaginary parts are a clever combination of the starting spectra. This combination is performed as follows: for the frequency lines representing the first part of the spectrum, from the line 0 to the line N / 2 -1, N being the number of lines of the Fourier representation of the spectra, the real part of the spectrum combined is obtained by adding the real part of the first of said spectra to the opposite of the imaginary part of the second of said spectra and the imaginary part of the combined spectrum is obtained by adding the imaginary part of the first of said spectra to the real part of the second of said spectra ; for the frequency lines representing the second part of the spectrum, from the N / 2 line to the N-1 line, the real part of the combined spectrum is obtained by adding the real part of the first of said spectra to the imaginary part of the second of said spectra , by inverting the reading direction of the lines of these last two spectra, and the imaginary part of the combined spectrum is obtained by adding the opposite of the imaginary part of the first of said spectra to the real part of the second of said spectra, also inverting the reading direction of the lines of these last two spectra.

Préférentiellement, un tel procédé de décodage comprend également une étape de transformation fréquence-temps dudit spectre combiné, de façon à obtenir un signal temporel combiné. Preferably, such a decoding method also comprises a frequency-time transformation step of said combined spectrum, so as to obtain a combined time signal.

De façon avantageuse, un tel procédé de décodage comprend une étape de post-matriçage dudit signal temporel combiné, de façon à obtenir deux signaux temporels associés respectivement à chacun desdits spectres. Advantageously, such a decoding method comprises a step of post-mastering said combined time signal, so as to obtain two time signals respectively associated with each of said spectra.

Ces deux signaux temporels correspondent respectivement aux parties 30 réelle et imaginaire du signal temporel combiné. These two time signals respectively correspond to the real and imaginary parts of the combined time signal.

Selon une variante avantageuse de l'invention, lesdits signaux temporels sont les signaux droite et gauche d'une représentation stéréophonique de ladite scène sonore. According to an advantageous variant of the invention, said temporal signals are the right and left signals of a stereophonic representation of said sound scene.

Avantageusement, ledit signal permettant de générer lesdits spectres 5 fréquentiels est un signal monophonique. Advantageously, said signal for generating said frequency spectrums is a monophonic signal.

Selon une caractéristique avantageuse de l'invention, lorsque N paires de spectres fréquentiels sont générées à partir dudit signal, lesdites étapes de transformation fréquence-temps et de post-matriçage sont mises en oeuvre sur chacune desdites N paires de spectres. According to an advantageous characteristic of the invention, when N pairs of frequency spectra are generated from said signal, said frequency-time transformation and post-matrixing steps are implemented on each of said N pairs of spectra.

En effet, on peut envisager de travailler, non pas sur une représentation stéréophonique de la scène sonore, présentant deux voies droite et gauche, mais sur une représentation multi-canaux de type 5.1 ou 6.1 par exemple. L'ensemble des opérations de pré-matriçage, transformation fréquencetemps et post-matriçage sont alors réalisées sur des paires de spectres: on peut par exemple grouper les signaux arrière-droite et arrière-gauche dans un premier spectre combiné, et les signaux avant-droite et avantgauche dans un second spectre combiné. Seule la voie centrale est alors traitée de manière indépendante. Indeed, we can consider working, not on a stereophonic representation of the sound scene, with two right and left channels, but on a multi-channel representation of type 5.1 or 6.1 for example. The set of pre-mastering operations, frequency transformation, time and post-mastering, are then performed on pairs of spectra: for example, the rear-right and back-left signals can be grouped into a first combined spectrum, and the front signals right and left in a second combined spectrum. Only the central channel is then treated independently.

A l'issue de l'opération de post-matriçage, on récupère deux paires de signaux temporels ((arrière-droite, arrière-gauche) et (avant-droite, avant-gauche)) représentatifs des différentes parties de la scène sonore. At the end of the post-stamping operation, two pairs of temporal signals ((right-rear, left-back) and (right-front, left-front)) representative of the different parts of the sound scene are recovered.

L'invention concerne aussi un dispositif de décodage d'un signal, permettant de reconstituer une scène sonore, comprenant des moyens de génération d'au moins deux spectres fréquentiels à partir dudit signal. The invention also relates to a device for decoding a signal, for reconstructing a sound scene, comprising means for generating at least two frequency spectra from said signal.

Selon l'invention, un tel dispositif comprend des moyens d'annulation de la partie imaginaire d'au moins certaines composantes desdits spectres, de façon à simplifier la transformation ultérieure desdits spectres dans le domaine temporel. According to the invention, such a device comprises means for canceling the imaginary part of at least some components of said spectra, so as to simplify the subsequent transformation of said spectra in the time domain.

Préférentiellement, un tel dispositif comprend des moyens de: pré-matriçage desdits spectres en un spectre combiné ; - transformation fréquence-temps dudit spectre combiné, de façon à obtenir 30 un signal temporel combiné ; - post-matriçage dudit signal temporel combiné, de façon à obtenir deux signaux temporels associés respectivement à chacun desdits spectres. Preferably, such a device comprises means for: pre-mastering said spectra in a combined spectrum; frequency-time transformation of said combined spectrum, so as to obtain a combined time signal; post-matrixing of said combined time signal, so as to obtain two time signals respectively associated with each of said spectra.

L'invention concerne encore un programme d'ordinateur comprenant des instructions de code de programme pour l'exécution des étapes du procédé de décodage décrit précédemment lorsque ledit programme est exécuté sur un ordinateur. The invention further relates to a computer program comprising program code instructions for performing the steps of the decoding method described above when said program is executed on a computer.

D'autres caractéristiques et avantages de l'invention apparaîtront plus clairement à la lecture de la description suivante d'un mode de réalisation préférentiel, donné à titre de simple exemple illustratif et non limitatif, et des dessins annexés, parmi lesquels: - la figure 1, déjà commentée en relation avec l'art antérieur, présente un synoptique du schéma de codage-décodage mis en oeuvre selon la technique dite de PS, proposée dans le cadre de la norme MPEG; - la figure 2, également décrite précédemment, illustre plus en détail le principe de décodage mis en oeuvre dans le schéma de la figure 1; - la figure 3 présente un synoptique des moyens de transformation fréquence-temps d'un décodeur selon l'invention; la figure 4 décrit plus en détail les traitements appliqués aux signaux dans le décodeur de la figure 3. Other features and advantages of the invention will appear more clearly on reading the following description of a preferred embodiment, given as a simple illustrative and nonlimiting example, and the appended drawings, among which: FIG. 1, already commented in relation to the prior art, presents a block diagram of the coding-decoding scheme implemented according to the so-called PS technique, proposed in the context of the MPEG standard; FIG. 2, also described above, illustrates in greater detail the decoding principle implemented in the diagram of FIG. 1; FIG. 3 is a block diagram of the frequency-time transformation means of a decoder according to the invention; FIG. 4 describes in more detail the processing applied to the signals in the decoder of FIG. 3.

Le principe général de l'invention repose, dans le cadre du décodage d'un signal sonore, sur la mise à zéro de la partie imaginaire de certaines composantes de spectres fréquentiels permettant de simplifier la transformation fréquence-temps ultérieure de ces spectres. The general principle of the invention lies, in the context of the decoding of a sound signal, on the setting to zero of the imaginary part of certain components of frequency spectrums making it possible to simplify the subsequent frequency-time transformation of these spectra.

On présente, en relation avec la figure 3, un mode de réalisation d'un 25 décodeur selon l'invention. In relation to FIG. 3, an embodiment of a decoder according to the invention is presented.

Comme illustré précédemment en relation avec la figure 2, la matrice inverse M-' 15 est alimentée par les spectres du signal monophonique M' (k) et du signal décorrélé D'(k), ainsi que par les paramètres de différence d'intensité, de déphasage temporel et de corrélation entre les objets sonores de la scène ild', itd' et rho'. Cette matrice inverse M-' 15 délivre en sortie deux spectres fréquentiels L'(k) et R'(k) correspondant respectivement aux canaux gauche et droite de la scène sonore, et permettant d'en restituer les effets stéréophoniques. As illustrated previously with reference to FIG. 2, the inverse matrix M-15 is fed by the spectra of the monophonic signal M '(k) and the decorrelated signal D' (k), as well as by the difference of intensity parameters. , phase shift and correlation between the sound objects of the scene ild ', itd' and rho '. This inverse matrix M- 'outputs two frequency spectra L' (k) and R '(k) respectively corresponding to the left and right channels of the sound stage, and making it possible to restore the stereophonic effects thereof.

Une pré-matrice 30, qui sera décrite plus en détail en relation avec la figure 4, traite les spectres L' (k) et R' (k) de façon à pouvoir les combiner en un spectre combiné unique C(k), auquel on applique une unique transformée fréquence-temps 31 (par exemple de type IFFT pour Inverse Fast Fourier Transform , ou en français transformée de Fourier rapide inverse ). A pre-matrix 30, which will be described in more detail in relation to FIG. 4, treats the spectra L '(k) and R' (k) so as to be able to combine them into a single combined spectrum C (k), to which a single frequency-time transform 31 (for example of the IFFT type for Inverse Fast Fourier Transform or in French Fast Fourier Transform) is applied.

Après transformation 31, le signal temporel combiné résultant c(n) est fourni en entrée d'une post-matrice 32, qui sépare, au sein de ce signal combiné c(n), les canaux temporels droite r'(n) et gauche l'(n) permettant de restituer les effets stéréophoniques de la scène sonore. After transformation 31, the resulting combined time signal c (n) is input to a post-matrix 32, which separates, within this combined signal c (n), the right temporal channels r '(n) and left the (n) allowing to restore the stereophonic effects of the sound stage.

La figure 4 décrit plus en détail les traitements appliqués dans les prématrice 30 et post-matrice 32. Figure 4 describes in more detail the treatments applied in the primer 30 and post-matrix 32.

Les spectres L' (k) et R' (k) sont par exemple organisés en blocs de 4096 raies fréquentielles. On notera que, sur la figure 4, on ne s'intéresse qu'aux raies d'indice 0 à 2048 des spectres L'(k) et R'(k), en raison des propriétés de symétrie des spectres M' (k) et D' (k). The spectra L '(k) and R' (k) are for example organized in blocks of 4096 frequency lines. It will be noted that in FIG. 4, only the lines of index 0 to 2048 of the spectra L '(k) and R' (k) are of interest because of the symmetry properties of the spectra M '(k). ) and D '(k).

Lors du pré-matriçage 30, on force à zéro les parties imaginaires des composantes L' (0), R' (0), L' (2048) et R' (2048), de façon que les spectres L' (k) et R'(k) puissent présenter une symétrie hermitienne. On peut alors construire un signal combiné C(k), à partir de ces deux spectres à symétrie hermitienne, selon l'algorithme présenté dans le bloc référencé 30 de la figure 4. On a: Pour k=0,...2047 Re(C(k)=Re(L'(k))Im(R'(k)) Im(C(k)=Im(L' (k))+Re(R' (k)) et Pour k=1,...2048 { Re(C(4096-k) =Re(L' (k))+Im(R' (k)) Im(C(4096-k)=-Im(L' (k))+Re(R' (k)) On notera que l'on ne modifie donc que les raies à très haute (k=2048) ou 5 très basse (k=0) fréquence, de sorte que, l'oreille humaine étant peu sensible à ces fréquences, la modification apportée aux signaux est inaudible. During pre-stamping 30, the imaginary parts of the components L '(0), R' (0), L '(2048) and R' (2048) are forced to zero, so that the spectra L '(k) and R '(k) may have Hermitian symmetry. We can then construct a combined signal C (k), from these two Hermitian symmetry spectra, according to the algorithm presented in the block referenced 30 of FIG. 4. We have: For k = 0, ... 2047 Re (C (k) = Re (L '(k)) Im (R' (k)) Im (C (k) = Im (L '(k)) + Re (R' (k)) and For k = 1, ... 2048 {Re (C (4096-k) = Re (L '(k)) + Im (R' (k)) Im (C (4096-k) = - Im (L '(k) ) + Re (R '(k)) Note that we only modify the lines at very high (k = 2048) or very low (k = 0) frequency, so that the human ear being not very sensitive to these frequencies, the modification made to the signals is inaudible.

La transformation fréquence-temps 31 subie par le spectre combiné C(k) est de préférence une FFT inverse d'ordre 12, qui délivre un signal temporel combiné c(n) organisé en blocs de 4096 échantillons. The frequency-time transformation 31 experienced by the combined spectrum C (k) is preferably a reverse order FFT 12, which delivers a combined time signal c (n) organized in blocks of 4096 samples.

Le post-matriçage 32 permet ensuite, à partir du signal temporel combiné c(n) de récupérer les signaux droite et gauche r'(n) et 1'(n) de la scène sonore. Le signal gauche 1'(n) est la partie réelle du signal combiné c(n) et le signal droite r' (n) est la partie imaginaire du signal combiné c(n). The post-matrixing 32 then makes it possible, from the combined time signal c (n), to recover the right and left signals r '(n) and 1' (n) of the sound scene. The left signal 1 '(n) is the real part of the combined signal c (n) and the right signal r' (n) is the imaginary part of the combined signal c (n).

L'invention permet donc de simplifier l'opération de passage du domaine fréquentiel au domaine temporel en forçant les spectres L' (k) et R' (k) à être purement réels à l'indice 0 et à fe/2 (k=N/2), où fe est la fréquence d'échantillonnage considérée. Grâce à cette annulation des composantes d'ordre 0 et N/2, les sorties des opérateurs de Fourier inverses sont purement réelles et peuvent être calculées de manière avantageuse à l'aide d'un seul opérateur 31, d'un pré-matriçage 30 et d'un post-matriçage 32. The invention therefore makes it possible to simplify the operation of switching from the frequency domain to the time domain by forcing the spectra L '(k) and R' (k) to be purely real at the index 0 and at fe / 2 (k = N / 2), where fe is the sampling frequency considered. With this cancellation of the components of order 0 and N / 2, the outputs of the inverse Fourier operators are purely real and can be calculated advantageously using a single operator 31, a pre-mastering 30 and post-mastering 32.

Les voies gauche et droite l' (n) et r' (n) ainsi obtenues sont différentes de celles obtenues selon les techniques de l'art antérieur. Cependant, la méthode proposée est parfaitement adaptée à la re-synthèse des effets stéréophoniques à partir des coefficients de localisation et décorrélation décodés, ild', itd' et rho' ainsi que l'ont montré des tests subjectifs par écoutes comparées des méthodes de l'invention d'une part (figure 3) et de l'art antérieur d'autre part (figure 2). The left and right lanes (n) and r '(n) thus obtained are different from those obtained according to the techniques of the prior art. However, the method proposed is perfectly adapted to the re-synthesis of stereophonic effects from the decoded localization and decorrelation coefficients, ild ', itd' and rho ', as shown by subjective tests by comparing the methods of the invention on the one hand (Figure 3) and the prior art on the other hand (Figure 2).

Ainsi, alors que le calcul des signaux 1'(n) et r'(n) proposé selon les techniques de l'art antérieur se faisait en prenant la partie réelle des signaux obtenus en sortie des deux opérateurs de Fourier inverses 22, 23 sur les signaux L'(k) et R'(k), la mise en oeuvre de la présente invention permet de n'utiliser qu'un seul opérateur de Fourier inverse 31 en utilisant les pré- et post-matriçages 30 et 32 de la figure 4. Thus, while the calculation of the signals 1 '(n) and r' (n) proposed according to the techniques of the prior art was done by taking the real part of the signals obtained at the output of the two inverse Fourier operators 22, 23 on the signals L '(k) and R' (k), the implementation of the present invention makes it possible to use only a single inverse Fourier operator 31 by using the pre- and post-matrixings 30 and 32 of the present invention. figure 4.

Claims

A method for decoding a signal, for reconstructing a sound scene, comprising a step of generating, from said signal, at least two frequency spectra each corresponding to a part of said sound scene, characterized in that the imaginary part of at least some components of said spectra is canceled, so as to simplify the subsequent transformation of said spectra in the time domain.

2. decoding method according to claim 1, characterized in that cancels the imaginary part of the components DC and Fe / 2 of said spectra, where Fe is a sampling frequency of said spectra.

3. decoding method according to any one of claims 1 and 2, characterized in that it also comprises a step of pre-mastering at least some of said spectra grouped in pairs, to develop a combined spectrum from each of said pairs of spectra.

4. Decoding method according to claim 3, characterized in that said pre-mastering step comprises sub-steps of: for frequency lines 0 to N / 2-1 of said combined spectrum: obtaining the real part of said spectrum combined by adding the real part of the first spectrum of said pair and the opposite of the imaginary part of the second spectrum of said pair; obtaining the imaginary part of said combined spectrum by adding the imaginary part of the first spectrum of said pair and the real part of the second spectrum of said pair; for frequency lines N / 2 to N-1 of said combined spectrum: obtaining the real part of said combined spectrum by adding the real part of the first spectrum of said pair and the imaginary part of the second spectrum of said pair by inverting the reading direction of the lines of said first and second spectra of said pair; obtaining the imaginary part of said combined spectrum by adding the opposite of the imaginary part of the first spectrum of said pair and of the real part of the second spectrum of said pair by inverting the reading direction of the lines of said first and second spectrums of said pair. said pair; where N is the number of lines of a Fourier representation of said spectra.

5. Decoding method according to any one of claims 3 and 4, characterized in that it also comprises a frequency-time transformation step of said combined spectrum, so as to obtain a combined time signal.

6. decoding method according to claim 5, characterized in that it comprises a step of post-matrixing said combined time signal, so as to obtain two time signals respectively associated with each of said spectra.

7. Decoding method according to claim 6, characterized in that said time signals are the right and left signals of a stereophonic representation of said sound scene.

8. decoding method according to any one of claims 1 to 7, characterized in that said signal for generating said frequency spectra is a monophonic signal.

9. decoding method according to claims 3 to 6, characterized in that, when N pairs of frequency spectra are generated from said signal, said frequency-time transformation and post-matrixing steps are implemented on each of said pairs. N of spectra.

10. Device for decoding a signal, for reconstructing a sound scene, comprising means for generating at least two frequency spectra from said signal, characterized in that it comprises means for canceling the part imaginary of at least some components of said spectra, so as to simplify the subsequent transformation of said spectra in the time domain.

11. The decoding device according to claim 10, characterized in that it comprises means for: pre-mastering said spectra in a combined spectrum; frequency-time transformation of said combined spectrum, so as to obtain a combined time signal; post-matrixing of said combined time signal, so as to obtain two time signals respectively associated with each of said spectra.

A computer program comprising program code instructions for performing the steps of the decoding method according to any one of claims 1 to 9 when said program is executed on a computer.