EP1994526B1

EP1994526B1 - Joint sound synthesis and spatialization

Info

Publication number: EP1994526B1
Application number: EP07731685A
Authority: EP
Inventors: Grégory PALLONE; Marc Emerit; David Virette
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 2006-03-13
Filing date: 2007-03-01
Publication date: 2009-10-28
Anticipated expiration: 2027-03-01
Also published as: PL1994526T3; US8059824B2; DE602007002993D1; EP1994526A1; ES2335246T3; US20090097663A1; ATE447224T1; JP5051782B2; JP2009530883A; WO2007104877A1

Abstract

The invention concerns a process for joint synthesis and spatialization of multiple sound sources in associated spatial positions, including: a) a step of assigning to each source at least one parameter (pi) representing an amplitude; b) a step of spatialization consisting in implementing an encoding into a plurality of channels, wherein each amplitude (pi) is duplicated to be multiplied to a specialization gain (gim), each spatialization gain being determined for one encoding channel (pgm) and for a source to be spatialized (Si); c) a step of grouping (R) the parameters multiplied by the gains (Pim), in respective channels (pg1, . . . , pgM), by applying a sum of said multiplied parameters (pim) on all the sources (Si) for each channel (pgm), and d) a step of parametric synthesis (SYNTH(I), . . . , SYNTH(M)) applied to each of the channels (pgm).

Description

La présente invention concerne un traitement audio et, plus particulièrement, une spatialisation tridimensionnelle de sources sonores synthétiques.The present invention relates to audio processing and, more particularly, to three-dimensional spatialization of synthetic sound sources.

Actuellement, la spatialisation d'une source sonore synthétique est réalisée souvent sans tenir compte du mode de production du son, c'est-à-dire de la manière même dont est synthétisé le son. Ainsi, de nombreux modèles, notamment paramétriques, ont été proposés pour la synthèse. Parallèlement, de nombreuses techniques de spatialisation ont été aussi proposées, sans toutefois proposer un recoupement avec la technique choisie pour une synthèse.Currently, the spatialization of a synthetic sound source is often performed without taking into account the sound production mode, that is to say, the way in which the sound is synthesized. Thus, many models, including parametric, have been proposed for synthesis. At the same time, many spatialization techniques have also been proposed, without proposing a cross-check with the technique chosen for a synthesis.

On connaît, parmi les techniques de synthèse, les méthodes dites "non paramétriques". Aucun paramètre particulier n'est utilisé a priori pour modifier des échantillons précédemment stockés en mémoire. Le représentant le plus connu de ces méthodes est la synthèse par table d'onde classique.Among the synthetic techniques, the so-called "non- parametric" methods are known . No particular parameter is used a priori to modify samples previously stored in memory. The best-known representative of these methods is classical wave table synthesis.

A ce type de technique s'opposent les méthodes de synthèse "paramétrique" qui reposent sur l'utilisation d'un modèle permettant de manipuler un nombre réduit de paramètres, comparé au nombre d'échantillons de signaux produits au sens des méthodes non paramétriques. Les techniques de synthèse paramétriques reposent typiquement sur des modèles additifs, soustractifs, source/filtre ou non-linéaires.This type of technique is opposed by " parametric " synthesis methods, which rely on the use of a model that makes it possible to manipulate a small number of parameters, compared to the number of signal samples produced in the sense of the non-parametric methods. Parametric synthesis techniques typically rely on additive, subtractive, source / filter or non-linear models.

Parmi ces méthodes paramétriques, on qualifie de "mutuelles" celles qui permettent de manipuler en commun des paramètres correspondant à des sources sonores différentes, pour n'utiliser alors qu'un seul processus de synthèse, néanmoins pour la totalité des sources. Dans les méthodes dites "sinusoïdales", typiquement, on construit un spectre en fréquence à partir des paramètres tels que l'amplitude et la fréquence de chaque composante partielle du spectre sonore global des sources. En effet, une implémentation par transformée de Fourier inverse, suivie d'une addition/recouvrement, assure une synthèse extrêmement efficace de plusieurs sources sonores simultanément.Among these parametric methods, those which make it possible to manipulate parameters corresponding to different sound sources, so as to use only one synthesis process, nevertheless for all sources, are called " mutuals ". In so-called " sinusoidal " methods , typically, a frequency spectrum is constructed from parameters such as the amplitude and frequency of each component. partial of the overall sound spectrum of the sources. Indeed, an implementation by inverse Fourier transform, followed by an addition / overlap, ensures an extremely efficient synthesis of several sound sources simultaneously.

Pour ce qui concerne la spatialisation de sources sonores, différentes techniques sont connues actuellement. Certaines techniques (comme le "transaural" ou le "binaural") se basent sur la prise en compte de fonctions de transfert HRTFs (pour "Head Related Transfer Function") représentant la perturbation d'ondes acoustiques par la morphologie d'un individu, ces fonctions HRTFs étant propres à cet individu. La restitution sonore s'effectue de façon adaptée aux HRTFs de l'auditeur, typiquement sur deux haut-parleurs distants ("transaural") ou à partir des deux oreillettes d'un casque ("binaural"). D'autres techniques (par exemple l'"ambiophonique" ou le "multicanal" (5.1 à 10.1 ou plus) prévoient plutôt une restitution sur plus de deux haut-parleurs.Concerning the spatialization of sound sources, different techniques are currently known. Some techniques (such as " transaural " or " binaural ") are based on the consideration of HRTFs (" Head Related Transfer Function ") transfer functions representing the disturbance of acoustic waves by the morphology of an individual, these HRTFs functions being specific to this individual. The sound reproduction is done in a manner adapted to the HRTFs of the listener, typically on two remote speakers (" transaural ") or from the two earpieces of a headset (" binaural ") . Other techniques (for example " surround " or "multichannel" (5.1 to 10.1 or more) rather provide a restitution on more than two speakers.

Plus précisément, certaines techniques à base de HRTFs utilisent la séparation des variables "fréquence" et "position" des HRTFs, donnant ainsi un ensemble de p filtres de base (correspondant aux p premières valeurs propres de la matrice de covariance des HRTFs dont les variables statistiques sont les fréquences), ces filtres étant pondérés par des fonctions spatiales (obtenues par projection des HRTFs sur des filtres de base). Les fonctions spatiales peuvent ensuite être interpolées, comme décrit dans le document US-5,500,900 .More specifically, some HRTFs techniques use the separation of the frequency and position variables of the HRTFs, thus giving a set of p basic filters (corresponding to the first p eigenvalues of the covariance matrix of the HRTFs whose variables statistics are the frequencies), these filters being weighted by spatial functions (obtained by projection of the HRTFs on basic filters). The spatial functions can then be interpolated, as described in the document US 5500900 .

La spatialisation de nombreuses sources sonores peut être réalisée grâce à une implémentation multicanale appliquée au signal de chacune des sources sonores. Les gains des canaux de spatialisation sont appliqués directement aux échantillons sonores du signal, souvent décrits dans le domaine temporel (mais possiblement aussi dans le domaine fréquentiel). Ces échantillons sonores sont traités par un algorithme de spatialisation (avec application de gains qui sont fonction de la position désirée), et ce, indépendamment de l'origine de ces échantillons. Ainsi, la spatialisation proposée pourrait s'appliquer aussi bien à des sons naturels qu'à des sons synthétiques.Spatialization of many sound sources can be achieved through a multichannel implementation applied to the signal of each of the sound sources. The gains of the spatialization channels are applied directly to the sound samples of the signal, often described in the time domain (but possibly also in the frequency domain). These samples sound are treated by a spatialization algorithm (with application of gains that depend on the desired position), regardless of the origin of these samples. Thus, the proposed spatialization could apply to both natural and synthetic sounds.

D'une part, chaque source sonore doit être synthétisée indépendamment (avec obtention d'un signal temporel ou fréquentiel), afin de pouvoir appliquer ensuite des gains de spatialisation indépendants. Pour N sources sonores, il est donc nécessaire de réaliser N calculs de synthèse.
D'autre part, l'application des gains à des échantillons sonores, qu'ils soient issus du domaine temporel ou fréquentiel, nécessite au minimum autant de multiplications qu'il y a d'échantillons. Pour un bloc de Q échantillons, il est donc nécessaire d'appliquer au moins N.M.Q gains, M étant le nombre de canaux intermédiaires (canaux ambiophoniques par exemple) et N étant le nombre de sources.
Ainsi, cette technique nécessite un coût de calcul élevé dans le cas de la spatialisation de nombreuses sources sonores.On the one hand, each sound source must be synthesized independently (with a temporal or frequency signal), in order to then be able to apply independent spatialization gains. For N sound sources, it is therefore necessary to perform N synthesis calculations.
On the other hand, the application of the gains to sound samples, whether they come from the time or frequency domain, requires at least as many multiplications as there are samples. For a block of Q samples, it is therefore necessary to apply at least NMQ gains, M being the number of intermediate channels (surround channels for example) and N being the number of sources.
Thus, this technique requires a high calculation cost in the case of the spatialization of many sound sources.

Parmi les techniques ambiophoniques, la méthode dite des "haut-parleurs virtuels" permet d'encoder les signaux à spatialiser en leur appliquant en particulier des gains, le décodage étant réalisé par convolution des signaux encodés par des filtres pré-calculés (Jérôme Daniel, "Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia", Thèse de doctorat, 2000).Among the ambiophonic techniques, the so-called " virtual loudspeakers " method makes it possible to encode the signals to be spatialized by applying them in particular gains, the decoding being done by convolution of the signals encoded by pre-calculated filters (Jérôme Daniel, " Representation of acoustic fields, application to the transmission and reproduction of complex sound scenes in a multimedia context " , PhD Thesis, 2000).

Une technique très prometteuse, combinant synthèse et spatialisation, a été présentée dans le document WO-05/069272 .A very promising technique, combining synthesis and spatialization, was presented in the document WO-05/069272 .

Elle consiste à déterminer des amplitudes à affecter à des signaux représentant des sources sonores pour définir, à la fois, l'intensité sonore (par exemple un "volume") d'une source à synthétiser et un gain de spatialisation de cette source. Ce document divulgue notamment une spatialisation binaurale avec prise en compte des retards et des gains (ou "fonctions spatiales") et, en particulier, un mixage des sources synthétisées dans la partie encodage de spatialisation.It consists in determining amplitudes to be assigned to signals representing sound sources to define, at the same time, the loudness (by example a " volume ") of a source to synthesize and a gain of spatialization of this source. This document notably discloses a binaural spatialization taking into account delays and gains (or " spatial functions ") and, in particular, a mixing of the sources synthesized in the encoding part of spatialization.

Plus particulièrement encore, un exemple de réalisation qui est visé dans ce document WO-05/069272 et dans lequel les sources sont synthétisées en associant des amplitudes à des fréquences constitutives d'un "timbre sonore" (par exemple une fréquence fondamentale et ses harmoniques), prévoit de regrouper par fréquences identiques des signaux de synthèse, en vue d'une spatialisation ultérieure opérant sur les fréquences.More particularly, an example embodiment that is referred to in this document WO-05/069272 and in which the sources are synthesized by associating amplitudes with frequencies constituting a " sound timbre " (for example a fundamental frequency and its harmonics), provides for grouping by identical frequencies synthesis signals, with a view to spatialization subsequent operating on the frequencies.

Cet exemple de réalisation est illustré sur la figure 1. Dans un bloc de synthèse SYNTH (représenté en traits pointillés), on affecte à des fréquences f₀, f₁, f₂, ..., f_p de chaque source à synthétiser S₁, ..., S_N des amplitudes respectives a₀ ¹, a₁ ¹, ..., a_p ¹, ..., a_i ^j, ..., a₀ ^N, a₁ ^N, ..., a_p ^N, où, dans la notation générale a_i ^j, j est un indice de source compris entre 1 et N et i est un indice de fréquence compris entre 0 et p. Bien entendu, certaines amplitudes d'un jeu a₀ ^j, a₁ ^j, ..., a_p ^j à affecter à une même source j peuvent être nulles si les fréquences correspondantes ne sont pas représentées dans le timbre sonore de cette source j.
Les amplitudes a_i ¹, ..., a_i ^N relatives à chaque fréquence f_i sont regroupées ("mixées") pour être appliquées, fréquence par fréquence, au bloc de spatialisation SPAT pour un encodage opérant sur les fréquences (en binaural par exemple, en prévoyant alors un retard interaural à appliquer à chaque source). Les signaux des canaux c₁, ..., c_k, issus du bloc de spatialisation SPAT, sont ensuite destinés à être transmis à travers un ou plusieurs réseaux, ou encore stockés, ou autres, en vue d'une restitution ultérieure (précédée le cas échéant d'un décodage de spatialisation adapté).This exemplary embodiment is illustrated on the figure 1 . In a synthesis block SYNTH (represented in dashed lines), at frequencies f ₀ , f ₁ , f ₂ ,..., F _p of each source to be synthesized S ₁ ,..., S _N, respective amplitudes are assigned. a ₀ ¹ , a ₁ ¹ , ..., a _p ¹ , ..., a _i ^j , ..., a ₀ ^N , a ₁ ^N , ..., a _p ^N , where in the general notation a _i ^j , j is a source index between 1 and N and i is a frequency index between 0 and p. Of course, certain amplitudes of a set at ₀ ^j , a ₁ ^j , ..., a _p ^j to be assigned to a same source j can be zero if the corresponding frequencies are not represented in the sound signal of this source j .
The amplitudes a _i ¹ , ..., a _i ^N relative to each frequency f _i are grouped (" mixed ") to be applied, frequency by frequency, to the SPAT spatialization block for a frequency-based encoding (in binaural by example, then providing an interaural delay to apply to each source). The signals of the channels c ₁ ,..., C _k , originating from the spatialization block SPAT, are then intended to be transmitted through one or more networks, or else stored, or other, for the purpose of a subsequent restitution (preceded by where appropriate, a suitable spatialization decoding).

Cette technique, quoique très prometteuse, mérite encore quelques optimisations.This technique, although very promising, still deserves some optimizations.

De manière générale, les procédés actuels requièrent des puissances de calcul notables pour spatialiser de nombreuses sources sonores synthétisées.In general, current methods require significant computing power to spatialize many synthesized sound sources.

La présente invention vient améliorer la situation.The present invention improves the situation.

elle propose à cet effet un procédé pour synthétiser et spatialiser conjointement une pluralité de sources sonores dans des positions associées de l'espace, le procédé comportant :

a) une étape d'affectation à chaque source d'au moins un paramètre de synthèse p_i, représentatif d'une amplitude d'au moins une composante fréquentielle de la source,
b) une étape de spatialisation mettant en oeuvre un encodage en une pluralité de canaux, dans laquelle on duplique chaque paramètre d'amplitude pour le multiplier par un gain de spatialisation, chaque gain de spatialisation étant déterminé, d'une part, pour un canal d'encodage et, d'autre part, pour une source à spatialiser,
c) une étape de regroupement des paramètres multipliés par les gains, dans des canaux respectifs, en appliquant une somme desdits paramètres multipliés sur toutes les sources pour chaque canal, et
d) une étape de synthèse paramétrique appliquée à chacun des canaux.

it proposes for this purpose a method for synthesizing and spatially co-ordinating a plurality of sound sources in associated positions of the space, the method comprising:

a) a step of assigning to each source at least one synthesis parameter p _i , representative of an amplitude of at least one frequency component of the source,
b) a spatialization step implementing an encoding in a plurality of channels, in which each amplitude parameter is duplicated to multiply it by a spatialization gain, each spatialization gain being determined, on the one hand, for a channel encoding and, on the other hand, for a source to be spatialised,
c) a step of grouping the parameters multiplied by the gains, in respective channels, by applying a sum of said multiplied parameters on all the sources for each channel, and
d) a parametric synthesis step applied to each of the channels.

Il est aussi proposé un programme d'ordinateur selon la revendication 6 et un module selon la revendication 7.There is also provided a computer program according to claim 6 and a module according to claim 7.

Ainsi, la présente invention propose à cet effet d'appliquer d'abord un encodage en spatialisation, puis une "pseudo-synthèse", le terme "pseudo" visant le fait que la synthèse s'applique en particulier aux paramètres encodés, issus de la spatialisation et non à des signaux sonores synthétiques habituels.Thus, the present invention proposes for this purpose to first apply a spatialization encoding, then a " pseudo-synthesis " , the term " pseudo " aiming at the fact that the synthesis applies in particular to the encoded parameters, derived from spatialization and not to usual synthetic sound signals.

En effet, une particularité que propose l'invention est l'encodage spatial de quelques paramètres de synthèse, plutôt que de réaliser un encodage spatial des signaux correspondant directement aux sources. Cet encodage spatial s'applique plus particulièrement à des paramètres de synthèse qui sont représentatifs d'une amplitude et il consiste avantageusement à appliquer à ces quelques paramètres de synthèse des gains de spatialisation qui sont calculés en fonction de positions souhaitées respectives des sources. On comprendra ainsi que les paramètres multipliés par les gains à l'étape b) et regroupés à l'étape c) ne sont pas réellement des signaux sonores, comme au sens de l'art antérieur général décrit ci-avant.Indeed, a feature that the invention proposes is the spatial encoding of some synthesis parameters, rather than performing a spatial encoding of the signals corresponding directly to the sources. This spatial encoding applies more particularly to synthesis parameters which are representative of an amplitude and it advantageously consists in applying to these few synthesis parameters spatialization gains which are calculated as a function of respective desired positions of the sources. It will thus be understood that the parameters multiplied by the gains in step b) and grouped in step c) are not really sound signals, as in the general prior art described above.

La présente invention utilise alors une synthèse paramétrique mutuelle où l'un des paramètres possède la dimension d'une amplitude. Contrairement aux techniques de l'art antérieur, elle tire ainsi partie des avantages d'une telle synthèse pour effectuer la spatialisation. La combinaison des jeux de paramètres de synthèse obtenus pour chacune des sources permet avantageusement de contrôler globalement les blocs encodés de synthèse paramétrique mutuelle.The present invention then uses a mutual parametric synthesis where one of the parameters has the dimension of an amplitude. Unlike techniques of the prior art, it thus takes advantage of the advantages of such a synthesis to perform the spatialization. The combination of synthesis parameter sets obtained for each of the sources advantageously makes it possible to globally control the encoded blocks of mutual parametric synthesis.

La présente invention permet alors de spatialiser simultanément et indépendamment de nombreuses sources sonores synthétisées à partir d'un modèle de synthèse paramétrique, les gains de spatialisation étant appliqués aux paramètres de synthèse plutôt qu'aux échantillons du domaine temporel ou fréquentiel. Cette réalisation assure alors une économie substantielle de la puissance de calcul requise car elle implique un faible coût de calcul.The present invention then makes it possible to spatialize simultaneously and independently of numerous synthesized sound sources from a parametric synthesis model, the spatialization gains being applied to the synthesis parameters rather than to the samples of the time or frequency domain. This embodiment thus ensures a substantial saving of the computing power required because it implies a low calculation cost.

Selon l'un des avantages que procure l'invention, comme le nombre d'étapes dans la synthèse est rendu indépendant par rapport au nombre de sources, une seule synthèse par canal intermédiaire peut être appliquée. Quel que soit le nombre de sources sonores, seul un nombre constant M de calculs de synthèse est prévu. Typiquement, dès lors que le nombre de sources N devient plus grand que le nombre M de canaux intermédiaires, la technique au sens de l'invention nécessite moins de calculs que les techniques habituelles au sens de l'art antérieur. Par exemple, à l'ordre ambiophonique 1 et en deux dimensions (soit trois canaux intermédiaires), l'invention permet déjà un gain de calcul pour seulement quatre sources à spatialiser.According to one of the advantages provided by the invention, since the number of steps in the synthesis is made independent with respect to the number of sources, only one intermediate channel synthesis can be applied. Regardless of the number of sound sources, only a constant number M of synthesis calculations is planned. Typically, since the number of sources N becomes larger than the number M of intermediate channels, the technique in the sense of the invention requires fewer calculations than the usual techniques in the sense of the prior art. For example, at the surround order 1 and in two dimensions (ie three intermediate channels), the invention already allows a calculation gain for only four sources to spatialize.

La présente invention permet aussi de diminuer le nombre de gains à appliquer. En effet, les gains sont appliqués aux paramètres de synthèse et non aux échantillons sonores. La mise à jour des paramètres tels que le volume étant généralement moins fréquente que la fréquence d'échantillonnage d'un signal, une économie de calcul est ainsi réalisée. Par exemple, pour une fréquence de mise à jour de paramètres (tel que le volume notamment) de 200Hz, on réalise une économie de multiplications substantielle pour une fréquence d'échantillonnage du signal de 44100Hz (selon un rapport d'environ 200).The present invention also makes it possible to reduce the number of gains to be applied. Indeed, the gains are applied to the synthesis parameters and not to the sound samples. Updating parameters such as the volume is generally less frequent than the sampling frequency of a signal, a calculation economy is thus achieved. For example, for a parameter update frequency (such as the volume in particular) of 200 Hz, a substantial saving in multiplication is achieved for a signal sampling frequency of 44100 Hz (in a ratio of about 200).

Les champs d'application de la présente invention peuvent concerner aussi bien le domaine musical (notamment les sonneries polyphoniques de mobiles), le domaine du multimédia (notamment les sonorisations de jeux vidéo), le domaine de la réalité virtuelle (rendu de scènes sonores), les simulateurs (synthèse de bruits moteurs), ou autres.The fields of application of the present invention may concern both the musical field (including polyphonic ringtones of mobiles), the field of multimedia (including video game sound systems), the field of virtual reality (rendering of sound scenes). , simulators (synthesis of engine noise), or others.

D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ci-après, et des dessins annexés sur lesquels, outre la figure 1 relative à l'art antérieur et décrite ci avant :

la figure 2 illustre le traitement général de spatialisation et synthèse prévu dans un procédé au sens de l'invention,
la figure 3 illustre un traitement des signaux spatialisés et synthétisés, pour un décodage spatial en vue d'une restitution,
la figure 4 illustre un mode de réalisation particulier dans lequel on affecte plusieurs paramètres d'amplitude à chaque source, chaque paramètre étant associé à une composante fréquentielle,
la figure 5 illustre les étapes d'un procédé au sens de l'invention, et peut correspondre à un organigramme d'un programme d'ordinateur pour la mise en oeuvre de l'invention.

Other features and advantages of the invention will appear on examining the detailed description below, and the accompanying drawings in which, in addition to the figure 1 relating to the prior art and described above:

the figure 2 illustrates the general treatment of spatialisation and synthesis provided for in a process within the meaning of the invention,
the figure 3 illustrates a processing of the spatialized and synthesized signals, for spatial decoding with a view to restitution,
the figure 4 illustrates a particular embodiment in which several amplitude parameters are assigned to each source, each parameter being associated with a frequency component,
the figure 5 illustrates the steps of a method in the sense of the invention, and may correspond to a flowchart of a computer program for the implementation of the invention.

En référence à la figure 2, on affecte au moins un paramètre p_i, représentatif d'une amplitude, à une source S_i parmi une pluralité de sources S₁, ..., S_N à synthétiser et spatialiser (i étant compris entre 1 et N). On duplique chaque paramètre p_i en autant de canaux de spatialisation prévus dans le bloc de spatialisation SPAT. Dans l'exemple représenté où l'on prévoit M canaux d'encodage pour la spatialisation, on duplique M fois chaque paramètre p_i pour appliquer des gains de spatialisation respectifs g_i ¹, g_i ^M (i étant, pour rappel, un indice de source S_i).
On obtient alors N.M paramètres multipliés chacun par un gain : p₁g₁ ¹, ..., P₁g₁ ^M, ..., p_ig_i ¹, ..., p_ig_i ^M, ..., p_Ng_N ¹, ..., p_Ng_N ^M.
On regroupe ensuite ces paramètres multipliés (référence R de la figure 2) par canaux de spatialisation (M canaux en tout), soit :

p₁g₁ ¹, ..., p_ig_i ¹, ..., p_Ng_N ¹ regroupés dans un premier canal de spatialisation p_g ¹,
et ce, jusqu'à :

p₁g₁ ^M, ..., p_ig_i ^M, ..., p_Ng_N ^M regroupés dans un M^ième canal de spatialisation p_g ^M,

la lettre g de l'indice désignant le terme "global". With reference to the figure 2 at least one parameter p _i , representative of an amplitude, is assigned to a source S _{i out} of a plurality of sources S ₁ , ..., S _N to be synthesized and spatialized (i being between 1 and N). Each parameter p _i is duplicated in as many spatialization channels provided in the spatialization block SPAT. In the example shown, where M encoding channels are provided for the spatialization, each parameter p _i is duplicated M to apply respective spatialization gains g _i ¹ , g _i ^M (i being, as a reminder, a subscript from source S _i ).
We then obtain NM parameters each multiplied by a gain: p ₁ g ₁ ¹ , ..., P ₁ g ₁ ^M , ..., p _i g _i ¹ , ..., p _i g _i ^M , ... , p _N g _N ¹ , ..., p _N g _N ^M.
We then regroup these multiplied parameters (reference R of the figure 2 ) by spatialization channels (M channels in all), that is:

p ₁ g ₁ ¹ , ..., p _i g _i ¹ , ..., p _N g _N ¹ grouped in a first spatialization channel p _g ¹ ,
until:

p ₁ g ₁ ^M , ..., p _i g _i ^M , ..., p _N g _N ^M grouped in an M ^th spatialization channel p _g ^M ,

the letter g of the index designating the term " global " .

Ainsi, de nouveaux paramètres p_i ^m (i variant de 1 à N et m variant de 1 à M) sont calculés par multiplication des paramètres p_i par les gains d'encodage g_i ^m, obtenus à partir de la position de chacune des sources. Les paramètres p_i ^m sont combinés (par sommation dans l'exemple décrit) afin de fournir les paramètres p_g ^m qui alimentent M blocs de synthèse paramétrique mutuelle. Ces M blocs (référencés SYNTH(1) à SYNTH(M) sur la figure 2) sont constitutifs du module de synthèse SYNTH, lequel délivre M signaux temporels ou fréquentiels ss^m (m variant de 1 à M), obtenus par synthèse à partir des paramètres p_g ^m. Ces signaux ss^m peuvent ensuite alimenter un bloc classique de décodage spatial, comme on le verra plus loin en référence à la figure 3.Thus, new parameters p _i ^m (i varying from 1 to N and m varying from 1 to M) are calculated by multiplying the parameters p _i by the encoding gains g _i ^m , obtained from the position of each of the sources. The parameters p _i ^m are combined (by summation in the example described) to provide the parameters p _g ^m which feed M mutual parametric synthesis blocks. These M blocks (referenced SYNTH (1) to SYNTH (M) on the figure 2 ) are constitutive SYNTH synthesis module, which delivers M time signals or frequency ss ^m (m ranging from 1 to M), obtained by synthesis from the parameters p _g ^m . These signals ss ^m can then feed a conventional block of spatial decoding, as will be seen later with reference to the figure 3 .

Dans un mode de réalisation particulier, la synthèse utilisée est une synthèse additive avec application d'une transformée de Fourier inverse (IFFT).In a particular embodiment, the synthesis used is an additive synthesis with application of an inverse Fourier transform (IFFT).

A cet effet, un ensemble de N sources est caractérisé par une pluralité de paramètres p_i,k représentant l'amplitude dans le domaine fréquentiel de la k^ième composante fréquentielle pour la i^ième source S_i.
Le signal temporel s_i(n) qui correspondrait à cette source S_i, s'il était synthétisé indépendamment des autres sources, serait donné par : $s_{i} (n) = Σ_{k = 1}^{K} c_{i, k} (n), {avec c}_{i, k} (n) = p_{i, k} (n) \cos [2 {πf}_{i, k} (n) n / F_{e} + ϕ_{i, k} (n)]$

où p_i,k est l'amplitude de la composante de fréquence f_i,k et dont la phase est donnée par ϕ_i,k pour la source S_i, à l'instant n.
Il est possible de réaliser la synthèse additive dans le domaine fréquentiel à partir des seuls paramètres p_i,k, f_i,k et ϕ_i,k donnés, en utilisant par exemple la technique exposée dans le document FR-2 679 689 .
Le paramètre p_i,k représente l'amplitude d'une composante fréquentielle k donnée pour une source S_i donnée. On en déduit donc les paramètres p^m _i,k pour chaque source et chacun des M canaux grâce à la relation :

{p^{m}}_{i, k} = {g^{m}}_{i -} p_{i, k},

m variant de 1 à M. Les gains g^m _i sont prédéterminés pour une position désirée pour la source S_i et en fonction de l'encodage de spatialisation choisi.For this purpose, a set of N sources is characterized by a plurality of parameters p _{i, k} representing the amplitude in the frequency domain of the k ^th frequency component for the i ^th source S _i .
The time signal s _i (n) which corresponds to this source S _i , if it were synthesized independently of the other sources, would be given by:

s_{i} (not) = Σ_{k = 1}^{K} {vs}_{i, k} (not), {with c}_{i, k} (not) = p_{i, k} (not) \cos [2 {πf}_{i, k} (not) not / F_{e} + φ_{i, k} (not)]

where p _{i, k} is the amplitude of the frequency component f _{i, k} and whose phase is given by φ _{i, k} for the source S _i , at time n.
It is possible to carry out the additive synthesis in the frequency domain from the only given parameters p _{i, k} , f _{i, k} and φ _{i, k} , using for example the technique described in the document FR-2,679,689 .
The parameter p _{i, k} represents the amplitude of a given frequency component k for a given source S _i . We thus deduce the parameters p ^m _{i, k} for each source and each of the M channels thanks to the relation:

{p^{m}}_{i, k} = {boy Wut}^{m}_{i -} p_{i, k},

m varies from 1 to M. The gains g ^m _i are predetermined for a desired position for the source S _i and according to the chosen spatialization encoding.

Dans le cas d'un encodage ambiophonique par exemple, ces gains correspondent aux harmoniques sphériques et peuvent s'écrire g^m _i = Y_m(θ_i,δ_i), où :

Y_m est un harmonique sphérique d'ordre m,
θ_i et δ_i sont respectivement l'azimut et le site souhaités pour la source S_i.

In the case of an ambiophonic encoding for example, these gains correspond to spherical harmonics and can be written g ^m _i = Y _m (θ _i , δ _i ), where:

Y _m is a spherical harmonic of order m,
θ _i and δ _i are respectively the azimuth and the desired site for the source S _i .

Les paramètres p^m _i,k sont ensuite combinés fréquence par fréquence, de manière à obtenir un seul paramètre global :

${p^{m}}_{g, kʹ} = {\sum_{i = 1}^{N} {p^{m}}_{i, k}}_{},$
, où k' décrit toutes les fréquences f_i,k présentes dans toutes les sources S_i.

En pratique, la valeur de k' est inférieure à k.i car des fréquences communes peuvent caractériser plusieurs sources à la fois. Dans une réalisation, il peut être prévu d'associer un même jeu global de fréquences à toutes les sources, quitte à ce que certains paramètres d'amplitude pour certaines fréquences de sources soient nuls.
Dans ce cas, les valeurs de k et k' sont égales et la relation précédente s'écrit simplement:

{p^{m}}_{g, k} = {\sum_{i = 1}^{N} {p^{m}}_{i, k}}_{} .

L'étape de synthèse consiste à utiliser ces paramètres p^m _g,k (m variant de 1 à M) pour synthétiser chacun des M spectres en fréquence ss^m(ω) issus du module de synthèse SYNTH. Il peut être prévu à cet effet d'appliquer la technique décrite dans FR-2 679 689 , en ajoutant itérativement des enveloppes spectrales correspondant à la transformée de Fourier d'une fenêtre temporelle (par exemple de Hanning), ces enveloppes spectrales étant précédemment échantillonnées, tabulées, centrées aux fréquences f_k et pondérées alors par p^m _g,k, ce qui s'écrit :

${SS}^{m} (ω) = \sum_{k = 1}^{K} {p^{m}}_{g, k} \cdot {env}_{k} (ω),$
où env_k(ω) est l'enveloppe spectrale centrée à la fréquence f_k.

The parameters p ^m _{i, k} are then combined frequency by frequency, so as to obtain a single global parameter:

${p^{m}}_{boy Wut, k '} = {Σ_{i = 1}^{NOT} {p^{m}}_{i, k}}_{},$
, where k 'describes all the frequencies f _{i, k} present in all the sources S _i .

In practice, the value of k 'is less than ki because common frequencies can characterize several sources at once. In one embodiment, it may be provided to associate the same global set of frequencies to all sources, even if certain amplitude parameters for certain source frequencies are zero.
In this case, the values of k and k 'are equal and the preceding relation is written simply:

{p^{m}}_{boy Wut, k} = {Σ_{i = 1}^{NOT} {p^{m}}_{i, k}}_{} .

The synthesis step consists of using these parameters p ^m _{g, k} (m varying from 1 to M) to synthesize each of the M frequency spectra ss ^m (ω) from the SYNTH synthesis module. It may be provided for this purpose to apply the technique described in FR-2,679,689 , iteratively adding spectral envelopes corresponding to the Fourier transform of a time window (for example Hanning), these spectral envelopes being previously sampled, tabulated, centered at frequencies f _k and then weighted by p ^m _{g, k} , ce which is written:

${SS}^{m} (ω) = Σ_{k = 1}^{K} {p^{m}}_{boy Wut, k} \cdot {ca.}_{k} (ω),$
where env _k (ω) is the spectral envelope centered at the frequency f _k .

Cette réalisation est illustrée sur la figure 4. On affecte K paramètres d'amplitude p_i,k à chaque source S_i. L'indice i, de source, est compris entre 1 et N. L'indice k, de fréquence, est compris entre 1 et K. Pour chaque source S_i, on duplique ces K paramètres, M fois, pour être multiplié chacun par un gain de spatialisation g_i ^m. L'indice m, de canal d'encodage de spatialisation, est compris entre 1 et M.This achievement is illustrated on the figure 4 . We assign K amplitude parameters p _{i, k} to each source S _i . The index i, of source, is between 1 and N. The index k, of frequency, is between 1 and K. For each source S _i , these K parameters are duplicated, M times, to be multiplied each by a gain of spatialization g _i ^m . The index m, of spatialization encoding channel, is between 1 and M.

Dans chaque canal m, on regroupe, fréquence par fréquence, les K résultats des produits g_i ^m.p_i,k, selon l'expression donnée ci-avant : ${p^{m}}_{g, k} = \sum_{i = 1}^{N} {p^{m}}_{i, k}, avec {p^{m}}_{i, k} = {g^{m}}_{i} \cdot p_{i, k},$

où k varie de 1 à K dans chaque canal m, et m varie globalement de 1 à M.
On comprendra ainsi que dans chaque canal m, il est prévu des sous-canaux p^m _g,k associés chacun à une composante fréquentielle k, l'indice g désignant, pour rappel, le terme "global".
Le traitement se poursuit alors en multipliant le paramètre global de chaque sous-canal p^m _g,k associé à une fréquence f_k par une enveloppe spectrale env_k(ω) centrée en cette fréquence f_k, et ce, pour tous les K sous-canaux (k compris entre 1 et K), et globalement, pour tous les M canaux (m étant compris entre 1 et M). Ensuite, les K sous-canaux sont sommés dans chaque canal m, conformément à la relation ci-après :

${SS}^{m} (ω) = \sum_{k = 1}^{K} {p^{m}}_{g, k \cdot} {env}_{k} (ω),$
pour m allant de 1 à M canaux au total. On obtient alors les signaux ss^m(ω) encodés pour leur spatialisation et synthétisés au sens de l'invention. Ils sont exprimés dans le domaine fréquentiel.

In each channel m, the frequency of the K results of the products g _i ^m .p _{i, k} is grouped, frequency by frequency, according to the expression given above:

{p^{m}}_{boy Wut, k} = Σ_{i = 1}^{NOT} {p^{m}}_{i, k}, with {p^{m}}_{i, k} = {boy Wut}^{m}_{i} \cdot p_{i, k},

where k varies from 1 to K in each channel m, and m varies globally from 1 to M.
It will thus be understood that in each channel m, there are sub-channels p ^m _{g, k} each associated with a frequency component k, the index g designating, as a reminder, the term " global " .
The processing then continues by multiplying the global parameter of each subchannel p ^m _{g, k} associated with a frequency f _k by a spectral envelope env _k (ω) centered at this frequency f _k , and this, for all the K under -channels (k between 1 and K), and globally, for all M channels (m being between 1 and M). Then the sub-channels K are summed in each channel m, according to the following relation:

${SS}^{m} (ω) = Σ_{k = 1}^{K} {p^{m}}_{boy Wut, k \cdot} {ca.}_{k} (ω),$
for m ranging from 1 to M channels in total. The signals ss ^m (ω) encoded for their spatialization and synthesized in the sense of the invention are then obtained. They are expressed in the frequency domain.

Pour ramener ces M signaux dans le domaine temporel (notés alors SS^m(n)), on peut leur appliquer ensuite une transformée de Fourier inverse (IFFT) : ${SS}^{m} (n) = IFFT ({ss}^{m} (ω))$

Le traitement par trames successives peut être réalisé par une technique classique d'addition/recouvrement.To bring back these M signals in the time domain (then noted SS ^m (n)), we can then apply them an inverse Fourier transform (IFFT):

{SS}^{m} (not) = IFFT ({ss}^{m} (ω))

The treatment by successive frames can be achieved by a conventional technique of addition / recovery.

Chacun des M signaux temporels SS^m(n) peut ensuite être fourni à un bloc de décodage de spatialisation.Each of the M time signals SS ^m (n) can then be supplied to a spatialization decoding block.

A cet effet, il peut être prévu par exemple une paire de filtres adaptés Fg^m(n), Fd^m(n) à appliquer, par convolution, à chaque signal SS^m(n), comme représenté sur la figure 3, pour une adaptation d'un encodage ambiophonique vers une restitution en binaural à deux voies gauche et droite. Ces filtres pour une telle transition ambiophonique/binaural peuvent être obtenus par application de la technique des haut-parleurs virtuels citée ci-avant.For this purpose, it is possible, for example, to provide a pair of matched filters Fg ^m (n), Fd ^m (n) to convolutionally apply to each signal SS ^m (n), as shown in FIG. figure 3 , for an adaptation of an ambiophonic encoding to a left and right two-way binaural rendition. These filters for such an ambiophonic / binaural transition can be obtained by applying the virtual speaker technique mentioned above.

Le traitement réalisé par le bloc DECOD de décodage spatial de la figure 3 peut être du type : ${S S^{m}}_{g} (n) = (S S^{m} * {Fg}^{m}) (n)$

{S S^{m}}_{d} (n) = (S S^{m} * {Fd}^{m}) (n)

Après filtrage, tous les signaux destinés aux oreilles gauche et droite sont sommés respectivement, et on obtient ainsi une paire de signaux binauraux :

S_{g} (n) = \sum_{m = 1}^{M} S S_{g}^{m} (n)

S_{d} (n) = \sum_{m = 1}^{M} S S_{d}^{m} (n)

qui viennent alimenter alors les écouteurs d'un casque à deux oreillettes.The processing carried out by the DECOD block of spatial decoding of the figure 3 can be of type:

{S S^{m}}_{boy Wut} (not) = (S S^{m} * {fg}^{m}) (not)

{S S^{m}}_{d} (not) = (S S^{m} * {fd}^{m}) (not)

After filtering, all the signals intended for the left and right ears are summed respectively, and thus a pair of binaural signals is obtained:

S_{boy Wut} (not) = Σ_{m = 1}^{M} S S_{boy Wut}^{m} (not)

S_{d} (not) = Σ_{m = 1}^{M} S S_{d}^{m} (not)

which then feed the headphones of a headset with two earpieces.

On décrit néanmoins une variante plus avantageuse ci-après. Les filtres d'adaptation du format ambiophonique vers le format binaural peuvent être appliqués directement dans le domaine fréquentiel, évitant ainsi une convolution dans le domaine temporel et un coût de calcul correspondant.However, a more advantageous variant is described below. The adaptation filters from the surround format to the binaural format can be applied directly in the frequency domain, thus avoiding convolution in the time domain and a corresponding calculation cost.

A cet effet, chacun des M spectres en fréquence ss^m(ω) est directement multiplié par les transformées de Fourier respectives des filtres temporels, notées Fg^m(ω) et Fd^m(ω) (adaptées le cas échéant pour avoir un nombre de points cohérent), ce qui s'écrit : ${S S}^{m}_{g} (ω) = {S S}^{m} (ω) . F g^{m} (ω)$

{S S}^{m}_{d} (ω) = {S S}^{m} (ω) . F d^{m} (ω)

Les spectres sont ensuite sommés par oreille avant d'effectuer la transformée de Fourier inverse et l'opération d'addition/recouvrement, soit :

S_{g} (ω) = \sum_{m = 1}^{M} {S^{m}}_{g} (ω)

S_{d} (ω) = \sum_{m = 1}^{M} {S^{m}}_{d} (ω)

Puis, pour exprimer les signaux alimentant le dispositif de restitution dans le domaine temporel, on applique la transformée de Fourier inverse :

S_{g} (n) = IFFT (s_{g} (ω))

S_{d} (n) = IFFT (s_{d} (ω))

For this purpose, each of the M frequency spectra ss ^m (ω) is directly multiplied by the respective Fourier transforms of the temporal filters, noted Fg ^m (ω) and Fd ^m (ω) (adapted if necessary to have a coherent number of points), which is written:

{S S}^{m}_{boy Wut} (ω) = {S S}^{m} (ω) . F {boy Wut}^{m} (ω)

{S S}^{m}_{d} (ω) = {S S}^{m} (ω) . F d^{m} (ω)

The spectra are then summed by ear before performing the inverse Fourier transform and the addition / recovery operation, ie:

S_{boy Wut} (ω) = Σ_{m = 1}^{M} {S^{m}}_{boy Wut} (ω)

S_{d} (ω) = Σ_{m = 1}^{M} {S^{m}}_{d} (ω)

Then, to express the signals supplying the reproduction device in the time domain, the inverse Fourier transform is applied:

S_{boy Wut} (not) = IFFT (s_{boy Wut} (ω))

S_{d} (not) = IFFT (s_{d} (ω))

La présente invention vise aussi un produit programme d'ordinateur, qu'il soit stocké dans une mémoire d'une unité centrale ou d'un terminal, ou sur un support amovible propre à coopérer avec un lecteur de cette unité centrale (CD-ROM, disquette ou autre), ou encore téléchargeable via un réseau de télécommunications. Ce programme comporte en particulier des instructions pour la mise en oeuvre du procédé décrit ci-avant et dont un organigramme peut être illustré à titre d'exemple sur la figure 5, résumant les étapes d'un tel procédé.

L'étape a) vise l'affectation des paramètres représentatifs d'une amplitude à chaque source S_i. Dans l'exemple représenté, on affecte un paramètre p_i,k par composante fréquentielle f_k comme décrit ci-avant.
L'étape b) vise la duplication de ces paramètres et leur multiplication par les gains g_i ^m des canaux d'encodage.
L'étape c) vise le regroupement des produits obtenus à l'étape b), avec en particulier le calcul de leur somme sur toutes les sources S_i.
L'étape d) vise la synthèse paramétrique avec multiplication par une enveloppe spectrale env_k comme décrit ci-avant, suivi d'un regroupement des sous-canaux par application, dans chaque canal, d'une somme sur toutes les composantes fréquentielles (d'indice k allant de 1 à K).
L'étape e) vise un décodage de spatialisation des signaux ss^m issus des canaux respectifs, synthétisés, spatialisés et représentés dans le domaine fréquentiel, pour une restitution sur deux haut-parleurs par exemple au format binaural.

The present invention also relates to a computer program product, whether stored in a memory of a central unit or a terminal, or on a removable support adapted to cooperate with a drive of this central unit (CD-ROM , diskette or other), or downloadable via a telecommunications network. This program comprises in particular instructions for the implementation of the method described above and a flow chart of which can be illustrated by way of example on the figure 5 , summarizing the steps of such a process.

Step a) aims at assigning the parameters representative of an amplitude to each source S _i . In the example shown, a parameter p _{i, k} is assigned by frequency component f _k as described above.
Step b) aims at the duplication of these parameters and their multiplication by the gains g _i ^m of the encoding channels.
Step c) relates to the grouping of the products obtained in step b), with in particular the calculation of their sum over all the sources S _i .
Step d) targets the parametric synthesis with multiplication by a spectral envelope env _k as described above, followed by a grouping of the subchannels by applying, in each channel, a sum over all the frequency components (d index k ranging from 1 to K).
Step e) aims at decoding the spatialization of the signals ss ^m originating from the respective channels, synthesized, spatialised and represented in the frequency domain, for a reproduction on two loudspeakers, for example in binaural format.

La présente invention vise aussi un dispositif de génération de sons synthétiques et spatialisés, comprenant notamment un processeur, et, en particulier, une mémoire de travail propre à stocker des instructions du produit programme d'ordinateur défini ci-avant.The present invention also provides a device for generating synthetic and spatialized sounds, comprising in particular a processor, and in particular a working memory adapted to store instructions of the computer program product defined above.

Bien entendu, la présente invention ne se limite pas à la forme de réalisation décrite ci-avant à titre d'exemple ; elle s'étend à d'autres variantes.Of course, the present invention is not limited to the embodiment described above by way of example; it extends to other variants.

Ainsi, il a été décrit ci-avant à titre d'exemple un encodage de spatialisation en format ambiophonique réalisé par le module SPAT de la figure 2, suivi d'une adaptation du format ambiophonique vers le format binaural. En variante, il peut être prévu par exemple d'appliquer directement un encodage vers le format binaural.Thus, it has been described above by way of example a spatialisation encoding in ambiophonic format produced by the SPAT module of the figure 2 followed by an adaptation of the surround format to the binaural format. As a variant, it may be provided, for example, to directly apply an encoding to the binaural format.

Par ailleurs, la multiplication par des enveloppes spectrales de la synthèse paramétrique est décrite ci-avant à titre d'exemple, d'autres modèles pouvant être prévus en variante.Moreover, the spectral envelope multiplication of the parametric synthesis is described above by way of example, other models that can be provided alternatively.

Claims

Method for jointly synthesizing and spatializing a plurality of sound sources in associated spatial positions, comprising:
a) a step of assigning to each source at least one parameter (p_i) representing an amplitude of at least one frequency component of the source,

b) a spatialization step implementing an encoding into a plurality of channels, wherein each amplitude parameter (p_i) is duplicated to be multiplied with a spatialization gain (g_i ^m), each spatialization gain being determined, on the one hand, for an encoding channel (p_g ^m) and, on the other hand, for a source to be spatialized (S_i),

c) a step of grouping together (R) the parameters (p_i ^m) multiplied by the gains, in respective channels (p_g ¹, ... p_g ^M), by applying a sum of said multiplied parameters (p_i ^m) to all the sources (S_i) for each channel (p_g ^m), and

d) a parametric synthesis step (SYNTH (1), ..., SYNTH (M)) applied to each of the channels (pg^m).
Method according to Claim 1, wherein:
a) each source (S_i) is assigned a plurality of parameters (p_i, _k), each representing an amplitude of a frequency component (f_k),

b) each amplitude parameter (p_i, _k) representing a frequency component (f_k) is duplicated to be multiplied with a spatialization gain (g_i ^m), each spatialization gain being determined, on the one hand, for an encoding channel (p_g ^m) and, on the other hand, for a source to be spatialized (S_i),

c) in each channel, there are grouped together, frequency component by frequency component, the products of the parameters (p_{i, k}) by the gains (g_i ^m), into sub-channels (p_g, k^m) each associated with a frequency component (f_k).
Method according to Claim 2, wherein the synthesis is conducted, in each channel, by:
d1) multiplying the output of each sub-channel associated with a frequency component (f_k) by a spectral envelope (env_k) centered on a frequency corresponding to said frequency component (f_k),

d2) and grouping together, by a sum over the frequency components (f_k), the products resulting from the operation d1),
to obtain, following the operation d2), a signal (ss^m) derived from each channel, spatially encoded and synthesized.
Method according to one of the preceding claims, wherein the spatialization is conducted by ambiophonic encoding and the parameters representing an amplitude that are assigned to the sources correspond to spherical harmonic amplitudes (Y_m).
Method according to Claim 4, taken in combination with Claim 3, wherein, to switch from an ambiophonic encoding to a decoding with a view to playback in binaural spatialization mode, a processing is applied in the frequency domain directly to the results of the products derived from the respective channels after the operation d2).
Computer program product, stored in a memory of a central unit or of a terminal, and/or on a removable medium specifically for cooperating with a drive of said central unit, and/or downloadable via a telecommunication network, characterized in that it comprises instructions for the implementation of the method according to one of Claims 1 to 5.
Module for generating spatialized synthetic sounds, notably comprising a processor, characterized in that it also comprises a working memory storing instructions of the computer program product according to Claim 6.