EP1992198B1 - Optimization of binaural sound spatialization based on multichannel encoding - Google Patents
Optimization of binaural sound spatialization based on multichannel encoding Download PDFInfo
- Publication number
- EP1992198B1 EP1992198B1 EP07731684.2A EP07731684A EP1992198B1 EP 1992198 B1 EP1992198 B1 EP 1992198B1 EP 07731684 A EP07731684 A EP 07731684A EP 1992198 B1 EP1992198 B1 EP 1992198B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- encoding
- functions
- filters
- decoding
- hrir
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005457 optimization Methods 0.000 title claims description 28
- 230000006870 function Effects 0.000 claims description 167
- 238000000034 method Methods 0.000 claims description 61
- 238000012546 transfer Methods 0.000 claims description 20
- 238000004091 panning Methods 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 2
- 230000001934 delay Effects 0.000 description 29
- 238000009877 rendering Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 11
- 238000001914 filtration Methods 0.000 description 7
- 210000003128 head Anatomy 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000000354 decomposition reaction Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 101100536354 Drosophila melanogaster tant gene Proteins 0.000 description 3
- 210000005069 ears Anatomy 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241001362574 Decodes Species 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 210000000613 ear canal Anatomy 0.000 description 2
- ULFUJLFTRWWLPO-UHFFFAOYSA-N ethyl 2,7,7-trimethyl-5-oxo-4-(4-phenylphenyl)-1,4,6,8-tetrahydroquinoline-3-carboxylate Chemical compound CCOC(=O)C1=C(C)NC(CC(C)(C)CC2=O)=C2C1C(C=C1)=CC=C1C1=CC=CC=C1 ULFUJLFTRWWLPO-UHFFFAOYSA-N 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 229940082150 encore Drugs 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Definitions
- the present invention aims at processing sound signals for their spatialization.
- Spatial sound reproduction allows a listener to perceive sound sources coming from a direction or from any position in space.
- HRTF Head Related Transfer Functions
- HRIR Head Related Impulse Response
- the term "binaural” aims at restitution on a stereophonic headphones with nevertheless effects of spatialization.
- the present invention is not limited to this technique and also applies in particular to techniques derived from the binaural such as rendering techniques called “transaural”, that is to say on remote speakers.
- Such techniques can then use what is called a “crosstalk cancellation” (or “cross-talk cancellation”) which consists in canceling the acoustic cross paths so that a sound, thus processed and then emitted by the loudspeakers. speakers, can be perceived only by one of the two ears of a listener.
- this decomposition makes it possible to do encoding and decoding called "binaural multichannel".
- the decoding functions which are actually filters
- a set of spatial encoding functions which are in fact encoding gains
- when they are optimal in rendering ensure a feeling of immersion perfect for the listener inside a sound stage, whereas it actually has, for the binaural restitution, only two loudspeakers (headphones of a headphone or distant loudspeakers).
- the advantages of a multi-channel approach for binaural techniques are multiple since the encoding step is independent of the decoding step.
- the encoding is generally inexpensive in memory and / or in calculations since the spatial functions are gains that depend solely on the effects of the effects. sources to encode and not the number of sources themselves.
- Decoding also has a cost independent of the number of sources to be spatialised. In the case, moreover, of a real sound field measured by a network of microphones and encoded according to known spatial functions, it is possible today to find decoding functions that allow satisfactory binaural listening.
- the decoding functions can be individualized for each of the listeners.
- the present invention aims in particular an improved obtaining of decoding filters and / or encoding gains in binaural multichannel technique.
- the context is as follows: sources are spatialised by multichannel encoding and the restitution of the spatially encoded content is done by applying appropriate decoding filters.
- each HRTF The delays of each HRTF are extracted. Indeed, the shape of a head is usually such that, for a given position, a sound arrives at one ear a certain time before reaching the other ear (a sound to the left arriving of course to the ear left before reaching the right ear).
- the delay difference t between the two ears is an interaural location index called ITD (for "Interaural Time Difference").
- ITD Interaural Time Difference
- decoding filters L i ( f ) and R i ( f ) of channel i are obtained which satisfy the equations:
- the i f for p 1 , 2 , ... , P
- a second approach proposed in the document US 5500900 , to jointly compute the decoding filters and the encoding spatial functions consists in decomposing the HRIR sets by performing a principal component analysis (PCA) and then selecting a reduced number of components (which corresponds to the number of channels).
- PCA principal component analysis
- An equivalent approach proposed in US 5596644 rather uses a singular value decomposition (SVD). If the delays are extracted from the HRIR before the decomposition and then used at the time of the encoding, the reconstruction of the HRIR is very good with a reduced number of components.
- SSD singular value decomposition
- the multichannel binaural can also be seen as the binaural simulation of multichannel rendering on a plurality of loudspeakers (more than two).
- virtual speakers when binaural restitution is, according to this approach, only on two headsets of a headset or two remote speakers.
- the principle of such a rendering consists in considering a configuration of loudspeakers distributed around the listener.
- intensity panning laws or " pan pot ” are used to then give the listener the feeling that sources are actually positioned in space only from two speakers.
- pan pot intensity panning laws
- Similar rules are used to define virtual speaker positions, which amounts to defining spatial encoding functions.
- the decoding filters correspond directly to the HRIR functions calculated at the positions of the virtual loudspeakers.
- the techniques of the prior art require the extraction of delays HRIR.
- the techniques of sound recording or multi-channel encoding at a point in space are widely used since it is then possible to make transformations to the encoded signals (for example rotations).
- the delay information is not extractable from the signal alone.
- the decoding filters must then be able to reproduce the delays for an optimal sound reproduction.
- the number of channels can be low and the techniques of the prior art do not allow good decoding with few channels without extracting delays.
- the multichannel signal acquired may consist of only four channels, typically.
- ambiophonic microphones means microphones composed of coinciding directional sensors. The interaural delays must then be reproduced at decoding.
- the present invention improves the situation.
- a sound spatialization method with multichannel encoding and binaural reproduction on two loudspeakers comprising a spatial encoding defined by encoding functions associated with a plurality of encoding channels and a decoding by application. filters for binaural playback on the two speakers.
- acoustic transfer functions specific to an individual morphology can relate to the HRIR functions expressed in the time domain. However, it is not excluded to consider in the first step a) the HRTF functions expressed in the frequency domain and, in fact, usually corresponding to the Fourier transforms of the HRIR functions.
- the invention proposes the optimization calculation of the filters associated with a set of chosen encoding gains or encoding gains associated with a set of selected decoding filters, or a joint optimization of the decoding filters. and encoding gains.
- These filters and / or these gains have for example been fixed or initially calculated by the techniques of the pseudo-inverse or the virtual loudspeakers, described in particular in the document WO-00/19415 . Then, these filters and / or the associated gains are improved, within the meaning of the invention, by an iterative optimization which aims to reduce a predetermined error function.
- the invention thus proposes the determination of decoding filters and encoding gains which allow both a good reconstruction of the delay but also a good reconstruction of the HRTF amplitude (modulus of HRTF), and this, for a weak number of channels, as will be seen with reference to the detailed description below.
- obtaining the HRTF of the second ear can be deduced from the measurement of the first ear by symmetry.
- the set of functions HRIR can for example be measured on a subject by positioning microphones at the entrance of his ear canal.
- this HRIR game can also be calculated by numerical simulation methods (modeling of the morphology of the subject or calculation by artificial neural network) or having been the subject of a chosen treatment (reduction of the number of samples, phase correction, or other). It is possible in this step a) to extract the delays from the HRIRs, to store them and then to add them at the time of the spatial encoding, the steps b) and c) remaining unchanged. This achievement will be described in detail with particular reference to the figure 8 .
- This first step a) has the reference E0 on the figure 1 .
- step b if one seeks to obtain optimized filters on the one hand, it is necessary to set the spatial encoding functions g ( ⁇ , ⁇ , n ) (or g ( ⁇ , ⁇ , n, f )) and, to obtain optimized spatial functions, the decoding filters denoted F ( t, n ) must be fixed . Nevertheless, it can be planned to jointly optimize both the filters and the spatial functions, as indicated above.
- the choice of an optimization of the spatial functions or an optimization of the decoding filters can depend on various contexts of application. If the spatial encoding functions are fixed, they are then reproducible and universal and the individualization of the filters is simply decoding. Moreover, the spatial encoding functions, when they comprise a large number of zeros among n encoding channels as in the second embodiment described below, make it possible to limit the number of operations during encoding. Panoramic intensity of Laws ( "pan pot") between virtual speakers in two dimensions and their extensions in three dimensions can be represented by encoding functions with only two non-zero earnings at most two dimensions and three non-zero gains for three dimensions, for a single given source. The number of non-zero winnings is, of course, independent of the number of channels and, most importantly, the zero winnings make it possible to lighten the encoding calculations.
- Spherical harmonic space functions in ambiophonic context have mathematical qualities that make it possible to transform the encoded signals (for example rotations of the sound field).
- such functions provide compatibility between binaural decoding and surround sound recordings based on a decomposition of the sound field into spherical harmonics.
- the encoding functions may be real or simulated directivity functions of microphones to allow listening of binaural multichannel recordings.
- the encoding functions can be arbitrary (non-universal) and determined by any method, the rendering then having to be optimized during subsequent steps of the method within the meaning of the invention.
- Spatial functions may also be a function of time or frequency. The optimization will be done taking into account this dependence (for example by optimizing independently each time sample or frequency).
- Decoding filters can be fixed so that the decoding can be universal.
- Decoding filters can also be chosen to reduce the resource cost of filtering. For example, the use of filters called “infinite impulse response" or "IIR" is advantageous.
- the decoding filters can also be chosen according to a psychoacoustic criterion, for example constructed from standardized Bark bands. More generally, the decoding filters can be determined by any method. The rendering, in particular for an individual listener, can then be optimized during the next steps of the method relating to the encoding functions.
- This second step b) relating to the calculation of an initial solution S0 carries the reference E1 on the figure 1 .
- it consists in choosing the decoding filters (referenced “F”) and / or the spatial encoding functions (referenced “g”) and determining an initial solution S0 for the encoding functions or the decoding filters, by a chosen method too.
- the filters of the starting solution S0 at step E1 can be directly the HRIR functions given to the corresponding positions of the virtual speakers.
- S0 starting solution being further determined by the functions defining the intensity panning laws ( "pan pot") in as encoding functions and by the HRIR functions, themselves, given to the positions of the virtual loudspeakers, as decoding filters.
- the decoding filters in step E1 are calculated from the pseudo-inverse to determine the starting solution S0.
- the elements F, HRIR and g are matrices.
- the starting solution S0 may be arbitrary (random or fixed), the essential point being that it leads to obtaining a converged solution SC at step E6 of FIG. figure 1 .
- the figure 1 also illustrates the operations E2, E3, T4, E5, E6 of the general step c), optimization within the meaning of the invention.
- this optimization is conducted by iterations.
- the so-called "gradient" optimization method searching for zeros of the first derivative of a finite difference multi-variable error function
- variant methods that make it possible to optimize functions according to an established criterion can also be considered.
- step E3 the calculation of an error function is an important point of the optimization method within the meaning of the invention.
- a proposed error function simply minimizes the module difference between the HRTF * Fourier transform of the reconstructed HRIR function set and the HRTF Fourier transform of the original HRIR function set (given in step E0).
- any error function calculated entirely or in part from the HRIR functions can be provided (module, phase, delay or estimated ITD, interaural differences, or other).
- the error criterion relates to the frequency samples of the HRTF functions, independently of each other contrary to what was proposed above (sum over all the frequencies for the calculation of the error function c)
- the optimization iterations can be successively applied to each frequency sample, with the advantage of then reducing the number of simultaneous variables, to have an error function specific to each frequency f and to meet a stopping criterion as a function of the convergence specific to each frequency.
- the filters F ( n, t ) or the gains g ( ⁇ , ⁇ , n ) or the calculated filter / gain pairs make it possible to obtain an optimal spatial rendering, as will be seen in particular with reference to the figure 4C or at figure 6 below.
- the treatment then stops by obtaining a converged solution (step E6).
- the starting solution S0 for step E1 is given by calculating the pseudo-inverse (with linear resolution).
- This starting solution constitutes the decoding solution which was proposed as such in the document WO-00/19415 of the prior art described above.
- the optimization technique used in the sense of the invention is preferably that of the gradient described above.
- the HRTF function module is relatively poorly reconstructed, with most reconstruction errors being greater than 8 dB. Nevertheless, it appears that the error on the phase is practically not modified during the iterations. This error is however minimal at low frequencies and on the ispilateral portion of the HRTF functions (region at 0-180 ° azimuth). On the other hand, the error on the module decreases strongly as iterations of optimization, especially in this ispilateral region. Optimization in the sense of the invention therefore makes it possible to improve the module of the HRTF functions without modifying the phase, therefore the group delay, and, from there and above all, the interaural delay ITD, so that the rendering is particularly faithful thanks to the implementation of this first embodiment.
- Panoramic laws are commonly used by sound technicians to produce audio content, including multichannel content in so-called “surround” formats that are used in sound reproduction 5.1, 6.1, or other.
- panning encoding by panning laws is achieved by mixing a sound environment in a "surround” format (tracks 5.1 of a digital recording for example). Optimized filters from the same panning laws then allow for optimal binaural decoding for the desired rendering with this "surround" effect.
- the present invention is advantageously applicable in the case where the positions of the virtual speakers correspond to positions of a multichannel rendering system for the general public, with "surround” effect.
- the optimized decoding filters then allow decoding of multimedia consumer content (typically multi-channel content with "surround” effect) for playback on two speakers, for example on binaural headphones.
- This binaural reproduction of a content that is for example initially in 5.1 format is optimized thanks to the implementation of the invention.
- the HRIR functions are obtained at 64 positions around the listener, as described with reference to the first embodiment above.
- the optimization method used in the second embodiment is still that of the gradient.
- the starting solution S0 in step E1 is given by the ten decoding filters which correspond to the ten HRIR functions given to the positions of the loudspeakers. virtual.
- the fixed spatial functions are the encoding functions representing the panning laws.
- the figure 7 illustrates the variations of ITD interaural delay as a function of the azimuth position of the HRIR functions.
- the optimized solution makes it possible to reconstruct a delay ITD (mixed lines) relatively close to the original ITD (solid lines), but just as close as that reconstructed from the initial solution, here obtained by the technique of the loudspeakers virtual (long broken lines).
- the ITD delay reconstructed from the filters obtained by linear (pseudo-inverse) resolution, represented by dashed lines on the figure 7 is quite irregular and far from the original ITD.
- the optimization of the method in the sense of the invention therefore makes it possible to reconstruct both the HRTF function module and the ITD group delay between the two ears.
- figure 9 corresponds to the case where the encoding gains are obtained by applying the virtual loudspeaker method according to the second embodiment described above.
- the figure 8 presents an implementation of multichannel encoding and decoding when delays are not included in the decoding filters but must be taken into account as soon as encoding. It may correspond to that of the prior art described above WO-00/19415 if the decoding filters (and / or the encoding functions) have not been optimized within the meaning of the invention.
- the realization of the figure 8 consists, in generic terms, of extracting, from the transfer functions obtained in step a), interaural delay information, while that the optimization, within the meaning of the invention, encoding functions and / or decoding filters is conducted here from the transfer functions from which these delay information has been extracted. Then, these interaural delays can be stored and then applied later, in particular to the encoding.
- F j, L the decoding filters for the channel j and the symmetrical filters of the filters F j, L. It is indicated here that in the case of virtual loudspeakers, the symmetrical filter of a given virtual loudspeaker (a given channel) is the filter of the virtual symmetrical loudspeaker (considering the left / right plane of symmetry of the head). Finally, L and R are the left and right binaural channels.
- the multichannel signals for the left channel are different from those for the right channel.
- the consequences of the introduction of coding delays are therefore the doubling of the number of encoding operations and the doubling of the number of channels, compared to the second implementation illustrated on the figure 9 and taking advantage of the advantages offered by the second embodiment of the invention.
- each signal from a source S i in the encoding block ENCOD is split so that one of them is applied to a delay (positive or negative) ⁇ ITD 1 , ⁇ ITD 2 , and each split signal is multiplied by each gain the results of the multiplications are then grouped by channel index j (n channels) and whether interaural delay has been applied or not (2 times n channels in total).
- the 2n signals obtained are conveyed through a network, stored, or otherwise, for restitution and, for this purpose, are applied to a DECOD decoding block having n filters F j, L for a left channel L and n symmetrical filters for a right lane R.
- the symmetry of the filters results from the fact that we consider a symmetry of the HRTF functions.
- the signals to which the filters are applied are grouped together in each channel and the signal resulting from this grouping is intended to supply one of the two speakers with playback on two distant loudspeakers (in which case it is necessary to add an operation of cross paths cancellation) or directly one of the two channels of a headset with auricles in binaural restitution.
- the figure 9 presents, in turn, an implementation of encoding and multichannel decoding when the delays are, on the contrary, included in the decoding filters in the sense of the second embodiment using the virtual speaker method and exploiting the resultant observation Figures 6 and 7 above.
- each sum or each difference of filters is to be considered as a filter in itself. What is indicated here as being a sum or a difference of filters is to be considered in relation to the expressions of the filters F j, L and described above with reference to the figure 8 .
- the decoding processing of the figure 9 is continued by a grouping sums SS and a grouping of SD differences feeding by their sum the L path (SL module delivering SS + SD signal) and by their difference the R channel (DR module delivering SS-SD signal).
- the useful working memory buffer for the implementation of the figure 8 more than twice that useful for the implementation of the figure 9 since 2n channels pass between encoding and decoding and a delay line must be implemented per source in the implementation of the figure 8 .
- the present invention thus aims at a sound spatialization system with multichannel encoding and for a two-channel reproduction comprising an ENCOD spatial encoding block defined by encoding functions associated with a plurality of encoding channels and a decoding block. DECOD by applying filters for binaural rendition.
- the spatial encoding functions and / or the decoding filters are determined by the implementation of the method described above.
- Such a system may correspond to that illustrated on the figure 8 , in an embodiment for which the delays are integrated at the time of the encoding, which corresponds to the state of the art in the sense of the document WO-00/19415 .
- Another advantageous embodiment consists of implementing the method according to the second embodiment to then build a spatialization system with a direct encoding block, without applying a delay, so as to reduce a number of encoding channels and a corresponding number of decoding filters, which directly include ITD interaural delays, according to an advantage offered by the implementation of the invention, as illustrated in FIG. figure 9 .
- this realization of the figure 9 achieves a quality of spatial rendering at least as good, if not better, than the techniques of the prior art, and this with a number of filters half the size and a lower computational cost.
- this embodiment allows a quality of reconstruction of the HRTF module and interaural delay better than the techniques of the prior art with a reduced number of channels.
- the present invention also relates to a computer program comprising instructions for implementing the method described above and whose algorithm can be illustrated by a general flowchart of the type shown in FIG. figure 1 .
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Description
La présente invention vise un traitement de signaux sonores pour leur spatialisation.The present invention aims at processing sound signals for their spatialization.
La restitution sonore spatialisée permet à un auditeur de percevoir des sources sonores provenant d'une direction ou d'une position quelconque de l'espace.Spatial sound reproduction allows a listener to perceive sound sources coming from a direction or from any position in space.
Les techniques particulières de restitution sonore spatialisées auxquelles se rapporte la présente invention sont basées sur les fonctions de transfert acoustique de la tête entre les positions de l'espace et le conduit auditif. Ces fonctions de transfert dites "HRTF" (pour "Head Related Transfer Functions") concernent la forme fréquentielle des fonctions de transfert. On désignera ci-après par "HRIR" (pour "Head Related Impulse Response") leur forme temporelle.The particular spatialized sound reproduction techniques to which the present invention relates are based on the acoustic transfer functions of the head between the positions of the space and the ear canal. These "HRTF" transfer functions (for "Head Related Transfer Functions") concern the frequency form of the transfer functions. Hereafter referred to as "HRIR" (for "Head Related Impulse Response") their temporal form.
Par ailleurs, le terme "binaural" vise la restitution sur un casque stéréophonique avec néanmoins des effets de spatialisation. La présente invention ne se limite pas à cette technique et s'applique notamment aussi à des techniques dérivées du binaural comme les techniques de restitution dite "transaurale", c'est-à-dire sur haut-parleurs distants. De telles techniques peuvent alors utiliser ce qui est appelé une "annulation de diaphonie" (ou "cross-talk cancellation") qui consiste à annuler les chemins croisés acoustiques de manière à ce qu'un son, ainsi traité puis émis par les haut-parleurs, puisse n'être perçu que par une seule des deux oreilles d'un auditeur.Moreover, the term "binaural" aims at restitution on a stereophonic headphones with nevertheless effects of spatialization. The present invention is not limited to this technique and also applies in particular to techniques derived from the binaural such as rendering techniques called "transaural", that is to say on remote speakers. Such techniques can then use what is called a "crosstalk cancellation" (or "cross-talk cancellation") which consists in canceling the acoustic cross paths so that a sound, thus processed and then emitted by the loudspeakers. speakers, can be perceived only by one of the two ears of a listener.
Le terme "multicanal", dans le traitement pour la restitution sonore spatialisée, consiste à produire une représentation du champ acoustique sous forme de N signaux (dites composantes spatiales). Ces signaux contiennent l'ensemble des sons qui composent le champ sonore avec néanmoins des pondérations qui dépendent de leur direction (ou "incidence") et décrites par N fonctions d'encodage spatial associées. La reconstruction du champ sonore, pour la restitution en un point choisi, est alors assurée par N' fonctions de décodage spatial (avec le plus souvent N=N').The term "multichannel", in the treatment for spatialized sound reproduction, consists of producing a representation of the acoustic field in the form of N signals (called spatial components). These signals contain all the sounds that make up the sound field with nevertheless weightings that depend on their direction (or "incidence") and described by N associated spatial encoding functions. Reconstruction the sound field, for the restitution at a chosen point, is then provided by N 'spatial decoding functions (with most often N = N').
Dans le cas particulier du binaural, cette décomposition permet de faire de l'encodage et du décodage dits "binaural multicanal". Les fonctions de décodage (qui sont en réalité des filtres), associées à un jeu de fonctions d'encodage spatial donné (qui sont en réalité des gains d'encodage), lorsqu'ils sont optimums en restitution, assurent un sentiment d'immersion parfaite de l'auditeur à l'intérieur d'une scène sonore, alors qu'il ne dispose en réalité, pour la restitution binaurale, que de deux haut-parleurs (oreillettes d'un casque ou haut-parleurs distants).In the particular case of the binaural, this decomposition makes it possible to do encoding and decoding called "binaural multichannel". The decoding functions (which are actually filters), associated with a set of spatial encoding functions (which are in fact encoding gains), when they are optimal in rendering, ensure a feeling of immersion perfect for the listener inside a sound stage, whereas it actually has, for the binaural restitution, only two loudspeakers (headphones of a headphone or distant loudspeakers).
Les avantages d'une approche multicanal pour les techniques binaurales sont multiples puisque l'étape d'encodage est indépendante de l'étape de décodage.
Ainsi, dans le cas d'une composition d'une scène sonore virtuelle à partir de signaux synthétisés ou enregistrés, l'encodage est généralement peu coûteux en mémoire et/ou en calculs puisque les fonctions spatiales sont des gains qui dépendent uniquement des incidences des sources à encoder et non pas du nombre de sources elles-mêmes. Le décodage a aussi un coût indépendant du nombre de sources à spatialiser.
Dans le cas en outre d'un champ sonore réel mesuré par un réseau de microphones et encodé selon des fonctions spatiales connues, il est possible aujourd'hui de trouver des
fonctions de décodage qui permettent une écoute binaurale satisfaisante. Enfin, les fonctions de décodage peuvent être individualisées pour chacun des auditeurs.The advantages of a multi-channel approach for binaural techniques are multiple since the encoding step is independent of the decoding step.
Thus, in the case of a composition of a virtual sound scene from synthesized or recorded signals, the encoding is generally inexpensive in memory and / or in calculations since the spatial functions are gains that depend solely on the effects of the effects. sources to encode and not the number of sources themselves. Decoding also has a cost independent of the number of sources to be spatialised.
In the case, moreover, of a real sound field measured by a network of microphones and encoded according to known spatial functions, it is possible today to find
decoding functions that allow satisfactory binaural listening. Finally, the decoding functions can be individualized for each of the listeners.
La présente invention vise en particulier une obtention perfectionnée des filtres de décodage et/ou des gains d'encodage en technique binaurale multicanale. Le contexte est le suivant : des sources sont spatialisées par encodage multicanal et la restitution du contenu encodé spatialement s'effectue par application de filtres de décodage appropriés.The present invention aims in particular an improved obtaining of decoding filters and / or encoding gains in binaural multichannel technique. The context is as follows: sources are spatialised by multichannel encoding and the restitution of the spatially encoded content is done by applying appropriate decoding filters.
On connaît par la référence
- gi (θp ,ϕp ) des fonctions spatiales d'encodage fixées où g est le gain correspondant au canal i ∈ 1,..,N et à la position p ∈ 1,..,P définie par ses angles d'incidence θ (azimut) et ϕ (élévation),
- L(θp,ϕp,f) et R(θp,ϕp,f) des bases de fonctions HRTF obtenues en mesurant les fonctions de transfert acoustique de chaque oreille L et R d'un individu pour un nombre P de positions de l'espace (p ∈ 1,...,P) et pour une fréquence donnée f, on prévoit dans ce document
WO-00/19415
- g i ( θ p , φ p ) fixed encoding spatial functions where g is the gain corresponding to the channel i ∈ 1, .., N and the position p ∈ 1, .., P defined by its angles d incidence θ (azimuth) and φ (elevation),
- L ( θ p , φ p , f ) and R ( θ p , φ p , f ) of HRTF function bases obtained by measuring the acoustic transfer functions of each individual L and R ear for a number P of positions of space ( p ∈ 1, ..., P ) and for a given frequency f, it is provided in this document
WO-00/19415
On extrait les retards de chaque HRTF. En effet, la forme d'une tête est habituellement telle que, pour une position donnée, un son arrive à une oreille un certain temps avant d'arriver à l'autre oreille (un son situé à gauche arrivant bien entendu à l'oreille gauche avant d'arriver à l'oreille droite). La différence de retard t entre les deux oreilles est un indice interaural de localisation appelé ITD (pour "Interaural Time Différence"). On définit alors de nouvelles bases de HRTF notées L et R par :
On obtient dans la seconde étape des filtres de décodage L i (f) et Ri (f) du canal i qui satisfont les équations :
Pour obtenir ces filtres, ce document propose une méthode dite "de calcul de la pseudo-inverse" qui vise à satisfaire les équations précédentes au sens des moindres carrés, soit :
La mise en oeuvre d'une telle technique nécessite donc de réintroduire un retard correspondant à l'ITD au moment de l'encodage de chaque source sonore. Chaque source est donc encodée deux fois (une fois pour chaque oreille). Le document
Par ailleurs, une deuxième approche, proposée dans le document
Une approche équivalente, proposée dans
An equivalent approach proposed in
Lorsque les retards sont laissés dans les filtres originaux, le nombre de canaux doit être augmenté afin d'obtenir une reconstruction de bonne qualité.
De plus, ces techniques de l'art antérieur ne permettent pas d'avoir des fonctions spatiales d'encodage universelles. En effet, la décomposition donne des fonctions spatiales différentes pour chaque individu.When delays are left in the original filters, the number of channels must be increased in order to obtain good quality reconstruction.
In addition, these techniques of the prior art do not allow to have universal encoding spatial functions. Indeed, decomposition gives different spatial functions for each individual.
On indique aussi que le binaural multicanal peut aussi être vu comme la simulation en binaural d'un rendu multicanal sur une pluralité de haut-parleurs (plus de deux). On parle alors de la méthode dite "des haut-parleurs virtuels" lorsque néanmoins la restitution binaurale se fait, selon cette approche, uniquement sur deux oreillettes d'un casque ou sur deux haut-parleurs distants. Le principe d'une telle restitution consiste à considérer une configuration de haut-parleurs répartis autour de l'auditeur. Lors du rendu sur deux haut-parleurs réels, des lois de panoramique d'intensité (ou " pan pot ") sont utilisées pour donner alors la sensation à l'auditeur que des sources sont réellement positionnées dans l'espace uniquement à partir de deux haut-parleurs. On parle alors de "sources fantômes". Des règles similaires sont utilisées pour définir des positions de haut-parleurs virtuels, ce qui revient à définir des fonctions spatiales d'encodage. Les filtres de décodage correspondent directement aux fonctions HRIR calculées aux positions des haut-parleurs virtuels.It is also indicated that the multichannel binaural can also be seen as the binaural simulation of multichannel rendering on a plurality of loudspeakers (more than two). We speak then of the method called "virtual speakers" when binaural restitution is, according to this approach, only on two headsets of a headset or two remote speakers. The principle of such a rendering consists in considering a configuration of loudspeakers distributed around the listener. When rendering on two real speakers, intensity panning laws (or " pan pot ") are used to then give the listener the feeling that sources are actually positioned in space only from two speakers. We are talking about "ghost sources". Similar rules are used to define virtual speaker positions, which amounts to defining spatial encoding functions. The decoding filters correspond directly to the HRIR functions calculated at the positions of the virtual loudspeakers.
Pour un rendu spatial performant avec un faible nombre de canaux, les techniques de l'art antérieur nécessitent l'extraction des retards des HRIR. Les techniques de prise de son ou d'encodage multicanal en un point de l'espace sont largement utilisées puisqu'il est alors possible de faire subir des transformations aux signaux encodés (par exemple des rotations). Or, dans le cas où le signal à décoder est un signal multicanal mesuré (ou encodé) en un point, l'information de retard n'est pas extractible à partir du signal seul. Les filtres de décodage doivent alors permettre de reproduire les retards pour un rendu sonore optimal. De plus, dans le cas d'enregistrements, le nombre de canaux peut être faible et les techniques de l'art antérieur ne permettent pas un bon décodage avec peu de canaux sans extraire les retards. Par exemple en technique d'acquisition à partir de microphones ambiophoniques, le signal multicanal acquis peut n'être constitué que de quatre canaux, typiquement. On entend par "microphones ambiophoniques" des microphones composés de capteurs directifs coïncidents. Les retards interauraux doivent alors être reproduits au décodage.For efficient spatial rendering with a small number of channels, the techniques of the prior art require the extraction of delays HRIR. The techniques of sound recording or multi-channel encoding at a point in space are widely used since it is then possible to make transformations to the encoded signals (for example rotations). However, in the case where the signal to be decoded is a multichannel signal measured (or encoded) at a point, the delay information is not extractable from the signal alone. The decoding filters must then be able to reproduce the delays for an optimal sound reproduction. In addition, in the case of recordings, the number of channels can be low and the techniques of the prior art do not allow good decoding with few channels without extracting delays. For example, in the acquisition technique from ambiophonic microphones, the multichannel signal acquired may consist of only four channels, typically. The term "ambiophonic microphones" means microphones composed of coinciding directional sensors. The interaural delays must then be reproduced at decoding.
Plus généralement, l'extraction des retards présente au moins deux autres inconvénients majeurs :
- les retards doivent être pris en compte (rajout d'une étape) au moment de l'encodage, ce qui augmente les ressources nécessaires en calcul,
- les retards étant pris en compte au moment de l'encodage, les signaux doivent être encodés pour chaque oreille et le nombre de filtrages nécessaire au décodage est double.
- the delays must be taken into account (addition of a step) at the time of the encoding, which increases the resources necessary in computation,
- delays being taken into account at the time of encoding, the signals must be encoded for each ear and the number of filtering necessary for decoding is double.
La présente invention vient améliorer la situation.The present invention improves the situation.
Elle propose à cette effet un procédé de spatialisation sonore avec un encodage multicanal et pour une restitution binaurale sur deux haut-parleurs, comprenant un encodage spatial défini par des fonctions d'encodage associées à une pluralité de canaux d'encodage et un décodage par application de filtres pour une restitution en contexte binaural sur les deux haut-parleurs.To this end, it proposes a sound spatialization method with multichannel encoding and binaural reproduction on two loudspeakers, comprising a spatial encoding defined by encoding functions associated with a plurality of encoding channels and a decoding by application. filters for binaural playback on the two speakers.
Le procédé au sens de l'invention comporte les étapes :
- a) obtenir un jeu original de fonctions de transfert acoustique propres à une morphologie d'individu (HRIR;HRTF),
- b) choisir des fonctions d'encodage spatial et/ou des filtres de décodage, et
- c) par itérations successives, optimiser les filtres associés aux fonctions d'encodage choisies ou les fonctions d'encodage associées aux filtres choisis, ou conjointement les filtres et les fonctions d'encodage choisis, en minimisant une erreur calculée en fonction d'une comparaison entre :
- le jeu original de fonctions de transfert, et
- un jeu de fonctions de transfert reconstruit à partir des fonctions d'encodage et des filtres de décodage, optimisés et/ou choisis.
- a) obtain an original set of acoustic transfer functions specific to an individual morphology (HRIR; HRTF),
- b) select spatial encoding functions and / or decoding filters, and
- c) by successive iterations, optimize the filters associated with the chosen encoding functions or the encoding functions associated with the chosen filters, or jointly the filters and the encoding functions chosen, while minimizing an error calculated according to a comparison enter :
- the original set of transfer functions, and
- a set of transfer functions reconstructed from encoding functions and decoding filters, optimized and / or selected.
Ce que l'on entend par "fonctions de transfert acoustique propres à une morphologie d'individu" peut concerner les fonctions HRIR exprimées dans le domaine temporel. Toutefois, il n'est pas exclu de considérer à la première étape a) les fonctions HRTF exprimées dans le domaine fréquentiel et, en réalité, correspondant habituellement aux transformées de Fourier des fonctions HRIR.What is meant by "acoustic transfer functions specific to an individual morphology" can relate to the HRIR functions expressed in the time domain. However, it is not excluded to consider in the first step a) the HRTF functions expressed in the frequency domain and, in fact, usually corresponding to the Fourier transforms of the HRIR functions.
Ainsi, de façon générale, l'invention propose le calcul par optimisation des filtres associés à un ensemble de gains d'encodage choisis ou des gains d'encodage associés à un ensemble de filtres de décodage choisis, ou une optimisation conjointe des filtres de décodage et des gains d'encodage. Ces filtres et/ou ces gains ont par exemple été fixés ou calculés initialement par les techniques de la pseudo-inverse ou des haut-parleurs virtuels, décrites notamment dans le document
L'invention propose ainsi la détermination de filtres de décodage et de gains d'encodage qui permettent à la fois une bonne reconstruction du retard mais aussi une bonne reconstruction de l'amplitude des HRTF (module des HRTF), et ce, pour un faible nombre de canaux, comme on le verra en référence à la description détaillée ci-après.The invention thus proposes the determination of decoding filters and encoding gains which allow both a good reconstruction of the delay but also a good reconstruction of the HRTF amplitude (modulus of HRTF), and this, for a weak number of channels, as will be seen with reference to the detailed description below.
D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ci-après, et des dessins annexés sur lesquels :
- la
figure 1 illustre les étapes générales d'un procédé au sens de l'invention, - la
figure 2 illustre l'amplitude (niveaux de gris) des fonctions temporelles HRIR (sur plusieurs échantillons successifs Ech) qui ont été choisies pour la mise en oeuvre de l'étape E0 de lafigure 1 , en fonction de l'azimut (en degrés notés deg°), - la
figure 3 illustre l'allure de quelques premiers harmoniques sphériques en contexte ambiophonique, en tant que fonctions d'encodage spatial dans un premier mode de réalisation, - les
figures 4A, 4B, 4C comparent les performances du traitement selon le premier mode de réalisation, pour une solution non optimisée (figure 4A ), pour une solution partiellement optimisée par quelques itérations de traitement (figure 4B ) et pour une solution complètement optimisée par le traitement au sens de l'invention (figure 4C ), - la
figure 5 illustre les fonctions d'encodage dans la technique des haut-parleurs virtuels utilisée dans un second mode de réalisation, - la
figure 6 compare une fonction HRTF moyenne réelle (représentée en trait plein) aux fonctions HRTF moyenne reconstruites en utilisant la solution de la pseudo-inverse au sens de l'art antérieur (représentée en traits pointillés), la solution de départ donnée par la méthode des haut-parleurs virtuels (représentés en traits interrompus longs) et la solution optimisée convergente, au sens du second mode de réalisation de l'invention (représentés en traits mixtes), - la
figure 7 compare les variations du retard interaural ITD original (traits pleins) à celui obtenu par la solution optimisée au sens du second mode de réalisation de l'invention (traits mixtes), à celui reconstruit à partir de la technique des haut-parleurs virtuels (traits interrompus longs) et à celui reconstruit à partir des filtres obtenus par la solution de la pseudo-inverse au sens de l'art antérieur (traits pointillés), - la
figure 8 représente schématiquement un système de spatialisation pouvant être obtenu par la mise en oeuvre du premier mode de réalisation, en tenant compte des retards interauraux à l'encodage, - la
figure 9 représente schématiquement un système de spatialisation pouvant être obtenu par la mise en oeuvre du second mode de réalisation, sans prise en compte des retards interauraux à l'encodage mais en incluant ces retards dans les filtres de décodage.
- the
figure 1 illustrates the general steps of a process within the meaning of the invention, - the
figure 2 illustrates the amplitude (gray levels) of the temporal functions HRIR (on several successive samples Ech) which have been chosen for the implementation of the step E0 of thefigure 1 , as a function of the azimuth (in degrees deg °), - the
figure 3 illustrates the appearance of some first spherical harmonics in ambiophonic context, as spatial encoding functions in a first embodiment, - the
Figures 4A, 4B, 4C compare the performance of the processing according to the first embodiment, for a non-optimized solution (Figure 4A ), for a solution partially optimized by a few iterations of treatment (Figure 4B ) and for a solution completely optimized by the treatment in the sense of the invention (figure 4C ) - the
figure 5 illustrates the encoding functions in the virtual speaker technique used in a second embodiment, - the
figure 6 compares a real average HRTF function (represented in solid line) with the reconstructed average HRTF functions using the solution of the pseudo-inverse in the sense of the prior art (represented by dashed lines), the starting solution given by the high method. virtual speakers (represented in long broken lines) and the convergent optimized solution, in the sense of the second embodiment of the invention (shown in phantom), - the
figure 7 compares the variations of the original ITD interaural delay (solid lines) to that obtained by the optimized solution in the sense of the second embodiment of the invention (mixed lines), to that reconstructed from the virtual speakers technique (features). interrupted long) and to that reconstructed from the filters obtained by the solution of the pseudo-inverse in the sense of the prior art (dashed lines), - the
figure 8 schematically represents a spatialization system obtainable by the implementation of the first embodiment, taking into account interaural delays in encoding, - the
figure 9 schematically represents a spatialization system obtainable by the implementation of the second embodiment, without taking into account the interaural delays in the encoding but including these delays in the decoding filters.
Dans un exemple de réalisation, le procédé au sens de l'invention peut se décomposer en trois étapes :
- a) obtenir un jeu de HRIR (oreille gauche et /ou oreille droite) en P positions autour de l'auditeur, noté ci-après H(θp,ϕp,t),
- b) fixer des fonctions d'encodage spatial et/ou des filtres de base, les fonctions d'encodages étant notées g(θ,ϕ,n) (ou encore g(θ,ϕ,n,f)), où :
- θ,ϕ sont les angles d'incidence en azimut et élévation,
- n est l'indice du canal d'encodage considéré,
- et f est la fréquence,
- c) et trouver les filtres associés aux fonctions spatiales fixées ou les fonctions spatiales associées aux filtres fixés ou une combinaison de filtres et de fonctions spatiales associés, par une technique d'optimisation qui sera décrite en détail plus loin.
- a) obtaining a set of HRIR (left ear and / or right ear) in P positions around the listener, denoted hereinafter H ( θ p , φ p , t ) ,
- b) set spatial encoding functions and / or base filters, the encoding functions being denoted g ( θ, φ, n ) (or also g ( θ, φ, n, f )), where:
- θ, φ are the angles of incidence in azimuth and elevation,
- n is the index of the encoding channel considered,
- and f is the frequency,
- c) and find the filters associated with the fixed spatial functions or the spatial functions associated with the fixed filters or a combination of filters and associated spatial functions, by an optimization technique which will be described in detail below.
On indique simplement ici que, pour la mise en oeuvre de la première étape a) précitée, l'obtention des HRTF de la deuxième oreille peut être déduite de la mesure de la première oreille par symétrie. Le jeu de fonctions HRIR peut par exemple être mesuré sur un sujet en positionnant des microphones à l'entrée de son conduit auditif. En variante, ce jeu de HRIR peut aussi être calculé par des méthodes de simulation numérique (modélisation de la morphologie du sujet ou calcul par réseau de neurones artificiels) ou encore avoir fait l'objet d'un traitement choisi (réduction du nombre d'échantillons, correction de la phase, ou autre).
Il est possible dans cette étape a) d'extraire les retards des HRIR, de les stocker puis de les rajouter au moment de l'encodage spatial, les étapes b) et c) restant inchangées. Cette réalisation sera décrite en détail en référence notamment à la
It is possible in this step a) to extract the delays from the HRIRs, to store them and then to add them at the time of the spatial encoding, the steps b) and c) remaining unchanged. This achievement will be described in detail with particular reference to the
Cette première étape a) porte la référence E0 sur la
Pour la mise en oeuvre de l'étape b), si l'on cherche à obtenir des filtres optimisés d'une part, il faut fixer les fonctions d'encodage spatial g(θ,ϕ,n) (ou g(θ,ϕ,n,f)) et, pour obtenir des fonctions spatiales optimisées d'autre part, il faut fixer les filtres de décodage notés F(t,n).
Néanmoins, il peut être prévu d'optimiser conjointement, à la fois les filtres et les fonctions spatiales, comme indiqué ci-dessus.For the implementation of step b), if one seeks to obtain optimized filters on the one hand, it is necessary to set the spatial encoding functions g ( θ, φ, n ) (or g ( θ, φ, n, f )) and, to obtain optimized spatial functions, the decoding filters denoted F ( t, n ) must be fixed .
Nevertheless, it can be planned to jointly optimize both the filters and the spatial functions, as indicated above.
Le choix d'une optimisation des fonctions spatiales ou d'une optimisation des filtres de décodage peut dépendre de divers contextes d'application.
Si les fonctions d'encodage spatial sont fixées, elles sont alors reproductibles et universelles et l'individualisation des filtres se fait simplement au décodage.
Par ailleurs, les fonctions d'encodage spatial, lorsqu'elles comportent un grand nombre de zéros parmi n canaux d'encodage comme dans le second mode de réalisation décrit plus loin, permettent de limiter le nombre d'opérations lors de l'encodage. Les lois de panoramique d'intensité ("pan pot") entre des haut-parleurs virtuels en deux dimensions et leurs extensions en trois dimensions peuvent être représentées par des fonctions d'encodage comportant seulement deux gains non nuls, au plus, pour deux dimensions et trois gains non nuls pour trois dimensions, pour une seule source donnée. Le nombre de gains non nuls est, bien entendu, indépendant du nombre de canaux et, surtout, les gains nuls permettent d'alléger les calculs d'encodage.The choice of an optimization of the spatial functions or an optimization of the decoding filters can depend on various contexts of application.
If the spatial encoding functions are fixed, they are then reproducible and universal and the individualization of the filters is simply decoding.
Moreover, the spatial encoding functions, when they comprise a large number of zeros among n encoding channels as in the second embodiment described below, make it possible to limit the number of operations during encoding. Panoramic intensity of Laws ( "pan pot") between virtual speakers in two dimensions and their extensions in three dimensions can be represented by encoding functions with only two non-zero earnings at most two dimensions and three non-zero gains for three dimensions, for a single given source. The number of non-zero winnings is, of course, independent of the number of channels and, most importantly, the zero winnings make it possible to lighten the encoding calculations.
Quant aux fonctions d'encodage proprement dites, plusieurs choix s'offrent encore.
Les fonctions spatiales du type harmoniques sphériques en contexte ambiophonique ont des qualités mathématiques qui permettent de faire subir des transformations aux signaux encodés (par exemple des rotations du champ sonore). De plus, de telles fonctions assurent une compatibilité entre le décodage binaural et des enregistrements ambiophoniques basés sur une décomposition du champ sonore en harmoniques sphériques.
Les fonctions d'encodage peuvent être des fonctions de directivités réelles ou simulées de microphones afin de permettre une écoute d'enregistrements en binaural multicanal. Les fonctions d'encodage peuvent être quelconques (non universelles) et déterminées par une méthode quelconque, le rendu devant alors être optimisé lors d'étapes subséquentes du procédé au sens de l'invention.
Les fonctions spatiales peuvent aussi bien être fonction du temps ou de la fréquence. L'optimisation se fera alors en tenant compte de cette dépendance (par exemple en optimisant de manière indépendante chaque échantillon temporel ou fréquentiel).As for the encoding functions themselves, several choices are still available.
Spherical harmonic space functions in ambiophonic context have mathematical qualities that make it possible to transform the encoded signals (for example rotations of the sound field). In addition, such functions provide compatibility between binaural decoding and surround sound recordings based on a decomposition of the sound field into spherical harmonics.
The encoding functions may be real or simulated directivity functions of microphones to allow listening of binaural multichannel recordings. The encoding functions can be arbitrary (non-universal) and determined by any method, the rendering then having to be optimized during subsequent steps of the method within the meaning of the invention.
Spatial functions may also be a function of time or frequency. The optimization will be done taking into account this dependence (for example by optimizing independently each time sample or frequency).
Pour ce qui concerne les filtres de décodage, ces derniers peuvent être fixés de manière à ce que le décodage puisse être universel.
Les filtres de décodage peuvent être choisis aussi de manière à réduire le coût en ressources qu'implique le filtrage. Par exemple, l'utilisation de filtres dits "à réponse impulsionnelle infinie" ou "IIR" est avantageuse.
Les filtres de décodages peuvent aussi être choisis selon un critère psychoacoustique, par exemple construit à partir de bandes de Bark normalisées.
De manière plus générale, les filtres de décodage peuvent être déterminés par une méthode quelconque. Le rendu, notamment pour un auditeur individuel, peut alors être optimisé lors d'étapes suivantes du procédé portant sur les fonctions d'encodage.As far as the decoding filters are concerned, these can be fixed so that the decoding can be universal.
Decoding filters can also be chosen to reduce the resource cost of filtering. For example, the use of filters called "infinite impulse response" or "IIR" is advantageous.
The decoding filters can also be chosen according to a psychoacoustic criterion, for example constructed from standardized Bark bands.
More generally, the decoding filters can be determined by any method. The rendering, in particular for an individual listener, can then be optimized during the next steps of the method relating to the encoding functions.
Cette deuxième étape b) relative au calcul d'une solution initiale S0 porte la référence E1 sur la
Par exemple, dans le cas où les fonctions spatiales fixées sont des fonctions définissant les lois de panoramique d'intensité (" pan pot ") entre des haut-parleurs virtuels, les filtres de la solution de départ S0 à l'étape E1 peuvent être directement les fonctions HRIR données aux positions correspondantes des haut-parleurs virtuels.For example, in the case where the fixed spatial functions are functions defining the intensity panning laws (" pan pot ") between virtual loudspeakers, the filters of the starting solution S0 at step E1 can be directly the HRIR functions given to the corresponding positions of the virtual speakers.
Dans cet exemple, il peut être prévu aussi d'optimiser conjointement les filtres de décodage et les gains d'encodage, la solution de départ S0 étant encore déterminée par des fonctions définissant les lois de panoramique d'intensité (" pan pot ") en tant que fonctions d'encodage et par les fonctions HRIR, elles-mêmes, données aux positions des haut-parleurs virtuels, en tant que filtres de décodage.In this example, it may be provided also to jointly optimize the decoding and encoding filter gains, S0 starting solution being further determined by the functions defining the intensity panning laws ( "pan pot") in as encoding functions and by the HRIR functions, themselves, given to the positions of the virtual loudspeakers, as decoding filters.
Dans un autre exemple où les fonctions d'encodage spatiales sont fixées comme étant des harmoniques sphériques, on calcule les filtres de décodage à l'étape E1 à partir de la pseudo-inverse, pour déterminer la solution de départ S0.In another example where the spatial encoding functions are set as spherical harmonics, the decoding filters in step E1 are calculated from the pseudo-inverse to determine the starting solution S0.
Plus généralement, la solution de départ S0 à l'étape E1 peut être calculée à partir de la solution aux moindres carrés :
Il convient de préciser ici que les éléments F, HRIR et g sont des matrices. En outre, la notation g -1 désigne la pseudo-inverse de la matrice de gain g selon l'expression : g -1 = pinv(g) = g T .(g.g T)-1, la notation g T désignant la transposée de la matrice g.It should be specified here that the elements F, HRIR and g are matrices. In addition, the notation g -1 denotes the pseudo-inverse of the gain matrix g according to the expression: g - 1 = pinv (g) = g T. ( g, g T ) -1 , the notation g T denoting the transpose of the matrix g .
De manière générale encore, la solution de départ S0 peut être quelconque (aléatoire ou fixée), l'essentiel étant qu'elle mène à l'obtention d'une solution convergée SC à l'étape E6 de la
La
A l'étape E2, la reconstruction du jeu de fonctions HRIR donne alors un jeu reconstruit HRIR* = gF différent du jeu original, à la première itération.In step E2, the reconstruction of the set of functions HRIR then gives a reconstructed set HRIR * = gF different from the original set, at the first iteration.
A l'étape E3, le calcul d'une fonction d'erreur est un point important de la méthode d'optimisation au sens de l'invention. Une fonction d'erreur proposée consiste à minimiser simplement la différence de modules entre la transformée de Fourier HRTF* du jeu de fonctions HRIR reconstruite et la transformée de Fourier HRTF du jeu de fonctions HRIR original (donné à l'étape E0). Cette fonction d'erreur, notée c, s'écrit :
D'autres fonctions d'erreur permettent aussi un rendu spatial optimal. Par exemple, il est possible de pondérer les fonctions HRIR par un gain qui dépend de la position des fonctions HRIR afin de mieux reconstruire certaines positions privilégiées de l'espace, ce qui s'écrit :
De la même façon, il est aussi possible de pondérer les fonctions HRIR en fonction du temps ou de la fréquence.In the same way, it is also possible to weight the HRIR functions as a function of time or frequency.
La fonction d'erreur peut aussi minimiser la différence d'énergie entre les modules, soit :
De manière générale, on retiendra que toute fonction d'erreur calculée entièrement ou en partie à partir des fonctions HRIR peut être prévue (module, phase, retard ou ITD estimé, différences interaurales, ou autre).
Par ailleurs, si le critère d'erreur porte sur les échantillons fréquentiels des fonctions HRTF, indépendamment les uns des autres contrairement à ce qui était proposé ci-avant (somme sur toutes les fréquences pour le calcul de la fonction d'erreur c), les itérations d'optimisation peuvent être appliquées successivement à chaque échantillon fréquentiel, avec l'avantage de réduire alors le nombre de variables simultanées, d'avoir une fonction d'erreur propre à chaque fréquence f et de rencontrer un critère d'arrêt en fonction de la convergence propre à chaque fréquence.In general, it will be remembered that any error function calculated entirely or in part from the HRIR functions can be provided (module, phase, delay or estimated ITD, interaural differences, or other).
Moreover, if the error criterion relates to the frequency samples of the HRTF functions, independently of each other contrary to what was proposed above (sum over all the frequencies for the calculation of the error function c), the optimization iterations can be successively applied to each frequency sample, with the advantage of then reducing the number of simultaneous variables, to have an error function specific to each frequency f and to meet a stopping criterion as a function of the convergence specific to each frequency.
L'étape T4 est un test pour arrêter ou non l'itération de l'optimisation en fonction d'un critère d'arrêt choisi. Il peut s'agir d'un critère caractérisant le fait que :
- la variable c a atteint une valeur minimale ε, et/ou que
- la variable c ne décroît plus suffisamment, et/ou que
- un nombre maximal d'itérations est atteint, et/ou que
- les modifications des filtres ne sont plus suffisantes, ou autre.
- the CA variable reaches a minimum value ε , and / or that
- variable c does not decrease enough, and / or
- a maximum number of iterations is reached, and / or that
- filter changes are no longer sufficient, or else.
Si le critère est atteint (flèche 0 en sortie du test T4), les filtres F(n,t) ou les gains g(θ,ϕ,n) ou les couples filtre/gains calculés permettent d'obtenir un rendu spatial optimal, comme on le verra notamment en référence à la
Si le critère n'est pas atteint (flèche N en sortie du test T4), selon la fonction d'erreur utilisée, il est difficile de connaître de manière analytique quelle doit être l'évolution des filtres F ou des gains g afin de minimiser l'erreur c. On a avantageusement recours à un calcul de gradient pour ajuster les filtres et/ou les gains afin qu'ils mènent à une réduction de la fonction d'erreur c (étapes itératives E5).If the criterion is not reached (arrow N at the output of the test T4), according to the error function used, it is difficult to know analytically what the evolution of the filters F or gains g should be in order to minimize the error c . Advantageously, a gradient calculation is used to adjust the filters and / or the gains so that they lead to a reduction of the error function c (iterative steps E5).
Ce traitement est avantageusement assisté par informatique. Une fonction dénommée "fminunc" du module "optimization Toolbox" du logiciel Matlab®, programmée de façon appropriée, permet de réaliser les étapes E2, E3, T4, E5, E6 décrites ci-avant en référence à la
Bien entendu, cette réalisation illustrée sur la
On décrit ci-après un exemple d'optimisation des filtres de décodage d'un contenu issu d'un encodage spatial par des fonctions harmoniques sphériques en contexte ambiophonique d'ordre élevé (ou "high order ambisonic"), pour une restitution vers du binaural. Il s'agit ici d'un cas sensible car si des sources ont été enregistrées ou encodées en contexte ambiophonique, les retards interauraux doivent être respectés dans le traitement au décodage, par application des filtres de décodage.An example of optimization of the decoding filters of a content resulting from a spatial encoding by spherical harmonic functions in a high order ambiophonic context (or "high order ambisonic") is described below, for a restitution to binaural. This is a sensitive case because if sources have been recorded or encoded in surround context, the interaural delays must be respected in the decoding processing, by application of the decoding filters.
Dans la mise en oeuvre de l'invention exposée ci-après à titre d'exemple, on a choisi de se limiter au cas de deux dimensions et on cherche alors à fournir des filtres optimisés afin de décoder un contenu ambiophonique à l'ordre 2 (cinq canaux ambiophoniques) pour une écoute binaurale sur casque à oreillettes.In the implementation of the invention described hereinafter by way of example, we have chosen to limit ourselves to the case of two dimensions and we then seek to provide optimized filters in order to decode an ambiophonic content in the order 2 (five surround channels) for binaural listening on headphones.
Pour la réalisation de la première étape a) du procédé général décrit ci-avant (référence E0 de la
En variante de mesures à effectuer sur un individu, on peut obtenir les fonctions HRIR à partir de bases de données standard ("tête de Kemar") ou par modélisation de la morphologie de l'individu, ou autre.For the realization of the first step a) of the general method described above (reference E0 of the
As a variant of measurements to be carried out on an individual, the HRIR functions can be obtained from standard databases ("Kemar head") or by modeling the morphology of the individual, or the like.
Les fonctions spatiales d'encodage choisies ici sont les harmoniques sphériques calculées à partir des fonctions cos( mθ ) et sin( mθ ), avec des fréquences angulaires croissantes m=0,1,2,..., N pour caractériser la dépendance en azimut (comme illustré sur la
La solution de départ S0 pour l'étape E1 est donnée par calcul de la pseudo-inverse (avec résolution linéaire). Cette solution de départ constitue la solution de décodage qui était proposée en tant que telle dans le document
Les
- à l'issue de la première étape E1 (solution de départ S0 obtenue par résolution linéaire par calcul de la pseudo-inverse),
- après quelques itérations E5 (solution intermédiaire SI),
- à l'issue de la dernière étape de traitement E6 (solution convergée SC).
- at the end of the first step E1 (starting solution S0 obtained by linear resolution by calculation of the pseudo-inverse),
- after a few iterations E5 (intermediate solution SI),
- at the end of the last processing step E6 (converged solution SC).
Pour la solution de départ qui constituait pourtant la solution de décodage au sens du document
On décrit ci-après un exemple d'optimisation des filtres de décodage pour des fonctions spatiales issues de lois de panoramique d'intensité ("pan pot"), consistant en termes simples en des règles de mixage.An example of optimization of the decoding filters for spatial functions resulting from pan-pan laws , consisting of simple terms in mixing rules, is described below.
Les lois de panoramique (dites "de panning") sont couramment employées par les techniciens du son pour produire des contenus audio, notamment des contenus multicanaux aux formats dits "surround" qui sont utilisés en restitution sonore 5.1, 6.1 , ou autre. Dans ce second mode de réalisation, on cherche à calculer les filtres qui permettent de restituer un contenu "surround" sur un casque. Dans ce cas, l'encodage par des lois de panning est réalisé par mixage d'une ambiance sonore selon un format "surround" (pistes 5.1 d'un enregistrement numérique par exemple). Les filtres optimisés à partir des mêmes lois de panning permettent alors d'obtenir un décodage binaural optimal pour le rendu souhaité avec cet effet "surround".Panoramic laws (called "panning") are commonly used by sound technicians to produce audio content, including multichannel content in so-called "surround" formats that are used in sound reproduction 5.1, 6.1, or other. In this second embodiment, we seek to calculate the filters that can render content "surround" on a headset. In this case, encoding by panning laws is achieved by mixing a sound environment in a "surround" format (tracks 5.1 of a digital recording for example). Optimized filters from the same panning laws then allow for optimal binaural decoding for the desired rendering with this "surround" effect.
La présente invention s'applique avantageusement au cas où les positions des haut-parleurs virtuels correspondent à des positions d'un système de restitution multicanal grand public, à effet "surround". Les filtres de décodage optimisés permettent alors un décodage de contenus multimédias grand public (typiquement des contenus multicanaux avec effet "surround") pour une restitution sur deux haut-parleurs, par exemple sur casque en binaural. Cette restitution en binaural d'un contenu qui est par exemple initialement au format 5.1 est optimisée grâce à la mise en oeuvre de l'invention.The present invention is advantageously applicable in the case where the positions of the virtual speakers correspond to positions of a multichannel rendering system for the general public, with "surround" effect. The optimized decoding filters then allow decoding of multimedia consumer content (typically multi-channel content with "surround" effect) for playback on two speakers, for example on binaural headphones. This binaural reproduction of a content that is for example initially in 5.1 format is optimized thanks to the implementation of the invention.
Ci-après, on décrit le cas d'un exemple de dix haut-parleurs virtuels, "disposés" autour de l'auditeur.
On obtient tout d'abord les fonctions HRIR en 64 positions autour de l'auditeur, comme décrit en référence au premier mode de réalisation ci-avant.Hereinafter, we describe the case of an example of ten virtual speakers, "arranged" around the listener.
First, the HRIR functions are obtained at 64 positions around the listener, as described with reference to the first embodiment above.
On détermine dans ce second mode de réalisation les fonctions spatiales données par les lois de panoramique d'intensité ou "panning" (ici en tangente) entre chaque couple de haut-parleurs adjacents, par une relation du type :
- L est le gain du haut-parleur de gauche,
- R est le gain du haut-parleur de droite,
- u est l'angle entre les haut-parleurs (360/10=36° dans cet exemple, comme illustré sur la
figure 5 ), - θv est l'angle pour lequel on souhaite calculer les gains (typiquement l'angle entre le plan de symétrie des deux haut-parleurs et la direction souhaitée).
- L is the gain of the left speaker,
- R is the gain of the right speaker,
- u is the angle between the loudspeakers (360/10 = 36 ° in this example, as shown in
figure 5 ) - θ v is the angle for which it is desired to calculate the gains (typically the angle between the plane of symmetry of the two loudspeakers and the desired direction).
Les formes des dix fonctions spatiales retenues en fonction de l'azimut sont données sur la
La méthode d'optimisation utilisée dans le second mode de réalisation est encore celle du gradient. La solution de départ S0 à l'étape E1 est donnée par les dix filtres de décodage qui correspondent aux dix fonctions HRIR données aux positions des haut-parleurs virtuels. Les fonctions spatiales fixées sont les fonctions d'encodage représentant les lois de panning. La fonction d'erreur c est basée sur le module de la transformée de Fourier des fonctions HRIR, soit :
On se réfère maintenant à la
- la solution de départ pseudo-inverse, sans optimisation (représentée en traits pointillés),
- la solution de départ donnée par la méthode plus adaptée des haut-parleurs virtuels (représentée en traits interrompus longs),
- et la solution optimisée convergente après quelques itérations, au sens de l'invention (représentée en traits mixtes).
- the pseudo-inverse starting solution, without optimization (represented by dashed lines),
- the starting solution given by the more suitable method of the virtual loudspeakers (represented in long broken lines),
- and the convergent optimized solution after a few iterations, within the meaning of the invention (shown in phantom).
La solution optimisée au sens de l'invention concorde parfaitement avec la fonction originale, ce qui s'explique par le fait que la fonction d'erreur c proposée ici vise à réduire au maximum l'erreur sur le module de la fonction.The optimized solution within the meaning of the invention is perfectly consistent with the original function, which is explained by the fact that the error function c proposed here aims to minimize the error on the module of the function.
La
L'optimisation du procédé au sens de l'invention permet donc de reconstruire à la fois le module des fonctions HRTF et le retard de groupe ITD entre les deux oreilles.The optimization of the method in the sense of the invention therefore makes it possible to reconstruct both the HRTF function module and the ITD group delay between the two ears.
De plus, il est apparu dans ce second mode de réalisation que la qualité des filtres reconstruits n'est pas affectée par le choix des fonctions d'encodage. De ce fait, il est possible d'utiliser des fonctions spatiales d'encodage quelconque, par exemple comportant avantageusement beaucoup de zéros, comme dans cet exemple de réalisation, ce qui permet de réduire d'autant les ressources nécessaires au calcul de l'encodage.In addition, it appeared in this second embodiment that the quality of the reconstructed filters is not affected by the choice of encoding functions. As a result, it is possible to use arbitrary spatial encoding functions, for example advantageously comprising many zeros, as in this exemplary embodiment, which makes it possible to reduce by the same amount the resources needed to calculate the encoding. .
L'objet de cette partie de la description est d'apprécier le gain en termes de nombre d'opérations et de ressources en mémoire nécessaires pour la mise en oeuvre de l'encodage et du décodage binaural multicanal au sens de l'invention, avec des filtres de décodage qui incluent la prise en compte du retard.The purpose of this part of the description is to appreciate the gain in terms of the number of operations and memory resources required for the implementation of multi-channel binaural encoding and decoding within the meaning of the invention, with decoding filters that include taking into account the delay.
Le cas traité dans l'exemple décrit ici est celui de deux sources spatialement distinctes à encoder en multicanal et à restituer en binaural. Les deux exemples de mise en oeuvre des
L'exemple donné à la
La réalisation de la
Dans l'exemple de la
Sur les
Sur la
Sur la
On the
On the
Pour la partie décodage de la
Enfin, on note L et R les canaux binauraux gauche et droit.For the decoding part of the
Finally, L and R are the left and right binaural channels.
Dans la mise en oeuvre de la
La
Ainsi, le fait de ne pas avoir à prendre en compte les retards interauraux à l'encodage permet de réduire le nombre de canaux à n (et non plus 2n). L'utilisation de la symétrie des filtres de décodage permet en outre, dans la mise en oeuvre de la
On indique que cette mise en oeuvre de la
Le traitement au décodage de la
Ainsi, alors que la solution illustrée sur la
- à l'encodage, la prise en compte de deux retards, des multiplications par 4n gains et 2n sommes, et
- au décodage 2n filtrages et 2n sommes,
- 2n gains et n sommes à l'encodage, et
- n filtrages, n sommes et simplement une somme et une différence globale, au décodage.
- the encoding, the taking into account of two delays, multiplications by 4n gains and 2n sums, and
- at
decoding 2n filtering and 2n sums,
- 2n gains and n are at the encoding, and
- n filtering, n is and simply a sum and a global difference, to the decoding.
Par ailleurs, même si le stockage en mémoire requiert, pour les deux solutions, les mêmes capacités (stockage de n filtres en calculant les retards et les gains à la volée), la mémoire de travail utile (tampon) pour la mise en oeuvre de la
La présente invention vise alors un système de spatialisation sonore avec un encodage multicanal et pour une restitution sur deux canaux comprenant un bloc d'encodage spatial ENCOD défini par des fonctions d'encodage associées à une pluralité de canaux d'encodage et un bloc de décodage DECOD par application de filtres pour une restitution en contexte binaural. En particulier, les fonctions d'encodage spatial et/ou les filtres de décodage sont déterminés par la mise en oeuvre du procédé décrit ci-avant. Un tel système peut correspondre à celui illustré sur la
Une autre réalisation avantageuse consiste en la mise en oeuvre du procédé selon le second mode de réalisation pour construire alors un système de spatialisation avec un bloc d'encodage direct, sans application de retard, de manière à réduire un nombre de canaux d'encodage et un nombre correspondant de filtres de décodage, lesquels incluent directement les retards interauraux ITD, selon un avantage qu'offre la mise en oeuvre de l'invention, comme illustré sur la
Cette réalisation de la
La présente invention vise aussi un programme informatique comportant des instructions pour la mise en oeuvre du procédé décrit ci-avant et dont l'algorithme peut être illustré par un organigramme général du type représenté sur la
Claims (14)
- Method of sound spatialization with a multichannel encoding and for reproduction on two loudspeakers, comprising a spatial encoding defined by encoding functions associated with a plurality of encoding channels and a decoding by applying filters for reproduction in a binaural context on the two loudspeakers, characterized in that it comprises the steps:a) obtaining an original suite of acoustic transfer functions specific to an individual's morphology (HRIR;HRTF),b) fixing spatial encoding functions (g(,ϕ,n,f)) and/or decoding filters (F(t,n)), andc) through successive iterations, optimizing the filters associated with the chosen encoding functions or the encoding functions associated with the chosen filters, or jointly the chosen filters and encoding functions, by minimizing an error (c(HRIR, HRIR*)) calculated as a function of a comparison between:- the original suite of transfer functions (HRIR), and- a suite of transfer functions reconstructed (HRIR*) on the basis of the encoding functions and the decoding filters, optimized and/or chosen, characterized in that the comparison of step c) is calculated by differences between respective moduli of the original (HRTF(p,f)) and reconstructed (HRTF * (p, f)) transfer functions, expressed in the frequency domain, for each position in space associated with a transfer function.
- Method according to Claim 1, characterized in that the reconstructed suite of transfer functions (HRIR*) is calculated by multiplying the filters by the encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) at each iteration.
- Method according to Claim 2, characterized in that in step b) spatial encoding functions are chosen which represent intensity panning laws based on virtual loudspeaker positions.
- Method according to Claim 3, characterized in that the positions of the virtual loudspeakers correspond to positions of a multichannel reproduction system with "surround" effect, the optimized decoding filters allowing a decoding of multichannel multimedia contents with "surround" effect for reproduction on two loudspeakers.
- Method according to one of Claims 3 and 4, characterized in that the encoding functions comprise a plurality of zero gains to be associated with encoding channels.
- Method according to one of the preceding claims, characterized in that interaural delay information is extracted, on the basis of the transfer functions (HRIR, HRTF) obtained in step a), while the optimization of the encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) and/or of the decoding filters is conducted (F(t,n)) on the basis of transfer functions from which said delay information has been extracted, said delay information being applied subsequently, on encoding.
- Method according to one of Claims 1 to 5, characterized in that interaural delay information is taken into account in the optimization of the decoding filters (F(t,n)), and in that the spatial encoding is conducted without delay application (ITD).
- Method according to one of the preceding claims, characterized in that, in step b), some at least of the transfer functions obtained (HRTF) are chosen as decoding filters.
- Method according to Claim 2, characterized in that in step b) spatial encoding functions (g(θ,ϕ,n), g(θ,ϕ,n,f)) of the spherical harmonic type in an ambiophonic context are chosen.
- Method according to one of Claims 1, 2, 3, 4, 5, 6, 7 and 9, characterized in that, for the first optimization iteration, the decoding filters (F(t,n)) are calculated by a solution of the pseudo-inverse type.
- Method according to Claim 1, characterized in that each difference is weighted as a function of a given direction in space so as to favor certain of said directions.
- Computer program for determining encoding functions (g(θ,ϕ,n)) and/or decoding filters (F(t,n)), for a sound spatialization processing with a multichannel spatial encoding and a decoding for binaural reproduction on two loudspeakers, characterized in that it comprises instructions for executing the method according to one of the preceding claims.
- Sound spatialization system with a multichannel encoding and for reproduction on two loudspeakers, comprising a spatial encoding block (ENCOD) defined by encoding functions associated with a plurality of encoding channels and a block for decoding (DECOD) by applying filters for reproduction in a binaural context on two loudspeakers, characterized in that the system is adapted for implementing the method according to one of Claims 1 to 11.
- System according to Claim 13, characterized in that the spatial encoding functions and/or the decoding filters are determined by implementing the method according to Claim 7,
and in that it comprises a direct encoding block without delay application so as to reduce a number of encoding channels and a corresponding number of decoding filters.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0602098 | 2006-03-09 | ||
PCT/FR2007/050867 WO2007101958A2 (en) | 2006-03-09 | 2007-03-01 | Optimization of binaural sound spatialization based on multichannel encoding |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1992198A2 EP1992198A2 (en) | 2008-11-19 |
EP1992198B1 true EP1992198B1 (en) | 2016-07-20 |
Family
ID=37452726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07731684.2A Active EP1992198B1 (en) | 2006-03-09 | 2007-03-01 | Optimization of binaural sound spatialization based on multichannel encoding |
Country Status (3)
Country | Link |
---|---|
US (1) | US9215544B2 (en) |
EP (1) | EP1992198B1 (en) |
WO (1) | WO2007101958A2 (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2343723B2 (en) * | 2009-02-05 | 2011-05-18 | Universidad De Vigo | SYSTEM FOR THE EXPLORATION OF VIRTUAL AND REAL ENVIRONMENTS THROUGH VECTOR ACOUSTIC SPACES. |
KR20120004909A (en) * | 2010-07-07 | 2012-01-13 | 삼성전자주식회사 | Method and apparatus for 3d sound reproducing |
EP2645748A1 (en) | 2012-03-28 | 2013-10-02 | Thomson Licensing | Method and apparatus for decoding stereo loudspeaker signals from a higher-order Ambisonics audio signal |
GB201211512D0 (en) * | 2012-06-28 | 2012-08-08 | Provost Fellows Foundation Scholars And The Other Members Of Board Of The | Method and apparatus for generating an audio output comprising spartial information |
US20140081627A1 (en) * | 2012-09-14 | 2014-03-20 | Quickfilter Technologies, Llc | Method for optimization of multiple psychoacoustic effects |
US9736609B2 (en) * | 2013-02-07 | 2017-08-15 | Qualcomm Incorporated | Determining renderers for spherical harmonic coefficients |
CN104982042B (en) | 2013-04-19 | 2018-06-08 | 韩国电子通信研究院 | Multi channel audio signal processing unit and method |
CN108806704B (en) | 2013-04-19 | 2023-06-06 | 韩国电子通信研究院 | Multi-channel audio signal processing device and method |
US9319819B2 (en) | 2013-07-25 | 2016-04-19 | Etri | Binaural rendering method and apparatus for decoding multi channel audio |
US10382880B2 (en) | 2014-01-03 | 2019-08-13 | Dolby Laboratories Licensing Corporation | Methods and systems for designing and applying numerically optimized binaural room impulse responses |
EA202090186A3 (en) | 2015-10-09 | 2020-12-30 | Долби Интернешнл Аб | AUDIO ENCODING AND DECODING USING REPRESENTATION CONVERSION PARAMETERS |
US10142755B2 (en) * | 2016-02-18 | 2018-11-27 | Google Llc | Signal processing methods and systems for rendering audio on virtual loudspeaker arrays |
US10325610B2 (en) | 2016-03-30 | 2019-06-18 | Microsoft Technology Licensing, Llc | Adaptive audio rendering |
US10492018B1 (en) * | 2016-10-11 | 2019-11-26 | Google Llc | Symmetric binaural rendering for high-order ambisonics |
US9992602B1 (en) | 2017-01-12 | 2018-06-05 | Google Llc | Decoupled binaural rendering |
US10009704B1 (en) | 2017-01-30 | 2018-06-26 | Google Llc | Symmetric spherical harmonic HRTF rendering |
US10158963B2 (en) | 2017-01-30 | 2018-12-18 | Google Llc | Ambisonic audio with non-head tracked stereo based on head position and time |
US10764684B1 (en) | 2017-09-29 | 2020-09-01 | Katherine A. Franco | Binaural audio using an arbitrarily shaped microphone array |
DK180449B1 (en) * | 2019-10-05 | 2021-04-29 | Idun Aps | A method and system for real-time implementation of head-related transfer functions |
US11363402B2 (en) | 2019-12-30 | 2022-06-14 | Comhear Inc. | Method for providing a spatialized soundfield |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006075077A2 (en) * | 2005-01-10 | 2006-07-20 | France Telecom | Method and device for individualizing hrtfs by modeling |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3264489B2 (en) * | 1988-07-08 | 2002-03-11 | アダプティブ オーディオ リミテッド | Sound reproduction device |
JPH08502867A (en) | 1992-10-29 | 1996-03-26 | ウィスコンシン アラムニ リサーチ ファンデーション | Method and device for producing directional sound |
GB9417185D0 (en) * | 1994-08-25 | 1994-10-12 | Adaptive Audio Ltd | Sounds recording and reproduction systems |
US5596644A (en) * | 1994-10-27 | 1997-01-21 | Aureal Semiconductor Inc. | Method and apparatus for efficient presentation of high-quality three-dimensional audio |
US6181800B1 (en) | 1997-03-10 | 2001-01-30 | Advanced Micro Devices, Inc. | System and method for interactive approximation of a head transfer function |
AU6400699A (en) * | 1998-09-25 | 2000-04-17 | Creative Technology Ltd | Method and apparatus for three-dimensional audio display |
US7231054B1 (en) * | 1999-09-24 | 2007-06-12 | Creative Technology Ltd | Method and apparatus for three-dimensional audio display |
WO2007048900A1 (en) * | 2005-10-27 | 2007-05-03 | France Telecom | Hrtfs individualisation by a finite element modelling coupled with a revise model |
-
2007
- 2007-03-01 EP EP07731684.2A patent/EP1992198B1/en active Active
- 2007-03-01 WO PCT/FR2007/050867 patent/WO2007101958A2/en active Application Filing
- 2007-03-01 US US12/224,840 patent/US9215544B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006075077A2 (en) * | 2005-01-10 | 2006-07-20 | France Telecom | Method and device for individualizing hrtfs by modeling |
Also Published As
Publication number | Publication date |
---|---|
WO2007101958A2 (en) | 2007-09-13 |
US20090067636A1 (en) | 2009-03-12 |
EP1992198A2 (en) | 2008-11-19 |
WO2007101958A3 (en) | 2007-11-01 |
US9215544B2 (en) | 2015-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1992198B1 (en) | Optimization of binaural sound spatialization based on multichannel encoding | |
EP1999998B1 (en) | Method for binaural synthesis taking into account a spatial effect | |
EP2000002B1 (en) | Method and device for efficient binaural sound spatialization in the transformed domain | |
EP2898707B1 (en) | Optimized calibration of a multi-loudspeaker sound restitution system | |
EP1836876B1 (en) | Method and device for individualizing hrtfs by modeling | |
EP1600042B1 (en) | Method for the treatment of compressed sound data for spatialization | |
EP2374124B1 (en) | Advanced encoding of multi-channel digital audio signals | |
EP2901718B1 (en) | Method and system for playing back an audio signal | |
EP2042001B1 (en) | Binaural spatialization of compression-encoded sound data | |
EP2304721B1 (en) | Spatial synthesis of multichannel audio signals | |
EP2920979B1 (en) | Acquisition of spatialised sound data | |
EP1586220B1 (en) | Method and device for controlling a reproduction unit using a multi-channel signal | |
FR2776461A1 (en) | METHOD FOR IMPROVING THREE-DIMENSIONAL SOUND REPRODUCTION | |
EP3559947B1 (en) | Processing in sub-bands of an actual ambisonic content for improved decoding | |
EP3025514B1 (en) | Sound spatialization with room effect | |
FR3065137A1 (en) | SOUND SPATIALIZATION METHOD | |
EP1994526B1 (en) | Joint sound synthesis and spatialization | |
EP3058564B1 (en) | Sound spatialisation with reverberation, optimised in terms of complexity | |
WO2005015954A2 (en) | Method and device for processing audio data in an ambisonic context | |
EP3384688B1 (en) | Successive decompositions of audio filters | |
WO2024213554A1 (en) | Optimized processing for reducing channels of a stereophonic audio signal | |
EP3934282A1 (en) | Method for converting a first set of signals representing a sound field into a second set of signals and associated electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20080828 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ORANGE |
|
17Q | First examination report despatched |
Effective date: 20150608 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602007047069 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: H04S0001000000 Ipc: H04S0005000000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04S 5/00 20060101AFI20160121BHEP Ipc: H04S 1/00 20060101ALI20160121BHEP |
|
INTG | Intention to grant announced |
Effective date: 20160212 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: LANGUAGE OF EP DOCUMENT: FRENCH |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 814926 Country of ref document: AT Kind code of ref document: T Effective date: 20160815 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602007047069 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20160720 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 814926 Country of ref document: AT Kind code of ref document: T Effective date: 20160720 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161120 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 11 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161121 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161021 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602007047069 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161020 |
|
26N | No opposition filed |
Effective date: 20170421 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170301 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 12 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170331 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170331 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170301 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20170331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20070301 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20160720 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20160720 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240220 Year of fee payment: 18 Ref country code: GB Payment date: 20240220 Year of fee payment: 18 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240220 Year of fee payment: 18 |