EP3475943B1

EP3475943B1 - Method for conversion and stereophonic encoding of a three-dimensional audio signal

Info

Publication number: EP3475943B1
Application number: EP17787331.2A
Authority: EP
Inventors: François BECKER; Benjamin Bernard
Original assignee: Coronal Encoding Sas
Current assignee: Coronal Encoding Sas
Priority date: 2016-09-30
Filing date: 2017-09-28
Publication date: 2021-12-01
Anticipated expiration: 2037-09-28
Also published as: CN109791768B; MC200186B1; US20200168235A1; WO2018059742A1; CN109791768A; EP3475943A1; US11232802B2

Description

CHAMP TECHNIQUETECHNICAL FIELD

La présente invention concerne une méthode et procédé de traitement du signal audio, et plus particulièrement d'un procédé de conversion et d'encodage stéréophonique d'un signal audio tridimensionnel.The present invention relates to a method and method for processing the audio signal, and more particularly to a method for converting and stereophonic encoding of a three-dimensional audio signal.

CONTEXTE ET ETAT DE L'ARTBACKGROUND AND STATE OF THE ART

La production, la transmission et la reproduction d'un signal audio tridimensionnel est une part importante de toute expérience d'immersion audiovisuelle, par exemple dans le contexte des présentations de contenus en réalité virtuelle, mais aussi lors du visionnage de contenus cinématographiques ou dans le cadre d'applications ludiques. Tout contenu audio tridimensionnel passe ainsi par une phase de production ou de captation, une phase de transmission ou de stockage, et une phase de reproduction.The production, transmission and reproduction of a three-dimensional audio signal is an important part of any audiovisual immersion experience, for example in the context of presentations of content in virtual reality, but also when viewing cinematographic content or in the framework of fun applications. Any three-dimensional audio content thus passes through a production or capture phase, a transmission or storage phase, and a reproduction phase.

La phase de production ou d'obtention du contenu peut être effectué par de nombreuses techniques très largement répandues et utilisées : captation stéréophonique, multicanale ou périphonique, ou bien synthèse de contenu à partir d'éléments séparés. Le contenu est alors représenté soit par un certain nombre de canaux séparés, ou sous forme d'un champ sonore périphonique (par exemple en format Ambisonics d'ordre 1 ou supérieur), ou bien encore sous forme d'objets sonores et d'informations spatiales séparées.The phase of producing or obtaining the content can be carried out by numerous techniques which are very widely used and used: stereophonic, multichannel or peripheral capture, or else synthesis of content from separate elements. The content is then represented either by a number of separate channels, or in the form of a peripheral sound field (for example in Ambisonics format of order 1 or higher), or even in the form of sound objects and information. separate spaces.

La phase de reproduction est également connue et largement répandue dans les domaines professionnels ou grand public : casques stéréophoniques ou bénéficiant d'un rendu binaural, dispositifs à enceintes stéréophoniques (bénéficiant optionnellement d'un traitement transaural), multicanales ou à disposition tridimensionnelle.The reproduction phase is also known and widely used in the professional or general public fields: stereo headphones or those benefiting from binaural rendering, devices with stereophonic speakers (optionally benefiting from transaural processing), multichannel or three-dimensional.

La phase de transmission peut être constituée d'une simple transmission canal par canal, ou bien d'une transmission des éléments séparés et des informations spatiales permettant de reconstituer le contenu, ou bien encore d'un encodage permettant, le plus souvent avec pertes, de décrire le contenu spatial du signal original. Il existe de nombreux procédés d'encodage audio permettant de conserver tout ou partie des informations spatiales présentes dans le signal original tridimensionnel.The transmission phase can consist of a simple channel-by-channel transmission, or else of a transmission of separate elements and spatial information making it possible to reconstitute the content, or even of an encoding allowing, most often with losses, describe the spatial content of the original signal. There are many audio encoding methods which make it possible to preserve all or part of the spatial information present in the original three-dimensional signal.

Peter Scheiber, à partir des années 1960, a été l'un des premiers à décrire un procédé de matriçage stéréophonique d'un champ surround planaire et a prévu dès lors d'utiliser ce qui porte depuis le nom de "sphère de Scheiber" comme outil de correspondance immédiate de la relation de magnitude (utilisé comme synonyme du terme amplitude) et de phase entre deux canaux et une position spatiale tridimensionnelle.Peter Scheiber, from the 1960s, was one of the first to describe a process of stereophonic mastering of a planar surround field and therefore planned to use what has since been called "Scheiber's sphere" as tool for immediate correspondence of the relation of magnitude (used as a synonym for the term amplitude) and phase between two channels and a three-dimensional spatial position.

Par exemple, Scheiber introduit dans « Analyzing Phase-Amplitude Matrices » (JAES, 1971 ) le concept de matriçage linéaire utilisant la différence de phase et d'amplitude pour encoder et décoder des positions spatiales, en deux ou trois dimensions, définit ce qui est désormais connu comme le « domaine intercanal » (c'est-à-dire le domaine à deux dimensions constitué des différences, entre les deux canaux, d'amplitude d'une part, et des différences de phase d'autre part) et en dévoile une implémentation dans US 3632886 . Cependant, à cause de la linéarité des opérations d'encodage et de décodage, les performances de séparation entre canaux sont alors limitées pour cette implémentation.For example, Scheiber introduced in "Analyzing Phase-Amplitude Matrices" (JAES, 1971 ) the concept of linear matrixing using the difference in phase and amplitude to encode and decode spatial positions, in two or three dimensions, defines what is now known as the "inter-channel domain" (that is, the two-dimensional domain made up of the differences, between the two channels, in amplitude on the one hand, and phase differences on the other hand) and reveals an implementation in US 3632886 . However, because of the linearity of the encoding and decoding operations, the separation performance between channels is then limited for this implementation.

Une analyse critique des systèmes de matriçage stéréophoniques de type 4-2-4 (c'est-à-dire 4 canaux originaux, matricés et transportés sur 2 canaux, puis décodés et reproduits sur 4 canaux) est fournie par Gerzon dans « Whither Four Channels » (Audio Annual, 1971 ). Dans « A Geometric Model for Two-Channel Four-Speaker Matrix Stereo System » (JAES, 1975), Gerzon étudie et propose plusieurs possibilités de matriçage 4-2-4, et décrit à nouveau les possibilités de description d'un champ tridimensionnel sur la sphère énergie (dont le principe est identique à la « sphère de Scheiber »), et donc de l'encodage tridimensionnel sur deux canaux. Cette dernière capacité est rappelée par Sommerwerck et Scheiber dans « The Threat of Dolby Surround » (MultiChannelSound, Vol.1, Nos.4/5, 1986 ).A critical analysis of 4-2-4 type stereophonic matrixing systems (i.e. 4 original channels, matrixed and transported on 2 channels, then decoded and reproduced on 4 channels) is provided by Gerzon in "Whither Four Channels" (Audio Annual, 1971 ). In "A Geometric Model for Two-Channel Four-Speaker Matrix Stereo System" (JAES, 1975), Gerzon studies and proposes several possibilities of 4-2-4 matrixing, and again describes the possibilities of description of a three-dimensional field on the energy sphere (whose principle is identical to the "Scheiber sphere"), and therefore of the three-dimensional encoding on two channels. This last ability is recalled by Sommerwerck and Scheiber in "The Threat of Dolby Surround" (MultiChannelSound, Vol.1, Nos.4 / 5, 1986 ).

Dans « A High-Performance Surround Sound Process for Home Video » et l'implémentation correspondante dévoilée dans US 4696036 , Julstrom utilise les concepts développés par Scheiber et Gerzon pour obtenir une amélioration de la séparation des signaux originaux dans des directions privilégiées correspondant à un placement de sept haut-parleurs dans le plan horizontal. Des techniques ayant un but similaire d'amélioration de la séparation sont présentées dans des publications postérieures comme US 4862502 , US 5136650 , ou encore WO 2002007481 .In "A High-Performance Surround Sound Process for Home Video" and the corresponding implementation disclosed in US 4,696,036 , Julstrom uses the concepts developed by Scheiber and Gerzon to achieve an improvement in the separation of the original signals in privileged directions corresponding to a placement of seven speakers in the horizontal plane. Techniques with a similar aim of improving separation are presented in later publications such as US 4862502 , US 5136650 , or WO 2002007481 .

En 1996, dans US 5136650 , Scheiber présente un système d'encodage hémisphérique sur deux canaux, qui applique ce principe dans le domaine temporel, d'une manière matricée analogue aux techniques matricées surround, et ajoutant une variable de décorrélation comme une dimension supplémentaire permettant de décrire la distance de la source sonore par rapport à l'origine de l'hémisphère ; ce décodeur est entre autres prévu pour alimenter les décodeurs à matrice alors disponibles sur le marché, la décorrélation empêche les dits décodeurs de déterminer une position unique pour la source, ce qui conduit à un étalement spatial lors du décodage. Le même brevet présente des décodeurs adaptés à l'encodeur, permettant une diffusion sur des transducteurs disposés selon un hémisphère.In 1996, in US 5136650 , Scheiber presents a two-channel hemispherical encoding system, which applies this principle in the time domain, in a matrixed manner analogous to matrix surround techniques, and adding a decorrelation variable as an additional dimension to describe the distance from the sound source relative to the origin of the hemisphere; this decoder is, among other things, designed to supply the matrix decoders then available on the market, the decorrelation prevents said decoders from determining a single position for the source, which leads to spatial spreading during decoding. The same patent presents decoders adapted to the encoder, allowing diffusion on transducers arranged in a hemisphere.

Il est connu depuis les décennies 1970 et 1980 que la transformée de Fourier court-terme, présentée par exemple dans Papoulis, « Signal Analysis » (McGraw Hill, 1977 pp.174-178 ), est un outil utile pour traiter le signal en bandes de fréquences distinctes. Par ailleurs les avantages de ce principe de transformation dans le domaine fréquentiel sont connus dans le contexte de la séparation de sources (laquelle nécessite une analyse spatiale du signal), par exemple dans Maher, « Evaluation of a Method for Separating Digitized Duet Signais » (JAES Volume 38 Issue 12 pp. 956-979; December 1990 ) puis dans Balan et al., "Statistical properties of STFT ratios for two channel systems and applications to blind source séparation" (Proc. ICA-BSS, 2000 ). Il est par ailleurs connu que d'autres types de transformées telles que la transformée en ondelettes complexes (CWT), la transformée en cosinus discrète modifiée (MDCT, utilisée dans les codecs MP3 ou Vorbis), ou encore la transformée à recouvrement complexe modulée (MCLT) peuvent avantageusement être utilisées dans le cadre de procédés de traitement du signal audionumérique. Ainsi une application directe du principe exposé par Peter Scheiber était rendue possible dans le domaine fréquentiel, mais comme nous allons l'exposer par la suite, à la connaissance de la phase près.It has been known since the 1970s and 1980s that the short-term Fourier transform, presented for example in Papoulis, "Signal Analysis" (McGraw Hill, 1977 pp. 174-178 ), is a useful tool for processing the signal in separate frequency bands. Moreover, the advantages of this principle of transformation in the frequency domain are known in the context of the separation of sources (which requires a spatial analysis of the signal), for example in Maher, “Evaluation of a Method for Separating Digitized Duet Signais” (JAES Volume 38 Issue 12 pp. 956-979; December 1990 ) then in Balan et al., "Statistical properties of STFT ratios for two channel systems and applications to blind source separation" (Proc. ICA-BSS, 2000 ). It is also known that other types of transforms such as the complex wavelet transform (CWT), the modified discrete cosine transform (MDCT, used in the MP3 or Vorbis codecs), or the complex modulated overlap transform ( MCLT) can advantageously be used in the context of digital audio signal processing methods. Thus a direct application of the principle set out by Peter Scheiber was made possible in the frequency domain, but as we will explain below, up to the knowledge of the phase.

Dans US 8712061, Jot et al . décrivent à nouveau les techniques de correspondance (mapping) entre la sphère de Scheiber (amplitude-phase) et les coordonnées de l'espace physique, optionnellement via une loi de panoramique surround ou périphonique qui est ensuite matricée de manière traditionnelle, et en présentent une implémentation dans le domaine fréquentiel, basée entre autres sur la nécessité d'avoir un entrée un signal directionnel et un signal « ambiant » non-directionnel. En supplément de cette dernière contrainte de décomposition du signal entrant, cette approche souffre, que ce soit lors de la phase d'encodage ou de décodage, d'un problème majeur de discontinuité de la représentation en phase : il existe une discontinuité spatiale de la phase avec une correspondance temporellement statique de la phase introduite par une « loi de panoramique » générique, introduisant des artefacts lorsqu'une source sonore est placée dans certaines directions de la sphère ou se déplace sur la sphère en effectuant certaines trajectoires. Comme il sera apparent dans la suite du présent document, la présente invention permet de résoudre ce problème de discontinuité, et ne nécessite pas de séparation du signal entrant en une partie ambiante et une partie directe.In US 8712061, Jot et al . describe again the techniques of correspondence (mapping) between the Scheiber sphere (amplitude-phase) and the coordinates of the physical space, optionally via a law of surround or peripheral panning which is then matrixed in a traditional way, and present one implementation in the frequency domain, based among other things on the need to have an input a directional signal and a non-directional "ambient" signal. In addition to this last constraint of decomposition of the incoming signal, this approach suffers, whether during the encoding or decoding phase, from a major problem of discontinuity of the representation in phase: there is a spatial discontinuity of the phase with a temporally static correspondence of the phase introduced by a generic “panning law”, introducing artefacts when a sound source is placed in certain directions of the sphere or moves on the sphere by performing certain trajectories. As will be apparent in the remainder of this document, the present invention makes it possible to solve this problem of discontinuity, and does not require separation of the incoming signal into an ambient part and a direct part.

Le décodeur matriciel présenté dans US 20080205676 par Merimaa et al . reprend les méthodes dévoilées dans US 5136650 dans le domaine fréquentiel. De même que dans les brevets précédents, la problématique de la discontinuité de phase n'est pas abordée.The matrix decoder presented in US 20080205676 by Merimaa et al . uses the methods unveiled in US 5136650 in the frequency domain. As in the previous patents, the problem of phase discontinuity is not addressed.

Dans WO 2009046223, Goodwin et al . décrivent un dispositif de conversion de format et de rendu binaural à partir d'un signal stéréophonique, qui s'appuie sur une décomposition source primaire / source ambiante similaire à celle dévoilée dans US 8712061 , et une analyse de direction de provenance utilisant les méthodes dévoilées par Scheiber dans US 5136650 . De même que dans les brevets précédents, la problématique de la discontinuité de phase n'est pas abordée.In WO 2009046223, Goodwin et al . describe a device for format conversion and binaural rendering from a stereophonic signal, which relies on a primary source / ambient source decomposition similar to that disclosed in US 8712061 , and a direction of provenance analysis using the methods disclosed by Scheiber in US 5136650 . As in the previous patents, the problem of phase discontinuity is not addressed.

Dans « A Spatial Extrapolation Method to Derive High-Order Ambisonics Data from Stereo Sources » (J. Inf. Hiding and Multimedia Sig. Proc, 2015), Trevino et al. proposent un système de décodage bidimensionnel (planaire) d'un champ HOA préalablement encodé sur un flux stéréophonique, toujours selon les principes de Scheiber. Les principaux problèmes rencontrés par les auteurs sont d'une part la présence d'une discontinuité de phase (pour des valeurs proches de π) et d'autres parts des instabilités aux positions extrêmes de panoramique stéréo, pour lequelles les métriques utilisées sont indéfinies. Dans « Enhancing Stereo Signals with High-Order Ambisonics Spatial Information », (IEICE, 2016 ), une méthode d'encodage permettant l'obtention du dit signal est précisée, toujours avec les mêmes problèmes de discontinuité en phase et amplitude. Dans les deux cas, les auteurs tentent d'atténuer lesdits problèmes de discontinuité par l'application d'une correction empirique des métriques de différences de niveau et de phase, suivie d'une déformation du domaine intercanal, au prix d'un compromis entre stabilité et précision de localisation. La méthode dévoilée dans le présent document permet de régler ces deux problèmes sans compromettre stabilité ou précision de localisation.In “A Spatial Extrapolation Method to Derive High-Order Ambisonics Data from Stereo Sources” (J. Inf. Hiding and Multimedia Sig. Proc, 2015), Trevino et al. propose a two-dimensional (planar) decoding system of an HOA field previously encoded on a stereophonic stream, still according to Scheiber's principles. The main problems encountered by the authors are on the one hand the presence of a phase discontinuity (for values close to π) and on the other hand instabilities at the extreme positions of stereo panning, for which the metrics used are undefined. In “Enhancing Stereo Signals with High-Order Ambisonics Spatial Information”, (IEICE, 2016 ), an encoding method making it possible to obtain said signal is specified, still with the same problems of phase and amplitude discontinuity. In both cases, the authors attempt to attenuate said discontinuity problems by applying an empirical correction of the level and phase difference metrics, followed by a deformation of the inter-channel domain, at the cost of a compromise between stability and precision of localization. The method disclosed in this document allows these two problems to be solved without compromising stability or location accuracy.

Dans « Spatial Sound Reproduction with Directional Audio Coding » (J. Audio. Eng. Soc. 2007 ), Pulkki présente un codage audio directionnel (DirAC) qui est un procédé de représentation sonore spatiale, applicable pour différents systèmes de reproduction sonore. Dans la partie d'analyse, la diffusion et la direction d'arrivée du son sont estimées en un emplacement unique en fonction du temps et de la fréquence. Dans la partie de synthèse les signaux de microphone sont d'abord divisés en des parties non diffusantes et diffusantes, puis reproduits à l'aide de différentes stratégies. Le DirAC est développé à partir d'une technologie existante de reproduction de réponse impulsionnelle, de rendu de réponse impulsionnelle spatiale (SIRR) et des mises en œuvre de DirAC pour différentes applications sont décrites.In “Spatial Sound Reproduction with Directional Audio Coding” (J. Audio. Eng. Soc. 2007 ), Pulkki presents Directional Audio Coding (DirAC) which is a spatial sound representation method, applicable for different sound reproduction systems. In the analysis part, the diffusion and the direction of arrival of sound are estimated at a single location as a function of time and frequency. In the synthesis part the microphone signals are first divided into non-scattering and scattering parts, and then reproduced using different strategies. DirAC is developed from existing impulse response reproduction, spatial impulse response rendering (SIRR) technology and implementations of DirAC for different applications are described.

Dans « Analysis, synthesis, and perception of spatial sound: Binaural localization modeling and multichannel loudspeaker reproduction » (Rapport / Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, ISBN : 951 - 22 - 8291 - 7, 2006 ) Merimaa propose un mécanisme de modélisation à propriétés souhaitées spécifiques (c'est-à-dire pour la localisation, le système auditif doit déterminer indépendamment la direction de chaque source, tout en ignorant les réflexions et effets de superposition de tout son éventuel arrivant simultanément). Les repères de différence de temps interaurale (ITD) et de différence de niveau interaurale (ILD) ne sont considérés qu'à des instants où seul le son direct d'une seule source a une énergie non négligeable dans une bande critique et, ainsi, lorsque les ITD et ILD évoquées représentent la direction de cette source. On montre comment identifier de tels instants de temps en fonction de la cohérence interaurale (IC). Les directions source suggérées par les repères ITD et ILD sélectionnés sont également représentées de façon à impliquer les résultats d'un certain nombre d'études psychophysiques publiées. Les techniques d'analyse physique examinées et les connaissances psychoacoustiques sur l'audition spatiale sont appliquées au développement du procédé de rendu de réponse impulsionnelle spatiale (SIRR). SIRR vise à recréer ITD, ILD, IC et les repères de localisation monaurale en utilisant un procédé de synthèse-analyse perceptuellement motivé. Le procédé est décrit dans le contexte de la reproduction par haut-parleur multicanaux de réponses de chambre avec réverbérateurs à convolution. Les grandeurs analysées sont constituées de la direction d'arrivée et de la diffusion du son dépendantes du temps et de la fréquence. Sur la base des données d'analyse et d'un signal omnidirectionnel enregistré, les réponses multicanaux appropriées pour la reproduction avec n'importe quel haut-parleur surround choisi sont synthétisées.In “Analysis, synthesis, and perception of spatial sound: Binaural localization modeling and multichannel loudspeaker reproduction” (Report / Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, ISBN: 951 - 22 - 8291 - 7, 2006 ) Merimaa provides a modeling mechanism with specific desired properties (i.e. for localization, the hearing system must independently determine the direction of each source, while ignoring reflections and layering effects of any possible sound arriving simultaneously ). The interaural time difference (ITD) and interaural level difference (ILD) benchmarks are only considered at times when only direct sound from a single source has significant energy in a critical band and, thus, when the ITDs and ILDs mentioned represent the direction of that source. We show how to identify such instants of time as a function of interaural coherence (IC). The source directions suggested by the selected ITD and ILD benchmarks are also shown so as to imply the results of a number of published psychophysical studies. The examined physical analysis techniques and psychoacoustic knowledge of spatial hearing are applied to the development of the spatial impulse response rendering (SIRR) process. SIRR aims to recreate ITD, ILD, IC and monaural localization landmarks using a perceptually motivated synthesis-analysis process. The method is described in the context of multichannel loudspeaker reproduction of chamber responses with convolution reverberators. The quantities analyzed consist of the direction of arrival and the diffusion of sound time and frequency dependent. Based on the analysis data and a recorded omnidirectional signal, multichannel responses suitable for reproduction with any selected surround speaker are synthesized.

US 2010/329466 A1 décrit un processeur audio pour convertir un signal d'entrée audio multicanal, tel qu'un signal de champ sonore de format B, en un ensemble de signaux de sortie audio, tel qu'un ensemble de deux ou plusieurs signaux de sortie audio agencés pour une reproduction de casque ou pour une lecture sur un réseau de haut-parleurs. Une banque de filtres divise chacun des canaux d'entrée en bandes de fréquence. Le signal d'entrée est décomposé en ondes planes pour déterminer une ou deux directions de source sonore dominante. Celles-ci sont utilisées pour déterminer un ensemble de positions de haut-parleur virtuel choisi de telle sorte que la/les direction(s) dominante(s) coïncide(nt) avec des positions de haut-parleur virtuel. Le signal d'entrée est décodé en signaux de haut-parleur virtuel correspondant à chacune des positions de haut-parleur virtuel, et les signaux de haut-parleur virtuel sont traités avec des fonctions de transfert adaptées pour créer l'illusion du son émanant des directions des haut-parleurs virtuels. Une haute fidélité spatiale est obtenue grâce à la coïncidence des positions de haut-parleur virtuel et de la/des direction(s) de source sonore dominante déterminée(s). Les fonctions de transfert en rapport avec la tête (HRTF) peuvent être utilisées en différenciant la phase d'une partie haute fréquence des HRTF par rapport à la fréquence, suivie d'une intégration correspondante de cette partie par rapport à la fréquence après combinaison des composantes de HRTF de directions différentes. US 2010/329466 A1 describes an audio processor for converting a multi-channel audio input signal, such as a format B sound field signal, into a set of audio output signals, such as a set of two or more audio output signals arranged for headphone reproduction or for playback over a loudspeaker network. A bank of filters divides each of the input channels into frequency bands. The input signal is decomposed into plane waves to determine one or two directions of the dominant sound source. These are used to determine a set of virtual speaker positions chosen such that the dominant direction (s) coincide (s) with virtual speaker positions. The input signal is decoded into virtual speaker signals corresponding to each of the virtual speaker positions, and the virtual speaker signals are processed with suitable transfer functions to create the illusion of sound emanating from them. directions of virtual speakers. High spatial fidelity is achieved by the coincidence of the virtual speaker positions and the determined dominant sound source direction (s). Head Related Transfer Functions (HRTFs) can be used by differentiating the phase of a high frequency part of the HRTFs from the frequency, followed by a corresponding integration of that part against the frequency after combining the frequencies. components of HRTF from different directions.

WO 2009/046460 A2 décrit un schéma de codage et de décodage stéréo d'amplitude de phase à deux canaux permettant une reproduction audio 3 D interactive par l'intermédiaire de transmission standard à deux canaux audio uniquement. Le schéma de codage permet d'associer une localisation de position 2 D ou 3 D à chacune d'une pluralité de sources sonores en utilisant des différences de phase et d'amplitude inter-canaux indépendantes en fréquence. Le décodeur est basé sur une analyse spatiale dans le domaine fréquentiel de repères directionnels 2 D ou 3 D dans un signal stéréo à deux canaux et re-synthèse de ces repères par toute technique de spatialisation préférée, permettant ainsi la reproduction de repères audio positionnels et la réverbération ou repères ambiants sur des formats de reproduction arbitraires d'un haut-parleur multicanal ou sur des casques tout en visant à préserver la séparation source malgré le codage intermédiaire sur seulement deux canaux audio. WO 2009/046460 A2 describes a two-channel phase-amplitude stereo encoding and decoding scheme enabling interactive 3-D audio reproduction through standard two-channel audio-only transmission. The coding scheme makes it possible to associate a 2 D or 3 D position location with each of a plurality of sound sources by using frequency independent inter-channel phase and amplitude differences. The decoder is based on a spatial frequency domain analysis of 2D or 3D directional cues in a two-channel stereo signal and re-synthesis of these cues by any preferred spatialization technique, thus allowing the reproduction of positional audio cues and the reverberation or ambient cues on arbitrary reproduction formats of a multichannel loudspeaker or on headphones while aiming to preserve the source separation despite intermediate coding on only two audio channels.

Dans « Binaural Reproduction of Higher Order Ambisonics a Real-Time Implementation and Perceptual Improvements » (2014 ), Vennerød examine et développe la théorie des harmoniques sphériques et des ambisoniques d'ordre supérieur, qui sert de fondation pour un système de temps réel proposé. Ce système peut enregistrer des signaux provenant d'un ensemble de microphones sphériques commerciaux, les convertir au format ambisonique d'ordre supérieur, et reproduire le champ sonore à travers des casques. Pour compenser le mouvement de la tête, on utilise un dispositif de suivi de la tête. Le système en temps réel fonctionne avec une latence d'environ 95 millisecondes entre un mouvement de la tête et la rotation de champ sonore consécutive. En outre, deux nouvelles méthodes visant à améliorer la reproduction de casque ont été évaluées.In “Binaural Reproduction of Higher Order Ambisonics a Real-Time Implementation and Perceptual Improvements” (2014 ), Vennerød examines and develops the theory of spherical harmonics and higher-order ambisonics, which serves as the foundation for a proposed real-time system. This system can record signals from a set of commercial spherical microphones, convert them to the higher-order ambisonic format, and reproduce the sound field through headphones. To compensate for head movement, a head tracker is used. The real-time system operates with a latency of approximately 95 milliseconds between a head movement and the subsequent sound field rotation. In addition, two new methods to improve helmet reproduction were evaluated.

L'un des objectifs de la présente invention est de dévoiler une méthode qui permette, dans le cadre d'un encodage vers un flux stéréophonique, une continuité du signal y compris de sa phase, quelle que soit la position de la source et quelle que soit la trajectoire qu'elle décrive, sans nécessiter ni composante non-directionnelle dans le signal d'entrée, ni d'encodage matriciel du signal, ni compromis entre stabilité et précision de localisation pour les positions extrêmes dans le domaine intercanal.One of the objectives of the present invention is to unveil a method which allows, in the context of encoding to a stereophonic stream, continuity of the signal including its phase, whatever the position of the source and whatever or the trajectory that it describes, without requiring any non-directional component in the input signal, or matrix encoding of the signal, or compromise between stability and precision of location for the extreme positions in the inter-channel domain.

Un exemple permettant d'assurer un décodage et un transcodage depuis un signal stéréophonique, optionnellement encodé avec l'une des implémentations de l'invention, ou encodé avec les systèmes existants d'encodage matriciel, et d'en effectuer un rendu sur tout moyen de diffusion et sous tout format audio, sans nécessiter de compromis entre stabilité et précision de localisation, ne relève pas de l'invention et n'est présent qu'à titre indicatif.An example making it possible to ensure decoding and transcoding from a stereophonic signal, optionally encoded with one of the implementations of the invention, or encoded with existing matrix encoding systems, and to render it by any means broadcast and in any audio format, without requiring a compromise between stability and precision of location, does not come within the scope of the invention and is present only as an indication.

Un autre exemple, qui ne relève pas de l'invention et qui n'est présent qu'à titre indicatif, fournit une chaîne complète de transport ou de stockage d'un champ acoustique tridimensionnel, dans un format compact et accepté par les moyens standard de transport ou de stockage, tout en conservant les informations spatiales tridimensionnelles pertinentes du champ original.Another example, which does not fall within the scope of the invention and which is present only as an indication, provides a complete chain of transport or storage of a three-dimensional acoustic field, in a compact format and accepted by standard means. transport or storage, while retaining the relevant three-dimensional spatial information of the original field.

BREVE DESCRIPTION DES FIGURESBRIEF DESCRIPTION OF THE FIGURES

The figure 1 represents the Scheiber sphere (also called the Stokes-Poincaré sphere or the energy sphere) as defined, for example, in " Analyzing Phase-Amplitude Matrices ", Journal of the Audio Engineering Society, Vol. 19, No. 10, p. 835 (November 1971 ).
The figure 2 illustrates in the form of a panoramic-phase map an example of an arbitrary phase correspondence choice.
The figure 3 gives an example of a partial phase correspondence map ensuring continuity between the edges of the panoramic-phase definition domain.
The figure 4 illustrates the principle of the folding of the correspondence card of the figure 2 on Scheiber's sphere of the figure 1 .
The figure 5 illustrates the folding of the figure 4 , once it is completely done.
The figure 6 represents the sphere of Scheiber on which is present a field of vectors corresponding to the complex frequency coefficient c L local. By construction of the phase correspondence map, the sum of the indices to the authorized singularities, in L, or of cancellation of the vector field, in R, is different from 2, expected value if it were possible not to have d another singularity on the sphere. In the left and right boxes are presented the possible local structures of the vector field in the vicinity of the singularities of the points L and R, with their respective indices.
The figure 7 illustrates the phase correspondence map for a singularity positioned at Ψ = Ψ 0 . The phase correspondence described by this map is continuous at all points except at Ψ .
The figure 8 represents the map of the figure 7 after its folding on the sphere of Scheiber.
The figure 9 illustrates the phase correspondence map for a singularity positioned in Ψ of panorama coordinates and phase difference (-1/4, -3 π / 4).
The figure 10 represents the map of the figure 9 after its folding on the sphere of Scheiber.
The figure 11 shows the diagram of the encoding process, converting a signal from the spherical domain to the inter-channel domain.
The figure 12 shows the diagram of the decoding process, converting a signal from the inter-channel domain to the spherical domain.
The figure 13 illustrates the process of deformation of spherical space according to azimuth values.

DESCRIPTION DETAILLEEDETAILED DESCRIPTION

Les techniques exposées par la suite traitent des données qui se présentent sous la forme de coefficients fréquentiels complexes. Ces coefficients représentent une bande de fréquences sur une fenêtre temporelle réduite. Ils sont obtenus à l'aide d'une technique appelée transformée de Fourier court-terme (STFT en anglais), et peuvent également entre obtenus à l'aide de transformées analogues, telles que celles de la famille des transformées en ondelettes complexes (CWT), transformées en paquets d'ondelettes complexes (CWPT), la transformée en cosinus discret modifiée (MDCT) ou la transformée à recouvrement complexe modulée (MCLT), etc. Chacune de ces transformées, appliquée sur des fenêtres successives et chevauchées du signal, possède une transformée inverse permettant, depuis les coefficients fréquentiels complexes représentant l'ensemble des bandes de fréquences du signal, d'obtenir un signal sous forme temporelle.
Dans le présent document, on définit :

l'opérateur Norm〈. |. 〉 tel que Norm $〈 \vec{v} | \vec{d} 〉 = {\begin{matrix} \frac{\vec{v}}{‖ \vec{v} ‖} si \vec{v} \neq \vec{0} \\ \vec{d} si \vec{v} = \vec{0} \end{matrix}$
l'opérateur Re[v] qui désigne la partie réelle du vecteur v , c'est-à-dire le vecteur des parties réelles des composantes du vecteur v ;
l'opérateur v * qui est l'opérateur de conjugaison des composantes complexes du vecteur v ;
l'opérateur atan2(y, x) qui est l'opérateur qui donne l'angle orienté entre un vecteur (1,0) ^T et un vecteur (x,y) ^T ; cet opérateur est disponible sous forme d'une fonction std::atan2 de la librairie STL du langage C++.

The techniques described below deal with data which is in the form of complex frequency coefficients. These coefficients represent a band of frequencies over a reduced time window. They are obtained using a technique called the short-term Fourier transform (STFT), and can also be obtained using analog transforms, such as those of the complex wavelet transform family (CWT ), Complex Wavelet Packet Transforms (CWPT), Modified Discrete Cosine Transform (MDCT) or Modulated Complex Overlay Transform (MCLT), etc. Each of these transforms, applied to successive and overlapped windows of the signal, has an inverse transform making it possible, from the complex frequency coefficients representing all the frequency bands of the signal, to obtain a signal in temporal form.
In this document, we define:

the Norm operator 〈. |. 〉 Such as Norm $〈 \vec{v} | \vec{d} 〉 = {\begin{matrix} \frac{\vec{v}}{‖ \vec{v} ‖} if \vec{v} \neq \vec{0} \\ \vec{d} if \vec{v} = \vec{0} \end{matrix}$
operator Re [ v ] which designates the real part of the vector v , i.e. the vector of the real parts of the components of the vector v ;
the operator v * which is the conjugation operator of the complex components of the vector v ;
the operator atan2 ( y , x ) which is the operator which gives the oriented angle between a vector (1,0) ^T and a vector (x, y) ^T ; this operator is available in the form of a function std :: atan2 of the STL library of the C ++ language.

À l'aide de l'une des transformées temps-vers-fréquences exposées précédemment, deux canaux sous forme temporelle, par exemple formant un signal stéréophonique, peuvent être transformés vers le domaine fréquentiel en deux tableaux de coefficients complexes. Les coefficients fréquentiels complexes des deux canaux peuvent être appariés, de manière à avoir une paire pour chaque fréquence ou bande de fréquences parmi une pluralité de fréquences, et pour chaque fenêtre temporelle du signal.
Chaque paire de coefficients fréquentiels complexes peut être analysée à l'aide de deux métriques, combinant des informations issues de deux canaux stéréophoniques, qui sont introduites ci-dessous : le panorama et la différence de phase, lesquelles forment ce que l'on nommera dans la suite du présent document le « domaine intercanal ».Using one of the time-to-frequency transforms discussed previously, two channels in temporal form, for example forming a stereophonic signal, can be transformed to the frequency domain into two tables of complex coefficients. The complex frequency coefficients of the two channels can be paired, so as to have a pair for each frequency or frequency band among a plurality of frequencies, and for each time window of the signal.
Each pair of complex frequency coefficients can be analyzed using two metrics, combining information from two stereophonic channels, which are introduced below: the panorama and the phase difference, which form what we will call in the remainder of this document the “inter-channel domain”.

On définit le panorama de deux coefficients fréquentiels complexes c ₁ et c ₂ comme le rapport entre la différence de leurs puissances et la somme de leurs puissances: $\forall (c_{1}, c_{2}) \in ℂ^{2} | (c_{1}, c_{2}) \neq (0,0), panorama (c_{1}, c_{2}) = \frac{{|c_{1}|}^{2} - {|c_{2}|}^{2}}{{|c_{1}|}^{2} + {|c_{2}|}^{2}}$

Le panorama prend ainsi des valeurs dans l'intervalle [-1,1]. Si les deux coefficients sont simultanément de magnitude nulle, il n'y a pas de signal dans la bande de fréquence qu'ils représentent, et l'utilisation du panorama n'est pas pertinente.
Le panorama appliqué à un signal stéréophonique composé de deux canaux gauche (L) et droit (R) sera ainsi, pour les coefficients respectifs des deux canaux c_L et c_R non simultanément nuls :

panorama (c_{L}, c_{R}) = \frac{{|c_{L}|}^{2} - {|c_{R}|}^{2}}{{|c_{L}|}^{2} + {|c_{R}|}^{2}}

Le panorama vaut ainsi, entre autres :

1 pour un signal entièrement contenu dans le canal gauche, c'est-à-dire c_R = 0,
-1 pour un signal entièrement contenu dans le canal droit, c'est-à-dire c_L = 0,
0 pour un signal de même magnitude sur les deux canaux.

We define the panorama of two complex frequency coefficients c ₁ and c ₂ as the ratio between the difference of their powers and the sum of their powers:

\forall ({vs}_{1}, {vs}_{2}) \in ℂ^{2} | ({vs}_{1}, {vs}_{2}) \neq (0.0), panorama ({vs}_{1}, {vs}_{2}) = \frac{{|{vs}_{1}|}^{2} - {|{vs}_{2}|}^{2}}{{|{vs}_{1}|}^{2} + {|{vs}_{2}|}^{2}}

The panorama thus takes values in the interval [-1,1]. If the two coefficients are simultaneously of zero magnitude, there is no signal in the frequency band they represent, and the use of the panorama is irrelevant.
The panorama applied to a stereophonic signal composed of two left (L) and right (R) channels will thus be, for the respective coefficients of the two channels c _L and c _R not simultaneously zero:

panorama ({vs}_{THE}, {vs}_{R}) = \frac{{|{vs}_{THE}|}^{2} - {|{vs}_{R}|}^{2}}{{|{vs}_{THE}|}^{2} + {|{vs}_{R}|}^{2}}

The panorama is thus worth, among others:

1 for a signal entirely contained in the left channel, i.e. c _R = 0,
-1 for a signal entirely contained in the right channel, i.e. c _L = 0,
0 for a signal of the same magnitude on both channels.

La connaissance d'un panorama et d'une puissance totale p permet de déterminer les magnitudes des deux coefficients fréquentiels complexes : ${\begin{matrix} |c_{L}| = \sqrt{p} \sqrt{\frac{1}{2} (1 + panorama)} \\ |c_{R}| = \sqrt{p} \sqrt{\frac{1}{2} (1 - panorama)} \end{matrix}$

Une variante de la formulation du panorama est la suivante :

panorama (c_{L}, c_{R}) = \frac{4}{π} atan2 (|c_{L}|, |c_{R}|) - 1

Avec cette formulation, la connaissance d'un panorama et d'une puissance totale p permet de déterminer les magnitudes des deux coefficients fréquentiels complexes :

{\begin{matrix} |c_{L}| = \sqrt{p} \sin (\frac{π}{4} (panorama + 1)) \\ |c_{R}| = \sqrt{p} \cos (\frac{π}{4} (panorama + 1)) \end{matrix}

On définit par ailleurs la différence de phase entre deux coefficients fréquentiels complexes c ₁ et c ₂ tous deux non nuls comme suit :

phasediff (c_{1}, c_{2}) = \arg (c_{2}) - \arg (c_{1}) + k 2 π

où k ∈

tel que phasediff(c _1, c ₂) ∈ ]-π, π].Knowing a panorama and a total power p makes it possible to determine the magnitudes of the two complex frequency coefficients:

{\begin{matrix} |{vs}_{THE}| = \sqrt{p} \sqrt{\frac{1}{2} (1 + panorama)} \\ |{vs}_{R}| = \sqrt{p} \sqrt{\frac{1}{2} (1 - panorama)} \end{matrix}

A variant of the panorama formulation is as follows:

panorama ({vs}_{THE}, {vs}_{R}) = \frac{4}{π} atan2 (|{vs}_{THE}|, |{vs}_{R}|) - 1

With this formulation, knowing a panorama and a total power p makes it possible to determine the magnitudes of the two complex frequency coefficients:

{\begin{matrix} |{vs}_{THE}| = \sqrt{p} \sin (π) (\frac{}{4} (panorama + 1)) \\ |{vs}_{R}| = \sqrt{p} \cos (\frac{π}{4} (panorama + 1)) \end{matrix}

We also define the phase difference between two complex frequency coefficients c ₁ and c ₂ both non-zero as follows:

phasediff ({vs}_{1}, {vs}_{2}) = \arg ({vs}_{2}) - \arg ({vs}_{1}) + k 2 π

where k ∈

such that phasediff ( c _1, c ₂ ) ∈] -π, π].

Dans la suite de ce document, on se place dans le repère cartésien tridimensionnel d'axes (X, Y,Z) et de coordonnées (x,y,z). On considère que l'azimut est l'angle dans le plan (z = 0), de l'axe X vers l'axe Y (sens trigonométrique), en radians. Un vecteur v présentera une coordonnée d'azimut a lorsque le demi-plan (y = 0, x ≥ 0) ayant subi une rotation autour de l'axe Z d'un angle a contiendra le vecteur v. Un vecteur v présentera une coordonnée d'élévation e lorsque, dans le demi-plan (y = 0, x ≥ 0) ayant subi une rotation autour de l'axe Z, il présente un angle e avec un vecteur non nul de la demi-droite définie par intersection entre le demi-plan et le plan horizontal (z = 0), positif vers le haut.
Un vecteur unité d'azimut et d'élévation a et e aura pour coordonnées cartésiennes : ${\begin{cases} x = \cos (a) \cos (e) \\ y = \sin (a) \cos (e) \\ z = \sin (e) \end{cases}$

Dans ce repère cartésien, un signal exprimé sous la forme d'un champ "First Order Ambisonics" (FOA), c'est à dire en harmoniques sphériques du premier ordre, est composé de quatre canaux W, X, Y, Z, correspondant à la pression et au gradient de pression en un point de l'espace suivant chacune des directions :

le canal W est le signal de pression
le canal X est le signal du gradient de pression au point suivant l'axe X
le canal Y est le signal du gradient de pression au point suivant l'axe Y
le canal Z est le signal du gradient de pression au point suivant l'axe Z

In the continuation of this document, one places oneself in the three-dimensional Cartesian coordinate system of axes (X, Y, Z) and of coordinates (x, y, z). The azimuth is considered to be the angle in the plane ( z = 0), from the X axis to the Y axis (trigonometric direction), in radians. A vector v will have an azimuth coordinate a when the half-plane (y = 0 , x ≥ 0) that has been rotated around the Z axis by an angle a will contain the vector v . A vector v will have an elevation coordinate e when, in the half-plane ( y = 0, x ≥ 0) having undergone a rotation around the axis Z , it presents an angle e with a non-zero vector of the half-line defined by intersection between the half-plane and the horizontal plane ( z = 0), positive upwards.
A unit vector of azimuth and elevation a and e will have Cartesian coordinates:

{\begin{cases} x = \cos (To) \cos (e) \\ y = \sin (To) \cos (e) \\ z = \sin (e) \end{cases}

In this Cartesian coordinate system, a signal expressed in the form of a "First Order Ambisonics" (FOA) field, ie in first order spherical harmonics, is made up of four channels W, X, Y, Z, corresponding to the pressure and to the pressure gradient at a point in space according to each of the directions:

channel W is the pressure signal
the X channel is the signal of the pressure gradient at the point along the X axis
the Y channel is the signal of the pressure gradient at the point along the Y axis
the Z channel is the signal of the pressure gradient at the point along the Z axis

Un standard de normalisation des harmoniques sphériques peut être défini comme suit : une onde plane progressive monochromatique (OPPM) de composante fréquentielle complexe c et de direction de provenance le vecteur unitaire v de coordonnées cartésiennes (v_x, v_y, v_z ) ou de coordonnées d'azimut et d'élévation (a, e) engendrera pour chaque canal un coefficient de même phase mais de magnitude altérée : $en coordonnées cartésiennes {\begin{cases} c_{w} = \frac{c}{\sqrt{2}} & pour W \\ c_{x} = {cv}_{x} & pour X \\ c_{y} = {cv}_{y} & pour Y \\ c_{z} = {cv}_{z} & pour Z \end{cases}$

ou respectivement

en coordonnées d'azimut et d'élévation {\begin{cases} c_{w} = \frac{c}{\sqrt{2}} & pour W \\ c_{xw} = c \cos (a) \cos (e) & pour X \\ c_{y} = c \sin (a) \cos (e) & pour Y \\ c_{z} = c \sin (e) & pour Z \end{cases}

l'ensemble étant exprimé à un facteur de normalisation près. Par linéarité des transformées temps-fréquences, l'expression des équivalents dans le domaine temporel est triviale. D'autres standards de normalisation existent, qui sont par exemple présentés par Daniel dans « Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia » (Thèse de doctorat de l'Université Paris 6, 31 juillet 2001 )
Le concept de "divergence" permet simuler dans le champ FOA une source se déplaçant à l'intérieur de la sphère unitaire des directions : la divergence est un paramètre réel à valeurs dans [0,1], une divergence div = 1 positionnera la source à la surface de la sphère comme dans les équations précédentes, et divergence div = 0 positionnera la source au centre de la sphère. Ainsi les coefficients du champ FOA sont les suivants :

en coordonnées cartésiennes {\begin{cases} c_{w} = \frac{c}{\sqrt{2}} & pour W \\ c_{x} = c d i v v_{x} & pour X \\ c_{y} = c d i v v_{y} & pour Y \\ c_{z} = c d i v v_{z} & pour Z \end{cases}

ou respectivement

en coordonn é es d' azimut et d' é l é vation {\begin{cases} c_{w} = \frac{c}{\sqrt{2}} & pour W \\ c_{x} = c d i v \cos (a) \cos (e) & pour X \\ c_{y} = c d i v \sin (a) \cos (e) & pour Y \\ c_{z} = c d i v \sin (e) & pour Z \end{cases}

l'ensemble étant exprimé à un facteur de normalisation près. Par linéarité des transformées temps-fréquences, l'expression des équivalents dans le domaine temporel est triviale.A standard for the normalization of spherical harmonics can be defined as follows: a monochromatic progressive plane wave (OPPM) of complex frequency component c and direction of origin the unit vector v Cartesian coordinates ( v _x , v _y , v _z ) or azimuth and elevation coordinates ( a , e ) will generate for each channel a coefficient of the same phase but of altered magnitude:

in contact details Cartesian {\begin{cases} {vs}_{w} = \frac{vs}{\sqrt{2}} & for W \\ {vs}_{x} = {cv}_{x} & for X \\ {vs}_{y} = {cv}_{y} & for Y \\ {vs}_{z} = {cv}_{z} & for Z \end{cases}

or respectively

in contact details azimuth and elevation {\begin{cases} {vs}_{w} = \frac{vs}{\sqrt{2}} & for W \\ {vs}_{xw} = vs \cos (To) \cos (e) & for X \\ {vs}_{y} = vs \sin (To) \cos (e) & for Y \\ {vs}_{z} = vs \sin (e) & for Z \end{cases}

the whole being expressed to within a normalization factor. By linearity of time-frequency transforms, the expression of equivalents in the time domain is trivial. Other standardization standards exist, which are for example presented by Daniel in "Representation of acoustic fields, application to the transmission and reproduction of complex sound scenes in a multimedia context" (Doctoral thesis of the University of Paris 6, July 31, 2001 )
The concept of "divergence" allows to simulate in the FOA field a source moving inside the unitary sphere of the directions: the divergence is a real parameter with values in [0,1], a divergence div = 1 will position the source at the surface of the sphere as in the previous equations, and divergence div = 0 will position the source at the center of the sphere. Thus the coefficients of the FOA field are as follows:

in contact details Cartesian {\begin{cases} {vs}_{w} = \frac{vs}{\sqrt{2}} & for W \\ {vs}_{x} = vs d i v v_{x} & for X \\ {vs}_{y} = vs d i v v_{y} & for Y \\ {vs}_{z} = vs d i v v_{z} & for Z \end{cases}

or respectively

in coordinate é es d' azimuth and d' é the é vation {\begin{cases} {vs}_{w} = \frac{vs}{\sqrt{2}} & for W \\ {vs}_{x} = vs d i v \cos (To) \cos (e) & for X \\ {vs}_{y} = vs d i v \sin (To) \cos (e) & for Y \\ {vs}_{z} = vs d i v \sin (e) & for Z \end{cases}

the whole being expressed to within a normalization factor. By linearity of time-frequency transforms, the expression of equivalents in the time domain is trivial.

Une implémentation préférée de l'invention comprend une première méthode de conversion d'un tel champ FOA en coefficients complexes et en coordonnées sphériques. Cette première méthode permet une conversion, avec pertes, basée sur un caractère perceptuel, du champ FOA vers un format composé de coefficients fréquentiels complexes et de leur correspondance spatiale en coordonnées azimut et élévation (ou un vecteur cartésien de norme unité). Ladite méthode se base sur une représentation fréquentielle des signaux FOA obtenus après fenêtrage temporel et transformée temps-vers fréquence, par exemple via l'usage de la transformée de Fourier court-terme (ou en anglais "Short-Term Fourier Transform", STFT).
Le procédé suivant est appliqué sur chaque groupe de quatre coefficients complexes correspondant à un "bin" fréquentiel, c'est-à-dire les coefficients complexes de la représentation fréquentielle de chacun des canaux W, X, Y, Z qui correspondent à la même bande de fréquences, et ce pour toute fréquence ou bande de fréquence parmi une pluralité de fréquences. Une exception est faite pour le (ou les) bin(s) fréquentielles correspondant à la composante continue (du fait du « padding » appliqué au signal avant transformée temps-vers-fréquence, les quelques bins fréquentiels suivants peuvent être également concernés).
On note c_W, c_X, c_Y, c_Z les coefficients complexes correspondant à un "bin" fréquentiel considéré. Une analyse est effectuée pour séparer le contenu de cette bande de fréquence en trois parties :

une partie A correspondant à une onde plane progressive monochromatique (OPPM), directionnelle,
une partie B correspondant à une onde de pression diffuse,
une partie C correspondant à une onde stationnaire.

Pour la compréhension de cette séparation, les exemples suivants sont donnés :

Une analyse menant à une séparation dans laquelle seule la partie A est non nulle peut être obtenue avec un signal provenant d'une OPPM telle que décrite dans l'équation 8 ou l'équation 9.
Une analyse menant à une séparation dans laquelle seule la partie B est non nulle peut être obtenue avec deux OPPM (de même fréquence), en phase, et de directions de provenance opposées (seul c_W étant alors non nul).
Une analyse menant à une séparation dans laquelle seule la partie C est non nulle peut être obtenue avec deux OPPM (de même fréquence), hors phase, et de directions de provenance opposées (seuls c_X, c_Y, c_Z étant alors non nuls).

Par la suite, les trois parties sont regroupées afin d'obtenir un signal total.
Concernant la partie A définie ci-dessus, on s'intéresse au vecteur intensité moyenne du signal du champ FOA. Dans « Instantaneous intensity» (AES Convention 81, Nov 1986), Heyser indique une formulation dans le domaine fréquentiel de la partie active de l'intensité acoustique, que l'on peut alors exprimer, selon les trois dimensions :

{\vec{I}}_{x, y, z} = \frac{1}{2} Re [p {\vec{u}}_{x, y, z} *]

où :

I _x,y,z est le vecteur tridimensionnel d'intensité moyenne, dirigé vers l'origine de l'OPPM, de magnitude proportionnelle au carré de la magnitude de l'OPPM,
l'opérateur Re[ v ] désigne la partie réelle du vecteur v , c'est-à-dire le vecteur des parties réelles des composantes du vecteur v ,
p est le coefficient complexe correspondant à la composante de pression, c'est-à-dire p = $\sqrt{2} c_{w}$
,
u _x,y,z est le vecteur tridimensionnel composé des coefficients complexes correspondant aux gradients de pressions respectivement suivant l'axe X, Y, et Z, c'est-à-dire u _x,y,z = (c_X, c_Y, c_Z)^T,
l'opérateur v * est l'opérateur de conjugaison des composantes complexes du vecteur.

Il est ainsi obtenu pour la partie A, pour chaque "bin" fréquentiel excepté celui ou ceux correspondant à la composante continue :

{\begin{cases} \vec{a} = Norm 〈 {\vec{I}}_{x, y, z} | \vec{0} 〉 le vecteur de direction de provenance \\ c_{a} = {‖ {\vec{I}}_{x, y, z} ‖}^{1 / 2} e^{i \arg (c_{w})} le coefficient complexe associ é \end{cases}

A preferred implementation of the invention comprises a first method of converting such a FOA field into complex coefficients and into spherical coordinates. This first method allows a conversion, with losses, based on a perceptual character, of the FOA field to a format composed of complex frequency coefficients and their spatial correspondence in azimuth and elevation coordinates (or a Cartesian vector of unit norm). Said method is based on a frequency representation of the FOA signals obtained after time windowing and time-to-frequency transform, for example via the use of the short-term Fourier transform (or in English "Short-Term Fourier Transform", STFT) .
The following process is applied to each group of four complex coefficients corresponding to a frequency "bin", that is to say the complex coefficients of the frequency representation of each of the channels W, X, Y, Z which correspond to the same frequency band, and this for any frequency or frequency band among a plurality of frequencies. An exception is made for the frequency bin (s) corresponding to the DC component (due to the “padding” applied to the signal before time-to-frequency transform, the following few frequency bins may also be concerned).
We denote by c _W , c _X , c _Y , c _Z the complex coefficients corresponding to a considered frequency "bin". An analysis is performed to separate the content of this frequency band into three parts:

a part A corresponding to a monochromatic progressive plane wave (OPPM), directional,
a part B corresponding to a diffuse pressure wave,
a part C corresponding to a standing wave.

For the understanding of this separation, the following examples are given:

An analysis leading to a separation in which only part A is non-zero can be obtained with a signal from an OPPM as described in Equation 8 or Equation 9.
An analysis leading to a separation in which only part B is non-zero can be obtained with two OPPMs (of the same frequency), in phase, and of opposite directions of origin (only c _W then being non-zero).
An analysis leading to a separation in which only the part C is non-zero can be obtained with two OPPMs (of the same frequency), out of phase, and from opposite directions of origin (only c _X , c _Y , c _Z then being non-zero ).

Subsequently, the three parts are grouped together in order to obtain a total signal.
Concerning part A defined above, we are interested in the mean intensity vector of the FOA field signal. In "Instantaneous intensity" (AES Convention 81, Nov 1986), Heyser indicates a formulation in the frequency domain of the active part of the acoustic intensity, which can then be expressed, according to the three dimensions:

{\vec{I}}_{x, y, z} = \frac{1}{2} D [p {\vec{u}}_{x, y, z} *]

or :

I _{x, y, z} is the three-dimensional vector of mean intensity, directed towards the origin of the OPPM, of magnitude proportional to the square of the magnitude of the OPPM,
operator Re [ v ] designates the real part of the vector v , i.e. the vector of the real parts of the components of the vector v ,
p is the complex coefficient corresponding to the component of pressure, i.e. p = $\sqrt{2} {vs}_{w}$
,
u _{x, y, z} is the three-dimensional vector composed of the complex coefficients corresponding to the pressure gradients respectively along the X, Y, and Z axis, that is to say u _{x, y, z} = ( c _X , c _Y , c _Z ) ^T ,
the operator v * is the conjugation operator of the complex components of the vector.

It is thus obtained for part A, for each frequency "bin" except the one or those corresponding to the DC component:

{\begin{cases} \vec{To} = Norm 〈 {\vec{I}}_{x, y, z} | \vec{0} 〉 the vector of direction of origin \\ {vs}_{To} = {‖ {\vec{I}}_{x, y, z} ‖}^{1 / 2} e^{i \arg ({vs}_{w})} the coefficient complex associate é \end{cases}

Par ailleurs, concernant la partie B définie ci-dessus, soit le coefficient complexe c_w ' le résultat de la soustraction du coefficient complexe correspondant au signal extrait dans la partie A (c'est-à-dire via l'équation 8) au coefficient original c_w : $c_{w}' = c_{w} - \frac{c_{a}}{\sqrt{2}}$

Il est possible de définir plusieurs modes de comportement pour la détermination de la partie B :

Dans un premier mode sphérique de conversion conservant l'ensemble des directions de provenance à élévations négatives, et donc notamment adapté à la réalité virtuelle, la partie B s'exprime comme ${\begin{cases} \vec{b} = {\vec{r}}_{w} le vecteur de direction de provenance \\ c_{b} = c_{w}' \sqrt{2} le coefficient complexe associ é \end{cases}$
où r _w est un vecteur dépendant de la bande de fréquence, décrit plus bas dans le présent document.
Dans un second mode hémisphérique, adapté notamment à la musique, dans lequel les élévations négatives ne sont pas pertinentes, l'information contenue dans l'hémisphère des élévations négatives est utilisée comme divergence dans le plan horizontal lors du décodage, ainsi par exemple une source positionnée au milieu de la sphère sera abaissée à une élévation de -90° afin d'obtenir une divergence de 0 et donc un étalement sur l'ensemble des haut-parleurs planaires après décodage sur un système d'écoute circulaire ou hémisphérique. La partie B s'exprime comme : ${\begin{cases} \vec{b} = {[\cos (e_{w}), 0, \sin (e_{w})]}^{T} le vecteur de direction de provenance \\ c_{b} = c_{w}' \sqrt{2} le coefficient complexe associ é \end{cases}$
où e_w est l'élévation de réintroduction de w, dans [-π/2,0], choisie par l'utilisateur, et par défaut réglée à -π/2.
D'autres modes intermédiaires entre le premier mode sphérique et le second mode hémisphérique peuvent également être construits, indexés par le coefficient s ∈ [0,1], valant 0 pour le mode sphérique, et 1 pour le mode hémisphérique. Soit le vecteur somme : $\vec{v_{s}} = (1 - s) \times \vec{r_{w}} + s \times {[\cos (e_{w}), 0, \sin (e_{w})]}^{T}$
Il est obtenu : ${\begin{matrix} \vec{b} = Norm 〈 \vec{v_{s}} | {[\cos (e_{w}), 0, \sin (e_{w})]}^{T} 〉 le vecteur de direction de provenance \\ c_{b} = {c^{'}}_{w} \sqrt{2} le coefficient comlexe associ é \end{matrix}$

Enfin, concernant la partie C, soient les coefficients complexes c_x', c_y', et c_z' les résultats de la soustraction des coefficient complexes correspondant au signal extrait dans la partie A (c'est-à-dire les coefficients obtenus avec l'équation) aux coefficients originaux c_x, c_y, et c_z :

{\begin{matrix} {c_{x}}^{'} = c_{x} - c_{a} a_{x} \\ {c_{y}}^{'} = c_{y} - c_{a} a_{y} \\ {c_{z}}^{'} = c_{z} - c_{a} a_{z} \end{matrix}

où a_x, a_y, a_z sont les composantes cartésiennes du vecteur a .
Il est obtenu :

{\begin{matrix} \vec{c_{x}} = \vec{r_{x}} le vecteur de direction de provenance selon l' axe X \\ c_{c, x} = c'_{x} le coefficient complexe associ é \\ \vec{c_{y}} = \vec{r_{y}} le vecteur de direction de provenance selon l' axe Y \\ c_{c, y} = c'_{y} le coefficient complexe associ é \\ \vec{c_{z}} = \vec{r_{z}} le vecteur de direction de provenance selon l' axe Z \\ c_{c, z} = c'_{z} le coefficient complexe associ é \end{matrix}

où r _x, r _y , et r _z sont des vecteurs dépendant de la fréquence ou de la bande de fréquence, décrits par la suite.
Les parties séparées A, B, et C sont regroupées en un vecteur de direction de provenance v _total et un coefficient complexe c _total :

{\begin{matrix} {\vec{v}}_{total} = Norm 〈 |c_{a}| \vec{a} + |c_{b}| \vec{b} + |c_{c, x}| \vec{c_{x}} + |c_{c, y}| \vec{c_{y}} + |c_{c, z}| \vec{c_{z}} | {(1, 0,0)}^{T} 〉 \\ c_{total} = c_{a} e^{i \arg (c_{w})} + c_{b} e^{i \arg (c_{w})} + c_{c, x} e^{i (\arg (c_{x}) + ϕ_{x})} + c_{c, y} e^{i (\arg (c_{y}) + ϕ_{y})} + c_{c, z} e^{i (\arg (c_{z}) + ϕ_{z})} \end{matrix}

où Φ_x, Φ_y et Φ_z sont des phases qui seront définies plus bas dans le présent document.Moreover, concerning part B defined above, let the complex coefficient c _w 'be the result of the subtraction of the complex coefficient corresponding to the signal extracted in part A (i.e. via equation 8) from original coefficient c _w :

{vs}_{w}' = {vs}_{w} - \frac{{vs}_{To}}{\sqrt{2}}

It is possible to define several modes of behavior for the determination of part B:

In a first spherical conversion mode retaining all of the directions of origin at negative elevations, and therefore particularly adapted to virtual reality, part B is expressed as ${\begin{cases} \vec{b} = {\vec{r}}_{w} the vector of direction of origin \\ {vs}_{b} = {vs}_{w}' \sqrt{2} the coefficient complex associate é \end{cases}$
or r _w is a frequency band dependent vector, described later in this document.
In a second hemispherical mode, suitable in particular for music, in which the negative elevations are not relevant, the information contained in the hemisphere of the negative elevations is used as divergence in the horizontal plane during decoding, thus for example a source positioned in the middle of the sphere will be lowered to an elevation of -90 ° in order to obtain a divergence of 0 and therefore a spread over all the planar loudspeakers after decoding on a circular or hemispherical listening system. Part B is expressed as: ${\begin{cases} \vec{b} = {[\cos (e_{w}), 0, \sin (e_{w})]}^{T} the vector of direction of origin \\ {vs}_{b} = {vs}_{w}' \sqrt{2} the coefficient complex associate é \end{cases}$
where e _w is the reintroduction elevation of w, in [- π / 2.0], chosen by the user, and by default set to - π / 2.
Other intermediate modes between the first spherical mode and the second hemispherical mode can also be constructed, indexed by the coefficient s ∈ [0,1], being equal to 0 for the spherical mode, and 1 for the hemispherical mode. Let the sum vector be: $\vec{v_{s}} = (1 - s) \times \vec{r_{w}} + s \times {[\cos (e_{w}), 0, \sin (e_{w})]}^{T}$
It is obtained: ${\begin{matrix} \vec{b} = Norm 〈 \vec{v_{s}} | {[\cos (e_{w}), 0, \sin (e_{w})]}^{T} 〉 the vector of direction of origin \\ {vs}_{b} = {vs}^{'}_{w} \sqrt{2} the coefficient complex associate é \end{matrix}$

Finally, concerning part C, let the complex coefficients c _x ', c _y ', and c _{z 'be} the results of the subtraction of the complex coefficients corresponding to the signal extracted in part A (i.e. the coefficients obtained with the equation) to the original coefficients c _x , c _y , and c _z :

{\begin{matrix} {vs}_{x}^{'} = {vs}_{x} - {vs}_{To} {To}_{x} \\ {vs}_{y}^{'} = {vs}_{y} - {vs}_{To} {To}_{y} \\ {vs}_{z}^{'} = {vs}_{z} - {vs}_{To} {To}_{z} \end{matrix}

where a _x , a _y , a _z are the Cartesian components of the vector To .
It is obtained:

{\begin{matrix} \vec{{vs}_{x}} = \vec{r_{x}} the vector of direction of origin according to the' axis X \\ {vs}_{vs, x} = vs'_{x} the coefficient complex associate é \\ \vec{{vs}_{y}} = \vec{r_{y}} the vector of direction of origin according to the' axis Y \\ {vs}_{vs, y} = vs'_{y} the coefficient complex associate é \\ \vec{{vs}_{z}} = \vec{r_{z}} the vector of direction of origin according to the' axis Z \\ {vs}_{vs, z} = vs'_{z} the coefficient complex associate é \end{matrix}

or r _x , r _y , and r _z are vectors depending on the frequency or the frequency band, described below.
The separate parts A, B, and C are grouped into a direction of provenance vector v _total and a complex coefficient c _total :

{\begin{matrix} {\vec{v}}_{total} = Norm 〈 |{vs}_{To}| \vec{To} + |{vs}_{b}| \vec{b} + |{vs}_{vs, x}| \vec{{vs}_{x}} + |{vs}_{vs, y}| \vec{{vs}_{y}} + |{vs}_{vs, z}| \vec{{vs}_{z}} | {(1, 0.0)}^{T} 〉 \\ {vs}_{total} = {vs}_{To} e^{i \arg ({vs}_{w})} + {vs}_{b} e^{i \arg ({vs}_{w})} + {vs}_{vs, x} e^{i (\arg ({vs}_{x}) + ϕ_{x})} + {vs}_{vs, y} e^{i (\arg ({vs}_{y}) + ϕ_{y})} + {vs}_{vs, z} e^{i (\arg ({vs}_{z}) + ϕ_{z})} \end{matrix}

where Φ _x , Φ _y and Φ _z are phases which will be defined later in this document.

La première méthode de conversion présentée ci-dessus ne considère pas de caractère de divergence qui peut être introduite lors du panoramique FOA. Une seconde implémentation préférée permet de considérer le caractère de divergence.
Pour la partie A, on considère I _x,_y,_z obtenu par l'équation 12. La divergence div est calculée comme suit : ${\begin{array}{l} div = \min (1, \frac{‖ {\vec{I}}_{x, y, z} ‖}{{|c_{w} \sqrt{2}|}^{2}}) si c_{w} \neq 0 \\ div = 1 si c_{w} = 0 \end{array}$

À partir de div, c_w et I _x,_y,_z sont calculés a et c_a :

\vec{a_{0}} = Norm 〈 {\vec{I}}_{x, y, z} | {(0, 0,0)}^{T} 〉

Dans un premier mode sphérique, le vecteur unitaire de direction a _spherical est calculé comme suit :

\vec{a_{spherical}} = Norm 〈 div \vec{a_{0}} + (1 - div) \vec{r_{w}} | {(1, 0,0)}^{T} 〉

Dans un second mode hémisphérique, le vecteur unitaire de direction a _{hemispherical} est calculé comme suit :

\vec{a_{1}} = div \vec{a_{0}}

On définit p le vecteur a ₁ projeté sur le plan horizontal :

\vec{p} = \vec{a_{1}} - (\vec{a_{1}}) \cdot {(0, 0,1)}^{T}

où • est le produit scalaire, et on définit sa norme p :

p = ‖ \vec{p} ‖

On définit également h :

h = \sqrt{1 - p^{2}}

\vec{a_{2}} = \vec{a_{1}} - (1 - p) (h - \vec{a_{1}} \cdot {(0, 0,1)}^{T}) {(0, 0,1)}^{T}

puis si la coordonnée en Z de a ₂ est inférieure à -h, elle est ramenée à -h. On définit hdiv :

hdiv = ‖ {\vec{a}}_{2} ‖

Puis enfin a _{hemispherical} :

\vec{a_{hemispherical}} = Norm 〈 \vec{a_{2}} + (1 - hdiv) \vec{r_{w}} | {(1, 0,0)}^{T} 〉

Des modes intermédiaires entre le mode sphérique et le mode hémisphérique peuvent être construits, indexés par un coefficients s ∈ [0,1], 0 pour le mode sphérique et 1 pour le mode hémisphérique :

\vec{a} = (1 - s) {\vec{a}}_{spherical} + s {\vec{a}}_{hemispherical}

Le coefficient fréquentiel complexe est quant à lui :

c_{a} = c_{w} \sqrt{2}

Par ailleurs, on notera qu'il n'existe pas de partie B puisque celle-ci est intégralement prise en compte par la divergence dans la partie A.
Enfin, concernant la partie C, soient les coefficients complexes c_x', c_y', et c_z' les résultats de la soustraction des coefficient complexes correspondant au signal extrait dans la partie A (c'est-à-dire les coefficients obtenus avec l'équation ), dans sa direction sans divergence, aux coefficients originaux c_x, c_y, et c_z :

{\begin{matrix} c_{x}' = c_{x} - c_{a} div a_{0_{x}} \\ c_{y}' = c_{y} - c_{a} div a_{0_{y}} \\ c_{z}' = c_{z} - c_{a} div a_{0_{z}} \end{matrix}

où a _{0 _x}, a _{0 _y}, a _{0 _z} sont les composantes cartésiennes du vecteur a ₀. Il est obtenu :

{\begin{matrix} \vec{c_{x}} = \vec{r_{x}} le vecteur de direction de provenance selon l' axe X \\ c_{c, x} = c'_{x} le coefficient complexe associ é \\ \vec{c_{y}} = \vec{r_{y}} le vecteur de direction de provenance selon l' axe Y \\ c_{c, y} = c'_{y} le coefficient complexe associ é \\ \vec{c_{z}} = \vec{r_{z}} le vecteur de direction de provenance selon l' axe Z \\ c_{c, z} = c'_{z} le coefficient complexe associ é \end{matrix}

où r _x, r _y , et r _z sont des vecteurs dépendant de la bande de fréquence, décrits par la suite.
Les parties séparées A et C sont en définitive regroupées en un vecteur de direction de provenance v _total et un coefficient complexe c _total :

{\begin{matrix} {\vec{v}}_{total} = Norm 〈 |c_{a}| \vec{a} + |c_{c, x}| \vec{c_{x}} + |c_{c, y}| \vec{c_{y}} + |c_{c, z}| \vec{c_{z}} | {(1, 0,0)}^{T} 〉 \\ c_{total} = c_{a} e^{i \arg (c_{w})} + c_{c, x} e^{i (\arg (c_{x}) + ϕ_{x})} + c_{c, y} e^{i (\arg (c_{y}) + ϕ_{y})} + c_{c, z} e^{i (\arg (c_{z}) + ϕ_{z})} \end{matrix}

où Φ_x, Φ_y et Φ_z sont des phases qui seront définies plus bas dans le présent document..The first conversion method presented above does not consider any divergence character that can be introduced during the FOA panning. A second preferred implementation makes it possible to consider the character of divergence.
For part A, we consider I _x , _y , _z obtained by equation 12. The divergence div is calculated as follows:

{\begin{array}{l} div = \min (1, \frac{‖ {\vec{I}}_{x, y, z} ‖}{{|{vs}_{w} \sqrt{2}|}^{2}}) if {vs}_{w} \neq 0 \\ div = 1 if {vs}_{w} = 0 \end{array}

From div, c _w and I _x , _y , _z are calculated To and c _a :

\vec{{To}_{0}} = Norm 〈 {\vec{I}}_{x, y, z} | {(0, 0.0)}^{T} 〉

In a first spherical mode, the unit vector of direction To _spherical is calculated as follows:

\vec{{To}_{spherical}} = Norm 〈 div \vec{{To}_{0}} + (1 - div) \vec{r_{w}} | {(1, 0.0)}^{T} 〉

In a second hemispherical mode, the unit vector of direction To _{hemispherical} is calculated as follows:

\vec{{To}_{1}} = div \vec{{To}_{0}}

We define p the vector To ₁ projected on the horizontal plane:

\vec{p} = \vec{{To}_{1}} - (\vec{{To}_{1}}) \cdot {(0, 0.1)}^{T}

where • is the scalar product, and we define its norm p :

p = ‖ \vec{p} ‖

We also define h:

h = \sqrt{1 - p^{2}}

\vec{{To}_{2}} = \vec{{To}_{1}} - (1 - p) (h - \vec{{To}_{1}} \cdot {(0, 0.1)}^{T}) {(0, 0.1)}^{T}

then if the Z coordinate of To ₂ is less than -h, it is reduced to -h. We define hdiv:

hdiv = ‖ {\vec{To}}_{2} ‖

Then finally To _{hemispherical} :

\vec{{To}_{hemispherical}} = Norm 〈 \vec{{To}_{2}} + (1 - hdiv) \vec{r_{w}} | {(1, 0.0)}^{T} 〉

Intermediate modes between the spherical mode and the hemispherical mode can be constructed, indexed by a coefficients s ∈ [0,1], 0 for the spherical mode and 1 for the hemispherical mode:

\vec{To} = (1 - s) {\vec{To}}_{spherical} + s {\vec{To}}_{hemispherical}

The complex frequency coefficient is:

{vs}_{To} = {vs}_{w} \sqrt{2}

In addition, it should be noted that there is no part B since it is fully taken into account by the divergence in part A.
Finally, concerning part C, let the complex coefficients c _x ', c _y ', and c _z 'be the results of the subtraction of the complex coefficients corresponding to the signal extracted in part A (i.e. the coefficients obtained with the equation), in its direction without divergence, to the original coefficients c _x , c _y , and c _z :

{\begin{matrix} {vs}_{x}' = {vs}_{x} - {vs}_{To} div {To}_{0_{x}} \\ {vs}_{y}' = {vs}_{y} - {vs}_{To} div {To}_{0_{y}} \\ {vs}_{z}' = {vs}_{z} - {vs}_{To} div {To}_{0_{z}} \end{matrix}

where a _{0 _x} , a _{0 _y} , a _{0 _z} are the Cartesian components of the vector To ₀ . It is obtained:

{\begin{matrix} \vec{{vs}_{x}} = \vec{r_{x}} the vector of direction of origin according to the' axis X \\ {vs}_{vs, x} = vs'_{x} the coefficient complex associate é \\ \vec{{vs}_{y}} = \vec{r_{y}} the vector of direction of origin according to the' axis Y \\ {vs}_{vs, y} = vs'_{y} the coefficient complex associate é \\ \vec{{vs}_{z}} = \vec{r_{z}} the vector of direction of origin according to the' axis Z \\ {vs}_{vs, z} = vs'_{z} the coefficient complex associate é \end{matrix}

or r _x , r _y , and r _z are vectors depending on the frequency band, described below.
The separate parts A and C are finally grouped together in a direction vector of origin. v _total and a complex coefficient c _total :

{\begin{matrix} {\vec{v}}_{total} = Norm 〈 |{vs}_{To}| \vec{To} + |{vs}_{vs, x}| \vec{{vs}_{x}} + |{vs}_{vs, y}| \vec{{vs}_{y}} + |{vs}_{vs, z}| \vec{{vs}_{z}} | {(1, 0.0)}^{T} 〉 \\ {vs}_{total} = {vs}_{To} e^{i \arg ({vs}_{w})} + {vs}_{vs, x} e^{i (\arg ({vs}_{x}) + ϕ_{x})} + {vs}_{vs, y} e^{i (\arg ({vs}_{y}) + ϕ_{y})} + {vs}_{vs, z} e^{i (\arg ({vs}_{z}) + ϕ_{z})} \end{matrix}

Concernant les vecteurs de direction pour les parties diffuses, il est fait référence plus haut à :

des vecteurs r _w , r _x, r _y , r _z , et
des phases Φ_x, Φ_y et Φ _z.

Ces vecteurs et phases ont pour responsabilité d'établir un caractère diffus au signal dont ils donnent la direction et dont ils modifient la phase. Ils dépendent de la bande de fréquence traitée, c'est-à-dire qu'il y a un ensemble de vecteurs et de phase pour chaque "bin" fréquentiel. Afin d'établir ce caractère diffus, ils sont issus d'un processus aléatoire, qui permet de les lisser spec-tralement, ainsi que temporellement s'il est souhaité qu'ils soient dynamiques.
Le processus d'obtention de ces vecteurs est le suivant :

Pour chaque fréquence ou bande de fréquences, un ensemble de vecteurs unitaires r _{0 _w}, r _{0 _x}, r _{0 _y}, r _{0 _z}, et de phases Φ _0x , Φ _{0 _y} et Φ _{0 _z} sont générés à partir d'un processus pseudo-aléatoire :
- o les vecteurs unitaires sont générés à partir d'un azimut issu d'un générateur pseudo-aléatoire de réels uniforme dans ] - π, π] et d'une élévation issue de l'arcsinus d'un réel d'un générateur pseudo-aléatoire uniforme dans [-1,1] ;
- o les phases sont obtenues à l'aide d'un générateur pseudo-aléatoire de réels uniforme dans ] - π, π].
Les fréquences ou bandes de fréquences sont balayées depuis celles correspondant aux basses fréquences vers ceux correspondant aux hautes fréquences, pour lisser spectrale-ment les vecteurs et phases à l'aide de la procédure suivante :
- Pour les vecteurs r _w (b) où b est l'indice de la fréquence ou de la bande de fréquences, ${\begin{matrix} \vec{r_{w}} (b = 0) = \vec{r_{0_{w}}} (0) \\ \vec{r_{w}} (b > 0) = Norm 〈 \vec{r_{w}} (b - 1) + τ \vec{r_{0_{w}}} (b) | {(1, 0,0)}^{T} 〉 \end{matrix}$
  où τ est l'équivalent fréquentiel d'un temps caractéristique, permettant à l'utilisateur de choisir le lissage spectral du caractère diffus ; une valeur possible pour une fréquence d'échantillonnage de 48 kHz, une taille de fenêtre de 2048 et un padding de 100% est 0,65.
- Les vecteurs r _x, r _y , r _z suivent la même procédure à partir de r _0x , r ₀ _y, r _{0 _z} respectivement.
- Pour les phases Φ_x (b) où b est l'indice de la fréquence ou de la bande de fréquences, ${\begin{cases} ϕ_{x} (b = 0) = ϕ_{0_{x}} (0) \\ ϕ_{x} (b > 0) = \arg (e^{{iϕ}_{x} (b - 1)} + {τe}^{ϕ_{0_{x}} (b)}) \end{cases}$
  où τ est issu des mêmes considérations que pour les vecteurs.
- Les phases Φ_y et Φ_z suivent la même procédure à partir de Φ _0y et Φ _0z respectivement.
Si un processus dynamique est souhaité, lors de la génération de nouveaux vecteurs r _{0 _w}, r _{0 _x}, r _{0 _y}, r _{0 _z} et de nouvelles phases Φ _{0 _x}, Φ _{0 _y}, l'ancien vecteur et l'ancienne phase sont conservés d'une manière analogue aux processus énoncés, à l'aide d'un paramètre de temps caractéristique.

Les vecteurs des plus basses fréquences, par exemple ceux correspondant aux fréquences inférieures à 150 Hz sont modifiés pour être dirigés vers une direction privilégiée, par exemple et de préférence (1,0,0) ^T . Pour ce faire, la génération des vecteurs aléatoires r _{0 _w}, r _{0 _x}, r _{0 _y}, r _{0 _z} est modifiée : elle consiste alors

à générer un vecteur unitaire aléatoire,
à déterminer un vecteur (m n^b, 0,0) ^T où m est un facteur supérieur à 1, par exemple 8, et n est un facteur inférieur à 1, par exemple 0,9, permettant de faire décroître la prépondérance de ce vecteur par rapport au vecteur unitaire aléatoire lorsque l'index b du bin fréquentiel augmente,
à sommer et normaliser le vecteur obtenu.

Le lissage spectral pour l'obtention des vecteurs r _w, r _x, r _y , r _z est inchangé.
Alternativement à la procédure de génération de vecteurs aléatoires, les vecteurs r _w, r _x, r _y , r _z , et phases Φ_x, Φ_y et Φ_z peuvent être déterminés par des mesures de réponse impulsionnelle : il est possible de les obtenir par l'analyse des coefficients fréquentiels complexes issus de multiples captations sonores du champ sphérique du premier ordre, à l'aide de signaux émis par des haut-parleurs, en phase tout autour du point de mesure pour r _w, de part et d'autre et hors-phase suivant les axes X, Y, et Z pour r _x, r _y , et r _z respectivement et Φ_x, Φ_y et Φ_z respectivement.Concerning the direction vectors for the diffuse parts, it is referred above to:

vectors r _w , r _x , r _y , r _z , and
phases Φ _x , Φ _y and Φ _z .

These vectors and phases are responsible for establishing a diffuse character in the signal, of which they give the direction and of which they modify the phase. They depend on the frequency band processed, that is to say that there is a set of vectors and phase for each frequency "bin". In order to establish this diffuse character, they result from a random process, which makes it possible to smooth them spect-trally, as well as temporally if it is desired that they be dynamic.
The process for obtaining these vectors is as follows:

For each frequency or frequency band, a set of unit vectors r _{0 _w} , r _{0 _x} , r _{0 _y} , r _{0 _z} , and phases Φ ₀ _x , Φ _{0 _y} and Φ _{0 _z} are generated from a pseudo-random process:
- o the unit vectors are generated from an azimuth resulting from a pseudo-random generator of uniform reals in] - π , π ] and from an elevation resulting from the arcsine of a real from a pseudo- generator uniform random in [-1,1];
- o the phases are obtained using a pseudo-random generator of uniform reals in] - π , π ].
The frequencies or frequency bands are swept from those corresponding to low frequencies to those corresponding to high frequencies, to spectrally smooth the vectors and phases using the following procedure:
- For vectors r _w (b) where b is the index of the frequency or frequency band, ${\begin{matrix} \vec{r_{w}} (b = 0) = \vec{r_{0_{w}}} (0) \\ \vec{r_{w}} (b > 0) = Norm 〈 \vec{r_{w}} (b - 1) + τ \vec{r_{0_{w}}} (b) | {(1, 0.0)}^{T} 〉 \end{matrix}$
  where τ is the frequency equivalent of a characteristic time, allowing the user to choose the spectral smoothing of the diffuse character; a possible value for a sample rate of 48 kHz, a window size of 2048, and a padding of 100% is 0.65.
- The vectors r _x , r _y , r _z follow the same procedure from r ₀ _x , r ₀ _y , r _{0 _z} respectively.
- For the phases Φ _x ( b ) where b is the index of the frequency or of the frequency band, ${\begin{cases} ϕ_{x} (b = 0) = ϕ_{0_{x}} (0) \\ ϕ_{x} (b > 0) = \arg (e^{{iϕ}_{x} (b - 1)} + {τe}^{ϕ_{0_{x}} (b)}) \end{cases}$
  where τ results from the same considerations as for the vectors.
- Phases Φ _y and Φ _z follow the same procedure from Φ _{0 y} and Φ _0z respectively.
If a dynamic process is desired, when generating new vectors r _{0 _w} , r _{0 _x} , r _{0 _y} , r _{0 _z} and new phases Φ _{0 _x} , Φ _{0 _y} , the old vector and the old phase are preserved in a manner analogous to the stated processes, using a characteristic time parameter.

The vectors of the lowest frequencies, for example those corresponding to the frequencies lower than 150 Hz are modified to be directed towards a privileged direction, for example and preferably (1,0,0) ^T. To do this, the generation of the random vectors r _{0 _w} , r _{0 _x} , r _{0 _y} , r _{0 _z} is modified: it then consists

to generate a random unit vector,
in determining a vector ( mn ^b , 0,0) ^T where m is a factor greater than 1, for example 8, and n is a factor less than 1, for example 0.9, making it possible to decrease the preponderance of this vector compared to the random unit vector when the index b of the frequency bin increases,
in summing and normalizing the vector obtained.

Spectral smoothing to obtain vectors r _w , r _x , r _y , r _z is unchanged.
As an alternative to the procedure for generating random vectors, the vectors r _w , r _x , r _y , r _z , and phases Φ _x , Φ _y and Φ _z can be determined by impulse response measurements: it is possible to obtain them by analyzing the complex frequency coefficients resulting from multiple sound recordings of the first order spherical field, at l '' using signals emitted by loudspeakers, in phase all around the measurement point to r _w , on either side and out of phase along the X, Y, and Z axes for r _x , r _y , and r _z respectively and Φ _x , Φ _y and Φ _z respectively.

Pour la (ou les) fréquence(s) ou bande(s) de fréquences correspondant à la composante continue, le traitement est distinct. On notera que du fait du padding, le régime continu correspond à une ou plusieurs fréquence(s) ou bande(s) de fréquences :

s'il n'y a pas de padding, seul le première fréquence ou bande de fréquences subit le traitement tel que défini ci-dessous ;
s'il y a un padding de 100% (qui double donc la longueur du signal avant transformée temps-vers-fréquence), les deux premières fréquences ou bandes de fréquences se voient appliquer le traitement tel que défini ci-dessous (ainsi que la fréquence ou bande de fréquences « négative » qui est conjuguée-symétrique de la seconde fréquence ou bande de fréquences) ;
s'il y a un padding de 300% (qui quadruple donc la longueur du signal avant transformée temps-vers-fréquence), les quatre premières fréquences ou bandes de fréquences se voient appliquer le traitement tel que défini ci-dessous (ainsi que les fréquences ou bandes de fréquences « négative » qui sont conjuguées-symétrique des seconde, troisième et quatrième fréquences ou bandes de fréquences) ;
les autres cas de padding découlent de la même logique.

Cette (ou ces) fréquence(s) ou bande(s) de fréquences sont à valeur réelle et non complexe, ce qui ne permet pas de connaître la phase du signal pour les fréquences correspondantes ; l'analyse de direction n'est donc pas possible. Cependant, comme le montre la littérature psychoacoustique, un être humain ne peut percevoir une direction de provenance pour les basses fréquences concernées (celles en dessous de 80 à 100 Hz, en l'espèce). Il est ainsi possible n'analyser que l'onde de pression, donc le coefficient c_w, et de choisir une direction de provenance arbitraire, frontale : (1,0,0) ^T . Ainsi la représentation dans le domaine sphérique du (ou des) premier(s) bin(s) fréquentiel(s) est :

{\begin{matrix} \vec{v_{total}} = {(1, 0,0)}^{T} \\ c_{total} = c_{w} \sqrt{2} \end{matrix}

For the frequency (s) or frequency band (s) corresponding to the DC component, the processing is separate. Note that due to the padding, the continuous mode corresponds to one or more frequency (s) or frequency band (s):

if there is no padding, only the first frequency or frequency band undergoes the processing as defined below;
if there is a 100% padding (which therefore doubles the length of the signal before time-to-frequency transform), the first two frequencies or frequency bands are subject to the processing as defined below (as well as the "negative" frequency or frequency band which is conjugate-symmetrical to the second frequency or frequency band);
if there is a padding of 300% (which therefore quadruple the length of the signal before time-to-frequency transform), the first four frequencies or frequency bands are subject to the processing as defined below (as well as the “negative” frequencies or frequency bands that are conjugate-symmetrical to the second, third and fourth frequencies or frequency bands);
the other cases of padding follow from the same logic.

This (or these) frequency (s) or band (s) of frequencies are at real value and not complex, which does not make it possible to know the phase of the signal for the corresponding frequencies; direction analysis is therefore not possible. However, as the psychoacoustic literature shows, a human being cannot perceive a direction of origin for the low frequencies concerned (those below 80 to 100 Hz, in this case). It is thus possible to analyze only the pressure wave, therefore the coefficient c _w , and to choose an arbitrary, frontal direction of origin: (1,0,0) ^T. Thus the representation in the spherical domain of the first frequency bin (s) is:

{\begin{matrix} \vec{v_{total}} = {(1, 0.0)}^{T} \\ {vs}_{total} = {vs}_{w} \sqrt{2} \end{matrix}

Afin d'assurer la correspondance entre coordonnées sphériques et le domaine intercanal, la sphère de Scheiber, correspondant dans le domaine de l'optique, à la sphère de Stokes-Poincaré, est utilisée dans ce qui suit.
La sphère de Scheiber représente de manière symbolique les relations de magnitude et de phase de deux ondes monochromatiques, c'est-à-dire également de deux coefficients fréquentiels complexes représentant ces ondes. Elle est constituée de demi-cercles joignant les points opposés L et R, chaque demi-cercle étant issu d'une rotation autour de l'axe LR de l'arc frontal en gras d'un angle β et représentant une valeur de différence de phase β ∈ ]-π, π]. Le demi-cercle frontal représente une différence de phase nulle. Chaque point du demi-cercle représente une valeur distincte de panorama, avec une valeur proche de 1 pour les points proches de L, et une valeur proche de -1 pour les points proches de R.
La figure 1 illustre le principe de la sphère de Scheiber. La sphère de Scheiber (100) représente de manière symbolique à l'aide de points sur une sphère les relations de magnitude et de phase de deux ondes monochromatiques, c'est-à-dire également de deux coefficients fréquentiels complexes représentant ces ondes, sous forme de demi-cercles d'égale différence de phase et indexés sur le panorama. Peter Scheiber a établi dans « Analyzing Phase-Amplitude Matrices » (JAES, 1971 ) qu'il était possible de faire correspondre cette sphère, construite de manière symbolique, avec la sphère des positions physiques des sources sonores, permettant un encodage sphérique des sources sonores. Il est choisi de suivre cette correspondance, de préférence en assignant les méridiens de différence de phase positive aux élévations négatives, cela permettant d'assurer une certaine compatibilité avec les signaux surround matricés classiques - un simple changement de signe permet d'obtenir une convention inverse, inversant les élévations positives et négatives. Ainsi l'axe LR (101, 102) devient l'axe Y (103), l'axe X (105) pointant en direction du demi-cercle (104) de différence de phase nulle.
Concernant la conversion depuis le domaine intercanal vers les coordonnées sphériques, le système de coordonnées de la sphère de Scheiber est sphérique d'axe polaire Y, et l'on peut exprimer les coordonnées en X, Y, Z en fonction du panorama et de la différence de phase : ${\begin{cases} x = \cos (\frac{π}{2} panorama) \cos (phasediff) \\ y = \sin (\frac{π}{2} panorama) \\ z = - \cos (\frac{π}{2} panorama) \sin (phasediff) \end{cases}$

Les coordonnées sphériques en azimut et élévation pour de telles coordonnées cartésiennes sont obtenues par la méthode suivante :

{\begin{cases} a = atan 2 (y, x) = atan2 (\sin (\frac{π}{2} panorama), \cos (\frac{π}{2} panorama) \cos (phasediff)) \\ e = \arcsin (z) = \arcsin (- \cos (\frac{π}{2} panorama) \sin (phasediff)) \end{cases}

Ainsi, étant donnée une paire de coefficients fréquentiels complexes, leur relation établissant un panorama et une différence de phase, il est possible de déterminer une direction de provenance d'un signal sonore sur une sphère. Cette conversion permet également de déterminer la magnitude du coefficient fréquentiel complexe du signal monophonique, mais la détermination de sa phase n'est pas établie par la méthode ci-dessus et sera précisée par la suite.
Il est possible d'obtenir la réciproque de la conversion présentée précédemment, c'est-à-dire la conversion depuis les coordonnées sphériques vers le domaine intercanal :

{\begin{matrix} panorama = \frac{2}{π} \arcsin (y) \\ phasediff = - atan 2 (z, x) \end{matrix}

soit, en coordonnées sphériques :

{\begin{cases} panorama = \frac{2}{π} \arcsin (\sin (a) \cos (e)) \\ phasediff = - atan 2 (\sin (e), \cos (a) \cos (e)) \end{cases}

Ainsi, étant donné le coefficient complexe d'un signal monophonique et sa direction de provenance, il est possible de déterminer les magnitudes de deux coefficients complexes ainsi que leur différence de phase, mais, comme vu plus haut, la détermination de leur phase absolue n'est pas établie par la méthode ci-dessus.In order to ensure the correspondence between spherical coordinates and the inter-channel domain, the Scheiber sphere, corresponding in the field of optics, to the Stokes-Poincaré sphere, is used in what follows.
Scheiber's sphere symbolically represents the magnitude and phase relationships of two monochromatic waves, that is, also of two complex frequency coefficients representing these waves. It consists of semicircles joining the opposite points L and R, each semicircle resulting from a rotation around the axis LR of the frontal arc in bold by an angle β and representing a difference value of phase β ∈] - π , π ]. The frontal semicircle represents a zero phase difference. Each point of the semi-circle represents a distinct panorama value, with a value close to 1 for points close to L, and a value close to -1 for points close to R.
The figure 1 illustrates the principle of Scheiber's sphere. Scheiber's sphere (100) symbolically represents with the help of points on a sphere the relations of magnitude and phase of two monochromatic waves, that is to say also of two complex frequency coefficients representing these waves, in the form of semi-circles of equal phase difference and indexed on the panorama. Peter Scheiber has established in "Analyzing Phase-Amplitude Matrices" (JAES, 1971 ) that it was possible to match this sphere, constructed in a symbolic way, with the sphere of the physical positions of the sound sources, allowing a spherical encoding of the sound sources. It is chosen to follow this correspondence, preferably by assigning the positive phase difference meridians to the negative elevations, this allows some compatibility with the classic matrixed surround signals - a simple change of sign makes it possible to obtain an inverse convention. , reversing positive and negative elevations. Thus the LR axis (101, 102) becomes the Y axis (103), the X axis (105) pointing in the direction of the semicircle (104) of zero phase difference.
Regarding the conversion from the inter-channel domain to the spherical coordinates, the coordinate system of the Scheiber sphere is spherical with polar axis Y, and we can express the coordinates in X, Y, Z according to the panorama and the phase difference:

{\begin{cases} x = \cos (\frac{π}{2} panorama) \cos (phasediff) \\ y = \sin (π) (\frac{}{2} panorama) \\ z = - \cos (\frac{π}{2} panorama) \sin (phasediff) \end{cases}

The spherical coordinates in azimuth and elevation for such Cartesian coordinates are obtained by the following method:

{\begin{cases} To = atan 2 (y, x) = atan2 (\sin (π)) ((\frac{}{2} panorama), \cos (\frac{π}{2} panorama) \cos (phasediff)) \\ e = \arcsin (z) = \arcsin (- \cos (π) (\frac{}{2} panorama) \sin (phasediff)) \end{cases}

Thus, given a pair of complex frequency coefficients, their relation establishing a panorama and a phase difference, it is possible to determine a direction of origin of a sound signal on a sphere. This conversion also makes it possible to determine the magnitude of the complex frequency coefficient of the monophonic signal, but the determination of its phase is not established by the above method and will be specified later.
It is possible to obtain the reciprocal of the conversion presented previously, that is to say the conversion from the spherical coordinates to the inter-channel domain:

{\begin{matrix} panorama = \frac{2}{π} \arcsin (y) \\ phasediff = - atan 2 (z, x) \end{matrix}

either, in spherical coordinates:

{\begin{cases} panorama = \frac{2}{π} \arcsin (\sin (To) \cos (e)) \\ phasediff = - atan 2 (\sin (e), \cos (To) \cos (e)) \end{cases}

Thus, given the complex coefficient of a monophonic signal and its direction of origin, it is possible to determine the magnitudes of two complex coefficients as well as their phase difference, but, as seen above, the determination of their absolute phase n is not established by the above method.

Conformément à la présentation faite par Peter Scheiber dans « Analyzing Phase-Amplitude Matrices » (JAES, 1971 ) les azimuts 90° et -90° correspondent aux haut-parleurs gauche (L) et droit (R), qui sont habituellement situés respectivement aux azimuts 30° et -30° de part et d'autre face à l'auditeur. Ainsi, pour respecter cette correspondance spatiale qui permet naturellement une compatibilité avec les formats stéréo et surround matricé, une conversion vers le domaine sphérique peut être suivie par une modification affine par segments des coordonnées en azimut :

tout azimut a ∈ [-90°, 90°] se retrouve étiré dans l'intervalle [-30°, 30°] d'une manière affine,
tout azimut a ∈ [90°, 180°] se retrouve étiré dans l'intervalle [30°, 180°] d'une manière affine,
tout azimut a ∈ ]-180°, -90°] se retrouve étiré dans l'intervalle ]-180°, -30°] d'une manière affine.

Pour suivre le même principe, une conversion depuis le domaine sphérique peut alors naturellement être précédée de la conversion inverse :

tout azimut a ∈ [-30°,30°] se retrouve étiré dans l'intervalle [-90°,90°] d'une manière affine,
tout azimut a ∈ [30°, 180°] se retrouve étiré dans l'intervalle [90°, 180°] d'une manière affine,
tout azimut a ∈ ]-180°, -30°] se retrouve étiré dans l'intervalle ]-180°, -90°] d'une manière affine.

Dans « Understanding the Scheiber Sphere » (MCS Review, Vol.4, No.3, Winter 1983), Sommerwerck illustre ce principe de correspondance entre espace physique et sphère de Schieber, le dit principe sera donc évident à toute personne au fait de l'état de l'art. Ces conversions d'azimut sont illustrées dans la figure 13, qui donne le principe les opérations (1301) et (1302) assurant les dites modifications affines.
Dans le cadre de la détermination de la correspondance de phase, l'objectif est de réaliser une correspondance entièrement déterminée entre une paire de coefficients fréquentiels complexes (domaine intercanal) d'une part et un coefficient fréquentiel complexe et des coordonnées sphériques d'autre part (domaine sphérique).
Comme on l'a vu plus haut, la correspondance établie précédemment ne permet pas de déterminer la phase des coefficients fréquentiels complexes, mais seulement la différence de phase dans la paire de coefficients fréquentiels complexes du domaine intercanal.
Il s'agit alors de déterminer la correspondance adéquate pour les phases, c'est-à-dire comment définir la phase d'un coefficient dans le domaine sphérique en fonction de la position dans le domaine intercanal (panorama, phasediff), ainsi que la phase absolue des dits coefficients (laquelle sera représentée par un valeur de phase intermédiaire, comme on le verra par la suite).
On établit une représentation d'une correspondance de phases sous forme de carte bidimensionnelle des phases dans le domaine intercanal, avec le panorama en abscisse sur le domaine de valeurs [-1,1], et de la différence de phase en ordonnée dans le domaine de valeurs ]-π, π]. On représente sur cette carte les paires de coefficients complexes du domaine intercanal obtenus depuis une conversion depuis un coefficient du domaine sphérique :

possédant une phase φ = 0, les autres phases l'entrée et de sortie étant obtenues à une rotation identique près,
possédant des coordonnées sphériques, qui sont bijectives avec un panorama et une différence de phase, choisies par la suite comme coordonnées de la carte.

Les paires de coefficients sont représentées localement, la carte représente donc un champ de paires de coefficients complexes. Le choix d'une correspondance de phase correspond à la rotation locale du plan complexe contenant la paire de coefficients fréquentiels complexes. On peut observer que la carte est une représentation bidimensionnelle de la sphère de Scheiber, à laquelle l'information de phase est ajoutée.
La figure 2 illustre un exemple de carte (200) de correspondance des phases entre le domaine sphérique et le domaine intercanal, représentant, pour différentes mesures de panorama en abscisse (201) et de différence de phase en ordonnée (202), un choix de correspondance de phase arbitraire qui est simplement la soustraction de la moitié de différence de phase pour le canal L et l'ajout de la moitié de la différence de phase pour le canal R. L'axe des abscisses (201) est inversé pour que les positions latérales gauche correspondent à un signal de puissance prépondérante dans le canal L et respectivement pour le côté droit et le canal R. L'axe des ordonnées (201) est également inversé pour l'hémisphère à élévation positive soit la moitié haute de la figure. Le champ de paires de coefficients complexes est représenté dans des sections de plans complexes autour de l'origine ; dans chaque repère, le coefficient fréquentiel complexe c_L est représenté par un vecteur dont le sommet est un cercle, le coefficient fréquentiel complexe c_R est représenté par un vecteur dont le sommet est une croix. Cette carte de correspondance de phase n'est pas utilisable car elle contrevient aux principes exposés par la suite.
Le critère choisi pour la conception d'une correspondance est celui de la continuité spatiale de la phase du signal, c'est-à-dire qu'un changement infime de position d'une source sonore doit aboutir à un changement infime de la phase. Le critère de continuité de phase impose des contraintes pour une correspondance de phases aux bords du domaine :

le haut et le bas du domaine sont, par le bouclage de la phase à 2π près, voisins. Ainsi les valeurs doivent être identiques en haut et en bas du domaine.
l'ensemble des valeurs à gauche du domaine (respectivement l'ensemble des valeurs à droite du domaine) correspond au voisinage du point L (respectivement du point R) de la sphère des localisations. Pour assurer la continuité autour de ces points sur la sphère, la phase du coefficient fréquentiel complexe possédant la plus grande magnitude doit être constante. La phase du coefficient fréquentiel complexe possédant la plus petite magnitude est alors imposée par la différence de phase ; elle effectue une rotation de 2π lorsqu'une courbe est parcourue autour des points L ou R de la sphère mais ce n'est pas problématique car la magnitude s'annule au point de discontinuité de phase, découlant sur une continuité du coefficient fréquentiel complexe.

In accordance with the presentation made by Peter Scheiber in "Analyzing Phase-Amplitude Matrices" (JAES, 1971 ) the 90 ° and -90 ° azimuths correspond to the left (L) and right (R) speakers, which are usually located respectively at the 30 ° and -30 ° azimuths on either side facing the listener. Thus, to respect this spatial correspondence which naturally allows compatibility with the stereo and matrixed surround formats, a conversion to the spherical domain can be followed by an affine modification by segments of the coordinates in azimuth:

any azimuth at ∈ [-90 °, 90 °] is found stretched in the interval [-30 °, 30 °] in an affine way,
all azimuth a ∈ [90 °, 180 °] is found stretched in the interval [30 °, 180 °] in an affine way,
any azimuth a ∈] -180 °, -90 °] is found stretched in the interval] -180 °, -30 °] in an affine way.

To follow the same principle, a conversion from the spherical domain can then naturally be preceded by the reverse conversion:

any azimuth a ∈ [-30 °, 30 °] is found stretched in the interval [-90 °, 90 °] in an affine way,
all azimuth a ∈ [30 °, 180 °] is found stretched in the interval [90 °, 180 °] in an affine way,
any azimuth a ∈] -180 °, -30 °] is found stretched in the interval] -180 °, -90 °] in an affine manner.

In “Understanding the Scheiber Sphere” (MCS Review, Vol.4, No.3, Winter 1983), Sommerwerck illustrates this principle of correspondence between physical space and Schieber's sphere, the said principle will therefore be obvious to anyone familiar with the state of the art. These azimuth conversions are illustrated in figure 13 , which gives the principle of operations (1301) and (1302) ensuring the said affine modifications.
In the context of the determination of the phase correspondence, the objective is to achieve a fully determined correspondence between a pair of complex frequency coefficients (inter-channel domain) on the one hand and a complex frequency coefficient and spherical coordinates on the other hand. (spherical domain).
As seen above, the correspondence established previously does not make it possible to determine the phase of the complex frequency coefficients, but only the phase difference in the pair of complex frequency coefficients of the inter-channel domain.
It is then a question of determining the adequate correspondence for the phases, i.e. how to define the phase of a coefficient in the spherical domain according to the position in the inter-channel domain (panorama, phasediff), as well as the absolute phase of said coefficients (which will be represented by an intermediate phase value, as will be seen below).
We establish a representation of a phase correspondence in the form of a two-dimensional map of the phases in the inter-channel domain, with the panorama on the abscissa on the domain of values [-1,1], and of the phase difference on the ordinate in the domain of values] - π , π ]. We represent on this map the pairs of complex coefficients of the inter-channel domain obtained from a conversion from a coefficient of the spherical domain:

having a phase φ = 0, the other entry and exit phases being obtained with close identical rotation,
having spherical coordinates, which are bijective with a panorama and a phase difference, subsequently chosen as coordinates of the map.

The pairs of coefficients are represented locally, so the map represents a field of pairs of complex coefficients. The choice of a phase correspondence corresponds to the local rotation of the complex plane containing the pair of complex frequency coefficients. It can be observed that the map is a two-dimensional representation of the Scheiber sphere, to which the phase information is added.
The figure 2 illustrates an example of a map (200) of phase correspondence between the spherical domain and the inter-channel domain, representing, for different measurements of panorama on the abscissa (201) and phase difference on the ordinate (202), a choice of phase correspondence arbitrary which is simply subtracting half the phase difference for the L channel and adding half the phase difference for the R channel. The x-axis (201) is inverted so that the left side positions correspond to a predominant power signal in the L channel and respectively for the right side and the R channel. The ordinate axis (201) is also inverted for the hemisphere with positive elevation, ie the upper half of the figure. The field of complex coefficient pairs is represented in sections of complex planes around the origin; in each frame of reference, the complex frequency coefficient c _L is represented by a vector whose vertex is a circle, the complex frequency coefficient c _R is represented by a vector whose vertex is a cross. This phase correspondence card cannot be used because it contravenes the principles explained below.
The criterion chosen for the design of a correspondence is that of the spatial continuity of the phase of the signal, that is to say that a tiny change in the position of a sound source must result in a tiny change of the phase. . The phase continuity criterion imposes constraints for a phase correspondence at the edges of the domain:

the top and bottom of the domain are, by the looping of the phase to within 2 π , neighbors. Thus the values must be identical at the top and at the bottom of the domain.
the set of values to the left of the domain (respectively the set of values to the right of the domain) corresponds to the neighborhood of the point L (respectively of the point R) of the sphere of locations. To ensure continuity around these points on the sphere, the phase of the complex frequency coefficient having the greatest magnitude must be constant. The phase of the complex frequency coefficient having the smallest magnitude is then imposed by the phase difference; it performs a rotation of 2 π when a curve is traversed around the points L or R of the sphere but this is not problematic because the magnitude vanishes at the point of phase discontinuity, resulting in a continuity of the complex frequency coefficient .

La figure 3 donne un exemple de correspondance de phase qui peut être construite d'après ces contraintes, pour assurer une continuité de phase aux bords de la carte (300). La constance de la valeur de phase est assurée sur chacun des bords latéraux, et il y a égalité des valeurs par la correspondance du haut et du bas du domaine. Cette solution n'étant pas unique, d'autres cartes de correspondance sont possibles.
Établissons s'il est possible de définir une carte continue des phases. Il est possible de "replier" la carte de correspondance des phases sur la sphère de Scheiber, qui est également la sphère des positions spatiales :

en collant ensemble les bord haut et bas sur le demi-cercle opposé au demi-cercle frontal,
en pinçant les côtés gauche et droit chacun autour de son point correspondant L ou R.

The figure 3 gives an example of a phase correspondence which can be constructed according to these constraints, to ensure phase continuity at the edges of the board (300). The constancy of the phase value is ensured on each of the lateral edges, and there is equality of the values by the correspondence of the top and the bottom of the domain. As this solution is not unique, other correspondence cards are possible.
Let us establish whether it is possible to define a continuous map of the phases. It is possible to "fold" the map of correspondence of the phases on the sphere of Scheiber, which is also the sphere of the spatial positions:

by gluing the top and bottom edges together on the semi-circle opposite the front semi-circle,
by pinching the left and right sides each around its corresponding point L or R.

La figure 4 illustre la façon dont la carte bidimensionnelle (200) de la figure 2 est repliée sur la sphère de Scheiber (100) de la figure 1. On conserve les directions des repères locaux par le repliement ; les repères locaux ont ainsi leur direction continue sur la sphère, sauf aux points L et R, mais cela n'est pas un problème car la continuité de phase est déjà assurée en ces points. Il est ainsi obtenu, pour une carte de correspondance, deux champs de coefficients complexes. Ces coefficients complexes correspondent à des vecteurs tangents à la sphère, sauf aux points L et R. On note que la carte (200), une fois repliée en totalité comme illustré figure 5, présente sur l'arc arrière (en tracé continu fin) (500) une discontinuité de phase, discontinuité qui est résolue par la méthode illustrée par la figure 3.
On considère par la suite le champ de vecteurs tangents générés par le coefficient du canal gauche c_L ; les considérations sont identiques pour le champ de vecteurs tangents générés par le coefficient du canal droit c_R. On modifie pour les considérations de la démonstration le champ de vecteurs au voisinage immédiat de L à l'aide d'un facteur réel qui l'annule en L, afin d'assurer la continuité du champ de vecteurs ; ceci ne modifie en rien les phases et donc la correspondance des phases.
D'après le théorème de Poincaré-Hopf, la somme des indices des zéros isolés du champ de vecteurs est égale à la caractéristique d'Euler-Poincaré de la surface. En l'espèce, un champ de vecteurs sur une sphère possède une caractéristique d'Euler-Poincaré de 2. Or par construction, le champ de vecteurs issu de c_L s'annule en R avec un indice 0 ou 2 et s'annule de par la modification autour de L avec un indice 1 comme cela peut être vu figure 6. La somme des indices est donc impaire, et cela impose au moins un autre zéro dans le champ vectoriel, d'indice adéquat afin que la somme des indices soit égale à la caractéristique d'Euler-Poincaré. Ce zéro n'étant pas possible par construction de la sphère de Scheiber, les magnitudes des coefficients complexes n'étant pas altérables, cela impose au moins une discontinuité supplémentaire dans le champ de coefficients complexes c_L . En conclusion, il n'est pas possible d'établir une correspondance de phase qui soit continue sur l'ensemble de la sphère de Scheiber.
La méthode dévoilée dans la présente invention résout cette problématique de continuité de phase. Elle s'appuie sur l'observation que dans les cas réels l'ensemble de la sphère n'est pas intégralement et simultanément parcourue par des signaux. Une discontinuité de correspondance de phase localisée en un point de la sphère parcouru par des signaux (signaux fixes ou trajectoires spatiales de signaux) provoquera une discontinuité de phase. Une discontinuité de correspondance de phase localisée en un point de la sphère non parcouru par des signaux (signaux fixes ou trajectoires spatiales de signaux) ne provoque pas de discontinuité de phase. Sans connaissance a priori des signaux, une discontinuité en un point fixe ne pourra pas garantir qu'aucun signal ne passera par ce point. Une discontinuité en un point mouvant pourra par contre "éviter" d'être parcourue par un signal, si sa localisation est fonction du signal. Ce point de discontinuité mouvant peut faire partie d'une correspondance de phase dynamique qui est continue sur tout autre point de la sphère. Le principe de correspondance de phase dynamique s'appuyant sur l'évitement de la localisation spatiale du signal par la discontinuité est ainsi établi. Nous allons établir une telle correspondance de phase s'appuyant sur ce principe, d'autres correspondances de phases étant possibles.
On définit une fonction de correspondance de phase Φ (panorama, phasediff) qui est utilisée dans les deux sens de conversion, depuis le domaine intercanal vers le domaine sphérique ainsi que dans le sens inverse ; le panorama et la différence de phase sont obtenus dans le domaine d'origine ou dans le domaine d'arrivée de ces deux conversions comme indiqué précédemment. Cette fonction décrit la différence de phase entre le domaine sphérique et le domaine intercanal : $Φ (panorama, phasediff) = ϕ_{s} - ϕ_{i}$

où Φ_s est la phase du coefficient fréquentiel complexe du domaine sphérique, et φ_i est la phase intermédiaire du domaine intercanal :

ϕ_{i} = \arg (c_{L}) + \frac{1}{2} phasediff = \arg (c_{R}) - \frac{1}{2} phasediff

où c_L et c_R sont les coefficients fréquentiels complexes du domaine intercanal. La fonction de correspondance de phase est dynamique, c'est-à-dire qu'elle varie d'une fenêtre temporelle à la suivante. On construit cette fonction avec une singularité dynamique, située en un point Ψ = (panorama_singularité, phasediff_singularité) du domaine intercanal défini par une valeur de panorama panorama_singularité dans [-1/2,1/2] et de différence de phase phasediff_singularité dans ]-π, -π/2]. Ceci correspond à une zone située à l'arrière de l'auditeur, légèrement en hauteur. Il est possible de choisir arbitrairement d'autres zones. La singularité est initialement localisée au centre de cette zone, à une position Ψ ₀ que l'on appelle "ancre" par la suite. Il est possible de choisir arbitrairement d'autres localisations initiales de l'ancre à l'intérieur de la zone. On note en indice de la fonction de correspondance de phase le choix de panorama et de différence de phase correspondant à la singularité. Une formulation d'une fonction de correspondance de phase ne créant qu'une singularité en est la suivante :

Si phasediff ≥ -π/2 : $Φ_{Ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff$
Si phasediff < -π/2 et panorama ≤ -1/2 : $Φ_{ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff + (panorama + 1) (2 phasediff + π)$
Si phasediff < -π/2 et panorama ≥ 1/2 : $Φ_{ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff + (panorama - 1) (2 phasediff + π)$
- Si phasediff < -π/2 et panorama ∈ ]-1/2,1/2[, c'est-à-dire si les coordonnées du point sont à l'intérieur de la zone de la singularité, alors ses coordonnées sont projetées depuis le point Ψ sur le bord de la zone, et les formules précédentes sont utilisées avec les coordonnées du point projeté. Si le point est exactement situé sur Ψ malgré les précautions, un point quelconque du bord de la zone peut être utilisé.

Afin d'éviter que le point de la singularité Ψ soit situé, spatialement parlant, près d'un signal, il est déplacé dans la zone afin de "fuir" la localisation du signal, fenêtre de traitement après fenêtre de traitement. Pour ce faire, de préférence avant le calcul de la correspondance de phase, toutes les bandes de fréquences sont analysées afin de déterminer leur localisation respective de panorama et de différence de phase dans le domaine intercanal, et pour chacune un vecteur de modification est calculé, destiné à déplacer le point de la singularité. Par exemple, dans une implémentation privilégiée de la présente invention, la modification issue d'une bande de fréquences peut être calculée comme suit :

f_{ψ} (panorama, phasediff) = \frac{1}{N} \min (\frac{1}{4}, \frac{1}{100 d^{2}})

comme norme du vecteur de modification, où N est le nombre de bandes de fréquences et d la distance entre le point Ψ et le point de coordonnées (panorama, phasediff), si d ≠ 0, 0 sinon, et

\vec{u_{Ψ}} (panorama, phasediff) = \frac{Ψ - {(panorama,phasediff)}^{T}}{d}

comme direction du vecteur de modification, si d ≠ 0, 0 sinon. De préférence, pour un meilleur évitement des trajectoires, il est possible d'appliquer à u _Ψ (panorama, phasediff) une légère rotation dans le plan, par exemple de π/16 pour une fréquence d'échantillonnage de 48000 Hz, des fenêtres glissantes de 2048 samples et un padding de 100% (la valeur de l'angle de rotation étant à adapter en fonction de ces facteurs), utile par exemple lorsqu'une source possède une trajectoire linéaire qui passe par le point Ψ ₀, afin que la singularité contourne la source par un côté. Le vecteur de modification est alors :
F _Ψ (panorama, phasediff) = f_Ψ (panorama, phasediff) u _Ψ (panorama, phasediff) (51) Les vecteurs de modification issus de toutes les bandes de fréquences sont ensuite ajoutés, et à cette somme un vecteur de retour de la singularité à l'ancre Ψ ₀ est ajoutée, formulé par exemple comme suit :

{\vec{F}}_{Ψ_{0}} = \frac{1}{10} (Ψ_{0} - Ψ)

où le facteur

\frac{1}{10}

est modifié selon la fréquence d'échantillonnage, la taille de la fenêtre et le taux de padding comme pour la rotation. La vecteur de modification résultant Σ F est appliqué à la singularité sous forme d'un simple ajout de vecteur à un point :

Ψ \leftarrow Ψ + Σ \vec{F}

The figure 4 illustrates how the two-dimensional map (200) of the figure 2 is folded over the sphere of Scheiber (100) of the figure 1 . The directions of the local reference marks are preserved by the folding; the local reference marks thus have their continuous direction on the sphere, except at the points L and R, but this is not a problem because the phase continuity is already ensured at these points. It is thus obtained, for a correspondence card, two fields of complex coefficients. These complex coefficients correspond to vectors tangent to the sphere, except at points L and R. We note that the map (200), once completely folded as illustrated figure 5 , presents on the rear arc (in fine continuous trace) (500) a phase discontinuity, discontinuity which is resolved by the method illustrated by figure 3 .
We then consider the field of tangent vectors generated by the coefficient of the left channel c _L ; the considerations are identical for the field of tangent vectors generated by the coefficient of the right channel c _R. For the sake of proof, the vector field in the immediate neighborhood of L is modified by means of a real factor which cancels it out in L, in order to ensure the continuity of the vector field; this does not in any way modify the phases and therefore the correspondence of the phases.
According to the Poincaré-Hopf theorem, the sum of the indices of the isolated zeros of the vector field is equal to the Euler-Poincaré characteristic of the surface. In this case, a field of vectors on a sphere has an Euler-Poincaré characteristic of 2. However, by construction, the field of vectors resulting from c _L vanishes in R with an

index

0 or 2 and vanishes from the modification around L with an index 1 as can be seen figure 6 . The sum of the indices is therefore odd, and that imposes at least one other zero in the vector field, of adequate index so that the sum of the indices is equal to the characteristic of Euler-Poincaré. This zero not being possible by construction of the Scheiber sphere, the magnitudes of the complex coefficients not being alterable, this imposes at least one additional discontinuity in the field of complex coefficients c _L. In conclusion, it is not possible to establish a phase correspondence which is continuous over the whole of Scheiber's sphere.
The method disclosed in the present invention solves this problem of phase continuity. It is based on the observation that in real cases the whole of the sphere is not fully and simultaneously traversed by signals. A phase correspondence discontinuity located at a point of the sphere traversed by signals (fixed signals or spatial trajectories of signals) will cause a phase discontinuity. A phase correspondence discontinuity located at a point of the sphere not traversed by signals (fixed signals or spatial trajectories of signals) does not cause a phase discontinuity. Without a priori knowledge of the signals, a discontinuity at a fixed point cannot guarantee that no signal will pass through this point. A discontinuity at a moving point may on the other hand "avoid" being traversed by a signal, if its location is a function of the signal. This moving point of discontinuity can be part of a dynamic phase correspondence that is continuous at any other point on the sphere. The principle of dynamic phase correspondence based on the avoidance of the spatial localization of the signal by the discontinuity is thus established. We will establish such a phase correspondence based on this principle, other phase correspondences being possible.
We define a phase correspondence function Φ (panorama, phasediff) which is used in both directions of conversion, from the inter-channel domain to the spherical domain as well as in the reverse direction; the panorama and the phase difference are obtained in the original domain or in the arrival domain of these two conversions as indicated previously. This function describes the phase difference between the spherical domain and the inter-channel domain:

Φ (panorama, phasediff) = ϕ_{s} - ϕ_{i}

where Φ _s is the phase of the complex frequency coefficient of the spherical domain, and φ _i is the intermediate phase of the inter-channel domain:

ϕ_{i} = \arg ({vs}_{THE}) + \frac{1}{2} phasediff = \arg ({vs}_{R}) - \frac{1}{2} phasediff

where c _L and c _R are the complex frequency coefficients of the inter-channel domain. The phase matching function is dynamic, i.e. it varies from one time window to the next. We build this function with a dynamic singularity, located at a point Ψ = (panorama _singularity _{, phasediff singularity} ) of the inter-channel domain defined by a value of panorama panorama _singularity in [-1 / 2,1 / 2] and phase difference phasediff _singularity in] - π , - π / 2]. This corresponds to an area located at the rear of the listener, slightly in height. It is possible to arbitrarily choose other zones. The singularity is initially located in the center of this zone, at a position Ψ ₀ which one calls “anchor” thereafter. It is possible to arbitrarily choose other initial locations of the anchor within the zone. We denote as an index of the phase correspondence function the choice of panorama and phase difference corresponding to the singularity. A formulation of a phase correspondence function creating only one singularity is as follows:

If phasediff ≥ - π / 2: $Φ_{Ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff$
If phasediff <- π / 2 and panorama ≤ -1/2: $Φ_{ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff + (panorama + 1) (2 phasediff + π)$
If phasediff <- π / 2 and panorama ≥ 1/2: $Φ_{ψ} (panorama, phasediff) = - \frac{1}{2} panorama phasediff + (panorama - 1) (2 phasediff + π)$
- If phasediff <- π / 2 and panorama ∈] -1 / 2,1 / 2 [, i.e. if the coordinates of the point are inside the zone of the singularity, then its coordinates are projected from point Ψ on the edge of the area, and the previous formulas are used with the coordinates of the projected point. If the point is exactly on Ψ despite precautions, any point on the edge of the area can be used.

In order to prevent the point of the singularity Ψ from being located, spatially speaking, near a signal, it is moved in the area in order to "escape" the location of the signal, processing window after processing window. To do this, preferably before the calculation of the phase correspondence, all the frequency bands are analyzed in order to determine their respective location of panorama and phase difference in the inter-channel domain, and for each one a modification vector is calculated, intended to move the point of the singularity. For example, in a preferred implementation of the present invention, the change resulting from a frequency band can be calculated as follows:

f_{ψ} (panorama, phasediff) = \frac{1}{NOT} \min (\frac{1}{4}, \frac{1}{100 d^{2}})

as the norm of the modification vector, where N is the number of frequency bands and d the distance between the point Ψ and the point of coordinates (panorama, phasediff), if d ≠ 0, 0 otherwise, and

\vec{u_{Ψ}} (panorama, phasediff) = \frac{Ψ - {(panorama, phasediff)}^{T}}{d}

like direction of the vector of modification, if d ≠ 0, 0 otherwise. Preferably, for better avoidance of trajectories, it is possible to apply to u _Ψ (panorama, phasediff) a slight rotation in the plane, for example π / 16 for a sampling frequency of 48000 Hz, sliding windows of 2048 samples and a padding of 100% (the value of the angle of rotation being to be adapted according to these factors), useful for example when a source has a linear trajectory which passes through the point Ψ ₀ , so that the singularity bypasses the source by one side. The modification vector is then:
F _Ψ (panorama, phasediff) = f _Ψ (panorama, phasediff) u _Ψ (panorama, phasediff) (51) The modification vectors resulting from all the frequency bands are then added, and to this sum a return vector of the singularity at the anchor Ψ ₀ is added, formulated for example as follows:

{\vec{F}}_{Ψ_{0}} = \frac{1}{10} (Ψ_{0} - Ψ)

where the postman

\frac{1}{10}

is changed according to sample rate, window size and padding rate as for rotation. The resulting modification vector Σ F is applied to the singularity as a simple vector addition at a point:

Ψ \leftarrow Ψ + Σ \vec{F}

Ainsi, au repos, on obtient la carte (700) de correspondance de phase de la figure 7 pour laquelle la singularité est fixée aux coordonnées $Ψ_{0} = (0, - \frac{3 π}{4})$

. La figure 8 représente la carte de correspondance de phase de la figure 7 une fois repliée sur la sphère de Scheiber.Thus, at rest, we obtain the map (700) of phase correspondence of the figure 7 for which the singularity is fixed at the coordinates

Ψ_{0} = (0, - \frac{3 π}{4})

. The figure 8 represents the phase correspondence map of the figure 7 when folded over Scheiber's sphere.

La figure 9 représente la carte de correspondance de phase si Ψ a pour coordonnées de panorama et de différence de phase (-1/4, -3π/4). La correspondance de phase décrite par cette carte est continue partout sauf en Ψ. La figure 10 représente la carte de correspondance de phase de la figure 9, une fois repliée sur la sphère de Scheiber.The figure 9 represents the phase correspondence map if Ψ has panorama and phase difference coordinates (-1/4, -3 π / 4). The phase correspondence described by this map is continuous everywhere except in Ψ . The figure 10 represents the phase correspondence map of the figure 9 , when folded over Scheiber's sphere.

Comme décrit plus haut dans le présent document, un signal exprimé dans le domaine sphérique est caractérisé, pour toute fréquence ou bande de fréquence, par un azimut et une élévation, une magnitude et une phase.As described above in this document, a signal expressed in the spherical domain is characterized, for any frequency or frequency band, by an azimuth and an elevation, a magnitude and a phase.

Des implémentations ne relevant pas de l'invention et qui ne sont présentes qu'à titre indicatif incluent un moyen de transcodage depuis le domaine sphérique vers un format audio donné choisi par l'utilisateur. Quelques techniques sont présentées à titre d'exemple mais leur adaptation à d'autres formats audio seront triviales pour une personne connaissant l'état de l'art du rendu sonore ou de l'encodage du signal sonore.Implementations not falling within the scope of the invention and which are present only as an indication include a means of transcoding from the spherical domain to a given audio format chosen by the user. Some techniques are presented by way of example but their adaptation to other audio formats will be trivial for a person familiar with the state of the art of sound rendering or of the encoding of the sound signal.

Un transcodage en harmoniques sphériques du premier ordre (ou First-Order Ambisonic, FOA) peut être effectué dans le domaine fréquentiel. Pour chaque coefficient complexe c correspondant à une bande de fréquences, connaissant l'azimut a et l'élévation e correspondants, quatre coefficients complexes w, x, y, z correspondant à la même bande de fréquences peuvent être générés grâce aux formules suivantes : ${\begin{cases} w = \frac{c}{\sqrt{2}} \\ x = c . \cos (a) \cos (e) \\ y = c . \sin (a) \cos (e) \\ z = c . \sin (e) \end{cases}$

First-order spherical harmonic (or First-Order Ambisonic, FOA) transcoding can be performed in the frequency domain. For each complex coefficient c corresponding to a frequency band, knowing the corresponding azimuth a and elevation e, four complex coefficients w, x, y, z corresponding to the same frequency band can be generated using the following formulas:

{\begin{cases} w = \frac{vs}{\sqrt{2}} \\ x = vs . \cos (To) \cos (e) \\ y = vs . \sin (To) \cos (e) \\ z = vs . \sin (e) \end{cases}

Les coefficients w, x, y, z obtenus pour chaque bande de fréquences sont assemblés pour générer respectivement des représentations fréquentielles W, X, Y, et Z de quatre canaux, et l'application de la transformée fréquence-vers-temps (inverse de celle utilisée pour la transformée temps-vers-fréquence), l'éventuel fenêtrage, puis le chevauchement des fenêtres temporelles successives obtenues permet d'obtenir quatre canaux qui sont une représentation temporelle en harmoniques spatiales du premier ordre du signal audio tridimensionnel. Une approche similaire peut être utilisée pour un transcodage vers un format (HOA) d'ordre supérieur ou égal à 2, en complétant l'équation (54) avec les formules d'encodage pour l'ordre considéré.The coefficients w, x, y, z obtained for each frequency band are assembled to generate respectively frequency representations W, X, Y, and Z of four channels, and the application of the frequency-to-time transform (inverse of that used for the time-to-frequency transform), the possible windowing, then the overlapping of the successive time windows obtained makes it possible to obtain four channels which are a temporal representation in spatial harmonics of the first order of the three-dimensional audio signal. A similar approach can be used for transcoding to a format (HOA) of order greater than or equal to 2, by completing equation (54) with the encoding formulas for the order considered.

Un transcodage vers un format surround 5.0 comportant cinq canaux gauche, centre, droit, arrière gauche et arrière droit peut être effectué de la manière suivante.Transcoding to a 5.0 surround format having five channels left, center, right, rear left and rear right can be performed as follows.

Pour chaque fréquence ou bande de fréquences, les coefficients c_L, c_C, c_R, c_Ls, c_Rs correspondant respectivement aux haut-parleurs nommés habituellement L, C, R, Ls, Rs, sont calculés comme suit, à partir des coordonnées d'azimut et d'élévation a et e du vecteur de direction de provenance et du coefficient fréquentiel complexe c_s . On définit le gains g_L, g_C, g_R, g_Ls, g_Rs comme les gains qui seront à appliquer au coefficient c_S pour obtenir les coefficients fréquentiels complexes des tableaux de coefficients de sortie, ainsi que deux gains g_B et g_T correspondant à des haut-parleurs virtuels permettant une redistribution des signaux en bas ("Bottom"), c'est-à-dire à élévation négative, et en haut ("Top"), c'est-à-dire à élévation positive, vers les autres haut-parleurs. $g_{B} = \max (\sin (- e), 0)$

g_{T} = \max (\sin (e), 0)

Si a ∈ [0°, 30°], ${\begin{cases} g_{C} = \cos (e) pan 1 (a, 0 °, 30 °) \\ g_{L} = \cos (e) pan2 (a, 0 °, 30 °) \\ g_{R} = g_{Ls} = g_{Rs} = 0 \end{cases}$
Si a ∈ [30°, 105°], ${\begin{cases} g_{L} = \cos (e) pan 1 (a, 30 °, 105 °) \\ g_{Ls} = \cos (e) pan2 (a, 30 °, 105 °) \\ g_{Rs} = g_{C} = g_{R} = 0 \end{cases}$
Si a + k×360° ∈ [105°, 360° - 105°], $k \in ℤ$
, ${\begin{cases} g_{Ls} = \cos (e) pan 1 (a, 105 °, 360 ° - 105 °) \\ g_{Rs} = \cos (e) pan2 (a, 105 °, 360 ° - 105 °) \\ g_{L} = g_{C} = g_{R} = 0 \end{cases}$
si a ∈ [-105°,-30°], ${\begin{cases} g_{Rs} = \cos (e) pan 1 (a, - 105 °, - 30 °) \\ g_{R} = \cos (e) pan2 (a, - 105 °, - 30 °) \\ g_{L} = g_{C} = g_{Ls} = 0 \end{cases}$
Si a ∈ [-30°, 0°], ${\begin{cases} g_{L} = \cos (e) pan 1 (a, - 30 °, 0 °) \\ g_{C} = \cos (e) pan2 (a, - 30 °, 0 °) \\ g_{R} = g_{Ls} = g_{Rs} = 0 \end{cases}$
où ${\begin{cases} pan 1 (a, a_{1}, a_{2}) = \cos (\frac{π}{2} \frac{a - a_{1}}{a 2 - a 1}) \\ pan 2 (a, a_{1}, a_{2}) = \sin (\frac{π}{2} \frac{a - a_{1}}{a 2 - a 1}) \end{cases}$
puis les gains g_B et g_T sont redistribués entre les autres coefficients : ${\begin{cases} g_{L} = \sqrt{g_{L}^{} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{C} = \sqrt{g_{C}^{} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{R} = \sqrt{g_{R}^{} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{Ls} = \sqrt{g_{Ls} + \frac{1}{4} {(g_{T} + g_{B})}^{2}} \\ g_{Rs} = \sqrt{g_{Rs} + \frac{1}{4} {(g_{T} + g_{B})}^{2}} \end{cases}$
enfin les coefficients fréquentiels des différents canaux sont obtenus par : ${\begin{cases} c_{L} = g_{L} c_{S} \\ c_{C} = g_{C} c_{S} \\ c_{R} = g_{R} c_{S} \\ c_{Ls} = g_{Ls} c_{S} \\ c_{Rs} = g_{Rs} c_{S} \end{cases}$

For each frequency or frequency band, the coefficients c _L , c _C , c _R , c _Ls , c _Rs corresponding respectively to the loudspeakers usually named L, C, R, Ls, Rs, are calculated as follows, from the coordinates of azimuth and elevation a and e of the direction vector of origin and of the complex frequency coefficient c _s . We define the gains g _L , g _C , g _R , g _Ls , g _Rs as the gains which will be applied to the coefficient c _S to obtain the complex frequency coefficients of the tables of output coefficients, as well as two gains g _B and g _T corresponding to virtual loudspeakers allowing a redistribution of the signals at the bottom ("Bottom"), ie at negative rise, and at the top ("Top"), ie at rise positive, to the other speakers.

g_{B} = \max (\sin (- e), 0)

g_{T} = \max (\sin (e), 0)

If a ∈ [0 °, 30 °], ${\begin{cases} g_{VS} = \cos (e) pan 1 (To, 0 °, 30 °) \\ g_{THE} = \cos (e) pan2 (To, 0 °, 30 °) \\ g_{R} = g_{Ls} = g_{Rs} = 0 \end{cases}$
If a ∈ [30 °, 105 °], ${\begin{cases} g_{THE} = \cos (e) pan 1 (To, 30 °, 105 °) \\ g_{Ls} = \cos (e) pan2 (To, 30 °, 105 °) \\ g_{Rs} = g_{VS} = g_{R} = 0 \end{cases}$
If a + k × 360 ° ∈ [105 °, 360 ° - 105 °], $k \in ℤ$
, ${\begin{cases} g_{Ls} = \cos (e) pan 1 (To, 105 °, 360 ° - 105 °) \\ g_{Rs} = \cos (e) pan2 (To, 105 °, 360 ° - 105 °) \\ g_{THE} = g_{VS} = g_{R} = 0 \end{cases}$
if a ∈ [-105 °, -30 °], ${\begin{cases} g_{Rs} = \cos (e) pan 1 (To, - 105 °, - 30 °) \\ g_{R} = \cos (e) pan2 (To, - 105 °, - 30 °) \\ g_{THE} = g_{VS} = g_{Ls} = 0 \end{cases}$
If a ∈ [-30 °, 0 °], ${\begin{cases} g_{THE} = \cos (e) pan 1 (To, - 30 °, 0 °) \\ g_{VS} = \cos (e) pan2 (To, - 30 °, 0 °) \\ g_{R} = g_{Ls} = g_{Rs} = 0 \end{cases}$
or ${\begin{cases} pan 1 (To, {To}_{1}, {To}_{2}) = \cos (\frac{π}{2} \frac{To - {To}_{1}}{To 2 - To 1}) \\ pan 2 (To, {To}_{1}, {To}_{2}) = \sin (π) (\frac{}{2} \frac{To - {To}_{1}}{To 2 - To 1}) \end{cases}$
then the gains g _B and g _T are redistributed between the other coefficients: ${\begin{cases} g_{THE} = \sqrt{g_{THE}^{2} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{VS} = \sqrt{g_{VS}^{2} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{R} = \sqrt{g_{R}^{} + \frac{1}{6} {(g_{T} + g_{B})}^{2}} \\ g_{Ls} = \sqrt{g_{Ls} + \frac{1}{4} {(g_{T} + g_{B})}^{2}} \\ g_{Rs} = \sqrt{g_{Rs} + \frac{1}{4} {(g_{T} + g_{B})}^{2}} \end{cases}$
finally the frequency coefficients of the different channels are obtained by: ${\begin{cases} {vs}_{THE} = g_{THE} {vs}_{S} \\ {vs}_{VS} = g_{VS} {vs}_{S} \\ {vs}_{R} = g_{R} {vs}_{S} \\ {vs}_{Ls} = g_{Ls} {vs}_{S} \\ {vs}_{Rs} = g_{Rs} {vs}_{S} \end{cases}$

Un transcodage en un format audio multicanal 5.0 L-C-R-Ls-Rs auquel est ajouté un canal zénithal T (canal « top » ou « voice of god ») peut également être effectué dans le domaine fréquentiel. Lors de la redistribution des gains des canaux virtuels, seule la redistribution du gain "bottom" g_B est alors effectuée : ${\begin{cases} g_{L} = \sqrt{g_{L}^{2} + \frac{1}{6} g_{B}^{2}} \\ g_{C} = \sqrt{g_{C}^{2} + \frac{1}{6} g_{B}^{2}} \\ g_{R} = \sqrt{g_{R}^{2} + \frac{1}{6} g_{B}^{2}} \\ g_{Ls} = \sqrt{g_{Ls} + \frac{1}{4} g_{B}^{2}} \\ g_{Rs} = \sqrt{g_{Rs} + \frac{1}{4} g_{B}^{2}} \end{cases}$

et les coefficients fréquentiels des différents canaux sont obtenus par :

{\begin{cases} c_{L} = g_{L} c_{S} \\ c_{C} = g_{C} c_{S} \\ c_{R} = g_{R} c_{S} \\ c_{Ls} = g_{Ls} c_{S} \\ c_{Rs} = g_{Rs} c_{S} \\ c_{T} = g_{T} c_{S} \end{cases}

Transcoding into a 5.0 LCR-Ls-Rs multichannel audio format to which is added a zenithal T channel (“top” or “voice of god” channel) can also be carried out in the frequency domain. During the redistribution of the gains of the virtual channels, only the redistribution of the "bottom" gain g _B is then carried out:

{\begin{cases} g_{THE} = \sqrt{g_{THE}^{2} + \frac{1}{6} g_{B}^{2}} \\ g_{VS} = \sqrt{g_{VS}^{2} + \frac{1}{6} g_{B}^{2}} \\ g_{R} = \sqrt{g_{R}^{} + \frac{1}{6} g_{B}^{2}} \\ g_{Ls} = \sqrt{g_{Ls} + \frac{1}{4} g_{B}^{2}} \\ g_{Rs} = \sqrt{g_{Rs} + \frac{1}{4} g_{B}^{}} \end{cases}

and the frequency coefficients of the different channels are obtained by:

{\begin{cases} {vs}_{THE} = g_{THE} {vs}_{S} \\ {vs}_{VS} = g_{VS} {vs}_{S} \\ {vs}_{R} = g_{R} {vs}_{S} \\ {vs}_{Ls} = g_{Ls} {vs}_{S} \\ {vs}_{Rs} = g_{Rs} {vs}_{S} \\ {vs}_{T} = g_{T} {vs}_{S} \end{cases}

Les six coefficients complexes ainsi obtenus pour chaque bande de fréquences sont assemblés pour générer respectivement des représentations fréquentielles de six canaux L,C,R,Ls,Rs et T, et l'application de la transformée fréquence-vers-temps (inverse de celle utilisée pour la transformée temps-vers-fréquence), l'éventuel fenêtrage, puis le chevauchement des fenêtres temporelles successives obtenues permet d'obtenir six canaux dans le domaine temporel.
Par ailleurs, pour un format ayant une disposition quelconque des canaux dans l'espace, on pourra avantageusement appliquer un algorithme VBAP à trois dimensions pour obtenir les canaux souhaités, en assurant si besoin une bonne triangulation de la sphère par l'ajout de canaux virtuels qui sont redistribués vers les canaux finaux.The six complex coefficients thus obtained for each frequency band are assembled to generate respectively frequency representations of six channels L, C, R, Ls, Rs and T, and the application of the frequency-to-time transform (inverse of that used for the time-to-frequency transform), the possible windowing, then the overlap of the successive time windows obtained makes it possible to obtain six channels in the time domain.
Moreover, for a format having any arrangement of the channels in space, it is advantageously possible to apply a three-dimensional VBAP algorithm to obtain the channels. desired, ensuring if necessary a good triangulation of the sphere by adding virtual channels which are redistributed to the final channels.

Un transcodage d'un signal exprimé dans le domaine sphérique vers un format binaural peut également être effectué. Il peut par exemple se baser sur les éléments suivants :

une base de données incluant, pour une pluralité de fréquences, pour une pluralité de directions dans l'espace, et pour chaque oreille, l'expression en coefficients complexes (magnitude et phase) des filtres Head-Related Transfer Function (HRTF) dans le domaine fréquentiel ;
une projection de ladite base de données sur le domaine sphérique pour obtenir, pour une pluralité de directions et pour chaque oreille, un coefficient complexe pour chaque fréquence parmi une pluralité de fréquences ;
une interpolation spatiale desdits coefficients complexes, pour toute fréquence parmi une pluralité de fréquences, de façon à obtenir une pluralité de fonctions spatiales complexes continûment définies sur la sphère unité, pour chaque fréquence parmi une pluralité de fréquences. Cette interpolation peut s'effectuer de manière bilinéaire ou spline, ou bien par l'intermédiaire de fonctions harmoniques sphériques.

On obtient ainsi une pluralité de fonctions sur la sphère unité, pour toute fréquence, décrivant le comportement fréquentiel de ladite base de données HRTF pour tout point de l'espace sphérique. Puisque, pour toute fréquence parmi une pluralité de fréquences, il est établi que ledit signal sphérique est décrit par une direction de provenance (azimut, élévation) et un coefficient complexe (magnitude, phase), ladite interpolation-projection permet ensuite d'effectuer l'opération de binauralisation du signal sphérique, comme suit :

pour chaque fréquence et pour chaque oreille, étant donnée la direction de provenance dudit signal sphérique, on établit la valeur de ladite fonction spatiale complexe établie précédemment par projection et interpolation, résultant en un coefficient complexe HRTF;
pour chaque fréquence et pour chaque oreille, ledit coefficient complexe HRTF est alors multiplié par le coefficient complexe correspondant au signal sphérique, résultant en un signal fréquentiel oreille gauche et un signal fréquentiel oreille droite ;
une transformée fréquence-vers-temps est alors effectuée, donnant un signal binaural à deux canaux.

Par ailleurs, les formats en harmoniques sphériques sont souvent utilisés comme formats intermédiaires avant décodage sur des constellations de haut-parleurs ou décodage par binauralisation. Les formats multicanaux obtenus via un rendu VBAP sont également susceptibles d'être bi-nauralisés. D'autres types de transcodage peuvent être obtenus par l'utilisation de techniques usuelles de spatialisation telles que panoramique pair-wise avec ou sans couches horizontales, SPCAP, VBIP, voire WFS. Il faut enfin noter la possibilité d'effectuer une modification de l'orientation du champ sphérique, ceci en altérant les vecteurs de direction à l'aide d'opération géométriques simples (rotations autour d'un axe...). En application de cette capacité, il est possible d'effectuer une compensation acoustique de la rotation de la tête de l'auditeur, si elle est captée par un dispositif de "headtracking", juste avant l'application d'une technique de rendu. Ce procédé permet un gain perceptuel de précision de localisation des sources sonores dans l'espace ; il s'agit là d'un phénomène connu du domaine de la psychoacoustique : des petits mouvements de tête permettent au dispositif auditif humain d'effectuer une meilleure localisation des sources sonores.Transcoding of a signal expressed in the spherical domain to a binaural format can also be performed. For example, it can be based on the following elements:

a database including, for a plurality of frequencies, for a plurality of directions in space, and for each ear, the expression in complex coefficients (magnitude and phase) of the Head-Related Transfer Function (HRTF) filters in the frequency domain;
projecting said database onto the spherical domain to obtain, for a plurality of directions and for each ear, a complex coefficient for each frequency among a plurality of frequencies;
a spatial interpolation of said complex coefficients, for any frequency among a plurality of frequencies, so as to obtain a plurality of complex spatial functions continuously defined on the unit sphere, for each frequency among a plurality of frequencies. This interpolation can be carried out bilinear or spline, or else by means of spherical harmonic functions.

We thus obtain a plurality of functions on the unit sphere, for any frequency, describing the frequency behavior of said HRTF database for any point of the spherical space. Since, for any frequency among a plurality of frequencies, it is established that said spherical signal is described by a direction of origin (azimuth, elevation) and a complex coefficient (magnitude, phase), said interpolation-projection then makes it possible to perform l 'binauralization of the spherical signal, as follows:

for each frequency and for each ear, given the direction of origin of said spherical signal, the value of said complex spatial function established previously is established by projection and interpolation, resulting in a complex coefficient HRTF;
for each frequency and for each ear, said complex coefficient HRTF is then multiplied by the complex coefficient corresponding to the spherical signal, resulting in a left ear frequency signal and a right ear frequency signal;
a frequency-to-time transform is then performed, giving a binaural two-channel signal.

In addition, spherical harmonic formats are often used as intermediate formats before decoding on speaker constellations or binaural decoding. Multichannel formats obtained via VBAP rendering are also likely to be bi-nauralized. Other types of transcoding can be obtained by using usual spatialization techniques such as pair-wise panoramic with or without horizontal layers, SPCAP, VBIP, or even WFS. Finally, it is necessary to note the possibility of carrying out a modification of the orientation of the spherical field, this by altering the vectors of direction using operation simple geometric (rotations around an axis ...). By applying this capability, it is possible to perform an acoustic compensation for the rotation of the listener's head, if it is picked up by a "headtracking" device, just before the application of a rendering technique. This method allows a perceptual gain in the precision of locating sound sources in space; this is a phenomenon known in the field of psychoacoustics: small movements of the head allow the human hearing device to better localize sound sources.

En application des techniques de conversion entre les deux domaines qui ont été présentées précédemment, l'encodage d'un signal sphérique peut être effectué de la manière suivante. Le signal sphérique est constitué de tableaux temporellement successifs correspondant chacun une représentation sur une fenêtre temporelle du signal, ces fenêtres se chevauchant. Chaque tableau est constitué de paires (coefficient fréquentiel complexe, coordonnées sur la sphère en azimut et élévation), chaque paire correspondant à une bande de fréquences. Le signal sphérique originel est obtenu à partir de techniques d'analyse spatiales telles que celle présentée qui transforme un signal FOA en signal sphérique. L'encodage permet d'obtenir des paires temporellement successives de tableaux de coefficients fréquentiels complexes, chaque tableau correspondant à un canal, par exemple gauche (L) et droit (R).
La figure 11 montre le diagramme du processus d'encodage, convertissant depuis le domaine sphérique vers le domaine intercanal. La séquence de la technique d'encodage pour chaque fenêtre temporelle successivement traitée, est ainsi illustrée :

Une première étape (1100) consiste à déterminer pour chaque élément du tableau d'entrée le panorama et la différence de phase correspondant à chaque coordonnée sphérique, comme indiqué aux équations 43. Optionnellement l'élargissement de l'azimut depuis l'intervalle [-30°,30°] vers l'intervalle [-90°, 90°] peut être effectué conformément à la méthode indiquée précédemment, avant la détermination du panorama et la différence de phase, cet élargissement correspondant à l'opération (1302) de la figure 13.
Une seconde étape (1101) consiste à déterminer la nouvelle position de la singularité dans le domaine intercanal, en analysant les coordonnées de panorama et de différence de phase déterminés à la première étape.
Une troisième étape (1102) consiste à déterminer la correspondance de phase Φ_Ψ (panorama, phasediff) pour chaque coefficient complexe du tableau d'entrée,
Une quatrième étape (1103) consiste à construire un tableau de paires de coefficients complexes c_L et c_R, d'après les coefficients fréquentiels complexes du domaine sphérique c_S , les valeurs calculées de panorama et de différence de phase, et la fonction de différence de phase : ${\begin{cases} |c_{L}| = |c_{S}| \sqrt{\frac{1}{2} (1 + panorama)} \\ \arg (c_{L}) = \arg (c_{S}) - Φ_{Ψ} (panorama, phasediff) - \frac{1}{2} phasediff \\ |c_{R}| = |c_{S}| \sqrt{\frac{1}{2} (1 - panorama)} \\ \arg (c_{R}) = \arg (c_{S}) - Φ_{Ψ} (panorama, phasediff) + \frac{1}{2} phasediff \end{cases}$
Une technique alternative de détermination de la magnitude des coefficients fréquentiels complexes est présentée dans l'équation 5.

La représentation sous forme de paires temporellement successives de tableaux de coefficients fréquentiels complexes n'est généralement pas conservée telle quelle ; l'application de la transformée inverse fréquence-vers-temps appropriée (l'inverse de la transformée directe utilisée en amont), telle que la partie fréquence-vers-temps de la transformée de Fourier court-terme, permet d'obtenir une paire de canaux sous forme d'échantillons temporels.By applying the conversion techniques between the two domains which have been presented previously, the encoding of a spherical signal can be carried out as follows. The spherical signal is made up of temporally successive arrays each corresponding to a representation over a time window of the signal, these windows overlapping. Each table is made up of pairs (complex frequency coefficient, coordinates on the sphere in azimuth and elevation), each pair corresponding to a frequency band. The original spherical signal is obtained from spatial analysis techniques such as the one presented which transforms an FOA signal into a spherical signal. The encoding makes it possible to obtain temporally successive pairs of tables of complex frequency coefficients, each table corresponding to a channel, for example left (L) and right (R).
The figure 11 shows the diagram of the encoding process, converting from the spherical domain to the inter-channel domain. The sequence of the encoding technique for each successively processed time window is thus illustrated:

A first step (1100) consists in determining for each element of the input table the panorama and the phase difference corresponding to each spherical coordinate, as indicated in equations 43. Optionally the widening of the azimuth from the interval [- 30 °, 30 °] towards the interval [-90 °, 90 °] can be carried out according to the method indicated previously, before the determination of the panorama and the phase difference, this widening corresponding to the operation (1302) of the figure 13 .
A second step (1101) consists in determining the new position of the singularity in the inter-channel domain, by analyzing the panorama and phase difference coordinates determined in the first step.
A third step (1102) consists in determining the phase correspondence Φ _Ψ (panorama, phasediff) for each complex coefficient of the input table,
A fourth step (1103) consists in constructing a table of pairs of complex coefficients c _L and c _R , according to the complex frequency coefficients of the spherical domain c _S , the calculated panorama and phase difference values, and the function of phase difference: ${\begin{cases} |{vs}_{THE}| = |{vs}_{S}| \sqrt{\frac{1}{2} (1 + panorama)} \\ \arg ({vs}_{THE}) = \arg ({vs}_{S}) - Φ_{Ψ} (panorama, phasediff) - \frac{1}{2} phasediff \\ |{vs}_{R}| = |{vs}_{S}| \sqrt{\frac{1}{2} (1 - panorama)} \\ \arg ({vs}_{R}) = \arg ({vs}_{S}) - Φ_{Ψ} (panorama, phasediff) + \frac{1}{2} phasediff \end{cases}$
An alternative technique for determining the magnitude of complex frequency coefficients is presented in Equation 5.

The representation in the form of temporally successive pairs of tables of complex frequency coefficients is generally not kept as it is; applying the appropriate frequency-to-time inverse transform (the inverse of the forward transform used upstream), such as the frequency-to-time portion of the short-term Fourier transform, results in a pair channels as time samples.

En application des techniques de conversion des domaines présentées précédemment, le décodage d'un signal stéréo encodé avec la technique présentée précédemment peut être effectué de la manière suivante. Le signal d'entrée étant sous forme d'une paire de canaux généralement temporels, une transformation telle que la transformée de Fourier court-terme est utilisée pour obtenir des paires temporellement successives de tableaux de coefficients fréquentiels complexes, chaque coefficient de chaque tableau correspondant à une bande de fréquences. Dans chaque paire de tableaux correspondant à une fenêtre temporelle, les coefficients correspondant à la même bande de fréquence sont appariés. Le décodage permet d'obtenir pour chaque fenêtre temporelle une représentation sphérique du signal, sous forme de tableau de paires (coefficient fréquentiel complexe, coordonnées sur la sphère en azimut et élévation). Voici la séquence de la technique de décodage pour chaque fenêtre temporelle successivement traités, illustrée dans la figure 12:

Une première étape (1200) consiste à déterminer le panorama et la différence de phase pour chaque paire, comme indiqué aux équations 2 ou 4, et 6.
Une seconde étape (1201) consiste à déterminer la nouvelle position de la singularité Ψ dans le domaine intercanal, en analysant les coordonnées de panorama et de différence de phase déterminés à la première étape.
Une troisième étape (1202) consiste à déterminer la correspondance de phase Φ_Ψ , (panorama, phasediff) pour chaque coefficient complexe du tableau d'entrée, à partir des résultats des première et deuxième étapes.
Une quatrième étape (1203) consiste à déterminer, à partir des résultats des première (1200) et troisième (1202) étapes, le coefficient fréquentiel complexe c_S dans le domaine sphérique : ${\begin{cases} |c_{S}| = \sqrt{{|c_{L}|}^{2} + {|c_{R}|}^{2}} \\ \arg (c_{S}) = ϕ_{i} + Φ_{Ψ} (panorama, phasediff) \end{cases}$
où φ _i est la phase intermédiaire, obtenue par exemple avec : $ϕ_{i} = \arg (C_{L}) + \frac{1}{2}$
phasediff.
Une cinquième étape (1204) consiste à déterminer, à partir des résultats de la première étape (1200), les coordonnées d'azimut et d'élévation comme indiqué aux équations 41. Optionnellement le resserrement de l'azimut depuis l'intervalle [-90°,90°] vers l'intervalle [-30°, 30°] peut être effectué, conformément à la méthode indiquée précédemment, cette étape correspondant à l'opération (1301) de la figure 13.

Il est obtenu un tableau de paires (coefficient fréquentiel complexe, coordonnées sur la sphère en azimut et élévation), chaque paire correspondant à une bande de fréquences. Cette représentation sphérique du signal n'est généralement pas conservée telle quelle, mais subit un transcodage en fonction des besoins de diffusion : il est ainsi possible, comme on l'a vu plus haut, d'effectuer un transcodage (ou « rendu ») vers un format audio donné, par exemple binaural, VBAP, multicanal planaire ou tridimensionnel, Ambisonics du premier ordre (FOA) ou d'ordres supérieurs (HOA), ou tout autre procédé de spatialisation connu dans la mesure où celui-ci permet d'utiliser les coordonnées sphériques pour piloter la position souhaitée d'une source sonore.By applying the domain conversion techniques presented previously, the decoding of a stereo signal encoded with the technique presented previously can be carried out as follows. Since the input signal is in the form of a pair of generally temporal channels, a transformation such as the short-term Fourier transform is used to obtain temporally successive pairs of tables of complex frequency coefficients, each coefficient of each table corresponding to a frequency band. In each pair of tables corresponding to a time window, the coefficients corresponding to the same frequency band are matched. Decoding makes it possible to obtain, for each time window, a spherical representation of the signal, in the form of a table of pairs (complex frequency coefficient, coordinates on the sphere in azimuth and elevation). Here is the sequence of the decoding technique for each successively processed time window, illustrated in the figure 12 :

A first step (1200) consists in determining the panorama and the phase difference for each pair, as indicated in equations 2 or 4, and 6.
A second step (1201) consists in determining the new position of the singularity Ψ in the inter-channel domain, by analyzing the panorama and phase difference coordinates determined in the first step.
A third step (1202) consists in determining the phase correspondence Φ _Ψ , (panorama, phasediff) for each complex coefficient of the input table, from the results of the first and second steps.
A fourth step (1203) consists in determining, from the results of the first (1200) and third (1202) steps, the complex frequency coefficient c _S in the spherical domain: ${\begin{cases} |{vs}_{S}| = \sqrt{{|{vs}_{THE}|}^{2} + {|{vs}_{R}|}^{2}} \\ \arg ({vs}_{S}) = ϕ_{i} + Φ_{Ψ} (panorama, phasediff) \end{cases}$
where φ _i is the intermediate phase, obtained for example with: $ϕ_{i} = \arg ({VS}_{THE}) + \frac{1}{2}$
phasediff.
A fifth step (1204) consists in determining, from the results of the first step (1200), the azimuth and elevation coordinates as indicated in equations 41. Optionally, the tightening of the azimuth from the interval [- 90 °, 90 °] towards the interval [-30 °, 30 °] can be carried out, in accordance with the method indicated previously, this step corresponding to the operation (1301) of the figure 13 .

A table of pairs is obtained (complex frequency coefficient, coordinates on the sphere in azimuth and elevation), each pair corresponding to a frequency band. This spherical representation of the signal is generally not kept as it is, but undergoes transcoding according to the broadcasting needs: it is thus possible, as we saw above, to carry out a transcoding (or “rendering”). to a given audio format, for example binaural, VBAP, planar or three-dimensional multichannel, Ambisonics of the first order (FOA) or of higher orders (HOA), or any other known spatialization process insofar as this allows to use spherical coordinates to drive the desired position of a sound source.

Beaucoup de contenus stéréo étant encodés sous forme surround avec une technique de matriçage, et les coordonnées des points de matriçage étant généralement positionnées dans le domaine intercanal à des positions cohérentes, le décodage de tels contenus surround fonctionne, avec quelques défauts de positionnement absolu des sources. Aussi, de manière générale les contenus stéréo non prévus pour être joués sur un autre dispositif qu'une paire d'enceinte prennent avantage à être traités par le procédé de décodage, aboutissant à un "upmix" 2D ou 3D du contenu, le terme "upmix" correspondant au fait de traiter un signal pour pouvoir le diffuser sur des dispositifs à un nombre d'enceintes supérieur au nombre de canaux originaux, chaque enceinte recevant un signal qui lui est propre, ou son équivalent virtualisé au casque.Since many stereo content is encoded in surround form with a mastering technique, and the coordinates of the mastering points are generally positioned in the inter-channel domain at coherent positions, the decoding of such surround content works, with some flaws in the absolute positioning of the sources. . Also, in general, stereo content not intended to be played on a device other than a pair of speakers takes advantage of being processed by the decoding process, resulting in a 2D or 3D "upmix" of the content, the term " upmix "corresponding to the fact of processing a signal in order to be able to broadcast it on devices with a number of speakers greater than the number of original channels, each speaker receiving a signal of its own, or its virtualized equivalent to the headphones.

APPLICATIONS INDUSTRIELLES DE L'INVENTIONINDUSTRIAL APPLICATIONS OF THE INVENTION

Le signal stéréophonique résultant de l'encodage d'un champ audio tridimensionnel peut être reproduit convenablement sans décodage sur un dispositif d'écoute stéréophonique standard, par exemple casque audio, barre de son ou chaîne stéréophonique. Ledit signal peut par ailleurs être traité par les systèmes de décodage multicanal de contenus surround matricés disponibles sur le marché sans que des artefacts audibles n'apparaissent.The stereophonic signal resulting from the encoding of a three-dimensional audio field can be properly reproduced without decoding on a standard stereophonic listening device, for example headphones, sound bar or stereophonic system. Said signal can moreover be processed by multichannel decoding systems for matrixed surround content available on the market without audible artifacts appearing.

Le décodeur est polyvalent : il permet à la fois de décoder des contenus spécialement encodés pour lui, décoder d'une manière relativement satisfaisante des contenus préexistant au format surround matricé (par exemple des contenus sonores cinématographiques), ainsi que d'upmixer des contenus stéréos. Ainsi il trouve immédiatement son utilité, embarqué de manière logicielle ou matérielle (par exemple sous la forme d'une puce) dans tout système dédié à la diffusion sonore : télévision, chaîne haute-fidélité stéréophonique, amplificateur de salon ou home-cinéma, système audio embarqué dans un véhicule, équipés en système de diffusion multicanal, ou même à tout système diffusant pour une écoute au casque, via un rendu binaural, éventuellement avec suivi de l'orientation de la tête ("headtracking"), tel qu'un ordinateur, un téléphone portable, un baladeur audionumérique. Un dispositif d'écoute à annulation de "crosstalk" permet également une écoute binaurale sans casque à partir d'au moins deux haut-parleurs, et permet l'écoute surround ou 3D d'un contenu sonore décodé et rendu en binaural. L'algorithme de décodage présenté permet d'effectuer une rotation de l'espace sonore sur les vecteurs de direction de provenance du champ sphérique obtenu, la direction de provenance étant celle qui serait perçue par un auditeur situé au centre de la dite sphère ; cette capacité permet d'implémenter le suivi d'orientation de la tête de l'auditeur (ou "head-tracking") dans la chaîne de traitement au plus près de son rendu, élément important pour réduire la latence entre les mouvements de la tête et leur compensation dans le signal audible.The decoder is versatile: it allows both decoding of content specially encoded for it, decoding in a relatively satisfactory manner pre-existing content in the matrixed surround format (for example cinematographic sound content), as well as upmixing stereo content. . So it immediately finds its use, embedded in software or hardware (for example in the form of a chip) in any system dedicated to sound broadcasting: television, stereo high-fidelity system, living room or home cinema amplifier, system audio on board a vehicle, equipped with a multichannel broadcasting system, or even any broadcasting system for listening through headphones, via binaural rendering, possibly with monitoring of the orientation of the head ("headtracking"), such as a computer, cell phone, digital audio player. A listening device with "crosstalk" cancellation also allows binaural listening without headphones from at least two speakers, and allows surround or 3D listening of sound content decoded and rendered in binaural. The decoding algorithm presented makes it possible to perform a rotation of the sound space on the origin direction vectors of the spherical field obtained, the origin direction being that which would be perceived by a listener located at the center of said sphere; this capability makes it possible to implement the tracking of the listener's head (or "head-tracking") in the processing chain as close as possible to its rendering, an important element to reduce the latency between the movements of the head and their compensation in the audible signal.

Un casque audio en lui-même peut embarquer le système de décodage en ajoutant éventuellement des fonctions de head-tracking et de rendu binaural.An audio headset itself can embed the decoding system, possibly adding head-tracking and binaural rendering functions.

Le prérequis d'infrastructure de traitement et de diffusion des contenus est déjà prêt pour l'application de la présente invention, par exemple la connectique audio stéréo, les codecs numériques stéréophoniques tels que MPEG-2 layer 3 ou AAC, les techniques de diffusion radio FM ou DAB stéréo, ou encore les normes de diffusion stéréophoniques télévisuelles hertziennes, par câble ou sur IP.The content processing and distribution infrastructure prerequisite is already ready for the application of the present invention, for example stereo audio connectivity, stereophonic digital codecs such as MPEG-2 layer 3 or AAC, radio broadcasting techniques. FM or DAB stereo, or even the stereophonic television broadcasting standards over the air, by cable or over IP.

L'encodage dans le format présenté dans cette invention est effectué en fin de « mastering » (fina-lisation) multicanal ou 3D, à partir d'un champ FOA via une conversion vers un champ sphérique telle que l'une de celles présentées dans ce document ou d'une autre technique. L'encodage peut également être effectué sur chaque source ajoutée au mixage sonore, indépendamment les unes des autres, à l'aide d'outils de spatialisation ou de panoramique embarquant le procédé décrit, ce qui permet d'effectuer un mixage 3D sur des stations de travail audionumériques ne supportant que 2 canaux. Ce format encodé peut par ailleurs être stocké ou archivé sur tout support ne comprenant que deux canaux, ou dans un but de compression de taille.The encoding in the format presented in this invention is carried out at the end of multi-channel or 3D “mastering” (finalization), from an FOA field via a conversion to a spherical field such as one of those presented in this document or some other technique. The encoding can also be performed on each source added to the sound mix, independently of each other, using spatialization or panning tools embedding the described method, which makes it possible to perform a 3D mixing on stations. digital audio workstation supporting only 2 channels. This encoded format can also be stored or archived on any medium comprising only two channels, or for the purpose of size compression.

L'algorithme de décodage permet d'obtenir un champ sphérique, qui peut être altéré, en supprimant les coordonnées sphériques et en ne conservant que les coefficients fréquentiels complexes, en vue d'obtenir un « downmix » mono. Ce procédé peut être implémenté de manière logicielle, ou matérielle pour l'embarquer dans une puce électronique, embarquée par exemple dans des dispositifs d'écoute FM monophoniques.The decoding algorithm makes it possible to obtain a spherical field, which can be altered, by removing the spherical coordinates and keeping only the complex frequency coefficients, with a view to obtaining a mono “downmix”. This method can be implemented in software or hardware to embed it in an electronic chip, embedded for example in monophonic FM listening devices.

Par ailleurs, les contenus des jeux vidéo et des systèmes de réalité virtuelle ou réalité augmentée peuvent être stockés sous forme encodée en stéréo, puis décodés pour être spatialisés à nouveau par transcodage, par exemple sous forme de champ FOA. La disponibilité des vecteurs de direction de provenance permet également de manipuler le champ sonore à l'aide d'opérations géométriques, permettant par exemple des zooms, des distorsions suivant l'environnement sonore telles que par la projection de la sphère des directions sur l'intérieur d'une pièce d'un jeu vidéo, puis déformation par parallaxe des vecteurs de direction de provenance. Un jeu vidéo ou autre système de réalité virtuelle ou réalité augmentée ayant comme format sonore interne un format audio surround ou 3D peut également encoder son contenu avant diffusion ; en conséquence, si le dispositif d'écoute final de l'auditeur implémente le procédé de décodage, il fournit ainsi une spatialisation tridimensionnelle, et si le dispositif est un casque audio implémentant le head-tracking (suivi d'orientation de la tête de l'auditeur), la personnalisation binaurale et le head-tracking permettent une écoute immersive dynamique.Furthermore, the content of video games and virtual reality or augmented reality systems can be stored in encoded form in stereo, then decoded to be spatialized again by transcoding, for example in the form of an FOA field. The availability of direction vectors of origin also makes it possible to manipulate the sound field using geometric operations, allowing for example zooms, distortions according to the sound environment such as by the projection of the sphere of directions on the interior of a part of a video game, then deformation by parallax of the vectors of direction of origin. A video game or other virtual reality or augmented reality system having a surround or 3D audio format as internal sound format can also encode its content before broadcasting; therefore, if the listener's final listening device implements the decoding method, it thus provides three-dimensional spatialization, and if the device is a headphones implementing head-tracking (orientation tracking of the listener's head), binaural personalization and head-tracking allow dynamic immersive listening.

Les implémentations de la présente invention peuvent être réalisées sous forme d'un ou plusieurs programmes informatiques, lesdits programmes informatiques fonctionnant sur au moins un ordinateur ou sur au moins un circuit de traitement du signal embarqué, de manière locale, déportée ou distribuée (par exemple dans le cadre d'une infrastructure de type « cloud »).The implementations of the present invention can be carried out in the form of one or more computer programs, said computer programs running on at least one computer or on at least one on-board signal processing circuit, locally, remotely or distributed (for example as part of a “cloud” type infrastructure).

Claims

Method for converting an ambisonic signal of the first order into a spherical field which is constituted by a plurality of monochromatic progressive planar waves and which is based on a frequential representation of the ambisonic signal obtained after temporal windowing and time/frequency transformation, including, for each frequency among a plurality of frequencies:
• the separation of the ambisonic signal into three components comprising:
∘ a first complex vector component A which corresponds to the mean acoustic intensity vector of the ambisonic signal,

∘ a second complex vector component B whose complex coefficient is equal to subtracting the pressure wave generated by the component A from the pressure component of the ambisonic signal and whose direction is modified in accordance with a random process,

∘ a third complex vector component C which corresponds to subtracting the pressure gradient generated by the component A from the pressure gradient of the ambisonic signal and whose phases are modified in accordance with a random process and each of the three axial components of which takes, as a direction, a vector which is derived from a random process;

• the grouping of the first, second and third vector components A, B and C as a total vector and a total complex coefficient which describes the spherical field, characterised in that:
∘ the total complex coefficient is equal to the sum of the complex coefficients corresponding to the three components,

∘ the total vector is equal to the sum of the directions of the three components, weighted by the amplitude of the complex coefficients corresponding to the three components.
Method for converting an ambisonic signal of the first order into a spherical field according to claim 1, characterised in that the second component B has attributed to it an arbitrary and predefined origin direction with negative elevations.
Method for converting an ambisonic signal of the first order into a spherical field which is constituted by a plurality of monochromatic progressive planar waves and which is based on a frequential representation of the ambisonic signal obtained after temporal windowing and time/frequency transformation, including, for each frequency among a plurality of frequencies:
• the separation of the ambisonic signal into:
∘ a first complex vector component A which is determined by the complex coefficient and the direction thereof, the first complex vector component being obtained by:

• a first step (a1) of determining the divergence value, calculated as a ratio between the mean acoustic intensity and the square of the amplitude of the pressure component of the ambisonic signal, the ratio being saturated at a maximum value of 1,

• a second step (a2) of determining a complex coefficient which corresponds to the pressure component of the ambisonic signal and providing the complex coefficient of the first vector component A,

• a third step (a3) of determining the direction of the first vector component A, which direction is calculated by weighting, as a function of the divergence value, between the direction of the mean acoustic intensity vector and the direction of a vector generated by a random process in order to obtain the direction of the first vector component A; and
∘ a second complex vector component C, determined by the complex coefficient and the direction thereof, the second complex vector component being obtained by:

• a first step (c1) of determining three complex axial components of the pressure gradient of the ambisonic signal,

• a second step (c2) of determining three complex axial components of the pressure gradient which would be generated by a monochromatic progressive planar wave, whose complex coefficient would be that of the pressure of the ambisonic signal multiplied by the divergence value and whose direction would be that of the mean acoustic intensity vector,

• a third step (c3) of subtracting the result of the second step from the result of the first step, and

• a fourth step (c4) of changing the phases and direction vectors of the three axial components of the result of the third step, as a function of a random process, in order to obtain the complex coefficients and the directions of the second vector component C;

• the grouping of the first and second vector components A and C as a total vector and a total complex coefficient describing the spherical field, characterised in that:
∘ the total complex coefficient is equal to the sum of the complex coefficients corresponding to the first and second components, and

∘ the total vector is equal to the sum of the directions of the two components, weighted by the amplitude of the complex coefficients corresponding to the two components.
Method according to any one of claims 1 to 3, further comprising the encoding of the spherical field in order to obtain a stereophonic signal by means of a first step of determining panorama and phase difference values from spherical spatial coordinates which describe the spherical field for any frequency among a plurality of frequencies, a second step of determining the position of the discontinuity ψ in the inter-channel range, carried out by analysing the panorama and phase difference coordinates obtained by the first step and by moving the discontinuity relative to its previous position so that the discontinuity is not positioned on a useful signal, a third step of determining the phase correspondence Φ_ψ(panorama,phasediff) corresponding to each pair of complex coefficients derived from the spherical field, and a fourth step of determining a table of pairs of complex coefficients c_L and c_R for any frequency among a plurality of frequencies, from complex coefficients which are derived from the spherical field c_s, the phase correspondence values derived from the third step and the phase difference values, the complex coefficients c_L and c_R being combined in order to obtain the encoded stereophonic signal.
Data-processing programme comprising instructions which, when the data-processing programme is carried out by at least one computer or at least one processing circuit, cause it to implement the method according to any one of claims 1 to 4.
Computer or processing circuit configured to carry out the instructions of the data-processing programme according to claim 5.