EP3595337A1 - Appareil audio et procédé de traitement audio - Google Patents
Appareil audio et procédé de traitement audio Download PDFInfo
- Publication number
- EP3595337A1 EP3595337A1 EP18182376.6A EP18182376A EP3595337A1 EP 3595337 A1 EP3595337 A1 EP 3595337A1 EP 18182376 A EP18182376 A EP 18182376A EP 3595337 A1 EP3595337 A1 EP 3595337A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio
- binaural transfer
- transfer functions
- binaural
- reverberation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
- H04S7/306—For headphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R27/00—Public address systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/003—Digital PA systems using, e.g. LAN or internet
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2227/00—Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
- H04R2227/007—Electronic adaptation of audio signals to reverberation of the listening space for PA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
Definitions
- the invention relates to an audio apparatus and method of audio processing, and in particular, but not exclusively, to using such to support an Augmented/ Virtual Reality conference application.
- VR Virtual Reality
- AR Augmented Reality
- a number of standards are also under development by a number of standardization bodies. Such standardization activities are actively developing standards for the various aspects of VR/AR systems including e.g. streaming, broadcasting, rendering, etc.
- VR applications tend to provide user experiences corresponding to the user being in a different world/ environment/ scene whereas AR (including Mixed Reality MR) applications tend to provide user experiences corresponding to the user being in the current environment but with additional information or virtual objects or information being added.
- VR applications tend to provide a fully immersive synthetically generated world/ scene whereas AR applications tend to provide a partially synthetic world/ scene which is overlaid the real scene in which the user is physically present.
- the terms are often used interchangeably and have a high degree of overlap.
- the term Virtual Reality/ VR will be used to denote both Virtual Reality and Augmented Reality.
- a service being increasingly popular is the provision of images and audio in such a way that a user is able to actively and dynamically interact with the system to change parameters of the rendering such that this will adapt to movement and changes in the user's position and orientation.
- a very appealing feature in many applications is the ability to change the effective viewing position and viewing direction of the viewer, such as for example allowing the viewer to move and "look around" in the scene being presented.
- Such a feature can specifically allow a virtual reality experience to be provided to a user. This may allow the user to (relatively) freely move about in a virtual environment and dynamically change his position and where he is looking.
- virtual reality applications are based on a three-dimensional model of the scene with the model being dynamically evaluated to provide the specific requested view. This approach is well known from e.g. game applications, such as in the category of first person shooters, for computers and consoles.
- the image being presented is a three-dimensional image. Indeed, in order to optimize immersion of the viewer, it is typically preferred for the user to experience the presented scene as a three-dimensional scene. Indeed, a virtual reality experience should preferably allow a user to select his/her own position, camera viewpoint, and moment in time relative to a virtual world.
- virtual reality applications are inherently limited in being based on a predetermined model of the scene, and typically on an artificial model of a virtual world.
- a virtual reality experience may be provided based on real-world capture. In many cases such an approach tends to be based on a virtual model of the real-world being built from the real-world captures. The virtual reality experience is then generated by evaluating this model.
- virtual reality glasses have entered the market which allow viewers to experience captured 360° (panoramic) or 180° video. These 360° videos are often pre-captured using camera rigs where individual images are stitched together into a single spherical mapping. Common stereo formats for 180° or 360° video are top/bottom and left/right. Similar to non-panoramic stereo video, the left-eye and right-eye pictures are compressed, e.g. as part of a single H.264 video stream.
- the audio preferably provides a spatial audio experience where audio sources are perceived to arrive from positions that correspond to the positions of the corresponding objects in the visual scene.
- the audio and video scenes are preferably perceived to be consistent and with both providing a full spatial experience.
- headphone reproduction enables a highly immersive, personalized experience to the user.
- the rendering can be made responsive to the user's head movements, which highly increases the sense of immersion.
- MPEG attempts to standardize a bit stream and decoder for realistic, immersive AR/VR experiences with six degrees of freedom.
- Social VR is an important feature and allows users to interact in a shared environment (gaming, conference calls, online shopping, etc.).
- the concept of social VR also facilitates making a VR experience a more social activity for users physically in the same location but where e.g. a head mounted display or other VR headset provides a perceptional isolation from the physical surroundings.
- Audio rendering in VR applications is a complex problem. It is typically desired to provide an audio experience which is as natural as possible but this is particularly difficult in a dynamic VR application with a high degree of freedom.
- the desired perceived audio is not just merely depending on the virtual sound sources or their position but in order to get a realistic experience it is also desired that the perceived audio reflects the audio characteristics of the virtual environment. For example, the audio should sound different when the audio source and virtual user is in a tiled bathroom than e.g. when in a sitting room with furniture, carpet, curtains etc attenuating reflections.
- the audio experience it is desired for the audio experience to be adapted to the individual user. Indeed, as different people have different physiognomic characteristics, the same audio source will tend to be perceived differently. For example, the pinnae are unique and the resulting impact on incoming sound will vary for different people. Further, the effect depends on the directional incidence of the incoming soundwave, and accordingly localization of sources is subject dependent and the specific features to localize sources are learned by each person from early childhood. Therefore, any mismatch between a person's actual pinnae and that of a reference used to generate virtual audio will result in potential degraded audio perception, and specifically spatial perception. Providing optimized audio perception is accordingly a challenging problem.
- the current trend is towards providing a high degree of flexibility and choice to the end user and VR client.
- the audio description data provided by a VR server must be sufficiently generic and rendering agnostic to allow for different rendering algorithms at the VR server.
- an improved approach for generating audio processing in particular for a virtual/ augmented/ mixed reality experience/ application, application, would be advantageous.
- an approach that allows improved operation, increased flexibility, reduced complexity, facilitated implementation, an improved audio experience, a more consistent perception of an audio and visual scene, improved customization, improved personalization; an improved virtual reality experience, and/or improved performance and/or operation would be advantageous.
- the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.
- an audio apparatus comprising: a first receiver for receiving a set of input audio signals from a first source; a second receiver for receiving acoustic environment data from a second source; a third receiver for receiving binaural transfer function data from a third source, the binaural transfer function data being indicative of a set of binaural transfer functions; a renderer for generating output audio signals from the set of input audio signals; the renderer comprising a reverberator arranged to generate a reverberation component of the output audio signals by applying reverberation processing to the set of input audio signals in response to the acoustic environment data; and an adapter for adapting a first property of the reverberation processing in response to a second property of the set of binaural transfer functions.
- the invention may provide an improved user experience in many embodiments and may specifically provide improved audio rendering in many applications, such as specifically virtual/ augmented/ mixed reality applications.
- the approach may provide improved audio perception and may in many embodiments provide an improved and more natural perception of an audio scene.
- the approach may in many embodiments provide improved performance while maintaining low complexity and resource usage.
- the approach may provide improved flexibility and may for example allow, facilitate, or improve audio processing where the acoustic environment data and the binaural transfer function data are received from different sources and e.g. generated by different and independent sources.
- audio data describing the set of input audio signals and the acoustic environment data may be provided by one source, such as specifically a remote server, whereas the binaural transfer function data may be provided by a different source, such as a local source.
- the approach may for example allow a central server to provide data representing a virtual experience without having to consider specific aspects of the individual user while allowing a local client comprising the audio apparatus to efficiently adapt and customize the rendered reverberation to the individual user, e.g. such that the reverberation reflects anthropometric properties of the individual user or at least anthopometric properties corresponding to the anthropometric properties reflected by the binaural transfer function data.
- the approach may for example provide efficient support for systems in which a central server does not specifically know the audio processing that will be performed at the individual client. It may support a flexible choice of rendering algorithm at the client.
- the input audio signals may be encoded audio data and may correspond to e.g. audio channel signals, audio components, audio objects etc.
- the first receiver may receive position data for the input audio signals and the renderer may generate the output audio signals in response to the position data.
- the output audio signals may comprise audio components corresponding to the input audio signals with positions determined in response to the position data.
- An audio component of the output audio signals corresponding to given input audio signal may be rendered with positional cues corresponding to a position indicated for the input audio signal by the position data.
- the output audio signals may specifically be a binaural stereo signal, or may e.g. be a set of signals derived from a binaural stereo signal.
- the acoustic environment data may comprise room acoustic characteristics.
- the acoustic environment data may for example describe acoustic or physical properties of the environment (e.g. reverberation time or dimensions) or may e.g. directly describe rendering parameters indicative of how the reverberation processing should be performed (e.g. filter coefficients).
- the adapter is arranged to adapt the reverberation processing such that a characteristic of the reverberation component match a corresponding characteristic of the set of binaural transfer functions.
- This may provide a rendering of the input audio signals which is particularly advantageous in many embodiments, and which may provide an improved perceived audio quality. Often an improved spatial perception and/or a more naturally sounding sound stage/ audio scene can be achieved.
- the adapter may be arranged to adapt the reverberation processing such that characteristics of the reverberation component match an anthropometrically dependent characteristic of the set of binaural transfer functions.
- the second property is a frequency response characteristic for the set of binaural transfer functions
- the adapter is arranged to adapt a frequency response of the reverberation processing in response to the frequency response characteristic.
- the approach may adapt the coloration of the output audio signals to match the coloration provided by the binaural transfer functions.
- the reverberator comprises an adaptable reverberator adaptive to acoustic environment data and a filter having a frequency response dependent on the frequency response characteristic.
- the adaptive reverberator and the filter may be cascade coupled.
- the order of the adaptive reverberator and the filter may be different in different embodiments and other functional blocks may be coupled in-between.
- the reverberator comprises a synthetic reverberator and the adapter is arranged to adapt a processing parameter of the synthetic reverberator in response to the frequency response characteristic.
- This may provide a particularly efficient and high-performance implementation in many embodiments.
- the second property comprises an inter-ear correlation property for the set of binaural transfer functions and the first property comprises inter-ear correlation property for the reverberation processing.
- An inter-ear correlation property may be indicative of a correlation between a left ear signal and a right ear signal.
- the inter-ear correlation property may for example include a coherence property or correlation coefficient for two signals corresponding to the two ears of the user.
- the inter-ear correlation property may be a correlation measure between the two signals of a binaural output audio signal generated by the reverberation processing.
- the inter-ear correlation property may be a correlation measure for right ear and left ear transfer functions of the binaural transfer functions.
- the inter-ear correlation property for the set of binaural transfer functions and the first inter-ear correlation property for the reverberation processing are frequency dependent.
- the reverberator is arranged to generate a pair of partially correlated signals from a pair of substantially uncorrelated signals generated from the set of input audio signals, and to generate the output audio signals from the partially correlated signals
- the adapter is arranged to adapt a correlation between the output audio signals in response to the inter-ear correlation property for the set of binaural transfer functions.
- the correlation between the substantially uncorrelated signals is very low.
- the correlation coefficient may be no more than e.g. 0.05 or 0.10.
- the reverberator comprises a decorrelator for generating a substantially decorrelated signal from a first signal derived from the set of input audio signals; and the reverberator is arranged to generate a pair of partially correlated signals from the decorrelated signal and the first signal and to generate the output audio signals from the partially correlated signals, the adapter being arranged to adapt a correlation between the output audio signals in response to the inter-ear correlation property for the set of binaural transfer functions.
- the substantially decorrelated signal may have a very low correlation relative to the first signal.
- the correlation coefficient may be no more than e.g. 0.05, 0.08, or 0.10.
- the adapter is arranged to determine the second property of the set of binaural transfer functions in response to a combination of properties for a plurality of binaural transfer functions of the set of binaural transfer functions for different positions.
- the second receiver is arranged to receive dynamically changing acoustic environment data and the third receiver is arranged to receive static binaural transfer function data.
- the second source is different from the third source.
- the renderer further comprises a binaural processor for generating an early reflection component of the output audio signals in response to at least one of the set of binaural transfer functions.
- a method of audio processing comprising: receiving a set of input audio signals from a first source; receiving acoustic environment data from a second source; receiving binaural transfer function data from a third source, the binaural transfer function data being indicative of a set of binaural transfer functions; generating output audio signals from the set of input audio signals, the generating comprising generating a reverberation component of the output audio signals by applying reverberation processing to the set of input audio signals in response to the acoustic environment data; and adapting a first property of the reverberation processing in response to a second property of the set of binaural transfer functions.
- Virtual (including augmented) experiences allowing a user to move around in a virtual or augmented world are becoming increasingly popular and services are being developed to satisfy such demands.
- visual and audio data may dynamically be generated to reflect a user's (or viewer's) current pose.
- placement and pose are used as a common term for position and/or direction/ orientation.
- the combination of the position and direction/ orientation of e.g. an object, a camera, a head, or a view may be referred to as a pose or placement.
- a placement or pose indication may comprise up to six values/ components/ degrees of freedom with each value/ component typically describing an individual property of the position/ location or the orientation/ direction of the corresponding object.
- a placement or pose may be represented by fewer components, for example if one or more components is considered fixed or irrelevant (e.g. if all objects are considered to be at the same height and have a horizontal orientation, four components may provide a full representation of the pose of an object).
- the term pose is used to refer to a position and/or orientation which may be represented by one to six values (corresponding to the maximum possible degrees of freedom).
- a pose having the maximum degrees of freedom, i.e. three degrees of freedom of each of the position and the orientation resulting in a total of six degrees of freedom.
- a pose may thus be represented by a set or vector of six values representing the six degrees of freedom and thus a pose vector may provide a three-dimensional position and/or a three-dimensional direction indication.
- the pose may be represented by fewer values.
- a system or entity based on providing the maximum degree of freedom for the viewer is typically referred to as having 6 Degrees of Freedom (6DoF).
- 6DoF 6 Degrees of Freedom
- 3DoF 3 Degrees of Freedom
- the virtual reality application generates a three-dimensional output in the form of separate view images for the left and the right eyes. These may then be presented to the user by suitable means, such as typically individual left and right eye displays of a VR headset.
- suitable means such as typically individual left and right eye displays of a VR headset.
- one or more view images may e.g. be presented on an autostereoscopic display, or indeed in some embodiments only a single two-dimensional image may be generated (e.g. using a conventional two-dimensional display).
- an audio representation of the scene may be provided.
- the audio scene is typically rendered to provide a spatial experience where audio sources are perceived to originate from desired positions.
- audio sources may be static in the scene, changes in the user pose will result in a change in the relative position of the audio source with respect to the user's pose. Accordingly, the spatial perception of the audio source should change to reflect the new position relative to the user.
- the audio rendering may accordingly be adapted depending on the user pose.
- the audio rendering is a binaural rendering using Head Related Transfer Functions (HRTFs) or Binaural Room Impulse Responses (BRIRs) (or similar) to provide the desired spatial effect for a user wearing a headphone.
- HRTFs Head Related Transfer Functions
- BRIRs Binaural Room Impulse Responses
- the audio may instead be rendered using a loudspeaker system and the signals for each loudspeaker may be rendered such that the overall effect at the user corresponds to the desired spatial experience.
- the viewer or user pose input may be determined in different ways in different applications.
- the physical movement of a user may be tracked directly.
- a camera surveying a user area may detect and track the user's head (or even eyes (eye-tracking)).
- the user may wear a VR headset which can be tracked by external and/or internal means.
- the headset may comprise accelerometers and gyroscopes providing information on the movement and rotation of the headset and thus the head.
- the VR headset may transmit signals or comprise (e.g. visual) identifiers that enable an external sensor to determine the position of the VR headset.
- the viewer pose may be provided by manual means, e.g. by the user manually controlling a joystick or similar manual input.
- the user may manually move the virtual viewer around in the virtual scene by controlling a first analog joystick with one hand and manually controlling the direction in which the virtual viewer is looking by manually moving a second analog joystick with the other hand.
- a headset may track the orientation of the head and the movement/ position of the viewer in the scene may be controlled by the user using a joystick.
- the VR application may be provided locally to a viewer by e.g. a standalone device that does not use, or even have any access to, any remote VR data or processing.
- a device such as a games console may comprise a store for storing the scene data, input for receiving/ generating the viewer pose, and a processor for generating the corresponding images from the scene data.
- the VR application may be implemented and performed remote from the viewer.
- a device local to the user may detect/ receive movement/ pose data which is transmitted to a remote device that processes the data to generate the viewer pose.
- the remote device may then generate suitable view images for the viewer pose based on scene data describing the scene.
- the view images are then transmitted to the device local to the viewer where they are presented.
- the remote device may directly generate a video stream (typically a stereo/ 3D video stream) which is directly presented by the local device.
- the remote device may generate an audio scene reflecting the virtual audio environment. This may in many embodiments be done by generating audio signals that correspond to the relative position of different audio sources in the virtual audio environment, e.g. by applying binaural processing to the individual audio components corresponding to the current position of these relative to the head pose.
- the local device may not perform any VR processing except for transmitting movement data and presenting received video and audio data.
- the remote VR device may generate audio data representing an audio scene and may transmit audio components/ objects corresponding to different audio sources in the audio scene together with position information indicative of the position of these (which may e.g. dynamically change for moving objects).
- the local VR device may then render such signals appropriately, e.g. by applying appropriate binaural processing reflecting the relative position of the audio sources for the audio components.
- a central server may accordingly in some embodiments generate a spatial audio mix that can be rendered directly by the remote client device.
- the central server may generate spatial audio as a number of audio channels for direct rendering by a surround sound loudspeaker setup.
- the central server may generate a mix by binaurally processing all audio signals in the scene to be rendered and then combining these into a binaural stereo signal which can be rendered directly at the client side using a set of headphones.
- the central server may instead provide a number of audio objects or components with each of these corresponding typically to a single audio source.
- the client can then process such objects/ components to generate the desired audio scene. Specifically, it may binaurally process each audio object based on the desired position and combine the results.
- audio data transmitted to a remote client may include data for a plurality of audio components or objects.
- the audio may for example be represented as encoded audio for a given audio component which is to be rendered.
- the audio data may further comprise position data which indicates a position of the source of the audio component.
- the positional data may for example include absolute position data defining a position of the audio source in the scene.
- the local apparatus may in such an embodiment determine a relative position of the audio source relative to the current user pose.
- the received position data may be independent of the user's movements and a relative position for audio sources may be determined locally to reflect the position of the audio source with respect to the user.
- Such a relative position may indicate the relative position of where the user should perceive the audio source to originate from, and it will accordingly vary depending on the user's head movements.
- the audio data may comprise position data which directly describes the relative position.
- the VR server may further provide data describing the acoustic environment of the user and/or the audio sources.
- the VR server may provide data to reflect whether the virtual user is e.g. in a small room, a large concert hall, outside etc.
- the acoustic environment data may include information about the reflectiveness of the boundaries (walls, ceiling, floor) and/or objects in the environment. The perceived audio in such environments vary substantially and in order to perceive more naturally sounding audio it is therefore highly desirable to adapt the rendering audio to reflect such characteristics.
- FIG. 1 illustrates an example of a VR system in which a central server 101 liaises with a number of remote clients 103 e.g. via a network 105, such as the Internet.
- the central server 101 may be arranged to simultaneously support a potentially large number of remote clients 103.
- Such an approach may in many scenarios provide an improved trade-off e.g. between complexity and resource demands for different devices, communication requirements etc.
- the viewer pose and corresponding scene data may be transmitted with larger intervals with the local device processing the viewer pose and received scene data locally to provide a real time low lag experience. This may for example substantially reduce the required communication bandwidth while providing a low latency experience and while allowing the scene data to be centrally stored, generated, and maintained. It may for example be suitable for applications where a VR experience is provided to a plurality of remote devices.
- FIG. 2 illustrates elements of an audio apparatus which may provide an improved audio rendering in many applications and scenarios.
- the audio apparatus may provide improved rendering for many VR applications, and the audio apparatus may specifically be arranged to perform the audio processing and rendering for a VR client of FIG. 1 .
- the audio apparatus of FIG. 2 generates output audio signals corresponding to an audio scene described by audio data received from the central server 101. Accordingly, the audio apparatus comprises a first receiver 201 which is arranged to receive a set of input audio signals from a first source which in the specific example is the central server 101.
- the audio signals may be encoded audio signals and thus be represented by encoded audio data.
- the audio input signals may be different types of audio signals and components and indeed in many embodiments the first receiver 201 may receive audio data which defines a combination of different types of audio signals.
- the audio data may include audio represented by audio channel signals, individual audio objects, higher order ambisonics etc.
- the audio apparatus further comprises a second receiver 203 which is arranged to receive acoustic environment data from a second source.
- the acoustic environment data may describe one or more acoustic properties or characteristics of an acoustic environment that is desired to be reproduced by the audio presented to the listener.
- the acoustic environment data may describe an acoustic environment for the listener and/or the audio sources/ signals/ components.
- the acoustic environment data will be a description of an intended acoustic environment in which both the audio sources and the listener is present.
- the acoustic environment data may describe the virtual acoustic environment of the listener/ user and/or one ore more objects in the virtual scene and/or one or more of the audio sources in the virtual scene.
- the acoustic environment data will reflect acoustic characteristics of a virtual room in which the virtual user is present. It is desired that the audio apparatus renders the input audio signals with acoustic characteristics that match those of the virtual room and the acoustic environment data may correspond to room acoustics data.
- the following description will focus on this description and use terms such as room data or room acoustics data, but it will be appreciated that this is not intended as a limitation and that the acoustic environment data may equally apply to e.g. virtual outdoor environments.
- the second source may typically be the same source as the first source but could be a different source in some embodiments. In the following, the second source will be considered to be the same as the first source, namely specifically the central server 101.
- the audio apparatus may specifically be arranged to receive a single data stream comprising visual data, audio data, and acoustic environment data from the central server 101.
- the first receiver 201 and the second receiver 203 may be considered to correspond to elements of a common receiver or network interface extracting the different parts of data.
- the audio apparatus further comprises a third receiver 205 which is arranged to receive binaural transfer function data from a third source.
- the binaural transfer function data specifically comprises data describing a set of binaural transfer functions.
- the third source may in some embodiments correspond to the first and/or second source but will in many embodiments be a different source.
- the third source may be a local source.
- the audio apparatus may comprise a local store which stores the set of binaural transfer functions.
- Binaural rendering is a technology, typically (but not exclusively) aimed at consumption over headphones. Binaural processing seeks to create the perception that there are sound sources surrounding the listener that are not physically present. As a result, the sound will not only be heard 'inside' one's head, as is the case with listening over headphones without binaural rendering, but can be brought outside one's head, as is the case for natural listening. Apart from a more realistic experience another upside is that virtual surround has a positive effect on listener fatigue.
- Binaural processing is known to be used to provide a spatial experience by virtual positioning of sound sources using individual signals for the listener's ears.
- Virtual surround is a method of rendering the sound such that audio sources are perceived as originating from a specific direction, thereby creating the illusion of listening to a physical surround sound setup (e.g. 5.1 speakers) or immersive environment (concert) or e.g. by directly positioning audio sources at their appropriate position in the sound stage.
- the signals required at the eardrums in order for the listener to perceive sound from any desired direction can be calculated, and the signals can be rendered such that they provide the desired effect. These signals are then recreated at the eardrum using either headphones or a crosstalk cancelation method (suitable for rendering over closely spaced speakers).
- Binaural rendering can be considered to be an approach for generating signals for the ears of a listener resulting in tricking the human auditory system into thinking that a sound is coming from the desired positions.
- the binaural rendering includes a compensation for such headphone or speaker playback.
- a compensation for the frequency response of the speakers or the transfer function from the speaker feed signals to the ears or ear-drums may be included in the rendering, or applied as a pre- of post-processing step. It may also, or alternatively, be combined with any filtering related to the invention.
- the binaural rendering is based on binaural transfer functions such as head related binaural transfer functions which vary from person to person due to the acoustic properties of the head, ears and reflective surfaces, such as the shoulders.
- binaural filters can be used to create a binaural recording simulating multiple sources at various locations. This can be realized by convolving each sound source with the pair of Head Related Impulse Responses (HRIRs) that correspond to the position of the sound source.
- HRIRs Head Related Impulse Responses
- the appropriate binaural filters can be determined. Typically, such measurements are made e.g. using models of human heads, MRI scans or indeed in some cases the measurements may be made by attaching microphones close to the eardrums of a person.
- the binaural filters can be used to create a binaural recording simulating multiple sources at various locations. This can be realized e.g. by convolving each sound source with the pair of measured impulse responses for a desired position of the sound source. In order to create the illusion that a sound source is moved around the listener, a large number of binaural filters is typically required with adequate spatial resolution, e.g. 10 degrees.
- the head related binaural transfer functions may be represented e.g. as Head Related Impulse Responses (HRIR), or equivalently as Head Related Transfer Functions (HRTFs) or, Binaural Room Impulse Responses (BRIRs), or Binaural Room Transfer Functions (BRTFs).
- HRIR Head Related Impulse Response
- HRTFs Head Related Transfer Functions
- BRIRs Binaural Room Impulse Responses
- BRTFs Binaural Room Transfer Functions
- the (e.g. estimated or assumed) transfer function from a given position to the listener's ears (or eardrums) may for example be given in the frequency domain in which case it is typically referred to as an HRTF or BRTF, or in the time domain in which case it is typically referred to as a HRIR or BRIR.
- the head related binaural transfer functions are determined to include aspects or properties of the acoustic environment and specifically of the room in which the measurements are made, whereas in other examples only the user characteristics are considered.
- Examples of the first type of functions are the BRIRs and BRTFs.
- a well-known method to determine binaural transfer functions is binaural recording. It is a method of recording sound that uses a dedicated microphone arrangement and is intended for replay using headphones. The recording is made by either placing microphones in the ear canal of a subject or using a dummy head with built-in microphones, a bust that includes pinnae (outer ears). The use of such dummy head including pinnae provides a very similar spatial impression as if the person listening to the recordings was present during the recording.
- the set of binaural transfer functions may accordingly comprise binaural transfer functions for a, typically high, number of different positions with each binaural transfer function providing information of how an audio signal should be processed/ filtered in order to be perceived to originate from that position. Individually applying binaural processing to a plurality of audio signals/ sources and combining the result may be used to generate an audio scene with a number of audio sources positioned at appropriate positions in the sound stage.
- the audio apparatus further comprises a renderer 207 which is coupled to the first receiver 201, second receiver 203, the third receiver 205 and which receives the input audio signals/ data, the acoustic environment data (room data), and the set of binaural transfer functions.
- the renderer 207 is arranged to generate output audio signals from the set of input audio signals in response to the acoustic environment data and the set of binaural transfer functions.
- the renderer 207 generates output audio signals that represent the desired audio scene when presented to a user.
- the output audio signals are specifically a binaural stereo output signal which can be provided to a user via headphones resulting in the perception of the audio stage.
- the renderer 207 of FIG. 2 includes a reverberator 209 which generates a reverberation component for the output audio signals by applying reverberation processing to the set of input audio signals.
- the renderer 207 has a specific processing which specifically generates a reverberation component for the audio being presented to the user.
- the reverberation component may be combined with other audio components, such as typically audio components that do not reflect reverberation (although possibly these audio components may also include some reverberation characteristics).
- a second audio component may be generated to represent the direct audio component as well as possibly early reflections of the audio.
- An example of such a renderer 207 is illustrated in FIG. 3 .
- the renderer 207 comprises a binaural processor 301 which is arranged to apply binaural processing to the received input audio signals.
- the binaural processor 301 processes the input audio signals based on the set of binaural transfer functions to generate binaural signals that correspond to the desired position of the audio sources of the input audio signals.
- the appropriate binaural transfer function to use is dependent on the desired position for the input audio signal being processed and may accordingly be selected based on position data received with the input audio signals.
- the binaural processor 301 may retrieve the binaural transfer function closest to the source position (or e.g. interpolate between the closest binaural transfer functions). The input audio signal may then be convoluted with the retrieved binaural transfer function to provide a binaural audio component (it will be appreciated that a binaural transfer function for a given position will typically include a transfer function for the right ear and a transfer function for the left ear).
- the convolution may also be a frequency dependent filtering with a gain and phase offset to generate a desired signal level (relative to other frequencies), a level difference and a phase difference between the two signals for each frequency.
- a desired signal level relative to other frequencies
- a level difference and a phase difference between the two signals for each frequency.
- the phase difference may be omitted to reduce computational complexity.
- the binaural processor 301 may proceed to perform binaural processing for all input audio signals and add these together to generate a binaural output signal.
- the binaural processor 301 thus provides a binaural audio signal corresponding to the sound stage with the individual audio sources (perceived to be) positioned at the desired positions.
- the binaural transfer functions will reflect the direct sound and possibly some early reflections. However, in most embodiments, the binaural transfer functions will not include a reverberation component or room specific information. In particular, the binaural transfer functions may be anechoic binaural transfer functions. The binaural transfer functions may specifically be HRIRs or HRTFs which reflect user anthropometric properties but not e.g. reverberation or room/ acoustics environment dependent properties. The following description will accordingly often specifically refer to HRTFs but it will be appreciated that this does not imply that the invention, or indeed even the embodiments, are limited to HRTFs.
- the reverberator 209 may process the input audio signals to generate the reverberation component. This processing is dependent on the acoustic environment data and the input audio signals will be processed to generate an output stereo signal that corresponds to reverberant audio for an acoustic environment described by the acoustic environment data.
- the reverberator 209 may for example include reverberation filters or synthetic reverberators, such as a Jot reverberator.
- the binaural processor 301 may accordingly generate audio components corresponding to the direct sound propagation such that this is perceived to arrive from the audio source position.
- the binaural processor 301 may also be arranged to generate audio components corresponding to paths from the source to the user that includes one or a few reflections. Such audio will be perceived to arrive from a position that does not correspond to the position of the audio source but rather to the position of the last reflection before reaching the user.
- a position of the last reflection may be determined, and an audio component corresponding to this early reflection will be generated using e.g. binaural transfer functions/ HRTFs corresponding to this position.
- the audio apparatus may use the positions of the audio sources and user/listener in the room/environment and determine reflections on surfaces such as walls, ceiling, floor to the user. Typically, this may be done by mirroring audio source positions with respect to the reflecting surfaces to easily find the distance and direction of incidence of the reflection of that source on the corresponding surface. Second order reflections can be determined by also mirroring a second surface with respect to the first surface and mirroring the mirrored audio source with respect to the mirrored second surface. These reflections can then be rendered as additional audio sources with the HRTFs corresponding to their direction of incidence.
- a (frequency dependent) attenuation may be used to model the reflectivity of the one or more surfaces related to the reflection and the distance from the source via the reflections to the user/listener.
- the output signals of the binaural processor 301 and the reverberator 209 are fed to a combiner 303 which combines the stereo output signals to generate an output binaural audio signal.
- the combiner 303 may for example perform a weighted summation with predetermined weights.
- the renderer 207 of FIG. 3 generates an output stereo signal which comprises a binaural component generated by binaural processing of the input audio signals based on the binaural transfer functions and a reverberation component generated by processing the input audio signals based on the room characteristics.
- the positional information is provided by the binaural processor 301 and the binaural component whereas the reverberation properties of the acoustic environment data are represented by the reverberation component.
- the binaural processor 301 is using the binaural transfer functions and the reverberator 209 is using the acoustic environment data.
- the reverberation processing is further adapted based on a property (or properties) of the set of binaural transfer functions.
- the set of binaural transfer functions are not merely used for the binaural processing and the positioning of audio sources but is also used to adapt the reverberation processing despite the binaural transfer functions typically not comprising any reverberation information or specific acoustic environment data information, and indeed even if anechoic binaural transfer functions/ HRTFs are used.
- the set of binaural transfer functions are permanent or semi-permanent and may specifically be static for the entire VR session.
- the binaural transfer functions may be fixed binaural transfer functions stored in local memory and not being dependent on the specific application.
- the acoustic environment data may be dynamically varying and indeed the central server 101 may continuously transmit acoustic environment data indicative of the current acoustic properties. For example, as a user moves, the central server 101 may continuously transmit acoustic environment data that characterizes the (virtual) acoustic environment, e.g. reflecting when the user moves from a small room to a large room etc.
- the binaural transfer function may not be dependent on the current acoustic environment.
- the binaural transfer functions may specifically not reflect any reverberation properties or acoustic environment specific information.
- the binaural transfer functions may for example be HRTFs/HRIRs rather than BRTF/BRIR.
- the acoustic environment data may be independent of positions of audio sources of the input audio signals.
- the acoustic environment data may comprise no data which is used to position individual audio sources.
- the acoustic environment data may describe the common environment for the user and the audio sources of the input audio signals.
- the set of binaural transfer functions may be static for a given session/ user experience. Any update rate for changes in the set of binaural transfer functions may be substantially lower than an update rate for changes in the acoustic environment data. E.g. it may typically be at least 10 times or 100 times lower. In many embodiments, the set of binaural transfer functions may be static whereas the acoustic environment data is dynamically varying for a user session.
- the audio apparatus of FIGs. 2 and 3 further comprises an adapter 211 that is arranged to adapt the reverberation processing based on the set of binaural transfer functions.
- the adapter 211 may adapt a property of the reverberation processing in response to a property of the set of binaural transfer functions.
- the Inventor has realized that such an adaptation may be advantageous in many scenarios. Specifically, the Inventor has realized that the approach may provide a personalization or adaptation of the reverberation to the individual user or adaptation to the binaural transfer functions used by the user and that this improves audio perception despite the acoustic environment data referring to the acoustic environment and not individual properties of the user. The Inventor has further realized that the personalization can be performed efficiently and with high quality by considering properties of the set of binaural transfer functions and matching the reverberation performance accordingly. The interworking between the seemingly separate data sets and processing has been found to provide substantially improved audio perception with a perceived improved audio quality and feeling of naturalness.
- the approach allows for an efficient system and distribution of functionality. It provides a high degree of flexibility and interworking. For example, it allows a central server to simply provide audio data and acoustic environment data without having to consider how this is processed at the client. Specifically, it can distribute this information without considering or having any knowledge of the binaural transfer functions used at the individual client and without considering or having any knowledge of the audio processing and algorithms used. It thus supports the acoustic environment data, audio data and set of binaural transfer function data to be received from completely different and independent sources.
- binaural transfer functions such as HRIRs and HRTFs
- room impulse responses consist of an anechoic portion (or direct sound) that only depends on the subject's anthropometric attributes (such as head size, ear shape, etc), followed by a reverberant portion that characterizes the combination of the room and the anthropometric properties.
- the reverberant portion contains two temporal regions, usually overlapping.
- the first region contains so-called early reflections, which are isolated reflections of the sound source on walls or obstacles inside the room before reaching the ear-drum (or measurement microphone).
- the second region in the reverberant portion is the part where these reflections are not isolated anymore. This region is called the diffuse- or late reverberation tail.
- the reverberant portion contains cues that give the auditory system information about distance of the source and size and acoustical properties of the room.
- the energy of the reverberant portion in relation to that of the anechoic portion largely determines the perceived distance of the sound source.
- the density of the (early-) reflections contributes to the perceived size of the room.
- reverberation time is the time that it takes for reflections to drop 60 dB in energy level.
- the reverberation time gives information on the acoustical properties of the room; whether its walls are very reflective (e.g. bathroom) or there is much absorption of sound (e.g. bed-room with furniture, carpet and curtains).
- EDC Energy Decay Curve
- EDR Energy Decay Relief
- the T 60 time can be derived. But since the decay may not be linear, Early Decay Time (EDT) is also used in direct comparison with T 60 . EDT corresponds to the drop from 0 to -10 dB times 6 (to compare with T60). Early decay development is a cue to distinguishing spaces, and is highly correlated to dissimilarity between outside and inside reverberation [5].
- EDT Early Decay Time
- Further properties of the reverberation may include:
- acoustic environment data may be provided which is indicative of these reverberation characteristics for the room/ environment.
- acoustic environment data comprising values for the T 60 , reflection density and/or modal density may be provided. These properties may as previously mentioned be described as independent of the individual user, e.g. they may be based on a nominal or reference measurement.
- the acoustic environment data may be provided from the central server 101 together with the audio data describing the audio signals.
- the personalization and customization including the rendering algorithm selection and the retrieval of personalized binaural transfer functions, is performed at the individual client.
- MPEG-I as in: MPEG Immersive
- AR/VR augmented reality, virtual reality, also considered to include 'mixed reality'
- MPEG-I there is a high-level architecture with an audio renderer that takes care of rendering an immersive audio scene corresponding to the virtual environment.
- audio renderer typically, virtual sound elements are rendered as if they were present in the physical environment of the user.
- MPEG will standardize a default renderer that can model an acoustic environment based on metadata (either directly from the bitstream or analyzed from indirect data and metadata or from analyzing the physical environment of the user), there will be an API to replace the default renderer.
- HRTFs are highly personal and are therefore also not fixed and may be personalized or optimized by means of selecting from a large default set or by providing a personal set.
- the renderer can consist of several components, two of which will typically be a binaural renderer and reverberation renderer module.
- the binaural renderer takes in audio objects, channels and/or HOA signal sets and a set of HRIRs / HRTFs and render these. Thus, this may correspond to the binaural processor 301 of FIG. 3 .
- these HRTFs are anechoic or at least non-reverberant (i.e. do not include late reverberation, or even early reflections).
- the HRTFs must preferably be anechoic, and the early reflections modelled according to room properties and possibly other reflecting surfaces.
- the reverberation characteristics can then be modelled separately based on acoustic environment data, such as by the reverberator 209 of FIG. 3 .
- This setup may use a generic late reverberation simulation/rendering that allows manipulation of acoustic properties. Since the HRTF set and reverb rendering may each be replaced by external versions, the acoustic environment data can only rely on generic representations.
- the virtual room's acoustical properties will typically be described by means of room properties (e.g. dimensions and/or reverberation properties like T 60 ) or a generic RIR rather than configuration parameters for a specific synthetic reverberator.
- room properties e.g. dimensions and/or reverberation properties like T 60
- RIR reverberation properties
- the entire rendering is under control of the application and the reverb is provided together with the HRTFs, representing specific acoustic properties.
- the reverberation processing tool generally determines in what representation the late reverb metadata must be provided (e.g. FIR coefficients, reverb model parameters) and the reverberation and binaural processing is provided as an integrated and consistent audio representation.
- FIG. 2 and 3 allows for a combination of the binaural transfer functions/ HRTFs with the reverb model to allow matching of the reverb to the specific set of HRTFs. This may provide for a personalization of the reverberation effect to match characteristics of the user as reflected by the HRTFs.
- the reverb characteristics are personalized to fit the HRTF set. It has been found that this provides a substantially improved perceived audio quality.
- the approach may be highly advantageous for systems such as those of MPEG-I by providing high performance while allowing acoustic environment data and binaural transfer function data to be retrieved from and generated by separate and independent sources.
- the reverberation may in this way be made to depend on both scene-dependent acoustic properties and on personal, anthropometric-acoustic properties, where the scene-dependent acoustic properties are provided from a different source than the HRTFs. It supports a scenario where the scene-dependent acoustic properties are time-varying, as the user moves through a scene or when a scene change occurs.
- the acoustic environment data may in different embodiments comprise different parameters and values, and it will be appreciated that the specific choice of acoustic environment data will depend on the preferences and requirements of the individual embodiment.
- the acoustic environment data may comprise metadata that can directly or indirectly be used to configure the reverberator 209. For example, T 60 time, frequency dependent T 60 time, room modes, modal density, Energy Decay Curve (EDC), Energy Decay Relief (EDR), Early Decay Time, frequency dependent inter-aural coherence/ correlation, room dimensions, room shape descriptions, frequency-dependent reflective properties of walls, Room Impulse Response (RIR), etc. may be provided.
- the scene description may not provide room acoustic parameters, but a RIR or BRIRs.
- RIR or BRIRs For example, as FIR or IIR filter descriptions.
- the reverberator 209 may then simply apply the filters.
- the filters or the processing is customized to the set of binaural transfer functions that are used for the direct part and early reflections.
- the reverberation processing can be done in multiple ways, and this to some extend depends on which data is provided.
- the specific reverberation processing will depend on the preferences and requirements of the individual embodiment.
- a room impulse response (RIR) is provided as a description of the room acoustics. It may be analyzed to derive T 60 , EDR, modal density, etc. to (indirectly) configure a synthetic reverberator, or used as a basis for two or more uncorrelated filters.
- RIR room impulse response
- providing de-personalized BRIRs may be used directly or after personalization processing as filters generating late reverberation signals.
- the metadata may provide parameters descriptive of acoustic properties that can be used to indirectly configure a synthetic reverberator.
- Synthetic reverberation algorithms are often employed, because of the ability to modify certain properties of the acoustic simulation, and because of their relatively low computational complexity in generating a binaural rendering.
- reverberation can be simulated by models incorporating a number of aspects that appear important for realistic reverberation. By changing parameters of these models, it is relatively easy to represent a wide range of reverberation effects. Providing means for manipulation of reflection density, reverberation time and overall energy, it is possible to model different rooms. Since the models can be used to reproduce the perception of a measured BRIR, it includes sufficient configuration possibilities to include personal aspects of the late reverb.
- the adaptation and personalization of the reverberation processing may be performed in many different ways.
- the approach may for example depend on the specific reverberation processing that has been implemented.
- the adapter 211 may be arranged to adapt the reverberation processing such that characteristics of the reverberation component match a corresponding characteristic of the set of binaural transfer functions.
- the adaptation may be such that the reverberation processing will have the same characteristic as the binaural transfer function processing.
- the adapter 211 may adapt the reverberation processing such that a property matches the corresponding property of the binaural transfer functions, and specifically an average or combined property of a plurality of the set of binaural transfer functions.
- the adapter 211 may specifically be arranged to adapt a property of the reverberation processing such that an identifying characteristic of the reverberation component corresponds to (or match) an identifying characteristic of the set of binaural transfer functions where the identifying characteristic may be a property of the binaural transfer function that depends on a given anthropometric property or a combination of anthropometric properties.
- the identifying characteristic may e.g. be a property of the binaural transfer function having a dependency on the widths, lengths, protrusions, etc. of the ears or head.
- the adapter 211 may be arranged to adapt the reverberation processing such that characteristics of the reverberation component match an anthropometrically dependent characteristic of the set of binaural transfer functions.
- a particularly advantageous parameter to adapt is a frequency response of the reverberation processing.
- the property of the set of binaural transfer functions being considered is a frequency response characteristic and the adapter 211 is arranged to adapt a frequency response of the reverberation processing in response to the frequency response characteristic of the set of binaural transfer functions.
- the frequency response of a processing is often also referred to as a coloration of the signal being processed.
- the ear-dependent coloration may be a main contributor to a personalized late reverberation. This may be considered a simplification that can be made because the human auditory system cannot distinguish between the many individual reflections that occur in very short time intervals in the case of late reverberation. Therefore, inter-aural phase- or level differences are not traceable to specific sound sources.
- the late reverberation causes a diffuse sound-field.
- coloration may be a major component to personalize a late reverberation signal.
- acoustics also affect the coloration of the reverb, but this is completely independent of the coloration by the human anthropometric properties.
- Any coloration by the acoustics can be represented in the acoustics metadata as a result from frequency dependent T 60 or EDR, and reflectivity properties of walls and other object surfaces.
- the personal part of the coloration can be applied in addition to this acoustics-imposed coloration.
- the reverberator 209 may comprise an adaptable reverberator which is adaptive to the acoustic environment data but not to the binaural transfer functions.
- the adaptable reverberator may only consider the acoustic environment data but not the personalization data derived from the binaural transfer functions.
- the adaptable reverberator may not include any personalization but simply generate a reverberation based on the acoustic environment data. This may e.g. allow a low complexity implementation of the adaptable reverberator with potential for reuse of existing reverberator circuitry and algorithms.
- the adaptable reverberator may be supplemented by a filter which has a frequency response that is dependent on the set of binaural transfer functions and specifically on the frequency response characteristic determined from these (e.g. the average frequency response for a plurality of binaural transfer functions).
- the filter may be coupled in series with the adaptable reverberator and may adapt the frequency response of the reverberation component.
- the HRTF-based coloration could be performed after the reverb processing by a filtering of the generated reverberation signals as illustrated in FIG. 5 .
- This approach is typically beneficial in terms of computational complexity when the number of audio signals going into the adaptable reverberator is lower than the number of audio signals coming out.
- This approach is shown in FIG 6 and can typically be applied when the late reverberation generation is a predominantly linear process.
- the coloration processing can be chosen to be moved to be the first processing of the reverberator 209 as shown in FIG. 7 .
- the reverberator 209 may apply a filter internally and this may allow the reverberation generation to be combined with the HRTF-based coloration.
- An example is shown in FIG. 8 where the late reverberation generation comprises filtering with FIR filters.
- the acoustics metadata provides (de-personalized) BRIRs.
- the FIR modification may be a modification of the magnitude spectrum of the filter.
- a L and A R be the sets of two FIR filter coefficients corresponding to a left ear BRIR filter and a right ear BRIR filter respectively.
- C p,L and C p,R be coloration FIR filters (for the left- resp. right ear) representing the HRTF metadata.
- the HRTF metadata may provide one coloration filter for coloration of both ears' BRIR filters. This may be beneficial for symmetric HRTFs.
- the FIR modification excludes the IFFT operation and provides the FFT coefficients to FIR filtering block for filtering in the FFT domain.
- the reverberator 209 may comprise a synthetic reverberator and the adapter 211 may be arranged to adapt a processing parameter of the synthetic reverberator in response to the frequency response characteristic.
- a synthetic reverberator is specifically a reverberator that models the many reflections of an audio source in a room by a combination of simple processing blocks, such as gains, low order filters, delays and feedback loops.
- the feedback loops typically include delays that cause repetitions of an input signals with those delays.
- Gains (0 ⁇ g ⁇ 1) or filters in the feedback loops cause the repetitions to dissipate as the signal component spends more time in the loop, similar to the decaying energy of an audio wave component as it is progressing through a room and reflecting off walls.
- These filters control simulation of a (frequency dependent) T60.
- periodicity due to the fixed delays is distorted and creates a chaotic diffuse reverb tail as would happen in a real environment.
- Further filtering steps, typically after or before the feedback loops, may impose coloration due to a combination of reflectivity properties.
- the coloration may be integrated into an operation of the synthetic reverberator.
- the Jot reverberator as shown in FIG. 9 already has filtering to control the reverberation signals' coloration per ear (filters t L and t R in FIG. 9 ).
- filters t L and t R in FIG. 9 filters t L and t R in FIG. 9 .
- C p,L ( z ) and C p,L ( z ) be coloration filters representing the HRTF metadata, then the synthetic reverberation's coloration filters can be updated to impose the personalized coloration.
- a single C p (z) may be provided for coloration of both ears to represent symmetry in an HRTF set.
- a particularly advantageous parameter to adapt is an inter-ear correlation property of the reverberation processing.
- the property of the set of binaural transfer functions being considered is an inter-ear correlation property and the adapter 211 is arranged to adapt an inter-ear correlation property of the reverberation processing in response to the inter-ear correlation property of the set of binaural transfer functions.
- the inter-ear correlation property may reflect a coherency or cross-correlation between signals/ processing corresponding to the right and left ears of the listener.
- the inter-ear correlation property for a plurality of binaural transfer functions may be determined by comparing the binaural transfer functions for the left and right ears for a plurality of positions, and the reverberation processing may then be set to provide the same inter-ear correlation property.
- the determination and setting of the inter-ear coherency may typically be frequency dependent and the inter-ear coherency may typically be independently determined and set in different frequency bands/ bins.
- correlation property comprises all measures indicative of correlation between signals including correlation coefficients and coherence and specifically all properties that can be derived from cross-correlation measures.
- the correlation coefficient (usually denoted as the correlation) is the real value of the complex cross-correlation (for no lag), and the coherence is the magnitude of the complex cross-correlation.
- the term correlation property includes both the correlation coefficient and coherence. Typical embodiments may use one or both of these and the following references to coherence may be considered to mutatis mutandis apply to correlation coefficients or indeed any other correlation property.
- Inter-aural coherence is an important contributor to reverb personalization. Further, it is advantageous to control inter-aural coherence in a frequency dependent manner.
- Synthetic reverberators like the Jot reverberator shown in FIG. 9 have means to control the Frequency Dependent Inter-aural Coherence (FDIC). More details on how to configure the filters c 1 (z) and c 2 (z) can be found in reference [3] indicated above.
- FDIC Frequency Dependent Inter-aural Coherence
- c 1 (z) and c 2 (z) would be fully defined by the HRTF metadata and not by the acoustic environment data.
- the approach used to control the FDIC by the Jot reverberator can be applied to any source of two (at least partially) decorrelated reverb signals, where decorrelated means the signals have a very low cross-correlation yet having highly similar temporal and spectral profiles.
- decorrelated means the signals have a very low cross-correlation yet having highly similar temporal and spectral profiles.
- late reverberation processing blocks may have a one or more pairs of decorrelated late reverberation impulse responses, or the acoustics metadata may provide such decorrelated late reverberation impulse responses.
- the FDIC is controlled by the filters.
- c 1 ⁇ 1 + ⁇ ⁇ 2
- c 2 ⁇ 1 ⁇ ⁇ ⁇ 2
- ⁇ ( ⁇ ) is the desired frequency dependent coherence between the output channels.
- FIG. 10 illustrates an example of how two decorrelated signals can be combined to control the coherence of the output signals.
- the reverberator 209 may comprise a combiner which, e.g. in this way, generates a pair of partially correlated signals from a pair of uncorrelated signals generated from the set of input audio signals. The resulting partially correlated signals may then be used to generate the output audio signals, or indeed may directly be used as the resonating components (e.g. if the combiner is the last processing block of the reverberator 209).
- the combiner may accordingly adapt the correlation as shown above to provide the desired inter-ear correlation property.
- a single reverb signal may be produced by e.g. a synthetic reverberator only considering the acoustic environment data.
- a decorrelated signal may then be generated from this reverberation signal resulting in the generation of a stereo reverberation signal with the two signals being decorrelated.
- a pair of partially correlated signals may then be generated from these two signals, e.g. by a mixer performing a weighted summation for each of the partially correlated signals.
- the weights may be determined to provide the desired correlation as determined from the binaural transfer functions, i.e. such that the correlation of the resulting reverberation component matches that of the set of binaural transfer functions.
- the binaural transfer functions may be analyzed directly in order to determine a suitable property (or properties) for adapting and personalizing the reverberator 209.
- the property may be determined in response to e.g. metadata describing parameters of the binaural transfer functions.
- HRTF metadata input may comprise metadata that can directly or indirectly be used for matching the late reverberation to the HRTFs used for the rendering of the direct path and/or early reflections.
- HRTF metadata input may comprise metadata that can directly or indirectly be used for matching the late reverberation to the HRTFs used for the rendering of the direct path and/or early reflections.
- any subset of the HRTF set, parameters or metadata provided along with the HRTFs, parameters or metadata extracted from analyzing the HRTFs may be used.
- the HRTFs may be directly analyzed to derive personalization information, which could be provided as the HRTF metadata.
- the HRTF set (or a subset) is provided as HRTF metadata and the analysis is performed inside the late reverberation processing block.
- the audio apparatus may be arranged to determine a property for adapting the reverberation processing in response to a combination of properties for a plurality of binaural transfer functions of the set of binaural transfer functions for different positions.
- M represents a set of HRTFs involved in the analysis. This may be a subset of a larger set of HRTFs. For example, to reduce computational complexity. However, it may also be a subset that is equally distributed over a sphere's surface. In some embodiments the subset M may be different for the left and right ear. For example, to include only the ipsilateral responses. In some embodiments the subset may be influenced by the location in a room, or room properties. For example, when the user is very close to a wall, or when a wall is missing, HRTFs corresponding to that direction may be excluded.
- coloration may be derived from a single HRTF pair. This will typically be suboptimal, but may be beneficial in scenarios where the analysis has to be very low computational complexity. In such cases it is typically best to choose the HRTF from the median plane.
- ⁇ ⁇ ⁇ i ⁇ M w i ⁇ H i , L ⁇ ⁇ H i , R ⁇ ⁇ i ⁇ M w i ⁇ H i , L ⁇ ⁇ H i , L * ⁇ ⁇ ⁇ i ⁇ M w i ⁇ H i , R ⁇ ⁇ H i , R * ⁇ with w i a weighing factor, to compensate for the non-equal spacing and/or other aspects, for example when certain HRTF pairs were measured at different distances than others.
- coloration and coherence further processing may be applied such as smoothing, dynamic range modification on the (coloration) spectrum, biasing of the inter-aural coherence to lower or higher values, etc.
- such embodiments may analyze BRIR or late reverberation directly, rather than through the analysis of a set of HRTFs. In such cases it may be necessary to eliminate, or compensate for, any influence of room acoustics. Especially for coloration analysis this may be necessary.
- b is a frequency band index (for example following ERB scale) of which its start-bin is given by the function (or vector) s ( b ).
- the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
- the invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors.
- the elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units, circuits and processors.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18182376.6A EP3595337A1 (fr) | 2018-07-09 | 2018-07-09 | Appareil audio et procédé de traitement audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18182376.6A EP3595337A1 (fr) | 2018-07-09 | 2018-07-09 | Appareil audio et procédé de traitement audio |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3595337A1 true EP3595337A1 (fr) | 2020-01-15 |
Family
ID=63077668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18182376.6A Withdrawn EP3595337A1 (fr) | 2018-07-09 | 2018-07-09 | Appareil audio et procédé de traitement audio |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP3595337A1 (fr) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111641899A (zh) * | 2020-06-09 | 2020-09-08 | 京东方科技集团股份有限公司 | 虚拟环绕声发声电路、平面音源装置及平面显示设备 |
EP3930349A1 (fr) * | 2020-06-22 | 2021-12-29 | Koninklijke Philips N.V. | Appareil et procédé pour générer un signal de réverbération diffus |
WO2022108494A1 (fr) * | 2020-11-17 | 2022-05-27 | Dirac Research Ab | Modélisation et/ou détermination améliorée(s) de réponses impulsionnelles de pièce binaurales pour des applications audio |
EP4152770A1 (fr) * | 2021-09-17 | 2023-03-22 | Nokia Technologies Oy | Procédé et appareil de gestion audio de communication dans un rendu de scène audio immersive |
WO2023238637A1 (fr) * | 2022-06-10 | 2023-12-14 | ソニーグループ株式会社 | Dispositif, procédé et programme de traitement d'informations |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100246832A1 (en) * | 2007-10-09 | 2010-09-30 | Koninklijke Philips Electronics N.V. | Method and apparatus for generating a binaural audio signal |
US20110286614A1 (en) * | 2010-05-18 | 2011-11-24 | Harman Becker Automotive Systems Gmbh | Individualization of sound signals |
US20130236040A1 (en) * | 2012-03-08 | 2013-09-12 | Disney Enterprises, Inc. | Augmented reality (ar) audio with position and action triggered virtual sound effects |
WO2017178309A1 (fr) * | 2016-04-12 | 2017-10-19 | Koninklijke Philips N.V. | Traitement audio spatial mettant en évidence des sources sonores proches d'une distance focale |
WO2017203011A1 (fr) * | 2016-05-24 | 2017-11-30 | Stephen Malcolm Frederick Smyth | Systèmes et procédés pour améliorer des systèmes de virtualisation d'audio |
US20180124539A1 (en) * | 2013-01-15 | 2018-05-03 | Koninklijke Philips N.V. | Binaural audio processing |
-
2018
- 2018-07-09 EP EP18182376.6A patent/EP3595337A1/fr not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100246832A1 (en) * | 2007-10-09 | 2010-09-30 | Koninklijke Philips Electronics N.V. | Method and apparatus for generating a binaural audio signal |
US20110286614A1 (en) * | 2010-05-18 | 2011-11-24 | Harman Becker Automotive Systems Gmbh | Individualization of sound signals |
US20130236040A1 (en) * | 2012-03-08 | 2013-09-12 | Disney Enterprises, Inc. | Augmented reality (ar) audio with position and action triggered virtual sound effects |
US20180124539A1 (en) * | 2013-01-15 | 2018-05-03 | Koninklijke Philips N.V. | Binaural audio processing |
WO2017178309A1 (fr) * | 2016-04-12 | 2017-10-19 | Koninklijke Philips N.V. | Traitement audio spatial mettant en évidence des sources sonores proches d'une distance focale |
WO2017203011A1 (fr) * | 2016-05-24 | 2017-11-30 | Stephen Malcolm Frederick Smyth | Systèmes et procédés pour améliorer des systèmes de virtualisation d'audio |
Non-Patent Citations (3)
Title |
---|
JOT, J. ET AL.: "Digital delay networks for designing artificial reverberations", 90TH AES CONVENTION, 1991 |
JOT, J.: "An analysis/synthesis approach to real-time artificial reverberation", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP, vol. 2, 1992, pages 221 - 224, XP000356977, DOI: doi:10.1109/ICASSP.1992.226080 |
MENZER, F. ET AL.: "Binaural reverberation using a modified Jot reverberator with frequency dependent interaural coherence matching", 126TH AES CONVENTION, 2009 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111641899A (zh) * | 2020-06-09 | 2020-09-08 | 京东方科技集团股份有限公司 | 虚拟环绕声发声电路、平面音源装置及平面显示设备 |
WO2021249172A1 (fr) * | 2020-06-09 | 2021-12-16 | 京东方科技集团股份有限公司 | Circuit de production de son d'ambiance virtuelle, appareil de source sonore plane et dispositif d'affichage à écran plat |
US12081954B2 (en) | 2020-06-09 | 2024-09-03 | Beijing Boe Optoelectronics Technology Co., Ltd. | Virtual surround sound production circuit, planar sound source apparatus, and flat panel display device |
EP3930349A1 (fr) * | 2020-06-22 | 2021-12-29 | Koninklijke Philips N.V. | Appareil et procédé pour générer un signal de réverbération diffus |
WO2021259829A1 (fr) * | 2020-06-22 | 2021-12-30 | Koninklijke Philips N.V. | Appareil et procédé pour générer un signal de réverbération diffus |
WO2022108494A1 (fr) * | 2020-11-17 | 2022-05-27 | Dirac Research Ab | Modélisation et/ou détermination améliorée(s) de réponses impulsionnelles de pièce binaurales pour des applications audio |
EP4152770A1 (fr) * | 2021-09-17 | 2023-03-22 | Nokia Technologies Oy | Procédé et appareil de gestion audio de communication dans un rendu de scène audio immersive |
WO2023238637A1 (fr) * | 2022-06-10 | 2023-12-14 | ソニーグループ株式会社 | Dispositif, procédé et programme de traitement d'informations |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11877135B2 (en) | Audio apparatus and method of audio processing for rendering audio elements of an audio scene | |
EP3095254B1 (fr) | Impression spatiale améliorée pour audio domestique | |
EP3595337A1 (fr) | Appareil audio et procédé de traitement audio | |
US11656839B2 (en) | Audio apparatus, audio distribution system and method of operation therefor | |
JP2022500917A (ja) | オーディオビジュアルデータを処理するための装置及び方法 | |
EP4229601A1 (fr) | Appareil de restitution audiovisuelle et son procédé de fonctionnement | |
RU2823573C1 (ru) | Аудиоустройство и способ обработки аудио | |
RU2815366C2 (ru) | Аудиоустройство и способ обработки аудио | |
RU2815621C1 (ru) | Аудиоустройство и способ обработки аудио | |
RU2798414C2 (ru) | Аудиоустройство и способ обработки аудио | |
EP4210353A1 (fr) | Appareil audio et son procédé de fonctionnement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: KONINKLIJKE PHILIPS N.V. |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20200716 |