GB2594942A - Capturing and enabling rendering of spatial audio signals - Google Patents

Capturing and enabling rendering of spatial audio signals Download PDF

Info

Publication number
GB2594942A
GB2594942A GB2006944.9A GB202006944A GB2594942A GB 2594942 A GB2594942 A GB 2594942A GB 202006944 A GB202006944 A GB 202006944A GB 2594942 A GB2594942 A GB 2594942A
Authority
GB
United Kingdom
Prior art keywords
spatial
sets
audio signals
metadata
capture devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB2006944.9A
Other versions
GB202006944D0 (en
Inventor
Sakari Rämö Anssi
Juhani Laaksonen Lasse
Vasilache Adriana
Juhani Lehtiniemi Arto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to GB2006944.9A priority Critical patent/GB2594942A/en
Publication of GB202006944D0 publication Critical patent/GB202006944D0/en
Publication of GB2594942A publication Critical patent/GB2594942A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/09Electronic reduction of distortion of stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)

Abstract

Several different spatial audio capture devices, all aligned to reference coordinates, are used to obtain corresponding audio signals and sets of metadata 601. The sets of metadata are processed into a high-quality set of metadata 613 that may be used for rendering the spatial audio signals captured by the devices. The devices may capture audio signals from the same sound space (e.g. the devices may be in the same environment). The processing may involve determining a reliable set of metadata (possibly one of the captured audio signal sets, 603) and using that set to filter 611 the other sets of spatial metadata. The sets of metadata may have variable reliability, and the high-quality set of metadata may have higher reliability than the original sets of metadata. During processing the metadata may be aligned 609 to a common direction of reference, possibly by correlating 605 direction of arrival (DOA) parameters.

Description

TITLE
Capturing and Enabling Rendering of Spatial Audio Signals
TECHNOLOGICAL FIELD
Examples of the disclosure relate to capturing and enabling rendering of spatial audio signals. In particular they relate to capturing and enabling rendering of spatial audio signals using spatial metadata.
BACKGROUND
Spatial audio capture devices can be used to capture spatial audio signals. The spatial audio signal can comprise a representation of a sound space. The spatial audio signal can then be rendered by an audio rendering device such as headphones or loudspeakers. It can be difficult to obtain high quality spatial audio as factors such as low signal to noise ratio, relatively large distances to an audio source, shadowing of microphones, and other factors can reduce the sound quality.
BRIEF SUMMARY
According to various, but not necessarily all, examples of the disclosure there is provided an apparatus comprising means for; obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
The plurality of different spatial audio capture devices may be aligned to a timeline.
The plurality of sets of audio signals captured by the plurality of different spatial audio capture devices may represent the same sound space.
The plurality of sets of spatial metadata may be obtained by processing the plurality of sets of audio signals captured by the plurality of different spatial audio capture devices.
Processing the plurality of sets of spatial metadata may comprise determining a reliable set of spatial metadata and filtering the other sets of spatial metadata with the reliable set of spatial metadata.
Determining a reliable set of spatial metadata may comprise selecting a master set of audio signals from one of the plurality of spatial audio capture devices and using the set of spatial metadata from the spatial audio capture device from which the master set of audio signals is obtained as the reliable set of spatial metadata.
The reliability of the sets of spatial metadata may change over time.
Processing the plurality of sets of spatial metadata may comprise aligning them to a common direction frame of reference.
Aligning the plurality of sets of spatial metadata to a common direction frame of reference may comprise correlating the directions of arrival of sound parameters within the plurality of sets of spatial metadata.
The high-quality set of spatial metadata may have a higher reliability than the sets of spatial metadata corresponding to the different spatial audio signals.
The spatial audio signals that are used for rendering may be selected from the plurality of sets of spatial audio signals captured by the plurality of different spatial audio capture devices.
The audio signals that are used for rendering may be obtained by processing one or more of the captured audio signals from the plurality of audio devices.
The apparatus may comprise means for enabling the high-quality spatial metadata and the audio signals to be transmitted to another apparatus.
The apparatus may comprise means for obtaining one or more audio signals from one or more microphones in addition to the audio signals captured by the plurality of spatial audio capture devices.
The apparatus may be comprised within at least one of the spatial audio capture devices.
The apparatus may be comprised within a processing device separate to the plurality of spatial audio capture devices.
The spatial metadata may comprise, for one or more frequency sub-bands; a sound direction parameter, and an energy ratio parameter.
According to various, but not necessarily all, examples of the disclosure there is provided, an apparatus comprising: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform; obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
According to various, but not necessarily all, examples of the disclosure there is provided a method comprising: obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
In some methods the plurality of different spatial audio capture devices may be aligned to a timeline.
According to various, but not necessarily all, examples of the disclosure there is provided a computer program comprising computer program instructions that, when executed by processing circuitry, cause: obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
BRIEF DESCRIPTION
Some examples will now be described with reference to the accompanying drawings in which: Fig. 1 shows an apparatus; Fig. 2 shows a system; Fig. 3 shows a method; Fig. 4 shows part of an example system;
Fig. 5 shows an example of the disclosure; and
Fig. 6 shows an example method.
DEFINITIONS
"sound space" refers to an arrangement of one or more sound sources in a three-dimensional space. A sound space may be defined in relation to recording sounds (a recorded sound space) and in relation to rendering sounds (a rendered sound space).
"sound scene" refers to a representation of the sound space listened to from a particular point of view (position) within the sound space.
DETAILED DESCRIPTION
The apparatus 101 and systems 201 according to examples of the disclosure enable the accuracy and the stability of the spatial metadata to be improved by processing a plurality of sets of spatial metadata from a plurality of spatial audio capture devices 203. This provides for improved reproduction of the sound space by playback devices that receive the spatial audio signals and the high-quality spatial metadata.
The figures illustrate an apparatus 101 comprising means for obtaining 401 a plurality of sets of audio signals. Different sets of the audio signals are captured by a plurality of different spatial audio capture devices 203. The plurality of different spatial audio capture devices 203 are aligned to a set of reference coordinates. The apparatus 101 also comprises means for obtaining 403 a plurality of sets of spatial metadata corresponding to the different spatial audio signals. A set of spatial metadata can be obtained for each of the spatial audio capture devices 203 that capture a set of audio signals. The apparatus 101 also comprises means for processing 405 the plurality of sets of spatial metadata into a high-quality set of spatial metadata and enabling 407 the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices 203.
Enabling 407 the high-quality set of spatial metadata to be used for rendering spatial audio signals can comprise sending the high-quality set of spatial metadata, with the spatial audio signals, to another device so that the spatial audio can be rendered by the another device. The high-quality spatial metadata can have a high-quality as compared to the spatial metadata obtained from the plurality of spatial audio capture devices.
Fig. 1 schematically illustrates an apparatus 101 according to examples of the disclosure. The apparatus 101 illustrated in Fig. 1 may be a chip or a chip-set. In some examples the apparatus 101 may be provided within devices such as a processing device. In some examples the apparatus 101 may be provided within an audio capture device or an audio rendering device.
In the example of Fig. 1 the apparatus 101 comprises a controller 103. In the example of Fig. 1 the implementation of the controller 103 may be as controller circuitry. In some examples the controller 103 may be implemented in hardware alone, have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
As illustrated in Fig. 1 the controller 103 may be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 109 in a general-purpose or special-purpose processor 105 that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor 105.
The processor 105 is configured to read from and write to the memory 107. The processor 105 may also comprise an output interface via which data and/or commands are output by the processor 105 and an input interface via which data and/or commands are input to the processor 105.
The memory 107 is configured to store a computer program 109 comprising computer program instructions (computer program code 111) that controls the operation of the apparatus 101 when loaded into the processor 105. The computer program instructions, of the computer program 109, provide the logic and routines that enables the apparatus 101 to perform the methods illustrated in Figs. 4 and 6. The processor 105 by reading the memory 107 is able to load and execute the computer program 109.
The apparatus 101 therefore comprises: at least one processor 105; and at least one memory 107 including computer program code 111, the at least one memory 107 and the computer program code 111 configured to, with the at least one processor 105, cause the apparatus 101 at least to perform: obtaining 401 a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices 203 wherein the plurality of different spatial audio capture devices 203 are aligned to a set of reference coordinates; obtaining 403 a plurality of sets of spatial metadata corresponding to the different spatial audio signals; processing 405 the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling 407 the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices 203.
As illustrated in Fig.1 the computer program 109 may arrive at the apparatus 101 via any suitable delivery mechanism 113. The delivery mechanism 113 may be, for example, a machine readable medium, a computer-readable medium, a non-transitory computer-readable storage medium, a computer program product, a memory device, a record medium such as a Compact Disc Read-Only Memory (CD-ROM) or a Digital Versatile Disc (DVD) or a solid state memory, an article of manufacture that comprises or tangibly embodies the computer program 109. The delivery mechanism may be a signal configured to reliably transfer the computer program 109. The apparatus 101 may propagate or transmit the computer program 109 as a computer data signal. In some examples the computer program 109 may be transmitted to the apparatus 101 using a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IPv6 over low power personal area networks) ZigBee, ANT+, near field communication (NFC), Radio frequency identification, wireless local area network (wireless LAN) or any other suitable protocol.
The computer program 109 comprises computer program instructions for causing an apparatus 101 to perform at least the following: obtaining 401 a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices 203 wherein the plurality of different spatial audio capture devices 203 are aligned to a set of reference coordinates; obtaining 403 a plurality of sets of spatial metadata corresponding to the different spatial audio signals; processing 405 the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling 407 the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices 203.
The computer program instructions may be comprised in a computer program 109, a non-transitory computer readable medium, a computer program product, a machine readable medium. In some but not necessarily all examples, the computer program instructions may be distributed over more than one computer program 109.
Although the memory 107 is illustrated as a single component/circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached storage.
Although the processor 105 is illustrated as a single component/circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable. The processor 105 may be a single core or multi-core processor.
References to "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc. or a "controller', "computer', "processor" etc. should be understood to encompass not only computers having different architectures such as single /multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc. As used in this application, the term "circuitry" may refer to one or more or all of the 20 following: (a) hardware-only circuitry implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable): (i) a combination of analog and/or digital hardware circuit(s) with software/firmware 25 and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g. firmware) for operation, but the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit for a mobile device or a similar integrated circuit in a server, a cellular network device, or other computing or network device.
The blocks illustrated in the Figs. 4 and 6 can represent steps in a method and/or sections of code in the computer program 109. The illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block may be varied. Furthermore, it can be possible for some blocks to be omitted.
Fig. 2 schematically shows a system 201 that can be used to implement examples of the disclosure.
The system 201 comprises a plurality of spatial audio capture devices 203 that are configured to capture a sound space 211, a central processing device 213 and a playback device 209. The devices 203, 213, 209 can be provided at different locations. For examples the plurality of spatial audio capture devices 203 can be provided at different locations to the playback device 209 and/or the central processing device 213.
It is to be appreciated that only features necessary for the understanding of the following description have been included in Fig. 2 and that the system 201 could comprise other components not shown in Fig. 2.
The spatial audio capture devices 203 can comprise any suitable devices that can be configured to capture spatial audio signals. Each of the spatial audio capture devices 203 can comprise two or more microphones that can be configured to capture spatial audio. The microphones can be spatially distributed within the spatial audio capture devices 203 so as to enable the spatial audio to be captured.
The spatial audio capture devices 203 can comprise portable user devices such as mobile phones, laptops or other similar devices. In some examples the spatial audio capture devices 203 could comprise teleconferencing devices or other similar devices. In some examples the spatial audio capture devices 203 can comprise dedicated spatial capture devices such as one or more microphone arrays. The microphone arrays can be configured with direct network connectivity or can be connected to the network 207 via a host device such as a laptop or mobile phone. In some examples the spatial audio capture microphones can be provided as an accessory for a device such as a mobile phone or a computer.
The spatial audio capture devices 203 can operate independently of each other. The spatial audio capture devices 203 can capture their own audio signals independently of the other spatial audio capture devices 203. The independently captured audio signals can then be transmitted to a central processing device 213 by any suitable means In the example shown in Fig. 2 the system 201 comprises three spatial audio capture devices 203. It is to be appreciated that any number of spatial audio capture devices 203 can be provided in other examples of the disclosure.
The spatial audio capture devices 203 are configured to capture a sound space 211.
The sound space 211 comprises one or more sound sources 205. In the example shown in Fig. 2 the sound source 205 comprises a person who can be talking or making any other sounds. In the example shown in Fig. 2 only one sound source 205 is provided in the sound space 211. It is to be appreciated that any number of sound sources 205 can be provided in a sound space 211.
The sound sources 205 can be fixed in position relative to the spatial audio capture devices 203 or can be moved relative to the spatial audio capture devices 203. For example some devices such as teleconferencing devices can be in a fixed position whereas devices such as mobile phones can be picked up and handled by a user while simultaneously capturing the spatial audio signals.
As the spatial audio capture devices 203 are located at different positions relative to the sound sources 205 they capture different sound scenes. That is, the different spatial audio capture devices 203 capture the sound space 211 from slightly different points of view.
Each of the spatial audio capture devices 203 can also be configured to enable spatial metadata to be obtained. The spatial metadata corresponds to the captured spatial audio signals in that it comprises information that indicates how to spatially reproduce the audio signals captured by the spatial audio capture devices 203. The spatial metadata can comprise information such as the direction of arrival of audio, distances to an audio source, direct-to-total energy ratios, diffuse-to-total energy ratio or any other suitable information. The spatial metadata can be provided in frequency bands.
In some examples the spatial metadata can comprise, for one or more frequency sub-bands; a sound direction parameter, and an energy ratio parameter.
In some examples a set of spatial metadata can be obtained for the audio signals captured by each of the spatial audio capture devices 203. In some examples each of the spatial audio capture devices 203 can be configured to process the captured audio signals to obtain a corresponding set of spatial metadata. In other examples the spatial audio capture devices 203 can be configured to transmit the audio signals to a central processing device 213 and the central processing device 213 can be configured to determine a set of spatial metadata for each of the spatial audio capture devices 203.
The spatial audio capture devices 203 within the sound space can be configured to communicate with each other. In some examples the spatial audio capture devices 203 can communicate via a wireless protocol such as Bluetooth, Bluetooth Low Energy, Bluetooth Smart, 6LoWPan (IPv6 over low power personal area networks) ZigBee, ANT+, near field communication (NEC), Radio frequency identification, wireless local area network (wireless LAN), 5G network or any other suitable protocol. In some examples one or more wired connections can be provided between two or more of the spatial audio capture devices 203.
The spatial audio capture devices 203 can be configured to exchange the spatial audio signals and corresponding sets of spatial metadata. The spatial audio capture devices 203 can be configured to provide the spatial audio signals and the corresponding sets of spatial metadata to a central processing device 213.
In the example shown in Fig. 2 the central processing device 213 is provided within the network 207. The central processing device 213 can be a server or any other suitable processing devices. The central processing device 213 can be located in a different location to the plurality of spatial audio capture devices 203. The spatial audio capture devices 203 can send the spatial audio signals and corresponding sets of spatial metadata to the central processing device 213 via a communications network 207. The communications network 207 can be a long-range communication network such as a cellular communication network or any suitable network or combinations of networks.
The central processing device 213 can be configured to perform methods such as the methods shown in Figs. 4 and 6 to enable a high-quality set of spatial metadata to be obtained. The spatial audio signals and the high-quality set of spatial metadata can then be transmitted over a communications network 207 to one or more playback devices 209.
The audio signals and high-quality set of spatial metadata are sent to one or more play back devices 209. The playback devices 209 comprise any means for rendering the received audio signals and using the high-quality spatial metadata to enable the spatial properties of the sound scene to be reproduced for the users of the playback devices 209. The play back devices 209 can be portable user devices such as lap tops, mobile phones or conferencing devices, virtual reality headsets or any other suitable type of devices. In the example shown in Fig. 2 only one playback device 209 is shown, however it is to be appreciated that the audio signals and high-quality set of spatial metadata can be sent to any number of playback devices 209. Where the system 201 comprises a plurality of playback devices 209 they can comprise different types of playback devices 209 and they can be in different locations to each other so that users of the playback devices do not need to be in the same room.
In some examples the playback devices 209 can be configured to provide mediated reality content to a user. The mediated reality can comprise augmented or virtual reality content. The received audio signals and the high-quality spatial metadata can be used by the playback devices 209 to provide the augmented or virtual reality content In the example system shown in Fig. 2 all of the audio capture devices 203 are configured to capture spatial audio signals. In some examples one or more audio capture devices can be provided that are not configured to capture spatial audio. For example, a high-quality microphone could be provided to capture a high-quality audio signal that can then be used to improve qualities such as signal to noise ratio in the play back audio signals.
In the example system shown in Fig. 2 the central processing device 213 is provided in a different location to the sound space 211. In other examples the central processing device 213 can be one of the spatial audio capture devices 203 that is located within the sound space 211. In such examples one of the spatial audio capture devices 203 can be configured to act as a master audio capture device and the other spatial audio capture devices 203 can be configured to transmit the captured audio signals and corresponding spatial metadata to the master audio capture device.
In the example shown in Fig. 2 the system is used for conferencing or other chat applications. In this example the audio signals can be transmitted to the playback device 209 and rendered in real time and without permanent storage of the audio signals or spatial metadata. In other examples the audio signals and spatial metadata can be captured and stored in one or the spatial audio capture devices 203 or in a memory of a separate storage device. The audio signals and sets of spatial metadata can then be retrieved by one or more playback devices 209 for rendering and play back at a later time as needed.
Fig. 3 shows an example arrangement of a plurality of spatial audio capture devices 203 that are being used to capture a sound scene 211 according to examples of the disclosure. In this example the spatial audio capture devices 203 are being used to capture a teleconference.
In the example of Fig. 3 five spatial audio capture devices 203 are located within the sound space 211. In the example of Fig. 3 the spatial audio capture devices 203 comprise two lap tops 301 and three mobile phones 303. Each of the mobile phones 303 and lap tops 301 comprise a plurality of spatially distributed microphones that enables them to be used to capture spatial audio signals.
In the example of Fig. 3 the spatial audio capture devices 203 are located near to each other. This can enable the spatial audio capture devices 203 to transfer data between each other using low power wireless networks or any other suitable means. This also enables the spatial audio capture devices 203 to capture the same sound space 211.
In the example of Fig. 3 each of the spatial audio capture devices 203 are provided on the surface of a table 305 however it is to be appreciated that each of the spatial audio capture devices 203 are portable so they can be picked up and otherwise moved by the users within the sound space 211.
It is to be appreciated that in other examples of the disclosure the arrangement could comprise other devices. For instance, one or more dedicated teleconferencing devices could be provided. Or in some examples one or more of the mobile phones could be positioned in a stand instead of on the surface of the table 305 so as to enable the mobile phone to be used to capture higher quality audio signals.
In the example of Fig. 3 three people 307 are within the sound space. Each of the people 307 provide a sound source 305 within the sound space 211. For example, the people can talk or generate other sounds that can be captured by the plurality of spatial audio capture devices 203.
Fig. 4 shows an example method that can be implemented using an apparatus 101 as shown in Fig. 1. In some examples the apparatus 101 that performs the method can be provided within a central processing device 213. The central processing device 213 could be provided within a network 207 that enables communication between the spatial audio capture devices 203 and one or more playback devices 209. In other examples the apparatus 101 that performs the method can be provided within one of the spatial audio capture devices 203. In such examples one of the spatial audio capture devices 203 can be configured to act as the central processing device 213.
The method of Fig. 4 comprises, at bock 401, obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices 203. The plurality of different spatial audio capture devices 203 are aligned to a set of reference coordinates and timeline.
In examples of the disclosure the plurality of spatial audio capture devices 203 are located within the same sound space 211 so that the plurality of sets of audio signals captured by the plurality of different spatial audio capture devices 203 represent the same sound space 211. However, as the different spatial audio capture devices 203 are at different positions and/or orientations within the sound space 211 they capture slightly different sound scenes within the sound space 211. This means that the directions between the sound sources 205 and the spatial audio capture devices 203 will be different for each of the spatial audio capture devices 203.
In some examples the apparatus 101 can be configured to obtain the plurality of sets of audio signals from the plurality of different spatial audio capture devices 203 via a low power radio frequency network or any other suitable wired or wireless connections. For instance, where the apparatus 101 is provided within one of the spatial audio capture devices 203 then apparatus 101 can control the spatial audio capture device 203 to act as a master device and obtain the audio signals from the other spatial audio capture devices 203 in the sound space 211. In other examples the apparatus 101 can be provided within a central processing device 213 that is not located within the sound space 211. In such examples, the audio signals can be provided to the apparatus 101 via a communications network 207 or any other suitable means.
The apparatus 101 can be configured to receive the plurality of sets of audio signals with a high bit rate and accuracy. In some examples the apparatus 101 can receive the plurality of sets of audio signals using an IVAS (Immersive Voice and Audio Services) codec at high bit rate.
The plurality of different spatial audio capture devices 203 are aligned to the same set of reference coordinates and timeline so as to ensure that the plurality of sets of audio signals, and the corresponding sets of spatial metadata, are in the same domain. This enables spatial parameter filtering.
At block 403 the method comprises obtaining a plurality of sets of spatial metadata corresponding to the different spatial audio capture devices. A set of spatial metadata corresponds to a set of spatial audio signals in that the spatial metadata provides information that indicates how to spatially reproduce the audio signals. A set of spatial metadata can be obtained by each of the plurality of spatial audio capture devices 203 that captures a set of audio signals.
The plurality of sets of spatial metadata can obtained by processing the plurality of sets of audio signals captured by the plurality of different spatial audio capture devices 203 to determine parameters such as a sound direction parameter, and an energy ratio parameter for respective frequency bands.
In some examples the spatial audio capture device 203 can process the audio signals to determine the set of spatial metadata before it is provided to the central processing device 213. In such examples the spatial metadata can be provided to the central processing device 213 with the set of audio signals. The spatial metadata can be sent using the IVAS MASA (Metadata Assisted Spatial Audio) spatial metadata standard interface format or any other suitable protocol.
In other examples the central processing device 213 can perform the processing to determine the sets of spatial metadata from the obtained sets of audio signals. In such examples the central processing device can receive the sets of audio signals from the spatial audio capture devices 203 and can then process the audio signals to obtain the spatial metadata.
At block 405 the method comprises processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata. The processing can combine the plurality of sets of spatial metadata into a single set that has a higher quality than the original obtained sets of spatial metadata. For example, the single set of spatial metadata can have a higher reliability.
The high-quality set of spatial metadata has a quality of data that is sufficient to provide good and stable spatial audio in which the sound sources remain in the correct position instead of fluctuating between positions as can be the case with low-quality spatial metadata or the original sets of spatial metadata. The high-quality spatial metadata also ensures that the sound sources appear as point sources and do not spread unnecessarily wide.
In some examples the high-quality spatial metadata can also have less noise than the original sets of spatial metadata. Having a reduced noise level can make the parameterization of the spatial metadata easier and provide more stable sound scenes.
Any suitable processes can be used to process the plurality of sets of spatial metadata into a high-quality set of spatial metadata. In some examples processing the plurality of sets of spatial metadata can comprise determining a reliable set of spatial metadata from the obtained sets of spatial metadata and then filtering the other sets of spatial metadata with the reliable set of spatial metadata This enables a more stable and reliable set of spatial metadata to be obtained.
Any suitable process can be used to determine the reliable set of spatial metadata. In some examples the process can determine the most reliable set of spatial metadata from the available sets of spatial metadata. In some examples the most reliable set of spatial metadata can be determined by determining a master set of audio signals and using the set of spatial metadata from the spatial audio capture device from which the master set of audio signals is obtained as the reliable set of spatial metadata. The master set of audio signals can be the audio signals that are determined to have the highest quality, the audio signals captured by the spatial audio capture device 203 closest to the sound source 205 or any other suitable set of audio signals.
The most reliable set of spatial metadata can change over time. For instance, the spatial audio capture devices 203 could be moved relative to the sound sources 205 which can affect the relative qualities of the spatial audio signals and the corresponding spatial metadata. In some examples the users of the spatial audio capture devices 203 could pick up and handle one or more of the spatial audio capture devices 203.
For instance, a user could pick up their mobile phone 303 which is being used as a spatial audio capture device 203. In such examples different sets of spatial metadata can be used as the most reliable set of spatial metadata for different periods of time.
In some examples processing the plurality of sets of spatial metadata comprise aligning them to a common direction frame of reference. For instance, if the spatial audio capture devices 203 are aligned in different orientations relative to the sound sources 205 then they can be aligned to correct for this. Aligning the plurality of sets of spatial metadata to a common direction frame of reference can comprise correlating the directions of arrival of sound parameters within the plurality of sets of spatial metadata.
In some examples positioning information from one or more of the spatial audio capture devices 203 can be used to assist with the alignment of the common frames of reference. For example, devices such as mobile phones 303 can comprise orientation sensors such as accelerometers that can be used to determine the orientation of the spatial audio capture devices 203. Sensors such as accelerometers could also be used to determine if a spatial audio capture device 203 is moving, for example it can determine a change in motion caused by a user picking a mobile phone 303 or a laptop 301 up from a table.
At block 407 the method comprises enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices 203. Enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals can comprise sending the high-quality set of spatial metadata and the spatial audio signals to one or more playback devices 209. The high-quality set of spatial metadata can be quantized and encoded with the spatial audio signals and transmitted via any suitable communication network 207. In some examples the encoded high-quality set of spatial metadata and the spatial audio signals can be stored in one or more memories and then retrieved for playback at a later time. The encoded bit stream has a higher quality and/or bit rate can be reduced because the high-quality spatial metadata has improved quality compared to the plurality of sets of spatial metadata.
In some examples the spatial audio signal that is used for rendering can be obtained by processing or combining the plurality of sets of spatial audio signals obtained by the plurality of spatial audio capture devices 203. For example, beam forming or other suitable processing can be applied to the sets of spatial audio signals. In some examples the spatial audio signal that is used for rendering could be one of the selected sets of spatial audio signals. For instance, the set of spatial audio signals with the highest signal to noise ratio could be selected and used as spatial audio signals.
Fig. 5 shows modules that can be provided within apparatus 101 implementing examples of the disclosure. In this example the apparatus 101 can be implemented within a central processing device 213. The central processing device 213 can be a dedicated processing device such as teleconferencing server or other suitable device. In some examples the central processing device 213 can be located in the same location as the plurality of spatial audio capture devices 203. In some examples the central processing device 213 can be in a different location and can be configured to communicate with the plurality of spatial audio capture devices and the playback devices 209 via a communications network 207. It is to be appreciated that the apparatus 101 and the corresponding modules could be implemented in other devices in other examples of the disclosure.
In the example of Fig. 5 a plurality of spatial audio capture devices 203 are provided.
The spatial audio capture devices 203 are located close to each other so that they capture the same sound space 211. For example, each of the spatial audio capture devices 203 can be located within the same room.
Each of the spatial audio capture devices 203 captures a set of spatial audio signals and obtains a corresponding set of spatial metadata.
The set of audio signals and corresponding set of spatial metadata are then encoded into an IVAS bitstream 501 and transmitted to the central processing device 213.
The plurality of IVAS bitstreams 501 are received by an IVAS decoder module 503. The IVAS decoder module 503 decodes the received plurality of IVAS bitstreams 501 into the separate sets of audio signals 505 and sets of spatial metadata 507.
The separate sets of audio signals 505 are provided as an input to an audio processing module 509. The audio processing module 509 can be configured to process the separate sets of audio signals 505 to provide an improved or optimised set of audio signals 511. In some examples the audio processing module 509 processes the separate sets of audio signals 505 by determining the set of audio signals with the highest quality and then selecting this as the optimised set of audio signals. The set of audio signals with the highest quality could be the set with the highest signal to noise ratio, or any other suitable parameter.
In some examples the audio processing module 509 can be configured to process the plurality of sets of audio signals 505 into a combined set of audio signals. For example, the audio processing module 509 could perform beam forming or any other suitable process to form a single improved or optimal set of audio signals 511.
The improved or optimal set of audio signals 511 is then provided an encoder module 521. The encoder module 521 can be an IVAS encoder or any other suitable type of encoder.
The plurality of sets of spatial metadata 507 are provided to an alignment module 513.
The alignment module 513 is configured to align the plurality of sets of spatial metadata 507 to the same direction frame of reference. In some examples the alignment module 513 is configured to select a master set of spatial metadata from the plurality of sets of spatial metadata 507 and then align the remaining sets of spatial metadata to the master set. In some examples the master set of spatial metadata could be selected as the set of spatial metadata that has the highest reliability or based on any other suitable factor.
The alignment module 513 provides a plurality of aligned sets of spatial metadata 515 as an output. The aligned sets of spatial metadata 515 have a common frame of reference.
The plurality of set of aligned sets of spatial metadata 515 are provided to a filtering module 517. The filtering module 517 is configured to determine the most reliable set of spatial metadata 515 and then use the most reliable set of spatial metadata 515 to filter the other sets of spatial metadata.
The filtering module 517 can be configured to perform any suitable type of filtering to provide a single set of spatial metadata 519 as an output. The filtering could comprise mean or median filtering or any other suitable type of filtering.
The filtering module 517 provides a single set of spatial metadata 519 as an output. The single set of spatial metadata 519 has a higher quality compared to the sets of spatial metadata 507 provide by the decoder module 503.
The single set of spatial metadata 519 is provided to a quantization module 523 that is configured to quantize the parameters within the single set of spatial metadata 519. The quantized spatial metadata is then encoded with the improved or optimal set of audio signals 511 so that an encoded bitstream 525 is provided as an output. The encoder module 521 can be an IVAS encoder or any other suitable type of encoder.
Fig. 6 shows another example method that can be implemented using the apparatus 101 and systems of Figs. 1 to 3. The method of Fig. 6 can be implemented by an apparatus 101 in a central processing device 213 or any other suitable device.
At block 601 the method comprises obtaining a plurality of sets of spatial metadata. The plurality of sets of spatial metadata can be received from each of the plurality of spatial audio capture devices 203 that capture a set of spatial audio signals. That is, each of the spatial audio capture devices 203 can transmit a bitstream comprising the spatial audio signals and the spatial metadata to the central processing device 213. In other examples the spatial audio capture devices 203 can transmit a bitstream comprising just the spatial audio signals and the central processing device 213 can obtain the sets of spatial metadata by processing the received sets of spatial audio signals.
At block 603 one of the sets of spatial metadata is selected as the master set of spatial metadata. In this example the master set of spatial metadata can be selected as the set of spatial metadata corresponding to a master set of audio signals. In other examples the master set of spatial metadata can be set as the set of spatial metadata that has the highest reliability or based on any other suitable factors or parameters.
In some examples the selection of the master set of spatial metadata can be performed in an angular parameter space. The angular parameter space can comprise the elevation and the azimuth. In such examples, for each time frame, the angular values foe each set of spatial metadata are combined within a vector. The vector can be configured such that azimuth values for each time-frequency tile follow the elevation values of the corresponding time-frequency tile. A matrix can then be formed by the juxtaposition of these vectors as distinct columns.
The normalized correlation values is calculated for each pair of vectors, resulting in the correlation matrix C /1 co C = cij 1 where cjj is the normalized correlation between vector i and vector j.
Once the correlation matrix C has been obtained the master set of spatial metadata can be selected by selecting, for each column of the correlation matrix, the non-identical pair (i,j), i#j such that cii is maximum. Pairs that have the same indices are reduced to one pair so that they are not used twice. The master set of spatial metadata can then be selected as the set with the index that appears the most times in the pairs.
For instance, if four sets of spatial metadata are provided and the covariance matrix has the values: 1 2 3 4 1 1.0000 0.9978 0.9728 0.9982 2 0.9978 1.0000 0.9852 0.9974 3 0.9728 0.9852 1.0000 0.9779 4 0.9982 0.9974 0.9779 1.0000 then the resulting pairs are: (1,4), (2,1), (3,2), (4,1). The pairs (1,4) and (4,1) have the same indices and so are reduced to one pair (1,4). This leaves only the first three pairs to be used for selecting the master set of spatial parameters. In these pairs the index 2 appears the most times. Therefore, the set of spatial metadata corresponding to index 2 is selected as master set.
At block 605 all of the other sets of spatial metadata are correlated against the master set of spatial metadata.
At block 607 the optimal rotation for the plurality of sets of spatial metadata is determined. The optimal rotation can comprise aligning the sets of spatial metadata with the master set of spatial metadata.
In the above example, the rotation angle for each of the secondary sets of spatial metadata 1, 3, and 4 are calculated by averaging the elevation difference and the azimuth difference between the secondary set of spatial metadata and the master set of spatial metadata.
In some examples the rotation angles are calculated only for the sets of spatial metadata that have a correlation with the master set of spatial metadata that is above a given threshold. These sets of spatial metadata can be considered to be viable secondary sets of spatial metadata. The sets of spatial metadata that have a correlation below a given threshold can be discarded.
At block 609 the direction of arrival parameters within the sets of spatial metadata are rotated according to the optimal rotation. It is to be appreciated that the optimal rotation can vary over time. For instance, in some examples some of the plurality of spatial audio capture devices 203 can be moved while the audio signals are being captured.
In such examples, the rotation of the sets of spatial metadata can be updated to a new optimal rotation.
At block 611 the sets of spatial metadata are filtered to obtain a high-quality set of spatial metadata. The sets of spatial metadata can be filtered using the most reliable set of spatial metadata.
Any suitable type of filtering can be used for filtering the sets of spatial metadata. In some examples the filtering could comprise a smoothing filter such as median filtering or outlier detection or any other suitable process. In examples of the disclosure there is more spatial metadata available to be filtered and so such filtering processes provide a high-quality set of spatial metadata.
In some examples the energy ratios of the sets of spatial metadata can be taken into consideration during the filtering process. For instance, a weighted average of the energy ratio values can be calculated after the outliers have been eliminated through filtering.
The blocks of 605 to 611 can be performed for each time frame. Once blocks 605 to 611 have been completed for a time frame the method can return to block 605 and repeat the process for another time frame.
At block 613 the high-quality set of spatial metadata and the selected set of audio signals are encoded so as to enable the high-quality set of spatial metadata and the selected set of audio signals to be transmitted via a communication network 207. The high-quality set of spatial metadata and the selected set of audio signals can be encoded using an IVAS encoder or any other suitable process.
In this description the term coupled means operationally coupled. Any number or combination of intervening elements can be provided between components, including no intervening elements.
The recording of data may comprise only temporary recording, or it may comprise permanent recording or it may comprise both temporary recording and permanent recording, Temporary recording implies the recording of data temporarily. This may, for example, occur during sensing or image capture, occur at a dynamic memory, occur at a buffer such as a circular buffer, a register, a cache or similar. Permanent recording implies that the data is in the form of an addressable data structure that is retrievable from an addressable memory space and can therefore be stored and retrieved until deleted or over-written, although long-term storage may or may not occur. The use of the term 'capture' in relation to an image relates to temporary recording of the data of the image. The use of the term 'store' in relation to an image relates to permanent recording of the data of the image.
The above described examples find application as enabling components of: automotive systems; telecommunication systems; electronic systems including consumer electronic products; distributed computing systems; media systems for generating or rendering media content including audio, visual and audio visual content and mixed, mediated, virtual and/or augmented reality; personal systems including personal health systems or personal fitness systems; navigation systems; user interfaces also known as human machine interfaces; networks including cellular, noncellular, and optical networks; ad-hoc networks; the internet; the internet of things; virtualized networks; and related software and services.
The term comprise' is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising Y indicates that X may comprise only one Y or may comprise more than one Y. If it is intended to use 'comprise' with an exclusive meaning then it will be made clear in the context by referring to "comprising only one..." or by using "consisting".
In this description, reference has been made to various examples. The description of features or functions in relation to an example indicates that those features or functions are present in that example. The use of the term 'example' or 'for example' or 'can' or 'may' in the text denotes, whether explicitly stated or not, that such features or functions are present in at least the described example, whether described as an example or not, and that they can be, but are not necessarily, present in some of or all other examples. Thus 'example', 'for example', 'can' or 'may' refers to a particular instance in a class of examples. A property of the instance can be a property of only that instance or a property of the class or a property of a sub-class of the class that includes some but not all of the instances in the class. It is therefore implicitly disclosed that a feature described with reference to one example but not with reference to another example, can where possible be used in that other example as part of a working combination but does not necessarily have to be used in that other example.
Although examples have been described in the preceding paragraphs with reference to various examples, it should be appreciated that modifications to the examples given can be made without departing from the scope of the claims.
Features described in the preceding description may be used in combinations other than the combinations explicitly described above.
Although functions have been described with reference to certain features, those functions may be performable by other features whether described or not.
Although features have been described with reference to certain examples, those features may also be present in other examples whether described or not.
The term 'a' or 'the' is used in this document with an inclusive not an exclusive meaning. That is any reference to X comprising a/the Y indicates that X may comprise only one Y or may comprise more than one Y unless the context clearly indicates the contrary. If it is intended to use 'a' or 'the' with an exclusive meaning then it will be made clear in the context. In some circumstances the use of 'at least one' or 'one or more' may be used to emphasis an inclusive meaning but the absence of these terms should not be taken to infer any exclusive meaning.
The presence of a feature (or combination of features) in a claim is a reference to that feature or (combination of features) itself and also to features that achieve substantially the same technical effect (equivalent features). The equivalent features include, for example, features that are variants and achieve substantially the same result in substantially the same way. The equivalent features include, for example, features that perform substantially the same function, in substantially the same way to achieve substantially the same result.
In this description, reference has been made to various examples using adjectives or adjectival phrases to describe characteristics of the examples. Such a description of a characteristic in relation to an example indicates that the characteristic is present in some examples exactly as described and is present in other examples substantially as described.
Whilst endeavoring in the foregoing specification to draw attention to those features believed to be of importance it should be understood that the Applicant may seek protection via the claims in respect of any patentable feature or combination of features hereinbefore referred to and/or shown in the drawings whether or not emphasis has been placed thereon.
I/we claim:

Claims (20)

  1. CLAIMS1. An apparatus comprising means for; obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
  2. 2. An apparatus as claimed in claim 1 wherein the plurality of different spatial audio capture devices are aligned to a timeline.
  3. 3. An apparatus as claimed in any preceding claim wherein the plurality of sets of audio signals captured by the plurality of different spatial audio capture devices represent the same sound space.
  4. 4. An apparatus as claimed in any preceding claim wherein the plurality of sets of spatial metadata are obtained by processing the plurality of sets of audio signals captured by the plurality of different spatial audio capture devices.
  5. 5. An apparatus as claimed in any preceding claim wherein processing the plurality of sets of spatial metadata comprises determining a reliable set of spatial metadata and filtering the other sets of spatial metadata with the reliable set of spatial metadata
  6. 6. An apparatus as claimed in claim 5 wherein determining a reliable set of spatial metadata comprises selecting a master set of audio signals from one of the plurality of spatial audio capture devices and using the set of spatial metadata from the spatial audio capture device from which the master set of audio signals is obtained as the reliable set of spatial metadata.
  7. 7. An apparatus as claimed in any of claims 5 to 6 wherein the reliability of the sets of spatial metadata changes over time.
  8. 8. An apparatus as claimed in any preceding claim wherein processing the plurality of sets of spatial metadata comprises aligning them to a common direction frame of reference.
  9. 9. An apparatus as claimed in claim 8 wherein aligning the plurality of sets of spatial metadata to a common direction frame of reference comprises correlating the directions of arrival of sound parameters within the plurality of sets of spatial metadata.
  10. 10. An apparatus as claimed in any preceding claim wherein the high-quality set of spatial metadata has a higher reliability than the sets of spatial metadata corresponding to the different spatial audio signals.
  11. 11. An apparatus as claimed in any preceding claim wherein the spatial audio signals that are used for rendering are selected from the plurality of sets of spatial audio signals captured by the plurality of different spatial audio capture devices.
  12. 12. An apparatus as claimed in any of claims 1 to 10 wherein the audio signals that are used for rendering are obtained by processing one or more of the captured audio signals from the plurality of audio devices.
  13. 13. An apparatus as claimed in any preceding claim comprising means for enabling the high-quality spatial metadata and the audio signals to be transmitted to another apparatus.
  14. 14. An apparatus as claimed in any preceding claim comprising means for obtaining one or more audio signals from one or more microphones in addition to the audio signals captured by the plurality of spatial audio capture devices.
  15. 15. An apparatus as claimed in any preceding claim wherein the apparatus is comprised within at least one of the spatial audio capture devices.
  16. 16. An apparatus as claimed in any preceding claim wherein the apparatus is comprised within a processing device separate to the plurality of spatial audio capture devices.
  17. 17. An apparatus as claimed in any preceding claim wherein the spatial metadata comprises, for one or more frequency sub-bands; a sound direction parameter, and an energy ratio parameter.
  18. 18. A method comprising: obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
  19. 19. A method as claimed in claim 18 wherein the plurality of different spatial audio capture devices are aligned to a timeline.
  20. 20. A computer program comprising computer program instructions that, when executed by processing circuitry, cause: obtaining a plurality of sets of audio signals where different sets of audio signals are captured by a plurality of different spatial audio capture devices wherein the plurality of different spatial audio capture devices are aligned to a set of reference 30 coordinates; obtaining a plurality of sets of spatial metadata corresponding to the different sets of audio signals; processing the plurality of sets of spatial metadata into a high-quality set of spatial metadata; and enabling the high-quality set of spatial metadata to be used for rendering the spatial audio signals captured by the plurality of different spatial audio capture devices.
GB2006944.9A 2020-05-12 2020-05-12 Capturing and enabling rendering of spatial audio signals Withdrawn GB2594942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2006944.9A GB2594942A (en) 2020-05-12 2020-05-12 Capturing and enabling rendering of spatial audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2006944.9A GB2594942A (en) 2020-05-12 2020-05-12 Capturing and enabling rendering of spatial audio signals

Publications (2)

Publication Number Publication Date
GB202006944D0 GB202006944D0 (en) 2020-06-24
GB2594942A true GB2594942A (en) 2021-11-17

Family

ID=71134835

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2006944.9A Withdrawn GB2594942A (en) 2020-05-12 2020-05-12 Capturing and enabling rendering of spatial audio signals

Country Status (1)

Country Link
GB (1) GB2594942A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2423702A1 (en) * 2010-08-27 2012-02-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for resolving ambiguity from a direction of arrival estimate
WO2018234628A1 (en) * 2017-06-23 2018-12-27 Nokia Technologies Oy Audio distance estimation for spatial audio processing
US20200053457A1 (en) * 2016-04-22 2020-02-13 Nokia Technologies Oy Merging Audio Signals with Spatial Metadata

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2423702A1 (en) * 2010-08-27 2012-02-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for resolving ambiguity from a direction of arrival estimate
US20200053457A1 (en) * 2016-04-22 2020-02-13 Nokia Technologies Oy Merging Audio Signals with Spatial Metadata
WO2018234628A1 (en) * 2017-06-23 2018-12-27 Nokia Technologies Oy Audio distance estimation for spatial audio processing

Also Published As

Publication number Publication date
GB202006944D0 (en) 2020-06-24

Similar Documents

Publication Publication Date Title
CA2784862C (en) An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal
US20140233917A1 (en) Video analysis assisted generation of multi-channel audio data
WO2021186107A1 (en) Encoding reverberator parameters from virtual or physical scene geometry and desired reverberation characteristics and rendering using these
US11140507B2 (en) Rendering of spatial audio content
US20230096873A1 (en) Apparatus, methods and computer programs for enabling reproduction of spatial audio signals
JP2013093840A (en) Apparatus and method for generating stereoscopic data in portable terminal, and electronic device
US11575988B2 (en) Apparatus, method and computer program for obtaining audio signals
TWI819344B (en) Audio signal rendering method, apparatus, device and computer readable storage medium
CN112567763A (en) Apparatus, method and computer program for audio signal processing
GB2549922A (en) Apparatus, methods and computer computer programs for encoding and decoding audio signals
CN115335900A (en) Transforming panoramical acoustic coefficients using an adaptive network
GB2594942A (en) Capturing and enabling rendering of spatial audio signals
TWI773286B (en) Bit allocating method and apparatus for audio signal
CN114531425A (en) Processing method and processing device
WO2021214380A1 (en) Apparatus, methods and computer programs for enabling rendering of spatial audio signals
EP4148728A1 (en) Apparatus, methods and computer programs for repositioning spatial audio streams
EP4164256A1 (en) Apparatus, methods and computer programs for processing spatial audio
CN111508507B (en) Audio signal processing method and device
US20240107225A1 (en) Privacy protection in spatial audio capture
EP4379506A1 (en) Audio zooming
CN117636928A (en) Pickup device and related audio enhancement method
GB2607934A (en) Apparatus, methods and computer programs for obtaining spatial metadata
WO2024115062A1 (en) Apparatus, methods and computer programs for spatial audio processing
GB2605611A (en) Apparatus, methods and computer programs for providing spatial audio content

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)