WO2018197747A1 - Traitement spatial de signal audio - Google Patents

Traitement spatial de signal audio Download PDF

Info

Publication number
WO2018197747A1
WO2018197747A1 PCT/FI2018/050288 FI2018050288W WO2018197747A1 WO 2018197747 A1 WO2018197747 A1 WO 2018197747A1 FI 2018050288 W FI2018050288 W FI 2018050288W WO 2018197747 A1 WO2018197747 A1 WO 2018197747A1
Authority
WO
WIPO (PCT)
Prior art keywords
channels
audio signal
allocation
multiple output
frequency sub
Prior art date
Application number
PCT/FI2018/050288
Other languages
English (en)
Inventor
Antti Eronen
Jussi LEPPÄNEN
Arto Lehtiniemi
Tapani PIHLAJAKUJA
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2018197747A1 publication Critical patent/WO2018197747A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/07Synergistic effects of band splitting and sub-band processing

Definitions

  • Embodiments of the present invention relate to spatial audio processing.
  • embodiments relate to providing a sound object with spatial extent.
  • BACKGROUND Audio content may or may not be a part of other content.
  • multimedia content comprises a visual content and an audio content.
  • the visual content and/or the audio content may be perceived live or they may be recorded and rendered.
  • the visual content is observed by a user via a see-through display while another part of the visual content is displayed on the see-through display.
  • the audio content may be live or it may be rendered to a user.
  • the visual content and the audio content are both rendered. It may in some circumstances be desirable to control how a user perceives audio content.
  • a method comprising: allocating frequency sub-channels of an input audio signal to multiple output audio channels, each output audio channel for rendering at a location within a sound space; and automatically changing an allocation of frequency sub-channels of the input audio signal to multiple output audio channels.
  • an apparatus comprising: means for allocating frequency sub-channels of an input audio signal to multiple output audio channels, each output audio channel for rendering at a location within a sound space; and means for automatically changing an allocation of frequency subchannels of the input audio signal to multiple output audio channels.
  • a computer program than when run on a processor enables: allocating frequency subchannels of an input audio signal to multiple output audio channels, each output audio channel for rendering at a location within a sound space; and automatically changing an allocation of frequency sub-channels of the input audio signal to multiple output audio channels.
  • an apparatus comprising: at least one processor; and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: allocating frequency subchannels of an input audio signal to multiple output audio channels, each output audio channel for rendering at a location within a sound space; and automatically changing an allocation of frequency sub-channels of the input audio signal to multiple output audio channels.
  • Figs 1 A to 1 D illustrates examples of a sound space comprising one or more sound objects
  • Figs. 2A to 2D illustrate examples of a recorded visual scene that respectively correspond with the sound space illustrated in Figs 1A to 1 D;
  • Fig 3A illustrates an example of a controller and Fig 3B illustrates an example of a computer program
  • Fig 4 illustrates an example of a spatial audio processing system comprising a spectral allocation module and a spatial allocation module
  • Fig 5 illustrates an example of a method
  • Fig 6 illustrates an example of a system
  • Fig 7 illustrates an example of a method
  • Fig 8A illustrates an example of a power spectral density function for an input audio signal
  • each of Figs 8B to 8E illustrates an example of a power spectral density function for each one of the allocated frequency sub-channels of the input audio signal
  • Fig 9 illustrates an example of a method for controlling rendering of spatial audio and, in particular, controlling rendering of a sound object that has a changing spatial extent, for example width.
  • spatial audio rendering may be used to render sound sources as sound objects at particular positions within a sound space.
  • Automatic or user controlled editing of a sound space may occur by, for example, repositioning one or more sound objects or by changing sound characteristics of the sound objects such as a perceived lateral and/or vertical extent of the sound source.
  • Fig 1A illustrates an example of a sound space 10 comprising a sound object 12 within the sound space 10.
  • the sound object 12 may be a sound object as recorded or it may be a sound object as rendered. It is possible, for example using spatial audio processing, to modify a sound object 12, for example to change its sound or positional characteristics. For example, a sound object can be modified to have a greater volume, to change its position within the sound space 10 (Figs 1 B & 1 C) and/or to change its spatial extent within the sound space 10 (Fig 1 D)
  • Fig 1 B illustrates the sound space 10 before movement of the sound object 12 in the sound space 10.
  • Fig 1 C illustrates the same sound space 10 after movement of the sound object 12.
  • the sound object 12 may be a sound object as recorded and be positioned at the same position as a sound source of the sound object or it may be positioned independently of the sound source.
  • the position of a sound source may be tracked to render the sound object at the position of the sound source. This may be achieved, for example, when recording by placing a positioning tag on the sound source. The position and the position changes of the sound source can then be recorded. The positions of the sound source may then be used to control a position of the sound object 12. This may be particularly suitable where an up-close microphone such as a boom microphone or a Lavalier microphone is used to record the sound source.
  • an up-close microphone such as a boom microphone or a Lavalier microphone is used to record the sound source.
  • the position of the sound source within the visual scene may be determined during recording of the sound source by using spatially diverse sound recording.
  • An example of spatially diverse sound recording is using a microphone array.
  • the phase differences between the sound recorded at the different, spatially diverse microphones provides information that may be used to position the sound source using a beam forming equation.
  • time-difference-of-arrival (TDOA) based methods for sound source localization may be used.
  • the positions of the sound source may also be determined by post-production annotation.
  • positions of sound sources may be determined using Bluetooth-based indoor positioning techniques, or visual analysis techniques, a radar, or any suitable automatic position tracking mechanism.
  • Fig 1 D illustrates a sound space 10 after extension of the sound object 12 in the sound space 10.
  • the sound space 10 of Fig. 1 D differs from the sound space 10 of Fig. 1 C in that the spatial extent of the sound object 12 has been increased so that the sound object has a greater breadth (greater width).
  • a visual scene 20 may be rendered to a user that corresponds with the rendered sound space 10.
  • the visual scene 20 may be the scene recorded at the same time the sound source that creates the sound object 12 is recorded.
  • Fig. 2A illustrates an example of a visual scene 20 that corresponds with the sound space 10.
  • Correspondence in this sense means that there is a one-to-one mapping between the sound space 10 and the visual scene 20 such that a position in the sound space 10 has a corresponding position in the visual scene 20 and a position in the visual scene 20 has a corresponding position in the sound space 10.
  • the coordinate system of the sound space 10 and the coordinate system of the visual scene 20 are in register such that an object is positioned as a sound object in the sound space and as a visual object in the visual scene at the same common position from the perspective of a user.
  • the sound space 10 and the visual scene 20 may be three-dimensional.
  • a portion of the visual scene 20 is associated with a position of visual content representing a sound source 22 within the visual scene 20.
  • the position of the sound source 22 in the visual scene 20 corresponds with a position of the sound object 12 within the sound space 10.
  • the sound source 22 is an active sound source producing sound that is or can be heard by a user, for example via rendering or live, while the user is viewing the visual scene via the display 200.
  • parts of the visual scene 20 are viewed through the display 200 (which would then need to be a see-through display).
  • the visual scene 20 is rendered by the display 200.
  • the display 200 is a see-through display and at least parts of the visual scene 20 is a real, live scene viewed through the see-through display 200.
  • the sound source 22 may be a live sound source or it may be a sound source that is rendered to the user.
  • This augmented reality implementation may, for example, be used for capturing an image or images of the visual scene 20 as a photograph or a video.
  • the visual scene 20 may be rendered to a user via the display 200, for example, at a location remote from where the visual scene 20 was recorded.
  • This situation is similar to the situation commonly experienced when reviewing images via a television screen, a computer screen or a mediated/virtual/augmented reality headset.
  • the visual scene 20 is a rendered visual scene.
  • the active sound source 22 produces rendered sound, unless it has been muted.
  • This implementation may be particularly useful for editing a sound space by, for example, modifying characteristics of sound sources and/or moving sound sources within the visual scene 20.
  • Fig 2B illustrates a visual scene 20 corresponding to the sound space 10 of Fig 1 B, before movement of the sound source 22 in the visual scene 20.
  • Fig 2C illustrates the same visual scene 20 corresponding to the sound space 10 of Fig 1 C, after movement of the sound source 22.
  • Fig 2D illustrates the visual scene 20 after extension of the sound object 12 in the corresponding sound space 10. While the sound space 10 of Fig. 1 D differs from the sound space 10 of Fig. 1 C in that the spatial extent of the sound object 12 has been increased so that the sound object has a greater breadth, the visual scene 20 is not necessarily changed.
  • the above described methods may be performed using a controller.
  • An example of a controller 400 is illustrated in Fig 3A.
  • Implementation of the controller 300 may be as controller circuitry.
  • the controller 300 may be implemented in hardware alone, have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
  • controller 300 may be implemented using instructions that enable hardware functionality, for example, by using executable instructions of a computer program 306 in a general-purpose or special-purpose processor 302 that may be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 302.
  • a general-purpose or special-purpose processor 302 may be stored on a computer readable storage medium (disk, memory etc.) to be executed by such a processor 302.
  • the processor 302 is configured to read from and write to the memory 304.
  • the processor 302 may also comprise an output interface via which data and/or commands are output by the processor 302 and an input interface via which data and/or commands are input to the processor 302.
  • the memory 304 stores a computer program 306 comprising computer program instructions (computer program code) that controls the operation of the apparatus 300 when loaded into the processor 302.
  • the computer program instructions, of the computer program 306, provide the logic and routines that enables the apparatus to perform the methods illustrated in the figures.
  • the processor 302 by reading the memory 304 is able to load and execute the computer program 306.
  • the controller 300 may be part of an apparatus or system 320.
  • the apparatus or system 320 may comprise one or more peripheral components 312.
  • the display 200 is a peripheral component.
  • peripheral components may include: an audio output device or interface for rendering or enabling rendering of the sound space 10 to the user; a user input device for enabling a user to control one or more parameters of the method; a positioning system for positioning a sound source; an audio input device such as a microphone or microphone array for recording a sound source; an image input device such as a camera or plurality of cameras.
  • the apparatus or system 320 may be comprised in a headset for providing mediated reality.
  • the controller 300 may be configured as a sound rendering engine that is configured to control characteristics of a sound object 12 defined by sound content.
  • the rendering engine may be configured to control the volume of the sound content, a position of the sound object 12 for the sound content within the sound space 10, a spatial extent of new sound object 12 for the sound content within the sound space 10, and other characteristics of the sound content such as, for example, tone or pitch or spectrum or reverberation etc.
  • the sound object may, for example, be rendered via an audio output device or interface.
  • the sound content may be received by the controller 300.
  • the sound rendering engine may, for example comprise a spatial audio processing system 50 that is configured to control the position and/or extent of a sound object 12 within a sound space 10.
  • Fig 4 illustrates an example of a spatial audio processing system 50 comprising a spectral allocation module 70 and a spatial allocation module 72.
  • the spectral allocation module 70 takes frequency sub-channels 51 of a received input audio signal 1 13 and allocates them to multiple spatial audio channels 52 as allocated frequency sub-channels 53.
  • the input audio signal 1 13 comprises a monophonic source signal and comprises, is accompanied with or is associated with one or more spatial processing parameters defining a position and/or spatial extent of the sound source that will render the monophonic source signal.
  • Each spatial audio channel is for rendering at a different location within a sound space.
  • the spatial allocation module 72 achieves the correct spatial rendering of the spatial audio channels 52 by controlled mixing 74 of the different spatial audio channels 52 across different audio device channels 76 that are rendered by different audio output devices.
  • the sound space 10 may be considered to be a collection of spatial audio channels 52 where each spatial audio channel 52 is a different direction.
  • the collection of spatial audio channels may be globally defined for all sound objects 12. In other examples, the collection of spatial audio channels may be locally defined for each sound object 12.
  • the collection of spatial audio channels may be fixed or may vary dynamically with time.
  • each spatial audio channel may be rendered as a single rendered sound source using amplitude panning signals 54, for example, using Vector Base Amplitude Panning (VBAP).
  • VBAP Vector Base Amplitude Panning
  • the direction of the spatial audio channel S nm may be represented by the couplet of polar angle ⁇ ⁇ and azimuthal angle ⁇ t> m .
  • a sound object 12 at position z may be associated with the spatial audio channel S nm that is closest to Arg(z).
  • a sound object 12 is associated with a spatial audio channel S nm then it is rendered as a point source.
  • a sound object 12 may however have spatial extent and be associated with a plurality of spatial audio channels.
  • a sound object 12 may be simultaneously rendered in a set of spatial audio channels ⁇ S ⁇ defined by Arg(z) and a spatial extent of the sound object 12.
  • That set of spatial audio channels ⁇ S ⁇ may, for example, include the set of spatial audio channels Sn m' for each value of n' between n-5 n and ⁇ + ⁇ ⁇ and of m' between n-5 m and n+5 m , where n and m define the spatial audio channel closest to Arg(z) and ⁇ ⁇ and 5 m define in combination a spatial extent of the sound object 12.
  • ⁇ ⁇ defines a spatial extent in a polar direction and the value of 5 m defines a spatial extent in an azimuthal direction.
  • the number of spatial audio channels and their spatial relationship in the set of spatial audio channels ⁇ S ⁇ , allocated by the spatial allocation module 72 is dependent upon the desired spatial extent of the sound object 12.
  • a single sound object 12 may be simultaneously rendered in a set of spatial audio channels ⁇ S ⁇ by decomposing the audio signal representing the sound object 12 into multiple different frequency sub-channels 51 and allocating each frequency sub-channel 51 to one of multiple spectrally-limited audio signals 53.
  • Each of the multiple spectrally-limited audio signals 53 may have one or more frequency sub- channels 51 allocated to it (as an allocated frequency sub-channel). Each frequency sub- channel 51 may be allocated to only one spectrally-limited audio signal 53 (as an allocated frequency sub-channel).
  • Each spectrally-limited audio signals 53 is allocated into the set of spatial audio channels ⁇ S ⁇ 52.
  • each spectrally-limited audio signal 53 is allocated to one spatial audio channel
  • each spatial audio channel 52 and each spatial audio channel 52 comprises only one spectrally-limited audio signal 53, that is, there is a one-to-one mapping between the spectrally-limited audio signals and the spatial audio channels at the interface between the spectral allocation module 70 and the spatial allocation module 72.
  • each spectrally-limited audio signal may be rendered as a single sound source using amplitude panning by the spatial allocation module 72. For example, if the set of spatial audio channels ⁇ S ⁇ comprised X channels, the audio signal 1 13 representing the sound object 12 would be separated into X different spectrally-limited audio signals 53 in different non-overlapping frequency bands each frequency band comprising one or more different frequency sub-channels 51 that may be contiguous and/or non-contiguous.
  • This may be achieved using a filter bank comprising a selective band pass limited filter for each spectrally- limited audio signal 53/spatial audio channel or, as illustrated in Fig 4, by using digital signal processing to distribute time-frequency bins to different spectrally-limited audio signals 53/spatial audio channels 52.
  • Each of the X different spectrally-limited audio signals 53 in different non-overlapping frequency bands would be provided to only one of the set of spatial audio channels ⁇ S ⁇ .
  • Each of the set of spatial audio channels ⁇ S ⁇ would comprise only one of the X different spectrally-limited audio signals in different non-overlapping frequency bands.
  • a short-term Fourier transform may be used to transform from the time domain to the frequency domain, where selective filtering occurs for each frequency band.
  • the different spectrally-limited audio signals 53 may be created using the same time period or different time periods for each STFT. The different spectrally-limited audio signals
  • An inverse transform 78 will be required to convert from the frequency to the time domain. In some examples, this may occur in the spectral allocation module 70 or the spatial allocation module 72 before mixing. In the example illustrated in Fig 4, the inverse transform 78 occurs for each audio device channel 76, after mixing 74, in the spatial allocation module 72.
  • Which frequency sub-channel 51 is allocated to which spectrally-limited audio signal 53/spatial audio channel 52 in the set of spatial audio channels ⁇ S ⁇ may be controlled by allocation module 60.
  • the allocation may be a quasi-random allocation or may be determined based on a set of predefined rules.
  • the allocation module 60 is a programmable filter bank.
  • the predefined rules may, for example, constrain spatial-separation of spectrally-adjacent frequency sub-channels 51 to be above a threshold value.
  • frequency sub-channels 51 adjacent in frequency may be separated spatially so that they are not spatially adjacent.
  • effective spatial separation of the multiple frequency sub-channels 51 that are adjacent in frequency may be maximized.
  • the predefined rules may additionally or alternatively define how frequency sub-channels 51 are distributed amongst the spectrally-limited audio signals 53/set of spatial audio channels ⁇ S ⁇ 52.
  • a low discrepancy sequence such as a Halton sequence, for example, may be used to quasi-randomly distribute the frequency sub-channels 51 amongst the spectrally-limited audio signals 53/spatial audio channels ⁇ S ⁇ 52.
  • Which frequency sub-channel 51 is allocated to which spectrally-limited audio signal 53/ spatial audio channel 52 in the set of spatial audio channels ⁇ S ⁇ may be dynamically controlled. For example, the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52,53 may be automatically changed.
  • Fig 5 illustrates an example of a method 100 comprising: at block 102, allocating frequency sub-channels 51 of an input audio signal 1 13 to multiple output audio channels 52, 54, each output audio channel 52, 54 for rendering at a location within a sound space; and at block 104, automatically changing an allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52, 54.
  • the method 100 may be performed by the allocation module 60 and/or the controller 300.
  • the method 100 may be used to improve the perceived spatial uniformity of a rendered spatially extended sound. Distinct audio components of the sound cannot, as a consequence, be heard at distinct spatial positions and the sound is heard as a uniform, spatially extended sound.
  • the apparatus or controller 300 may therefore comprises: at least one processor 302; and at least one memory 304 including computer program code the at least one memory 304 and the computer program code configured to, with the at least one processor 302, cause the apparatus 300 at least to perform:
  • Fig 6 illustrates an example of a system 1 10 that is configured to perform an example of the method 100.
  • the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels is dependent upon one or more changes in the input audio signal 1 13.
  • the system 1 10 is configured to automatically detect a sub-optimal allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52, and in response to detecting a sub-optimal allocation, automatically uses a new allocation of frequency sub-channels 51 of the input audio signal to multiple output audio channels 52.
  • the system comprises a spatial extent synthesizer module 1 14 that changes an allocation of frequency sub-channels 51 of the input audio signal to multiple output audio channels to change a spatial extent of a sound object 12.
  • the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels is defined by a distribution 1 19 provided by the distribution generator 1 18.
  • the distribution generator generates the new distribution 1 19 in response to a control signal 1 17 from the analyser module 1 16.
  • the analyser module 1 16 is configured to automatically detect a sub-optimal allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52, and in response to detecting a sub-optimal allocation, automatically controls the distribution generator 1 18 to define a new allocation 1 19 of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52 that is used by spatial extent synthesizer module 1 14 to change the allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels.
  • Each output audio channel is for rendering at a different location within a sound space 10 and thereby changes a spatial extent of a sound object 12 for rendering in the sound space 10.
  • the system 1 10 is configured to change the allocation of frequency subchannels 51 of the input audio signal 1 13 to multiple output audio channels 52 in dependence upon one or more changes in a power spectrum of the input audio signal 1 13.
  • the system 1 10 is configured to change the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52 to reduce deviation from a power spectrum of the input audio signal 1 13. This may prevent the power spectrum of each of the allocated frequency sub-channels 53 from deviating significantly (e.g. more than a threshold value) from a power spectrum of the input audio signal 1 13.
  • the system 1 10 automatically determines a first cost function value for current allocated frequency sub-channels 53 (based on a current allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels 52) and automatically determine a second cost function value for putative allocated frequency subchannels (based on a putative allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels 52), and in response to determining the first cost function value is sufficiently greater than the second cost function value, makes the putative allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels the current allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels. The putative allocated frequency sub-channels therefore become the current allocated frequency sub-channels 53.
  • the distribution generator module 1 18 may generate the putative allocation of frequency subchannels.
  • the analyser module 1 16 may determine the cost function and compare the cost function values making the decision to change the current allocation of frequency subchannels of the input audio signal 1 13.
  • the cost function may compare one or more parameters of the current input signal 1 13 with one or more parameters of each of the different output audio channels 52.
  • the cost function may for example be based on different parameters such as, for example parameters p(f) that vary with frequency f, such as amplitude or power spectral density or be based on cepstral analysis.
  • the cost function may for example be based on different combinations of parameters. It may, for example, comprise a function that averages a parameter over a range of frequencies, such as a moving mean calculation.
  • Determining whether or not to change the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels (spatial audio channels 52) in dependence upon one or more changes in the input audio signal 1 13 may occur automatically as a consequence of detection of a change in the input audio signal 1 13 or may occur automatically, intermittently according to a schedule, for example, it may occur automatically, periodically.
  • determining whether or not to change the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels (spatial audio channels 52) in dependence upon one or more changes in the input audio signal 1 13 may occur automatically as a background process.
  • the second cost function value is determined automatically, for example continuously, intermittently or periodically, for one or more putative allocations of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels, and in response to determining the second cost function value for a particular putative allocation is sufficiently less than the first cost function value and is the lowest of the second cost function values of the putative allocations, making the particular putative allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels the current allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels.
  • determining whether or not to change the allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels (spatial audio channels 52) in dependence upon one or more changes in the input audio signal 1 13 may occur automatically as a reactive process.
  • the first cost function value for a current allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels is determined automatically, for example continuously, intermittently or periodically, and if the first cost function value exceeds a threshold, a new allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels is used.
  • a current allocation of frequency sub-channels 51 of the input audio signal 1 13 to multiple output audio channels 52,54 may be automatically and dynamically adjusted to reduce a cost function value for the current allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels.
  • automatically changing an allocation of frequency sub-channels of the input audio signal 1 13 to multiple output audio channels comprises changing a definition of the frequency sub-channels and/or changing a distribution of frequency-sub channels across output channels.
  • the frequency sub-channels 51 may for example each be defined by a center frequency and a bandwidth.
  • the definition of a frequency sub-channel may be changed by changing its center frequency and/or by changing its bandwidth.
  • the definition of the frequency sub-channel 52 may for example occur subject to certain constrains such as the frequency sub-channels 51 do not overlap and/or that the frequency sub-channels 51 cover in combination certain frequency ranges.
  • a slowly varying part of the spectrum may be covered by fewer, wider frequency sub-channels 51 and more quickly varying part of spectrum may be covered by more, narrower frequency sub-channels 51 .
  • the lower frequency part of the spectrum may be covered by narrower sub-channels 51 and the higher frequency part of the spectrum may be covered by wider frequency sub-channels 51 .
  • the distribution of frequency-sub channels 51 across output channels 52, 54 may be changed by changing the rules used to distribute frequency-sub channels 51 across output channels 52,54.
  • the rules define how the spectrally-limited audio signals 53 are distributed amongst the set of spatial audio channels ⁇ S ⁇ . They may or may not include constraints concerning spatial separation of frequency sub-channels 51 that are adjacent in the frequency spectrum
  • the distribution of frequency-sub channels across output channels may be changed by changing one or more low-discrepancy sequences used for distribution of frequency-sub channels 51 across spatial audio channels 52.
  • a position or direction in the sound space 10 may be represented by one or more values derived from one or more low-discrepancy sequences. For example a point in two dimensions (x,y) (or (
  • a Halton sequence is defined by a base value and by a skip value.
  • a new Halton sequence is a Halton sequence with a new base value and/or a new skip value.
  • a new Halton sequence may additionally or alternatively be created by scrambling a Halton sequence or leaping or changing leaping in a Halton sequence. Scrambling changes the order of a Halton sequence. Leaping results in certain values in the Halton sequence not being used.
  • the distribution of frequency-sub channels 51 across output channels 52 may be changed by changing one or more Halton sequences used for distribution of frequency-sub channels across output channels.
  • the parameters used for sequence generation for example, base, skip, scrambling or leaping may be changed randomly or in a preset manner.
  • Fig 7 illustrates an example of method 400 that changes the allocation of frequency subchannels 51 of the input audio signal 1 13 to multiple output audio channels 52 so that a power spectrum of the frequency sub-channels has less deviation from a power spectrum of the input audio signal 1 13- they have the same overall spectral shape
  • Digital signal processing is used to distribute time-frequency bins to different spatial audio channels.
  • a short-term Fourier transform (STFT) is used to transform from the time domain to the frequency domain.
  • the method 400 comprises, at block 402, filtering the input audio signal 1 13 in the frequency domain; at block 404, automatically determining a first power spectral density function for the input audio signal 1 13 (Fig 8A), then at block 406, performing a running average smoothing overfrequency bins using a sliding window to simplify the first power spectral density function.
  • the method 400 comprises, at block 412, filtering the input audio signal 1 13 in the frequency domain according to a putative allocation of frequency sub-channels; at block 414, automatically determining a second power spectral density function for each of the allocated putative frequency sub-channels (Figs 8B-8E); then at block 416, performing a running average smoothing over frequency bins using a sliding window to simplify the first power spectral density function for each allocated putative frequency sub-channel.
  • the method 400 then comprises, at block 420, comparing the simplified first power spectral density function and the simplified second power spectral density function using a mean square error function.
  • the average mean square error value is compared to the previously stored mean square error (if any).
  • the putative allocation of frequency sub-channels is better than the current allocation of sub-channels, and the putative allocation of frequency sub-channels is used as the current allocation of sub-channels (block 430).
  • the mean square error value is stored in memory for subsequent comparison at block 422 in the next iteration of the method 400.
  • the method 400 may then start again immediately, after a delay or in response to an interrupt. If the average mean square error value is not less than the previous mean square error value, then the putative allocation of frequency sub-channels is not better than the current allocation of sub-channels, and the current allocation of sub-channels is unchanged.
  • a new putative allocation of frequency sub-channels is generated and the method 400 repeats (block 432). In this example, a new Halton sequence(s) is generated.
  • the cost function is based on a mean square error comparison between a putative allocation of frequency sub-channels and the input audio signal 1 13.
  • the cost function value for a particular putative allocation is lower than that for the current allocation, the putative allocation becomes the current allocation.
  • this method 400 is performed for only a single putative allocation of frequency subchannels, it will be recognised that it may be performed in parallel simultaneously for multiple putative allocations of frequency sub-channels.
  • Fig 9 illustrates an example of a method 500 for controlling rendering of spatial audio and in particular controlling rendering of a sound object 12 that has a changing spatial extent, for example width.
  • a first allocation of frequency sub-channels 51 of an input audio signal 1 13 to multiple output audio channels 52, 54 is automatically changed to a second allocation of frequency sub-channels of an input audio signal 1 13 to multiple output audio channels 52,54.
  • the second allocation of frequency sub-channels 51 may not be immediately used and there may be a gradual transition between the first allocation of frequency sub-channels 51 and the second allocation of frequency sub-channels 51 .
  • a first allocation of frequency subchannels 51 is used to render a sound object 12
  • the second allocation of frequency sub-channels 51 is used to render a sound object 12
  • a transitional allocation of frequency sub-channels 51 is used to render the sound object 12.
  • the sound space 10 when the sound space 10 is rendered to a listener through a head- mounted audio output device, for example headphones or a headset using binaural audio coding, it may be desirable for the rendered sound space to remain fixed in space when the listener turns their head in space. This means that the rendered sound space needs to be rotated relative to the audio output device by the same amount in the opposite sense to the head rotation.
  • the orientation of the rendered sound space tracks with the rotation of the listener's head so that the orientation of the rendered sound space remains fixed in space and does not move with the listener's head.
  • the system uses a transfer function to perform a transformation T that rotates the sound objects 12 within the sound space.
  • a head related transfer function (HRTF) interpolator may be used for rendering binaural audio.
  • Vector Base Amplitude Panning (VBAP) may be used for rendering in loudspeaker format (e.g. 5.1 ) audio.
  • the distance of a sound object 12 from an origin at the user may be controlled by using a combination of direct and indirect processing of audio signals representing the sound object 12.
  • the audio signals are passed in parallel through a "direct" path and one or more "indirect” paths before the outputs from the paths are mixed together.
  • the direct path represents audio signals that appear, to a listener, to have been received directly from an audio source
  • an indirect (decorrelated) path represents audio signals that appear to a listener to have been received from an audio source via an indirect path such as a multipath or a reflected path or a refracted path. Modifying the relative gain between the direct path and the indirect paths, changes the perception of the distance D of the sound object 12 from the listener in the rendered sound space 10. Increasing the indirect path gain relative to the direct path gain increases the perception of distance.
  • the decorrelated path may, for example, introduce a pre-delay of at least 2 ms.
  • the spatial audio channels 52 are treated as spectrally distinct sound objects that are then positioned at suitable widths and/or heights and/or distances using known audio reproduction methods.
  • loudspeaker sound reproduction amplitude panning can be used for positioning a spectrally distinct sound object in the width and/or height dimension
  • distance attenuation by gain control and optionally direct to reverberant (indirect) ratio can be used to position spectrally distinct sound objects in the depth dimension.
  • width and/or height dimension is obtained by selecting suitable head related transfer function (HRTF) filters (one for left ear, one for right ear) for each of the spectrally distinct sound objects depending on its position.
  • HRTF head related transfer function
  • a pair of HRTF filters model the path from a point in space to the listener's ears.
  • the HRFT coefficient pairs are stored for all the possible directions of arrival for a sound.
  • distance dimension of a spectrally distinct sound object is controlled by modelling distance attenuation with gain control and optionally direct to reverberant (indirect) ratio.
  • the width of a sound object may be controlled by the spatial allocation module 72.
  • the spatial allocation module 72 achieves the correct spatial rendering of the spatial audio channels 52 by controlled mixing 74 of the different spatial audio channels 52 across different width-separated audio device channels 76 that are rendered by different audio output devices.
  • the height of a sound object may be controlled in the same manner as a width of a sound object.
  • the spatial allocation module 72 achieves the correct spatial rendering of the spatial audio channels 52 by controlled mixing 74 of the different spatial audio channels 52 across different height- separated audio device channels 76 that are rendered by different audio output devices.
  • the depth of a sound object may be controlled in the same manner as a width of a sound object.
  • the spatial allocation module 72 achieves the correct spatial rendering of the spatial audio channels 52 by controlled mixing 74 of the different spatial audio channels 52 across different depth- separated audio device channels 76 that are rendered by different audio output devices. However, if that is not possible, the spatial allocation module 72 may achieve the correct spatial rendering of the spatial audio channels 52 by controlled mixing 74 of the different spatial audio channels 52 across different depth-separated spectrally distinct sound objects at different perception distances by modelling distance attenuation using gain control and optionally direct to reverberant (indirect) ratio.
  • the computer program 306 may arrive at the apparatus 300 via any suitable delivery mechanism 310.
  • the delivery mechanism 310 may be, for example, a non-transitory computer-readable storage medium, a computer program product, a memory device, a record medium such as a compact disc read-only memory (CD-ROM) or digital versatile disc (DVD), an article of manufacture that tangibly embodies the computer program 306.
  • the delivery mechanism may be a signal configured to reliably transfer the computer program 306.
  • the apparatus 300 may propagate or transmit the computer program 306 as a computer data signal.
  • memory 304 is illustrated as a single component circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached storage.
  • processor 302 is illustrated as a single component/circuitry it may be implemented as one or more separate components/circuitry some or all of which may be integrated/removable.
  • the processor 302 may be a single core or multi-core processor.
  • references to 'computer-readable storage medium', 'computer program product', 'tangibly embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should be understood to encompass not only computers having different architectures such as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry.
  • References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed- function device, gate array or programmable logic device etc.
  • circuitry refers to all of the following:
  • circuitry to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • This definition of 'circuitry' applies to all uses of this term in this application, including in any claims.
  • the term "circuitry” would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware.
  • circuitry would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or other network device.
  • the blocks illustrated in the enclosed figures may represent steps in a method and/or sections of code in the computer program 306. The illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block may be varied. Furthermore, it may be possible for some blocks to be omitted.
  • the controller 300 may, for example be a module.
  • 'Module' refers to a unit or apparatus that excludes certain parts/components that would be added by an end manufacturer or a user.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

Selon l'invention, un procédé consiste: à attribuer des sous-canaux de fréquence d'un signal audio d'entrée à de multiples canaux audio de sortie, chaque canal audio de sortie devant être rendu à un emplacement situé à l'intérieur d'un espace sonore; et à modifier automatiquement une attribution de sous-canaux de fréquence du signal audio d'entrée à de multiples canaux audio de sortie.
PCT/FI2018/050288 2017-04-24 2018-04-24 Traitement spatial de signal audio WO2018197747A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1706476.7 2017-04-24
GB1706476.7A GB2561844A (en) 2017-04-24 2017-04-24 Spatial audio processing

Publications (1)

Publication Number Publication Date
WO2018197747A1 true WO2018197747A1 (fr) 2018-11-01

Family

ID=58795609

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2018/050288 WO2018197747A1 (fr) 2017-04-24 2018-04-24 Traitement spatial de signal audio

Country Status (2)

Country Link
GB (1) GB2561844A (fr)
WO (1) WO2018197747A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4005233A1 (fr) 2019-07-30 2022-06-01 Dolby Laboratories Licensing Corporation Lecture audio spatiale adaptable
EP4005234A1 (fr) 2019-07-30 2022-06-01 Dolby Laboratories Licensing Corporation Rendu audio sur de multiples haut-parleurs avec de multiples critères d'activation
US11968268B2 (en) 2019-07-30 2024-04-23 Dolby Laboratories Licensing Corporation Coordination of audio devices
WO2021021857A1 (fr) 2019-07-30 2021-02-04 Dolby Laboratories Licensing Corporation Commande d'annulation d'écho acoustique pour dispositifs audio distribués
CN114391262B (zh) 2019-07-30 2023-10-03 杜比实验室特许公司 跨具有不同回放能力的设备的动态处理
US11659332B2 (en) 2019-07-30 2023-05-23 Dolby Laboratories Licensing Corporation Estimating user location in a system including smart audio devices

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090092259A1 (en) * 2006-05-17 2009-04-09 Creative Technology Ltd Phase-Amplitude 3-D Stereo Encoder and Decoder
US20120207309A1 (en) * 2011-02-16 2012-08-16 Eppolito Aaron M Panning Presets
US20130195276A1 (en) * 2009-12-16 2013-08-01 Pasi Ojala Multi-Channel Audio Processing
EP2830332A2 (fr) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Procédé, unité de traitement de signal et programme informatique permettant de mapper une pluralité de canaux d'entrée d'une configuration de canal d'entrée vers des canaux de sortie d'une configuration de canal de sortie

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023429A1 (en) * 2000-12-20 2003-01-30 Octiv, Inc. Digital signal processing techniques for improving audio clarity and intelligibility
US8284957B2 (en) * 2010-07-12 2012-10-09 Creative Technology Ltd Method and apparatus for stereo enhancement of an audio system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090092259A1 (en) * 2006-05-17 2009-04-09 Creative Technology Ltd Phase-Amplitude 3-D Stereo Encoder and Decoder
US20130195276A1 (en) * 2009-12-16 2013-08-01 Pasi Ojala Multi-Channel Audio Processing
US20120207309A1 (en) * 2011-02-16 2012-08-16 Eppolito Aaron M Panning Presets
EP2830332A2 (fr) * 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Procédé, unité de traitement de signal et programme informatique permettant de mapper une pluralité de canaux d'entrée d'une configuration de canal d'entrée vers des canaux de sortie d'une configuration de canal de sortie

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GALLO E ET AL.: "3D-audio matting, postediting, and rerendering from field recordings", EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 26 February 2007 (2007-02-26), XP055527420, Retrieved from the Internet <URL:http://www-sop.inria.fr/reves/Nicolas.Tsingos/publis/eurasipJASP07.pdf> [retrieved on 20180702] *

Also Published As

Publication number Publication date
GB201706476D0 (en) 2017-06-07
GB2561844A (en) 2018-10-31

Similar Documents

Publication Publication Date Title
US11979733B2 (en) Methods and apparatus for rendering audio objects
WO2018197747A1 (fr) Traitement spatial de signal audio
US10136240B2 (en) Processing audio data to compensate for partial hearing loss or an adverse hearing environment
EP3028476B1 (fr) Panoramique des objets audio pour schémas de haut-parleur arbitraires
WO2018197748A1 (fr) Traitement audio spatial
US11348288B2 (en) Multimedia content
US11337020B2 (en) Controlling rendering of a spatial audio scene
US10375472B2 (en) Determining azimuth and elevation angles from stereo recordings
US20170289724A1 (en) Rendering audio objects in a reproduction environment that includes surround and/or height speakers
US11627427B2 (en) Enabling rendering, for consumption by a user, of spatial audio content
WO2018193163A1 (fr) Amélioration de lecture de haut-parleur à l&#39;aide d&#39;un signal audio traité en étendue spatiale
US11032639B2 (en) Determining azimuth and elevation angles from stereo recordings
RU2803638C2 (ru) Обработка пространственно диффузных или больших звуковых объектов
EP3488623A1 (fr) Regroupement d&#39;objets audio sur la base d&#39;une différence de perception sensible au dispositif de rendu
WO2023131398A1 (fr) Appareil et procédé de mise en œuvre d&#39;un rendu d&#39;objet audio polyvalent
Trevino Lopez et al. Evaluation of different spatial windows for a multi-channel audio interpolation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18791246

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18791246

Country of ref document: EP

Kind code of ref document: A1