EP3706432A1

EP3706432A1 - Processing multiple spatial audio signals which have a spatial overlap

Info

Publication number: EP3706432A1
Application number: EP19160886.8A
Authority: EP
Inventors: Antti Johannes Eronen; Jussi Artturi LEPPÄNEN; Arto Juhani Lehtiniemi; Lasse Juhani Laaksonen
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2020-09-09

Abstract

An apparatus, method and computer program is described comprising: receiving a first spatial audio signal and a second spatial audio signal, wherein the first and second spatial audio signals are focussed in first and second directions respectively; applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; identifying a first region, in which a spatial overlap exists between the first and second directions; obtaining instructions for processing said first and/or second spatial audio signals within said first region; and processing said first and/or second spatial audio signals within said first region in accordance with said instructions.

Description

Field

This specification relates to spatial audio, for example to processing multiple spatial audio signals, wherein a spatial overlap may potentially occur between some of said audio signals.

Background

Systems exist in which multiple audio signals can be captured by user devices, such as mobile phones. However, there remains a need for further developments and improvements in this field.

Summary

In a first aspect, this specification describes an apparatus comprising: means for receiving a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed (e.g. using beamforming) in first and second directions respectively; means for applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; means for identifying a first region, in which a spatial overlap exists between the first and second directions; means for obtaining instructions (e.g. user instructions) for processing said first and/or second spatial audio signals within said first region; and means for processing said first and/or second spatial audio signals within said first region in accordance with said instructions. Example audio effects include reverberation, modulations, filtering, chorus, flanger etc. The said spatial overlap may be a partial spatial overlap.
The first and/or second spatial audio signals within said first region may be processed to provide a modified audio effect (e.g. a user-defined modified audio effect) to one or more of said audio signals when a spatial overlap of said audio signals is identified.
The first direction may have a first spatial width and/or the second direction may have a second spatial width. The first and second spatial widths may be the same or may be different.
The first and/or second direction may have a spectral width, such that the spatial overlap may be a partial overlap.
In some embodiments, the instructions may be pre-set instructions and said means for obtaining instructions may retrieve said pre-set instructions.
Some embodiments further comprise means (such as a video or audio editor) for generating a user interface output related to said first and/or second spatial audio signals within said first region, wherein said means for obtaining instructions receives instructions in response to said user interface output. The user interface output may comprise a time-domain visualisation and/or a frequency-domain visualisation of the first and/or second spatial audio signals within said first region. The said visualisation(s) may be provided to highlight the first region. In one embodiment, the said user interface output may depict potential echoes of at least some of said first and/or second spatial audio signals. This embodiment may further comprise means for receiving a user indication of which of said potential echoes are related to the first and/or the second spatial audio signals.
The means for identifying the first region may comprise one or more of: identifying if the first and second audio effects are identical and, if so, said first region is not identified; and identifying if the first and/or second spatial audio signals is/are below an audio threshold level (e.g. silent) and, if so, said first region is not identified.
The means for processing said first and/or second spatial audio signals may disable audio effects associated with said first and/or second spatial audio signals in the absence of user instructions to the contrary when said first region is identified.
Some embodiments further comprise means for providing a preview output providing an output including said processed first and/or second spatial audio signals within said first region. Thus, a user may preview how the overlap region may be rendered in a particular configuration.
The means for processing said first and/or second spatial audio signals within said first region may modify audio settings in said first region.
Some embodiments further comprise means for setting said first and/or second audio effects. For example, a user input may be provided to enable the user to set the first and/or second audio effects. This may, for example, be implemented using a user interface.
Some embodiments further comprise means (e.g. a mobile phone, such as a multi-microphone mobile phone, or a similar user device) for capturing the first and/or second spatial audio signals.
Some embodiments further comprise uploading at least some audio content of said first region for further audio processing. The said further audio processing may include sound separation.
The said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.
In a second aspect, this specification describes a method comprising: receiving a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed (e.g. using beamforming) in first and second directions respectively; applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; identifying a first region, in which a spatial overlap exists between the first and second directions; obtaining instructions (e.g. user instructions) for processing said first and/or second spatial audio signals within said first region; and processing said first and/or second spatial audio signals within said first region in accordance with said instructions. Example audio effects include reverberation, modulations, filtering, chorus, flanger etc. The said spatial overlap may be a partial spatial overlap.
The first direction may have a first spatial width and/or the second direction may have a second spatial width. The first and second spatial widths may be the same or may be different.
Some embodiments further comprise generating a user interface output related to said first and/or second spatial audio signals within said first region, wherein obtaining said instructions comprises receiving instructions in response to said user interface output. The user interface output may comprise a time-domain visualisation and/or a frequency-domain visualisation of the first and/or second spatial audio signals within said first region. The said visualisation(s) may be provided to highlight the first region.
Identifying the first region may comprise one or more of: identifying if the first and second audio effects are identical and, if so, said first region is not identified; and identifying if the first and/or second spatial audio signals is/are below an audio threshold level (e.g. silent) and, if so, said first region is not identified.
Some embodiments further comprise providing a preview output providing an output including said processed first and/or second spatial audio signals within said first region. Thus, a user may preview how the overlap region may be rendered in a particular configuration.
Some embodiments further comprise setting said first and/or second audio effects. For example, a user input may be provided to enable the user to set the first and/or second audio effects. This may, for example, be implemented using a user interface.
Some embodiments further comprise capturing the first and/or second spatial audio signals.
Some embodiments further comprise uploading at least some audio content of said first region for further audio processing (such as sound separation).
In a third aspect, this specification describes any apparatus configured to perform any method as described with reference to the second aspect.
In a fourth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the second aspect.
In a fifth aspect, this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: receive a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed in first and second directions respectively; apply a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; identify a first region, in which a spatial overlap exists between the first and second directions; obtain instructions for processing said first and/or second spatial audio signals within said first region; and process said first and/or second spatial audio signals within said first region in accordance with said instructions.
In a sixth aspect, this specification describes a computer-readable medium (such as a non-transitory computer readable medium) comprising program instructions stored thereon for performing at least the following: receiving a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed in first and second directions respectively; applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; identifying a first region, in which a spatial overlap exists between the first and second directions; obtaining instructions for processing said first and/or second spatial audio signals within said first region; and processing said first and/or second spatial audio signals within said first region in accordance with said instructions.
In a seventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed in first and second directions respectively; apply a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; identify a first region, in which a spatial overlap exists between the first and second directions; obtain instructions for processing said first and/or second spatial audio signals within said first region; and process said first and/or second spatial audio signals within said first region in accordance with said instructions.
In an eighth aspect, this specification describes an apparatus comprising: a first input for receiving a first spatial audio signal and a second spatial audio signal (and optionally first and/or second spatial video signals), wherein the first and second spatial audio signals are focussed (e.g. using beamforming) in first and second directions respectively; a first audio processing module for applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal; a spatial processing module for identifying a first region, in which a spatial overlap exists between the first and second directions; a first control module for obtaining instructions (e.g. user instructions) for processing said first and/or second spatial audio signals within said first region; and a second audio processing module (which may be the same module as the first audio processing module) for processing said first and/or second spatial audio signals within said first region in accordance with said instructions.

Brief description of the drawings

Example embodiments will now be described, by way of non-limiting examples, with reference to the following schematic drawings:

FIG. 1 is a plan view of a system in which embodiments described herein may be used;
FIG. 2 is a block diagram of a system in accordance with an example embodiment;
FIG. 3 is a block diagram of a system in accordance with an example embodiment;
FIG. 4 is a flow chart showing an example algorithm in accordance with an example embodiment;
FIG. 5 is a block diagram of a system in accordance with an example embodiment;
FIG. 6 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 7 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 8 shows an output in accordance with an example embodiment;
FIG. 9 shows an output in accordance with an example embodiment;
FIG. 10 is a block diagram of a system in accordance with an example embodiment;
FIG. 11 is a flow chart showing an algorithm in accordance with an example embodiment;
FIG. 12 shows a user interface in accordance with an example embodiment;
FIG. 13 shows a user interface in accordance with an example embodiment;
FIG. 14 is a flow chart showing an algorithm in accordance with an example embodiment; and
FIG. 15 is a block diagram of a system in accordance with an example embodiment;
FIG. 16 is a block diagram of components of a system in accordance with an example embodiment; and
FIGS. 17A and 17B show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to example embodiments.

Detailed description

In the description, like reference numerals relate to like elements throughout.
FIG. 1 is a plan view of a system, indicated generally by the reference numeral 10, in which embodiments described herein may be used. The system 10 comprises a user device 12, such as a mobile phone. The user device 12 may include one or more microphones for capturing spatial audio signals. The user device 12 may also include one or more cameras for capturing images and/or any related data such as depth maps.
The spatial audio data captured by the user device 12 may include audio content that includes at least some indication of sound directions. This may, for example, be by means of a parametric representation or otherwise perceivable by a listener. For example, an omnidirectional microphone may be used that captures audio from all directions around the user device 12.
FIG. 2 is a block diagram of a system, indicated generally by the reference numeral 20, in accordance with an example embodiment. The system 20 comprises the user device 12 described above and additionally includes a first audio source 22. The first audio source 22 may, for example, be a person talking (although other audio sources are possible in example embodiments).
In the system 20, the user device 12 receives a first spatial audio signal from the first audio source 22. The user device 12 may include a first audio focus arrangement 24, such as a beamforming arrangement, in the direction of the first audio source 22 in order to obtain, for example, a monophonic audio signal enhancing the specified direction. Such audio focussing may be provided, for example, to extract audio data relating to the first audio source 22, or to focus on the source 22 in the spatial audio field. The first spatial audio signal is focussed in a first direction (i.e. in the direction of the user device 12). Depending on how the first audio focus arrangement 24 is implemented, the beam can be narrow or wide. Furthermore, the beamforming may amplify the beamed direction only slightly, or may substantially amplify the beamed direction and, optionally, attenuate or cancel audio signals from other directions.
FIG. 3 is a block diagram of a system, indicated generally by the reference numeral 30, in accordance with an example embodiment. The system 30 comprises the user device 12 and the first audio source 22 described above and additionally includes a second audio source 32. The first and second audio sources 22 and 32 may be two people talking, although this is not essential to all embodiments. For example, the audio objects may include musical instruments (e.g. guitars and drums), stationary loudspeakers, movable loudspeakers, multimedia devices such as TV's, or may include any other sound sources.
In the system 30, the user device 12 receives a first spatial audio signal from the first audio source 22 and a second spatial audio signal from the second audio source 32. As described above, the user device 12 includes a first audio focus arrangement 24, such as a first beamforming arrangement, in the direction of the first audio source 22 in order to obtain, for example, a monophonic audio signal enhancing the specified direction. Similarly, the user device 12 of the system 30 includes a second audio focus arrangement 34, such as a second beamforming arrangement, in the direction of the second audio source.
By way of example, two beamforming arrangements may be operated in parallel, one implementing the first audio focus arrangement 24 and the other implementing the second audio focus arrangement 34.
In various embodiments, there may be a tracking employed to guide at least one audio focus or beamforming direction. For example, there may be utilized a visual tracking based on a camera input or any other suitable tracking, where an audio source, such as audio sources 22 and 32, that moves relative to the capture device is maintained under audio focus or beamforming. In other words, a moving audio source may be maintained under the audio focus arrangement.
FIG. 4 is a flow chart, indicated generally by the reference numeral 40, showing an example algorithm.
The algorithm 40 starts at operation 41, where the first and second audio focus arrangements 24 and 34 are arranged to enable the user device 12 to receive a first spatial audio signal focussed in the direction of the first audio source 22 and a second spatial audio signal focussed in the direction of the second audio source 32. Thus, the operation 41 may implement suitable beamforming arrangements. The first and second audio focus arrangements 24 and 34 may have a first and second spatial width respectively.
As operation 42, a user selects a first audio effect to be applied to the first spatial audio signal and/or a second audio effect to be applied to the second spatial audio signal. Example audio effects include reverberations, modulations, filtering etc.
At operation 43, the system 30 processes the spatial audio signals (e.g., audio objects) according to the effects selected in the operation 42. Thus, first and/or second spatial audio signals captured by the user device 12 can be modified according to the effects selected in the operation 42.
The algorithm 40 can be applied to the system 30 in order to process the first and second audio sources separately. Indeed, entirely different effects may be selected in operation 42 and applied to the first and second audio sources in operation 43. This is relatively simple to implement in the event that the separation between the audio sources is much greater than the width of the audio focusing arrangements 24 and 34. However, if the audio sources are relatively close together (e.g. such that they overlap, at least to some degree), it may become difficult to separate the audio sources and to process the relevant audio signals separately.
FIG. 5 is a block diagram of a system, indicated generally by the reference numeral 50, in accordance with an example embodiment. The system 50 comprises the user device 12, the first audio source 22 and the second audio source 32 described above. The user device 12 includes the first audio focus arrangement 24, in the direction of the first audio source 22, and the second audio focus arrangement 34, in the direction of the second audio source 32, described above.
The first and second audio sources 22 and 32 are much closer together in the system 50 than in the system 30. Accordingly, the first and second audio focus arrangements 24 and 34 are much closer together in the system 50 than in the system 30; indeed, in the system 50, the first and second audio focus arrangements 24 and 34 overlap to a significant degree. Of course, the degree of overlap is to some extent determined by the spatial width of the audio focus arrangements 24 and 34.
When applying the algorithm 40 to the system 50, it may be difficult to implement the operation 43, since it may be difficult for the user device 12 to distinguish between the first and second audio sources 22 and 32 in order to separately apply an effect to an audio source.
FIG. 6 is a flow chart showing an algorithm, indicated generally by the reference numeral 60, in accordance with an example embodiment. The algorithm 60 maybe implemented using the systems 30 or 50.
The algorithm 60 starts at operation 41, where, as discussed above, audio focus arrangements 24 and 34 are arranged to enable the user device 12 to receive a first spatial audio signal focussed in the direction of the first audio source 22 (a "first direction") and a second spatial audio signal focussed in the direction of the second audio source 32 (a "second direction"). The operation 41 maybe implemented at the user device 12 by providing means for receiving the first spatial audio signal and the second spatial audio signal at the user device. As noted above, the first and second audio focus arrangements 24 and 34 may have first and second spatial widths respectively.
At operation 42, a user selects a first audio effect to be applied to the first spatial audio signal and/or a second audio effect to be applied to the second spatial audio signal. As indicated above, example audio effects include reverberations, modulations, filtering etc.
The algorithm 60 then moves to operation 61, where it is determined whether a first region is identified, in which a spatial overlap exists between the first and second directions. For example, in the system 50, a substantial overlap exists, whereas in the system 30, no significant overlap exists. If an overlap is identified, the algorithm 60 moves to operation 62; otherwise, the algorithm 60 moves to operation 63. It should be noted that the extent of the overlap necessary for an overlap to be identified may vary in different embodiments (and may be user-definable). Moreover, as noted above, the extent to which an overlap exists may be dependent on the spatial widths of the respective audio focus arrangements.
At operation 62, overlap regions are processed (as discussed further below), before the algorithm 60 moves to operation 63.
At operation 63, the audio objects are processed. The operation 63 may, for example, be similar to (or identical to) the operation 43 discussed above. In the operation 63, the first and/or second spatial audio signals within said first region may be processed to modify audio settings in said first region. By way of example, the operation 63 may disable the audio effects associated with said first and/or second spatial audio signals in the absence of instructions (e.g. user instructions) to the contrary when said first region is identified (i.e. when an overlap is detected).
At operation 64, a preview output providing an (e.g. a preview output) output including said processed first and/or second spatial audio signals within said first region may be provided to a user. In this way, a user may preview how the overlap region may be rendered in a particular configuration. This may, for example, enable a user (such as an editor) to trial a number of different approaches to dealing with overlap conditions.
The algorithm 60 can therefore be used to identify areas of overlap between the spatial positions of audio sources (or between focus arrangements relating to said audio sources). This may be detected, for example, by calculating the absolute difference between directions-of-arrival (DOA) of audio signals, differences in azimuth and/or elevation angle, differences in the directions of audio focussing arrangements etc.
In some circumstances, regardless of whether an overlap exists, a problem due to spatial overlapping of the form described above may not arise. For example, in the event that audio effects applied to potentially overlapping audio sources are identical, then the presence of an overlap may not present a problem. Alternatively, or in addition, in the event that One or more of potentially overlapping audio sources have audio levels below an audio threshold level (e.g. are silent), then the presence of an overlap may not present a problem. In such circumstances, the operation 61 may proceed to the operation 63 regardless of whether an overlap is detected. Thus, the operation 61 may determine whether a relevant overlap exists.
FIG. 7 is a flow chart showing an algorithm, indicated generally by the reference numeral 70, in accordance with an example embodiment. The algorithm 70 is an example implementation of the operation 62 of the algorithm 60 described above. Other example implementations of the operation 62 are possible. For example, instructions for processing audio signals within an overlap region may be pre-set. In an example embodiment, a first effect may overtake a second effect. For example, if a first audio object having a chorus effect applied to it, and a second audio object having a reverb effect applied to it, overlap then a rule may be applied to modify the reverb effect of the second audio object. Different effects may be provided in a prioritized order in user settings, device settings, or learnt by the system 50. For example, the system may learn over the course of time when a user is making changes in overlapping situations and propose to apply the change the user has most often selected when an overlapping has occurred. The system may also change the effect of the first and/or second audio object according to the learnt behaviour without asking user's confirmation. Different ways to handle the overlapping situation exist as explained earlier. For example, one or more of the effects may be removed, one or more of the effects may be changed, one or more effects maybe applied to one or more audio objects. In the last option, a surprise random or semi-random effect (e.g. when a certain first effect and a certain second effect overlap a certain third effect may be applied to both etc.) may be applied when audio objects with different effects overlap. This could indicate to the user that the audio signals are overlapping and the user may decide to further edit the audio objects and their effects.
The algorithm 70 starts at operation 71, where a user interface output is generated related to said first and/or second spatial audio signals within said overlap region. The user interface may, for example, be an audio editor (or a video and audio editor). Example user interface outputs are described below.
At operation 72, an indication is given, for example by a user, indicating how the overlap (e.g. the overlap identified in the operation 61) is to be handled. For example, the operation 72 may involve receiving user instructions in response to said user interface output.
Finally, at operation 73, the identified overlap is processed in accordance with the user indication provided in the operation 72. For example, the first and/or second spatial audio signals within the overlap region may be processed in accordance with said user instructions.
FIG. 8 shows an output, indicated generally by the reference numeral 80, in accordance with an example embodiment. The output 80 is an example implementation of the operation 72 described above and shows a time-domain visualisation of a first audio signal 81 and a second audio signal 82. For example, the time-domain visualisation may correspond to a video (or audio and video) editing software timeline such that it provides a linear view of the audio or audio-visual content across a device screen or a software window. The visualisation may be provided to highlight the first region. The output 80 is therefore an example user interface output.
The time-domain visualisation of the first audio signal 81 includes three parts: labelled 81a, 81b and 81c respectively. The three parts of the first audio signal (which are provided at different time periods) are provided at different spatial positions. Similarly, the time-domain visualisation of the second audio signal 82 includes three parts: labelled 82a, 82b and 82c respectively. Again, the three parts of the second audio signal (which are provided at different time periods) are provided at different spatial positions.
The output 80 includes a visualisation of an overlap period 84, during which the first and second audio signals are at the same spatial position. The output 80 is therefore an example of a time-domain user interface that can be used to highlight, to a user (such as an editor) a region in which a spatial overlap exists between the first and second audio signals. Of course, many alternative time-domain visualisations are possible.
FIG. 9 shows an output, indicated generally by the reference numeral 90, in accordance with an example embodiment. The output 90 is an example implementation of the operation 72 described above and shows a frequency-domain visualisation of a first signal 91 and a second audio signal 92. The visualisation may be provided to highlight the first region. The output 90 is therefore an example user interface output.
The first audio signal 91 has generally lower frequency components and the second audio signal 92 has generally higher frequency components. A frequency 94 (which may be user defined) indicates a frequency at which a filter may be provided to separate the first and second audio signals. Such filtering in an example of processing that may be performed in the operations 62 and 73 described above. Of course, many alternative frequency-domain visualisations are possible.
FIG. 10 is a block diagram of a system, indicated generally by the reference numeral 100, in accordance with an example embodiment. The system 100 comprises the user device 12, the first audio source 22 and the second audio source 32 described above. As described above, the user device 12 receives a first spatial audio signal from the first audio source 22 and a second spatial audio signal from the second audio source 32, with the user device 12 including the first audio focus arrangement 24 and the second audio focus arrangement 34 described above.
In the system 100, the user device 12 and the first and second audio source 22, 32 are provided within a space 102, such that echoes occur. By way of example, a first echo 104 may be provided between the first audio source 22 and the user device 12 and a second echo 105 may be provided between the second audio source 32 and the user device 12. Of course, the system 100 is a simple example; more complicated systems, including multiple echoes for each of multiple audio sources are possible.
FIG. 11 is a flow chart showing an algorithm, indicated generally by the reference numeral 110, in accordance with an example embodiment. The algorithm 110 may be implemented using the system 100. The algorithm 110 may be implemented as part of the operation 62 of the algorithm 60 described above, although this is not essential to all embodiments.
The algorithm 110 starts at operation 111, where echoes (as received at the user device 12) are determined. Next, at operation 112, a visualisation of potential echoes of at least some of first and second spatial audio signals (originating at the first and second audio sources respectively) are presented to a user (such as an editor).
At operation 113, a user identifies echoes, for example by using a user interface to indicate which echo relates to which audio source. Thus, a user indication of which of said potential echoes are related to the first and/or the second spatial audio signals is received.
Finally on the basis of the information provided in the operation 113, the beamforming arrangement is enhanced accordingly at operation 114 of the algorithm 110.
FIG. 12 shows a user interface, indicated generally by the reference numeral 120, in accordance with an example embodiment. The user interface 120 provides an example visualisation of echoes and is therefore an example implementation of the operation 112 of the algorithm 110.
The user interface 120 shows a representation of the first audio source 22 and the second audio source 32 described above. The user interface 120 also shows first to sixth echoes (labelled 121 to 126 respectively) that could be related to either the first audio source 22 or the second audio source 32. By way of example, the user device 12 may not be able to determine which echoes relate to which sound source (particular if the sound sources are spatially close, e.g. overlapping, and the separation of the original audio source signals is therefore difficult).
The user interface 120 may enable a user (such as an editor) to listen to individual echoes 121 to 126 and to indicate which sound source some or all of the echoes relate to (thereby implementing the operation 113 described above).
By way of example, FIG. 13 shows a user interface, indicated generally by the reference numeral 130, in accordance with an example embodiment. The user interface 130 shows a representation of the first audio source 22 and the second audio source 32 described above. The user interface 130 also shows a first echo 122a, a second echo 122b and a third echo 122c that relate to the first audio source 22, a fourth echo 132a that relates to the second audio source 32, and a fifth echo 134 and a sixth echo 135 that have not (or, perhaps, have not yet) been assigned to either the first or the second audio sources.
The operation 114 may be implemented using an acoustic rake receiver beamformer. A user (such as an editor) may be able to preview how the audio sources sound when the acoustic rake receiver is applied to enhance an output using the selected echoes. The user may also be able to preview an audio output following rake receiving beamforming following the application of the selected effects (as discussed above). In an example embodiment, the adding of rake receiver beamformed echoes to a sound also affects the sound so that an effect is modified; for example, a sound may become more reverberant as more echoes are added to the sound.
FIG. 14 is a flow chart showing an algorithm, indicated generally by the reference numeral 140, in accordance with an example embodiment. The algorithm 140 is an example implementation of the operation 62 of the algorithm 60 described above.
The algorithm 140 starts at operation 141, where a determination made (e.g. in the form of a user input) that the audio objects should be uploaded to a remote server (or elsewhere) for further processing. Then, at operation 142, the audio objects are uploaded to a server, which server processes the data in some way. In this way, an overlap region can be uploaded for further audio processing. By way of example, the server may run a separation method to separate the audio signals within the overlap region. The algorithm 140 may be useful, for example, in the event that the user device 12 has insufficient resources (e.g. insufficient processing power) to perform the required processing, such as an audio separation method. By way of example, a process in the server may process the audio signals to apply effects to the overlapping region or modify the effect of at least one of the audio signals within the overlapping region. The process in the server may for example identify the audio objects and their audio signals and identify the effects applied to them. In the event the audio signals overlap, the process in the server may modify the effects of the audio signals as described for example in relation to FIG. 7. The process in the server may then create a modified audio stream and store it at the server and/or provide it for user devices. By way of example, audio signals could be recorded and uploaded to a server, which applies effects to the audio signals and sends modified audio signals to an editor process. A user operating an audio software on a device could be arranged to have a server inspect for overlapping audio effects and modify the recorded audio signals, and the user device could be arranged to receive modified audio signals which the user can then further edit if needed.
FIG. 15 is a block diagram of a system, indicated generally by the reference numeral 150, in accordance with an example embodiment. The system 150 comprises an apparatus 151, which may, for example, form part of the user device 12 described above. In any particular implementation, some of the elements of the system 150 may be omitted and/or some other elements added.
The apparatus 151 comprises an input module 152, a controller 153, an effects module 154, an overlap identifier module 155, a user interface controller 156, and a processor 157. As shown in FIG. 15, the input module 152 receives one or more audio and/or visual inputs, the user interface controller 156 receiver user input and the processor 157 is in two-way communication with one or more external modules 158.
The inputs received at the input module 152 may, for example, be the audio signals received from the first and second audio sources 22 and 32. The audio signals maybe focussed in first and second directions respectively, as discussed above. Under the control of the controller 153, audio effects may be applied to the audio sources (by the effects module 154), spatial overlaps between the first and second directions may be identified by the overlap identifier module 155 and a user interface related the first and/or second audio signals provided by the user interface controller 156. A user/editor may provide user instructions to the user interface controller 156. The processor 157 may then process the first and/or second spatial audio signals within said first region in accordance with said user instructions. The processor 157 may also communicate with the external module(s) 158, for example to implement the algorithm 140 described above.
Of course, the system 150 is highly schematic and is provided by way of example only. Many alternatives to the configuration of the system 150 will be apparent to those skilled in the art.
For completeness, FIG. 16 is a schematic diagram of components of one or more of the example embodiments described previously, which hereafter are referred to generically as processing systems 300. A processing system 300 may have a processor 302, a memory 304 closely coupled to the processor and comprised of a RAM 314 and ROM 312, and, optionally, user input 310 and a display 318. The processing system 300 may comprise one or more network/apparatus interfaces 308 for connection to a network/apparatus, e.g. a modem which may be wired or wireless. Interface 308 may also operate as a connection to other apparatus such as device/apparatus which is not network side apparatus. Thus, direct connection between devices/apparatus without network participation is possible.
The processor 302 is connected to each of the other components in order to control operation thereof.
The memory 304 may comprise a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD). The ROM 312 of the memory 314 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor implements aspects of the algorithms 40, 60, 70, 110 and 140 described above. Note that in the case of small device/ apparatus the memory can be most suitable for small size usage i.e. not always hard disk drive (HDD) or solid-state drive (SSD) is used.
The processor 302 may take any suitable form. For instance, it may be a microcontroller, a plurality of microcontrollers, a processor, or a plurality of processors.
The processing system 300 may be a standalone computer, a server, a console, or a network thereof. The processing system 300 and needed structural parts may be all inside device/ apparatus such as IoT device/ apparatus i.e. embedded to very small size
In some example embodiments, the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device/ apparatus and may run partly or exclusively on the remote server device/apparatus. These applications maybe termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device/ apparatus in order to utilize the software application stored there.
FIGS. 17A and 17B show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to example embodiments described above. The removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code. The memory 366 may be accessed by a computer system via a connector 367. The CD 368 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used. Tangible media can be any device/apparatus capable of storing data/information which data/information can be exchanged between devices/apparatus/network.
Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "memory" or "computer-readable medium" may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, "computer-readable storage medium", "computer program product", "tangibly embodied computer program" etc., or a "processor" or "processing circuitry" etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices/ apparatus and other devices/ apparatus. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device/ apparatus as instructions for a processor or configured or configuration settings for a fixed function device/ apparatus, gate array, programmable logic device/ apparatus, etc.
As used in this application, the term "circuitry" refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/ software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow charts of Figures 4, 6, 7, 11 and 14 are examples only and that various operations depicted therein maybe omitted, reordered and/or combined.
It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Claims

An apparatus comprising:
means for receiving a first spatial audio signal and a second spatial audio signal, wherein the first and second spatial audio signals are focussed in first and second directions respectively;

means for applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal;

means for identifying a first region, in which a spatial overlap exists between the first and second directions;

means for obtaining instructions for processing said first and/or second spatial audio signals within said first region; and

means for processing said first and/or second spatial audio signals within said first region in accordance with said instructions.
An apparatus as claimed in claim 1, wherein the first direction has a first spatial width and/or the second direction has a second spatial width.
An apparatus as claimed in claim 1 or claim 2, wherein said instructions are pre-set instructions and wherein said means for obtaining instructions retrieves said pre-set instructions.
An apparatus as claimed in any one of claims 1 to 3, further comprising means for generating a user interface output related to said first and/or second spatial audio signals within said first region, wherein said means for obtaining instructions receives instructions in response to said user interface output.
An apparatus as claimed in claim 4, wherein the user interface output comprises a time-domain visualisation and/or a frequency-domain visualisation of the first and/or second spatial audio signals within said first region.
An apparatus as claimed in claim 4 or claim 5, wherein said user interface output depicts potential echoes of at least some of said first and/or second spatial audio signals.
An apparatus as claimed in claim 6, further comprising means for receiving a user indication of which of said potential echoes are related to the first and/or the second spatial audio signals.
An apparatus as claimed in any one of the preceding claims, wherein the means for identifying the first region comprises one or more of:
identifying if the first and second audio effects are identical and, if so, said first region is not identified; and

identifying if the first and/or second spatial audio signals is/are below an audio threshold level and, if so, said first region is not identified.
An apparatus as claimed in any one of the preceding claims, wherein the means for processing said first and/or second spatial audio signals disables audio effects associated with said first and/or second spatial audio signals in the absence of user instructions to the contrary when said first region is identified.
An apparatus as claimed in any one of the preceding claims, further comprising means for providing a preview output providing an output including said processed first and/or second spatial audio signals within said first region.
An apparatus as claimed in any one of the preceding claims, wherein said means for processing said first and/or second spatial audio signals within said first region modifies audio settings in said first region.
An apparatus as claimed in any one of the preceding claims, further comprising uploading at least some audio content of said first region for further audio processing.
An apparatus as claimed in any one of the preceding claims, wherein the means comprise:
at least one processor; and

at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.
A method comprising:
receiving a first spatial audio signal and a second spatial audio signal, wherein the first and second spatial audio signals are focussed in first and second directions respectively;

applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal;

identifying a first region, in which a spatial overlap exists between the first and second directions;

obtaining instructions for processing said first and/or second spatial audio signals within said first region; and

processing said first and/or second spatial audio signals within said first region in accordance with said instructions.
A computer readable medium comprising program instructions stored thereon for performing at least the following:
receiving a first spatial audio signal and a second spatial audio signal, wherein the first and second spatial audio signals are focussed in first and second directions respectively;

applying a first audio effect to the first spatial audio signal and/or a second audio effect to the second spatial audio signal;

identifying a first region, in which a spatial overlap exists between the first and second directions;

obtaining instructions for processing said first and/or second spatial audio signals within said first region; and

processing said first and/or second spatial audio signals within said first region in accordance with said instructions.