EP3409025A1 - System and apparatus for tracking moving audio sources - Google Patents
System and apparatus for tracking moving audio sourcesInfo
- Publication number
- EP3409025A1 EP3409025A1 EP16704399.1A EP16704399A EP3409025A1 EP 3409025 A1 EP3409025 A1 EP 3409025A1 EP 16704399 A EP16704399 A EP 16704399A EP 3409025 A1 EP3409025 A1 EP 3409025A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frequency domain
- spatial
- domain representations
- audio signals
- arrival
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000005236 sound signal Effects 0.000 claims abstract description 106
- 238000005259 measurement Methods 0.000 claims abstract description 85
- 238000009826 distribution Methods 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 58
- 238000004590 computer program Methods 0.000 claims description 36
- 230000015654 memory Effects 0.000 claims description 34
- 238000001914 filtration Methods 0.000 claims description 32
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 24
- 239000002245 particle Substances 0.000 claims description 20
- 239000000203 mixture Substances 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 14
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 72
- 238000000926 separation method Methods 0.000 description 18
- 238000004458 analytical method Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000003786 synthesis reaction Methods 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 8
- 239000013598 vector Substances 0.000 description 7
- 238000012880 independent component analysis Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000003491 array Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 239000002775 capsule Substances 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000036962 time dependent Effects 0.000 description 3
- 239000000470 constituent Substances 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000008867 communication pathway Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005183 dynamical system Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
Definitions
- This specification relates to processing audio signals and, more specifically, to processing audio signals for separating one or more moving audio sources.
- a stereo or multi-channel recording can be passed from the recording or capture apparatus to a listening apparatus and replayed using a suitable multi-channel output such as a multi-channel loudspeaker arrangement and, with virtual surround processing, a pair of stereo headphones or headset.
- this speciation describes a method comprising: receiving frequency domain representations of at least two audio signals, each recorded by a respective microphone of a microphone array comprising at least two microphones, the audio signals including at least one component resulting from one or more moving audio sources, wherein the frequency domain representations each comprise a plurality of time frames; determining, based on the frequency domain representations of the at least two audio signals, a spatial energy distribution over a set of directions around the recording module array for each time frame; converting the spatial energy distribution into multiple direction-of-arrival measurements; and based on at least a subset of the direction-of-arrival measurements, performing acoustical tracking to identify the one or more moving audio sources and their associated directions relative to the recording module array for each time frame.
- the method may further comprise determining for each time frame a spatial covariance matrix model for each identified source based on the identified one or more moving audio sources and their associated directions relative to the recording module array for each time frame.
- the method may further comprise estimating a spectrogram model for one or more time frames for each of the one or more moving audio sources using the estimated spatial covariance matrix models and the frequency domain representations of the at least two audio signals.
- the method may further comprise separating the frequency domain representations of the at least two audio signals based on the estimated spectrogram model using time-frequency filtering to produce separated frequency domain representations of each of the moving audio sources.
- the method may further comprise restoring spatial energies for each identified audio source based on the associated directions relative to the recording module array; and determining the spatial covariance matrix models for each identified source based on the restored spatial energies.
- the method may comprise performing the acoustical tracking using a particle filtering algorithm.
- the particle filtering algorithm may comprise a Rao-Blackwellized particle filter.
- the method may comprise determining the spatial energy distribution using a steered response power algorithm with a phase transform weighting.
- the method may comprise converting the spatial energy distribution into multiple direction-of-arrival measurements by estimating a wrapped Gaussian mixture model of the observed spatial energy in each time frame.
- the method may comprise decluttering the direction-of-arrival measurements by filtering out measurements having particular parameters which do not satisfy a particular criterion.
- the direction-of-arrival measurements may comprise plural mean angle-of-arrival values and a variance associated with each mean value, and the method further comprise filtering out the direction-of-arrival measurements for which the associated variance is above a threshold.
- the direction-of-arrival measurements may comprise plural mean angle-of- arrival values and an associated weight and the method may further comprise filtering out the direction-of-arrival measurements for which the associated weight is below a threshold.
- the method may comprise estimating the spectrogram model for each of the one or more moving audio sources comprises performing iterative optimisation of parameters of the spectrogram model.
- the method may further comprise, prior to performing iterative optimisation of the parameters of the spectrogram model, initializing the parameters using spatial energy distribution values for each identified source which are scaled such that the spatial energy distribution values sum to unity when the source is active and which are set to zero when the source is inactive.
- the method may comprise transforming the at least two audio signals into their frequency domain representations using a short-time Fourier transform.
- the method may comprise synthesising audio signals for each identified moving audio source based on the separated frequency domain representations of each of the moving audio sources.
- the method may comprise in some examples: receiving frequency domain
- this specification describes apparatus configured to perform a method as described with reference to the first aspect.
- this specification describes computer-readable instructions which when executed by computing apparatus cause the computing apparatus to perform a method as described with reference to the first aspect.
- this specification describes apparatus comprising at least one processor and at least one memory including computer program code, which when executed by the at least one processor, causes the apparatus: to receive frequency domain representations of at least two audio signals, each recorded by a respective microphone of a microphone array comprising at least two microphones, the audio signals including at least one component resulting from one or more moving audio sources, wherein the frequency domain representations each comprise a plurality of time frames; to determine, based on the frequency domain representations of the at least two audio signals, a spatial energy distribution over a set of directions around the recording module array for each time frame; to convert the spatial energy distribution into multiple direction-of-arrival measurements; and based on at least a subset of the direction-of-arrival measurements, to perform acoustical tracking
- the computer program code when executed by the at least one processor, may cause the apparatus to determine for each time frame a spatial covariance matrix model for each identified source based on the identified one or more moving audio sources and their associated directions relative to the recording module array for each time frame.
- the computer program code when executed by the at least one processor, may further cause the apparatus to estimate for each time frame a spectrogram model for each of the one or more moving audio sources using the estimated spatial covariance matrix models and the frequency domain representations of the at least two audio signals.
- the computer program code when executed by the at least one processor, may further cause the apparatus to separate the frequency domain representations of the at least two audio signals based on the estimated spectrogram models using time-frequency filtering to produce separated frequency domain representations of each of the moving audio sources.
- the computer program code when executed by the at least one processor, may cause the apparatus to restore spatial energies for each identified audio source based on the associated directions relative to the recording module array, and to determine the spatial covariance matrix models for each identified source based on the restored spatial energies.
- the computer program code when executed by the at least one processor, may cause the apparatus to perform the acoustical tracking using a particle filtering algorithm.
- the particle filtering algorithm may comprise a Rao-Blackwellized particle filter.
- the computer program code when executed by the at least one processor, may cause the apparatus to determine the spatial energy distribution using a steered response power algorithm with a phase transform weighting.
- the computer program code when executed by the at least one processor, may cause the apparatus to convert the spatial energy distribution into multiple direction-of- arrival measurements by estimating a wrapped Gaussian mixture model of the observed spatial energy in each time frame.
- the computer program code when executed by the at least one processor, may further cause the apparatus to declutter the direction-of-arrival measurements by filtering out measurements having particular parameters which do not satisfy a particular criterion.
- the direction-of-arrival measurements may comprise plural mean angle-of-arrival values and a variance associated with each mean angle-of-arrival value, and the computer program code, when executed by the at least one processor, may cause the apparatus to filter out the direction-of-arrival measurements for which the associated variance is above a threshold.
- the direction-of-arrival measurements may comprise plural mean angle-of- arrival values and a weight associated with each mean angle-of-arrival value, wherein the computer program code, when executed by the at least one processor, causes the apparatus to filter out the direction-of-arrival measurements for which the associated weight is below a threshold.
- the computer program code when executed by the at least one processor, may cause the apparatus to estimate the spectrogram model for each of the one or more moving audio sources by performing iterative optimisation of parameters of the spectrogram model.
- the computer program code when executed by the at least one processor, may cause the apparatus, prior to performing iterative optimisation of the parameters of the spectrogram model, to initialize the parameters using spatial energy distribution values for each identified source which are scaled such that the spatial energy distribution values sum to unity when the source is active and which are set to zero when the source is inactive.
- the computer program code when executed by the at least one processor, may cause the apparatus to transform the at least two audio signals into their frequency domain representations using a short-time Fourier transform.
- the computer program code when executed by the at least one processor, may cause the apparatus to synthesise audio signals for each identified moving audio source based on the separated frequency domain representations of each of the moving audio sources.
- this specification describes a computer-readable medium having computer-readable code stored thereon, the computer readable code, when executed by a least one processor, causing performance of at least: receiving frequency domain representations of at least two audio signals, each recorded by a respective microphone of a microphone array comprising at least two microphones, the audio signals including at least one component resulting from one or more moving audio sources, wherein the frequency domain representations each comprise a plurality of time frames;
- the computer-readable code stored on the medium of the fifth aspect may further cause performance of any of the operations described with reference to the method of the first aspect.
- this specification describes apparatus comprising: means for receiving frequency domain representations of at least two audio signals, each recorded by a respective microphone of a microphone array comprising at least two microphones, the audio signals including at least one component resulting from one or more moving audio sources, wherein the frequency domain representations each comprise a plurality of time frames; means for determining, based on the frequency domain representations of the at least two audio signals, a spatial energy distribution over a set of directions around the recording module array for each time frame; means for converting the spatial energy distribution into multiple direction-of-arrival measurements; and means for performing acoustical tracking to identify the one or more moving audio sources and their associated directions relative to the recording module array for each time frame, based on at least a subset of the direction-of-arrival measurements.
- the apparatus of the sixth aspect may further comprise means for causing performance of any of the operations described with reference to method of the first aspect.
- FIG. ⁇ shows schematically an electronic apparatus including audio signal processing apparatus
- Figure 2 shows in a functional manner an example of an audio signal processing apparatus which may form part of the apparatus of Figure l;
- Figure 3 is a flow diagram illustrating various operations which may be performed by the audio signal processing apparatus of Figure 2;
- Figures 4A illustrates graphically direction-of-arrival measurements for a single time frame derived from a sample audio signal including components derived from two audio sources;
- Figure 4B illustrates graphically direction-of-arrival measurements for all time frames derived from the same sample audio signal
- Figure 4C illustrates graphically the direction-of-arrival measurements for all time frames following de-cluttering
- Figure 4D illustrates graphically the paths of the two moving audio sources which have been identified based on the decluttered direction-of-arrival measurements of Figure 4C;
- Figure 5 is a flow diagram illustrating various operations which may be performed by a spectrogram parameter estimator when estimating the spectrogram models for the detected sources.
- Figure 6 shows schematically example microphone configurations in an apparatus for capturing audio signals derived from one or more moving audio sources.
- a multiple microphone configuration enables the recording of stereo or surround-sound signals and the known location and orientation of the microphones further enables the apparatus to process the captured or recorded audio signals from the microphones to perform spatial processing to emphasise or focus on the audio signals from a defined direction relative to other directions.
- One way to perform spatial processing is to initially extract and manipulate the direction or sound source dependent information and to use this information in subsequent applications.
- These applications can include, for example, spatial audio coding (SAC), 3D sound-field analysis and synthesis, sound source separation and speaker extraction for further processing such as speech recognition.
- SAC spatial audio coding
- 3D sound-field analysis and synthesis sound source separation and speaker extraction for further processing such as speech recognition.
- BSS blind source separation
- a classic example of such a case is known as the cocktail party problem enabling the separation of each individual speaker from the party recorded using a microphone array.
- the field of BSS has been intensively studied, but is still categorized as an unsolved problem.
- the capturing or recording apparatus or device usually consists of a small hand held device having multiple microphones. The multiple channels and their information correlation and relationship can then be utilized for source separation and direction of arrival estimation.
- applications employing such analysis can employ the accurate and detailed directional information of the separated sources when rendering the captured field by positioning the source using either binaural synthesis by head related transfer function (HRTF) filtering or source positioning in multichannel and multidimensional loudspeaker arrays using source positioning techniques such as vector base amplitude panning (VBAP).
- HRTF head related transfer function
- VBAP vector base amplitude panning
- Blind sound source separation (BSS) of audio captures recorded using a small and enclosed microphone array such as conventionally found on a mobile device or apparatus can include the following problems and difficulties that are addressed herein by embodiments as described herein.
- the number of microphones is typically small, approximately 2-5 capsules, because of design volume and cost arrangements making the source direction of arrival (DoA) estimation difficult and pure beamforming-based separation inefficient.
- Beamforming for source direction of arrival detection and related problems and more recently spherical array beamforming techniques have been successfully used in sound field capture and analysis and also developed into final products.
- the problem with spherical array processing is the array structure and the sheer size of the actual arrays used prevents it from being incorporated into a single mobile device.
- pure beamforming does not address the problem of source separation but analyses the spatial space around the device with beams as narrow as possible.
- the side-lobe cancellation for decreasing the beam width generally requires increasing the microphone count of the array, which is costly in volume, device complexity and cost of manufacture.
- the small geometrical distance between capsules reduces the time-delay between microphones which require capturing using high sampling rate in order to observe the small time instance differences.
- high sampling frequency is used there are problems with frequency domain based BSS methods in form of spatial aliasing.
- audio frequencies with wavelength less than two times the distance of the microphone separation can cause ambiguity in resolving the time delays in form of a phase delay after a short time Fourier transform (STFT).
- STFT short time Fourier transform
- ICA independent component analysis
- TDoA time-difference of arrival
- DoA direction of arrival
- ICA-based separation is one of the methods which is sensitive to problems caused by spatial aliasing in permutation alignment and in unifying the source independencies over frequency.
- NMF non-negative matrix factorization
- multichannel cases have been proposed. These include for example multichannel NMF for convoluted mixtures, however the EM-algorithm used for parameter estimation is inefficient without oracle initialization, (in other words knowing source characteristics for initializing the algorithm).
- Complex multichannel NMF (CNMF) with multiplicative updates has been proposed with promising separation results.
- the proposed CNMF algorithms estimate the source spatial covariance properties and the magnitude model. However the spatial covariance matrices are estimated and updated individually for each frequency bin making the algorithm prone for estimation errors at high frequencies with spatial aliasing. Also the estimated covariance properties are not connected to the spatial locations of the sources.
- the problem includes solving and executing 3D sound synthesis of the separated sources.
- 3D synthesis of the separated sources or parts of the sources require pairing the separation algorithm with DoA analysis making the system potentially discontinuous and less efficient for the 3D sound scene analysis-synthesis loop.
- an enclosed microphone array with an unknown directivity pattern of each capsule requires a machine learning based algorithm for learning and compensating the unknown properties of the array.
- Moving sound sources add another layer of complexity to the sound source separation methods discussed above.
- the concept as described herein in further detail is one which the audio recording system provides apparatus and/or methods for separating moving audio sources using plural microphones in one device. More specifically, this specification describes a blind sound source separation method for a dynamic scenario captured using a spaced microphone array, which may in some examples be a compact array (for instance, in a mobile device). In other examples, the spaced microphone array may be made up of more than one physically separate device each including at least one microphone.
- methods described herein may be based on online tracking by particle filtering and estimating a NMF-based spectral model of tracked sources to separate them by time-frequency filtering.
- observed spatial energy as a function of time and direction of arrival is calculated for the signal under analysis.
- the observed spatial energy may be in the form of a steered response power (SRP).
- SRP steered response power
- WGMM wrapped Gaussian mixture model
- the WGMM means and variances may then be used as direction of arrival measurements for acoustic tracking, for instance using a Rao-Blackwellised particle filter.
- the acoustic tracking detects or identifies the underlying sources, associates the means and variances with the detected sound sources and outputs the source state in each time frame.
- a DOA-based spatial covariance matrix (SCM) model may be defined for each tracked source for each timeframe.
- the SCM model denotes the spatial behavior of sources, and a spectrogram model of sources consisting of evidence originating from the tracked direction is estimated.
- the individual source signals may then be reconstructed using a separation mask formulated as a Wiener-filter based on the estimated spectrogram model of each source.
- Figure 1 shows a schematic block diagram of an exemplary electronic apparatus 10, which may be used to record (or capture) audio signals derived from one or more audio sources.
- the electronic apparatus 1 may, for example, be a mobile terminal or user equipment of a wireless communication system.
- the apparatus 1 may be an audio player or audio recorder, such as an MP3 player, a media recorder/player (also known as an MP4 player), or any suitable portable apparatus suitable for recording audio or audio/video.
- the electronic apparatus l may in some embodiments comprise an audio subsystem 10.
- the audio subsystem 10 may for example comprise an array of microphones 11 for audio signal capture.
- the array of microphones may be solid state microphones, in other words capable of capturing audio signals and outputting a suitable digital format signal, in other words not requiring an analogue-to-digital converter.
- the microphones 11 may alternatively comprise any suitable microphone or audio capture means, for example a condenser microphone, capacitor microphone, electrostatic microphone, Electret condenser microphone, dynamic microphone, ribbon microphone, carbon microphone, piezoelectric microphone, or micro electrical-mechanical system (MEMS) microphone.
- the microphones n of the array may in such embodiments output the audio captured signal to an analogue-to-digital converter (ADC) 14.
- ADC analogue-to-digital converter
- the audio subsystem 10 may further comprise an analogue-to-digital converter (ADC) 14 configured to receive the analogue captured audio signal from the microphones and to output the audio captured signal in a suitable digital form.
- ADC analogue-to-digital converter
- the ADC converter 14 may be any suitable analogue-to-digital conversion or processing means.
- the microphones may contain both audio signal generating and analogue-to-digital conversion capability.
- the audio subsystem 10 may, in some examples further comprises a digital-to-analogue converter 32 for converting digital audio signals received from a processing apparatus 21 to a suitable analogue format.
- the digital-to-analogue converter (DAC) or signal processing means 32 may utilise any suitable DAC technology.
- the audio subsystem may comprise, in some embodiments, a speaker 33.
- the speaker 33 may receive the output from the DAC 32 and present the analogue audio signal to the user.
- the speaker 33 may be representative of multi-speaker arrangement, a headset, for example a set of headphones, or cordless headphones.
- the electronic apparatus 1 is shown having only audio capture and audio presentation components, it should be understood that, in some embodiments, the electronic 10 may additionally comprise only video capture and video presentation components such as a camera (for video capture) and/or a display (for video
- the audio subsystem 10 comprises a control apparatus 20 for controlling the other components of the audio subsystem 10.
- the control apparatus 20 may be coupled to the ADC 14 for receiving digital signals representing audio signals from the microphone 11 and/or to the DAC to provide digital for presentation to the user via the speaker.
- the control apparatus may comprise processing apparatus 21 coupled with memory 22.
- the processing apparatus 21 may be configured to execute various program codes.
- the implemented program codes may comprise for example audio recording and audio presentation routines.
- the program codes may be configured to perform audio signal processing.
- the control apparatus 20 may, in some examples, be referred to as audio signal processing apparatus 20.
- the memory 22 may be any suitable storage means.
- the memory 22 may comprise a program code section 23 for storing program codes implementable using the processing apparatus 21.
- the memory 22 may further comprise a stored data section 24 for storing data, for example data that has been encoded in accordance with the application or data to be encoded via the application embodiments as described later.
- the implemented program code stored within the program code section 23, and the data stored within the stored data section 24 can be retrieved by the processing apparatus 21 whenever needed via the memory-processor coupling.
- the electronic apparatus 10 may comprise a user interface 15.
- the user interface 15 may be coupled in some embodiments to the processing apparatus 21.
- the processing apparatus 21 may control the operation of the user interface 15 and receive inputs from the user interface 15.
- the user interface 15 may enable a user to input commands to the electronic apparatus or apparatus 10, for example via a keypad, and/or to obtain information from the apparatus 10, for example via a display which is part of the user interface 15.
- the user interface 15 may in some embodiments as described herein comprise a touch screen or touch interface capable of both enabling information to be entered to the electronic apparatus 10 and further displaying information to the user of the device 10.
- the apparatus further comprises a transceiver 13, the transceiver in such embodiments may be coupled to the processing apparatus 21 and configured to enable a communication with other electronic apparatuses, for example via a wireless communications network.
- the transceiver 13 or any suitable transceiver or transmitter and/ or receiver means may be configured to communicate with other electronic apparatuses via a wire or wired coupling.
- the transceiver 13 may be configured to communicate with further apparatus by any suitable known communications protocol, for example in some embodiments the transceiver 13 or transceiver means may use a suitable universal mobile
- UMTS telecommunications system
- WLAN wireless local area network
- IRDA infrared data communication pathway
- Figure 6 illustrates an example of an electronic apparatus 1, such as that described with reference to Figure 1, which comprises a front face 301 comprising a camera 51, a rear face 303 and a top edge or face 305.
- the audio subsystem 10 of the apparatus 1 comprises four microphones.
- the apparatus 1 comprises a first (front right) microphone Hi located at the front right side of the apparatus 1 (where right is towards the top edge of the front face 301 of the apparatus 1), a front left microphone n 3 located at the front left side of the apparatus 1, a right high microphone n 2 located at the top edge or face side of the apparatus 1, and a left rear microphone n 4 located at the left rear side of the apparatus 1.
- the microphones 11 are shown as part of the apparatus 1, it should be understood that, in some embodiments, the microphone array may be physically separate from the apparatus 1.
- the microphone array can be located on a headset which wirelessly or otherwise passes the audio signals to the audio processing apparatus 20 for processing.
- the microphones may be located at a different location to the audio signal processing apparatus 20 and/ or the processing of the captured signals may be carried out at an unspecified time after their capture.
- the processing may be performed by a web server while the signals may be captured by a user device such as a mobile phone.
- Figure 2 is a schematic functional illustration of the audio processing apparatus 20 according to some embodiments.
- Figure 3 is a flow diagram illustrating various operations which may be performed by the audio processing apparatus as shown in Figure 2.
- the apparatus 10 is configured to receive digital representations of audio signals captured by at least two microphones 11.
- Two microphones may be used, for instance, when the one or more sound source(s) are located approximately within an arc of 180 degrees relative to the array. In examples, in which the one or more sound source(s) are located at any location surrounding the microphone array (e.g. in a 360-degree arc), three or more microphones may be more suitable. If it is desired to determine a direction of the audio sources anywhere in three-dimensions surrounding the array, at least four microphones may be appropriate.
- the digital representations of audio signals captured by the at least two microphones 11 may be received directly from the microphones 11 (in the case of "integrated
- microphones or via the ADC 14 in the case of microphones which output an analogue audio signal.
- the audio signal processing apparatus may in some examples comprise a short time Fourier transformer (STFT) 101 for transforming received time domain audio signals into the frequency domain.
- STFT short time Fourier transformer
- the digital representations x m (t) of the audio signals captured by the microphone array are received by the STFT 101.
- the operation of receiving the microphone input audio signals is shown in Figure 3 by step 201.
- the microphone array 11 can be considered to capture in the time domain the sound or audio sources which have been convolved with their spatial responses. This can be mathematically modelled or described as:
- T is an index for the convolution with the spatial response h.
- the STFT 101 is configured to perform a short time Fourier transform on the captured audio signals.
- the STFT 101 may be configured to calculate the STFT of a time domain signal by dividing the captured audio signals into small overlapping windows, applying the window function and taking the discrete Fourier transform (DFT) of it.
- DFT discrete Fourier transform
- the “mixing” model can be approximated in the STFT-domain as:
- n i...Nis the frame index
- Sf,n, P is the denotation of the single-channel STFT of each source p
- y/, ⁇ , ⁇ is the denotation of the STFT of the reverberated source signals.
- the STFT of the array capture x / , n may be described as the sum of the STFTs of the reverberated individual source signals y/, ⁇ , ⁇ ⁇
- step 301 The operation of transforming the time domain signals into the frequency domain is shown in Figure 3 by step 301.
- the output of the STFT 101, Xf_ n (the frequency domain form of the audio signals) is provided to a discrete spatial distribution calculator 102.
- the audio signal processing apparatus comprises discrete spatial distribution calculator 102.
- the discrete spatial distribution calculator 102 is configured to calculate a discrete spatial distribution of the captured audio signals on the basis of the received frequency domain form of the audio signals.
- Each individual spatial energy distribution value z n ,o for a particular direction o and time frame n may be referred to a spatial weight for that direction and time frame.
- the observed spatial energy z n ,o may calculated using a steered response power (SRP) algorithm with a phase transform (PHAT) weighting (which may be referred to as the Steered Response Power - Phase Transform algorithm
- the observed spatial energy z n ,o may be mathematically modelled or be described as:
- Equation 3 where: ⁇ 0 (m 1( m 2 ) is the difference between times of arrival of sounds at a pair of microphones (m 1 ,m 2 ).
- the discrete spatial distribution calculator 102 may utilise knowledge of the array geometry in order to calculate the time it takes sound to arrive from direction o to microphone m, ⁇ 0 (m). This is illustrated in Figure 2 as an input to the discrete spatial distribution calculator.
- Equation 3 all sound sources may be assumed to be far field. Alternatively, the sound sources may be assumed to be at a fixed distance from the center of the microphone array (which may be the center of the electronic apparatus 1 depicted in Figure 4), for instance 2m.
- the output z n o of the discrete spatial distribution calculator 102 is provided to a direction-of- arrival estimator 103.
- the output z n o of the discrete spatial distribution calculator 102 may be referred to as the signal energy for each direction 1...0 and time instant 1...N.
- the audio signal processing apparatus further comprises a direction-of-arrival estimator 103.
- the direction-of-arrival estimator 103 is configured to convert the discrete spatial distribution z n 0 into multiple direction-of-arrival (DOA) measurements.
- the direction of arrival measurements may be made up of associated sets of
- the conversion of the discrete spatial distribution z n o into multiple direction-of-arrival (DOA) measurements may, in some examples, be performed by estimating parameters of a wrapped Gaussian mixture model (WGMM) for each time-frame.
- the wrapped Gaussian mixture model may be defined as:
- ⁇ is an estimated mean angle of the Gaussian distribution model
- o 2 is the variance associated with the mean angle
- ⁇ ( ⁇ ; ⁇ ] ⁇ , ⁇ ) is a probably distribution function of a regular Gaussian distribution of with the same mean and variance;
- Kis a predefined constant that may typically between three and five; and ⁇ is a weight associated with the mean angle.
- the direction-of-arrival estimator 103 may be configured to estimate the
- the mean angle ⁇ ⁇ , ⁇ , the variance in the angle a n ,k and the weight a n ,k may be considered as permutated measurements and may be obtained as results of the following optimization problem:
- 0 O corresponds to the azimuth angle of direction o
- Equation 5 The problem of equation 5 is a WGMM parameter estimation in one dimension with observed discretized distribution.
- the minimization of Equation 5 is a conventional non-linear, least squares problem and numerous methods for solving its parameters exist in literature. Such methods include that described in R. H. Byrd, R. B. Schnabel, and G. A. Shultz, "A trust region algorithm for nonlinearly constrained optimization,” SIAM Journal on Numerical Analysis, vol. 24, no. 5, pp. 1152-1170, 1987.
- the WGMM parameters may be referred to as direction-of-arrival measurements.
- the direction of arrival measurements output by the direction of arrival estimator 103 may include the mean and variance of each tracked source for each time instant.
- step 303 The operation of estimating the direction-of-arrival measurements based on the discrete spatial distribution is shown in Figure 3 by step 303.
- Figures 4A to 4C are graphical illustrations of estimated direction-of-arrival
- Figure 4B is a graph of the time in seconds (on the x-axis) against the mean angle ⁇ (on the y-axis) for all time frames.
- Figure 4C is derived from the graph of Figure 4B but has had all the means angles ⁇ ⁇ 3 ⁇ 4 having the standard deviation o n k greater than 1 radian (approximately 57 degrees) and the weighting n fe less than 0.005 filtered out. This serves to remove the "clutter" of false measurements (put another way, to "declutter” the measurements) before a tracking algorithm is applied to the results. This clutter may result from the modelling of the noise in the SRP algorithm.
- the direction-of-arrival estimator 103 may be configured to
- the direction-of-arrival estimator 103 may be configured to remove measurements having a standard deviation o n k above a particular threshold value and/or a weight below another threshold value.
- the thresholds may be determined experimentally.
- the weight threshold may be relatively universal, but may depend on the level of background noise, e.g. SNR of the target to be tracked. Typical values for the weight threshold may be from o.i to o.ooi.
- the threshold for the standard deviation may be determined in dependence on, for example, the array geometry (spatial resolution and how sharp the peaks in SRP-PHAT are). Typical values for the standard deviation threshold may be from 5 degrees to 6o degrees (approximately o.i to l radians) (in other words, the spatial window of how "wide" peaks are to be considered as non-clutter
- the decluttering of the direction-of arrival measurements may be performed by the acoustical tracker 104, which also forms part of the audio signal processing apparatus 20.
- the acoustical tracker 104 receives the direction-of-arrival measurements from the direction of arrival measurer 103. As discussed above, the received direction of arrival measurements may be "cluttered” or may have been “decluttered” by the direction of arrival estimator 103.
- the acoustical tracker 104 is configured to track acoustical sources in each time frame based on the received direction-of-arrival measurements. The acoustical tracking is performed after decluttering of the direction-of-arrival measurements.
- the acoustical tracker 104 converts the wrapped one dimensional angle measurements (jn n ,k) to a two-dimension point on a unit circle y ⁇ ), where superscript (k) denotes multiple measurements within one time frame n.
- This may be performed using a rotating vector model such as described in H. Nies, O. Loffeld, andR. Wang, "Phase unwrapping using 2d-kalman filter-potential and limitations," in Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008. IEEE International, vol. 4. IEEE, 2008, pp. IV-1213.
- the conversion may serve to linearize the measurement model matrix and the state transition.
- the conversion may be performed using the following expression:
- a dynamic state-space model may be used by the acoustical tracker 104.
- the state of each source is considered as a 2-D point on a unit circle.
- a constant velocity model may be used.
- the underlying state of the dynamical system may be defined by:
- Equation 7 where: x is the state of each tracked source at a time frame n;
- x and y are the x and y coordinates of the tracked source
- x and y are velocities along x-axis and y-axis, respectively of the tracked source
- the acoustical tracker 104 may be configured to particle filter the converted measured angles to detect or identify the sound sources present in the observed data. The acoustical tracker 104 then associates particular measured angles with a particular detected sound source. Alternatively, if none of the active source particle distributions (the spatial distribution of an active source) indicates a probability of a current measurement belonging to the active source that is higher than the clutter prior probability, then the measurement is labeled as clutter.
- the clutter prior probability is a fixed pre-set value intended to validate when (the probability being above the clutter prior threshold) the current measurement is linked to existing source. A typical value for the clutter prior probability may be, for instance 0.15.
- the acoustical tracker 104 may be configured to track the multiple targets (or sound sources) using Rao-Blackwellized particle filtering for instance as described in either of S. Sarkka, A. Vehtari, and J. Lampinen, "Rao-Blackwellized particle filter for multiple target tracking” (Information Fusion, vol. 8, no. 1, pp. 2-15, 2007) or "Rao- Blackwellized Monte Carlo data association for multiple target tracking,"
- the output of the particle filtering is the state of each detected audio source p at each time frame, denoted by x curb .
- the acoustical tracker 104 may be configured to model internally the estimated state x ⁇ using Kalman filtering thereby to provide a reliability measure (d n p ) for each state estimation x ⁇ .
- the acoustical tracker 104 may be further configured to extract the direction-of-arrival from the tracked source state x ⁇ . This may be achieved by calculating the angle of the vector (the azimuth angle) defined by the two-dimensional points. This may be obtained as: fl n>p ⁇ - t n2( n (2)/ n (l))
- Equation 8 where: fl n p is the estimated direction of source p at time frame n;
- x n (2) is the y coordinate of the object location
- Xn(i) is the x-coordinate of the object location.
- the estimated direction fl n p of source p at time frame n may be the output of the acoustical tracker 104.
- the output of the acoustic tracker 104 may be said to be the number and direction of tracked audio source for each time instant 1...N.
- the acoustical tracker 104 may also be configured to output the standard deviation ⁇ ⁇ ⁇ associated with the estimated direction fl n p .
- step 304 The operation of the acoustical tracker 14 in tracking the audio sources in the captured audio signals is shown in Figure 3 by step 304.
- Figure 4D shows the output of the acoustical tracker 104 converted back into the azimuth angle using Equation 8. As can be seen, the acoustical tracker 104 has
- the audio signal processing apparatus 20 further comprises a spatial covariance processing module 105.
- This may be configured to receive the output from the acoustical tracker 104 and to restore the spatial weights (or spatial energy distribution values) z n o p for the detected sound sources and their directions in each time frame.
- the spatial weights z n>0>p may be restored by another functional unit/module, for instance but not limited to the acoustical tracking module 104. Restoration of spatial weights z n>0>p may be performed using the wrapped Gaussian distribution model defined in Equation 4.
- the weighting factor ⁇ is assumed to 1.
- each source direction at each time frame is considered to consist of a single wrapped Gaussian.
- the spatial weights for the detected sound sources are given by:
- Equation 10 where: ⁇ ⁇ ⁇ is the direction-of-arrival of sound from each detected source at each time frame (extracted from the tracked source state calculated by the acoustical tracker 104); and
- ⁇ 3 ⁇ 4p is the variance for the extracted direction of arrival for each detected source at each time frame.
- the spatial covariance processing module 105 is configured to perform spatial covariance on the spatial weights z n o p to produce time-varying source spatial covariance matrices (SCMs) for each time instant 1...N.
- SCMs time-varying source spatial covariance matrices
- the time-varying source SCMs produced by the spatial covariance processing module 105 may be defined as:
- H ,n,p are SCMs of the frequency domain time-dependent room impulse responses (RIRs).
- W 0 is a direction of arrival kernel for a particular pair of microphones mi, m 2 which are utilised to capture the audio signals.
- Equation 11 The direction of arrival kernel Wf i0 in the expression of Equation 11 may be defined as:
- fj is the frequency off h discrete Fourier Transform bin index
- Tko(mi,m2) denotes the time difference of arrival (time delay) between two microphones caused by a source at direction k a .
- the direction k 0 in the above expression is defined as a vector pointing towards a direction parameterized by azimuth ⁇ 0 e [ ⁇ ,2 ⁇ ] and elevation ⁇ 0 6 [ ⁇ , ⁇ ] originating from the geometric center of the microphone array.
- the direction vectors k a indexed by o would sample the space around the array approximately uniformly.
- the sources of interest lie approximately on the x -plane with elevation being zero such that all the direction vectors k 0 differ only by their azimuth.
- the directional statistics used in the acoustical tracking of the sound sources simplifies to a univariate case when the sampling of the spatial space around the array is by azimuthal information only.
- the acoustical tracking described above may be performed along all three axes (x , y, z). In such examples, it is straightforward to define vectors k 0 which differ in terms of both their azimuth and elevations.
- Tko(mi,m2) in Equation 12 can be obtained relatively easily, for instance as specified in J. Nikunen and T. Virtanen, "Direction of arrival based spatial covariance model for blind sound source separation" (IEEE Transactions on Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727-739, 2014).
- the direction of arrival kernel Wf i0 of Equation 12 simply results from converting the time difference of arrival Tk 0 (mi,m2) to a phase difference.
- the direction of arrival kernel W f 0 may be pre-stored by spatial covariance processing module 105, or alternatively may be determined as required based on received information defining the geometry of the microphone array which has been used to capture the audio signals.
- the operation of the spatial covariance processing module 105 to produce time-varying spatial covariance matrices (SCMs) for each source for each time instant 1...N is illustrated in Figure 3 by operation 305.
- the time-varying spatial covariance matrices of the detected sources generated by the spatial covariance processing module 105 may be passed to the parameter estimator 106, which may form part of the audio signal processing apparatus 20.
- the parameter estimator 106 is configured to utilize the received time-varying spatial covariance matrices of the detected sources to estimate spectrogram parameters for use by a source separator 107 in separating the microphone array capture into their constituent sources.
- the parameter estimation may include estimating the non-negative matrix
- NMF factorization
- one NMF component q represents a single spectrally repetitive event from a mixture of audio sources which is observed by the signals captured by the microphone array 11.
- One audio source is modeled as a sum of multiple NMF components q.
- the parameter estimator 106 may be configured also to utilize the frequency domain form of the array capture x fin (as output by the STFT 101) when estimating the spectrogram parameters for each source.
- the parameter estimator 106 may utilize the square rooted version of the frequency domain form of the array capture Xf_ n to produce a spatial covariance matrix (X n 6 C x ) for each time-frequency point. This may be calculated as follows:
- Xf,n 3 ⁇ 4 Xf,n ⁇ ⁇ W 0 1 ⁇ 2,o,p ⁇ bq,p tf,q v q,n
- Equation 16 may be referred to as the complex non-negative matrix factorization (CNMF) model.
- the parameter estimator 106 may be configured to estimate the spectrogram parameters b q p tf q v q n using the derived CNMF model of equation 16.
- the estimation may be an iterative process which serves to iteratively optimize the parameters.
- the optimization may be performed using an assumption that H ,n,p is set externally (based on z n>0>p calculated during the acoustical tracking) and remains constant during the parameter estimation process.
- the parameter estimator 306 may be configured to obtain multiplicative updates for estimating the optimal spectrogram parameters in an iterative manner by partial derivation of the total modeling criterion (or, put another way, the cost function).
- the multiplicative updates for finding the optimal parameters of the CNMF model of Equation 16 may be performed using an optimization criterion (cost-function) of squared Frobenius norm or Itakura-Saito divergence.
- the parameter estimator 306 may additionally use auxiliary variables, for instance as described in the expectation maximization algorithm of A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm” (Journal of the Royal Statistical Society, vol. 39, no. 1, pp. 1-38, 1977).
- spectrogram parameters may in some examples be the squared Frobenius norm, which can be described Accordingly, the update equations for the non-negative parameters as may be utilized by the parameter estimator may be described as:
- the estimated spectrogram parameters may be output by the parameter estimator 106 to the source separator 107, which may also form part of the audio signal processing apparatus 20.
- the estimation of the spectrogram parameters is illustrated in Figure 3 as operation 307.
- the estimation of the spectrogram parameters by the parameter estimator 306 is described in more detail below with respect to the flow chart of Figure 5 ⁇
- the source separator 107 may be configured to separate the array capture x m (more specifically the short-time Fourier transform (STFT) of the array capture x f n ) into individual sources on the basis of the received spectrogram parameters b q p tf q v q n (which define the real value magnitude spectrogram, s/ ,n,P )- This may be performed using a Wiener filter.
- STFT short-time Fourier transform
- the source separator 107 may, in some examples, also utilise the SCMs of the frequency domain time-dependent room impulse responses ⁇ Hf n p ), for instance, as estimated or otherwise defined by the spatial covariance processing module 105.
- source separator 107 may be configured to obtain the reverberated source signals y/, ⁇ , ⁇ m the frequency domain by using multichannel Wiener filtering, as follows:
- Sf n p is the real value magnitude spectrogram defined by the spectrogram
- Hf,n,p represents the SCMs of the frequency domain time-dependent room impulse responses
- Xf in is the frequency domain transform of the array capture.
- the source separator 107 may be configured to obtain the reverberated source signals y/,n,p-by using plain magnitude-based Wiener filtering, as follows:
- the source separator 107 may output the separated source signals in the frequency domain (i.e. y/, ⁇ , ⁇ ) ⁇
- the audio signal processing apparatus 20 may comprise a spatial synthesiser 108.
- the spatial synthesiser 108 may be configured to receive the output of the source separator 107 and to regenerate the source signals. This may be performed using an inverse short-time Fourier transformer 108-1 (iSTFT) for applying a short-time Fourier transform to the frequency domain separated source signals, thereby to transform them back into the time domain.
- iSTFT inverse short-time Fourier transformer 108-1
- the iSTFT may be configured to perform the inverse operation to that performed by the STFT 101.
- the spatial synthesiser 108 may form part of a different device or apparatus to that which analyses and separates the array capture into its constituent sources.
- the operation of the spatial synthesiser 108 to regenerate the source signals is illustrated in Figure 3 by operation 308.
- the output of the spatial synthesiser 108 may be provided for user consumption via a loudspeaker array 33 or a pair of headphones/headset having binaural rendering capabilities.
- estimation of the spectrogram parameters which defines the source spectrogram is an iterative process.
- An example of this iterative estimation process 306 is illustrated in the flow chart of Figure 5.
- the spectrogram parameter estimator 106 initializes the spectrogram parameters b q p tf A v q>n .
- the spatial weights of the additional background source may be set to one when ⁇ p 3 ⁇ 4 0 p ⁇ threshold and set to zero otherwise.
- the threshold may be experimentally determined to allow the detected and tracked sources to capture all spatial evidence within +-30 degrees from their estimated mean. With a background modeling strategy such as this, the detected tracked sources have exclusive priority to model all spatial evidence originating around the tracked mean, with exception of two direction-of-arrival trajectories intersecting in which case both sources are active at the same direction indices.
- the spatial weights z n o p calculated in operation 302 are subsequently utilized to optimize the spectrogram parameters.
- spectrogram parameters b q p t f q v q n as initialized in operation 306-1 and the spatial weights z n>0>p calculated in operation 302.
- X n may be referred to as the complex-valued NMF model of the observed spatial covariance matrices.
- a first of the spectrogram parameters (in this example, b q p - the soft decision of NMF component q belonging to source p) is updated using equation 17 ⁇
- the spectrogram parameter estimator 106 After updating the first of the spectrogram parameters, the spectrogram parameter estimator 106 (in operation 306-4) once again calculates Xf >n using equation 16. This time, however, the calculation is performed with the first spectrogram parameter having its updated value.
- the second and third spectrogram parameters e.g.
- tf A and v q n still have the values as initialized in operation 306-1 and the spatial weights z n o p remain as calculated in operation 302. Subsequently, in operation 306-5, a second of the spectrogram parameters (in this example, t - the magnitude spectrum of one NMF component q for each frequency bin f) is updated using equation 18.
- the spectrogram parameter estimator 106 (in operation 306-6) once again calculates X fin using equation 16. This time, however, the calculation is performed with the first and second spectrogram parameters having their updated values.
- the third spectrogram parameter (e.g. v q n ) still has its value as initialized in operation 306-1 and the spatial weights z n o p remain as calculated in operation 302.
- the third of the spectrogram parameters (in this example, v q n - the gain of the NMF component in each frame n) is updated using equation 19.
- the spectrogram parameter estimator 106 (in operation 306-8) repeats operations 306-2 to 306-7 for a
- the spectrogram parameter estimator 106 once again calculates X fin using equation 16. This time, however, the calculation is performed with the all three spectrogram parameters having their updated values. The spatial weights ⁇ ⁇ >0, ⁇ remain as calculated in operation 302. After the predetermined number of iterations have been performed, the spectrogram parameter estimator 106 proceeds to operation 306-9 and outputs the estimated spectrogram parameters for use by the source separator 107.
- the number of iterations may be between, for instance, 50 and 1000 and may depend the duration of the portion of audio signal that is currently being processed.
- the control (or audio signal processing) apparatus 20 comprises processing apparatus 21 communicatively coupled with memory 22.
- the memory 2 has computer readable instructions 23 stored thereon, which when executed by the processing apparatus 21 causes the processing apparatus 21 to cause performance of various ones of the operations described with reference to Figures 1 to 6.
- the control apparatus 20 may in some instance be referred to, in general terms, as "apparatus".
- the processing apparatus 21 may be of any suitable composition and may include one or more processors 21A of any suitable type or suitable combination of types.
- the processing apparatus 21 may be a programmable processor that interprets computer program instructions 23 and processes data.
- the processing apparatus may include plural programmable processors.
- the processing apparatus 21 may be, for example, programmable hardware with embedded firmware.
- the processing apparatus 21 may be termed processing means.
- the processing apparatus 21 may be termed processing means.
- processing apparatus 21 may alternatively or additionally include one or more Application Specific Integrated Circuits (ASICs).
- ASICs Application Specific Integrated Circuits
- processing apparatus 21 may be referred to as computing apparatus.
- the processing apparatus 21 is coupled to the memory (or one or more storage devices)
- the memory 22 may comprise a single memory unit or a plurality of memory units, upon which the computer readable instructions (or code) 23 is stored.
- the memory 22 may comprise both volatile memory and non-volatile memory.
- the computer readable instructions/program code 23 may be stored in the non-volatile memory and may be executed by the processing apparatus 21 using the volatile memory for temporary storage of data 24 or data and instructions.
- volatile memory include RAM, DRAM, and SDRAM etc.
- Examples of non-volatile memory include ROM, PROM, EEPROM, flash memory, optical storage, magnetic storage, etc.
- the memories in general may be referred to as non-transitory computer readable memory media.
- the term 'memory' in addition to covering memory comprising both non-volatile memory and volatile memory, may also cover one or more volatile memories only, one or more non-volatile memories only, or one or more volatile memories and one or more non-volatile memories.
- the computer readable instructions/program code 23 may be pre-programmed into the control apparatus 20. Alternatively, the computer readable instructions 23 may arrive at the control apparatus 20 via an electromagnetic carrier signal or may be copied from a physical entity such as a computer program product, a memory device or a record medium such as a CD-ROM or DVD.
- the computer readable instructions 23 may provide the logic and routines that enables the devices/apparatuses 1, 10, 20 to perform the functionality described above.
- the combination of computer-readable instructions stored on memory (of any of the types described above) may be referred to as a computer program product.
- the apparatus/device 1 and or subsystem 10 described herein may include various hardware components which have may not been shown in the Figures.
- the apparatuses 1, 20 may comprise further optional software components which are not described in this specification since they may not have direct interaction to embodiments of the invention.
- Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
- the software, application logic and/or hardware may reside on memory, or any computer media.
- the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
- a "memory” or “computer-readable medium” may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
- Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing apparatus” etc. should be understood to encompass not only computers having differing architectures such as single/multi -processor architectures and sequencers/parallel architectures, but also specialised circuits such as field
- programmable gate arrays FPGA field-programmable gate arrays
- ASIC application specify circuits ASIC
- signal processing devices and other devices References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array,
Landscapes
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Abstract
Description
Claims
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2016/051709 WO2017129239A1 (en) | 2016-01-27 | 2016-01-27 | System and apparatus for tracking moving audio sources |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3409025A1 true EP3409025A1 (en) | 2018-12-05 |
Family
ID=55357959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP16704399.1A Withdrawn EP3409025A1 (en) | 2016-01-27 | 2016-01-27 | System and apparatus for tracking moving audio sources |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3409025A1 (en) |
WO (1) | WO2017129239A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10733276B2 (en) * | 2017-12-08 | 2020-08-04 | Cirrus Logic International Semiconductor Ltd. | Multi-microphone human talker detection |
EP3503592B1 (en) | 2017-12-19 | 2020-09-16 | Nokia Technologies Oy | Methods, apparatuses and computer programs relating to spatial audio |
GB2573537A (en) | 2018-05-09 | 2019-11-13 | Nokia Technologies Oy | An apparatus, method and computer program for audio signal processing |
CN109188362B (en) * | 2018-09-03 | 2020-09-08 | 中国科学院声学研究所 | Microphone array sound source positioning signal processing method |
CN111986692B (en) * | 2019-05-24 | 2024-07-02 | 腾讯科技(深圳)有限公司 | Sound source tracking and pickup method and device based on microphone array |
CN110459236B (en) * | 2019-08-15 | 2021-11-30 | 北京小米移动软件有限公司 | Noise estimation method, apparatus and storage medium for audio signal |
US11678111B1 (en) * | 2020-07-22 | 2023-06-13 | Apple Inc. | Deep-learning based beam forming synthesis for spatial audio |
CN111933182B (en) * | 2020-08-07 | 2024-04-19 | 抖音视界有限公司 | Sound source tracking method, device, equipment and storage medium |
WO2022042864A1 (en) | 2020-08-31 | 2022-03-03 | Proactivaudio Gmbh | Method and apparatus for measuring directions of arrival of multiple sound sources |
CN114333831A (en) * | 2020-09-30 | 2022-04-12 | 华为技术有限公司 | Signal processing method and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7394907B2 (en) * | 2003-06-16 | 2008-07-01 | Microsoft Corporation | System and process for sound source localization using microphone array beamsteering |
GB2517690B (en) * | 2013-08-26 | 2017-02-08 | Canon Kk | Method and device for localizing sound sources placed within a sound environment comprising ambient noise |
-
2016
- 2016-01-27 WO PCT/EP2016/051709 patent/WO2017129239A1/en active Application Filing
- 2016-01-27 EP EP16704399.1A patent/EP3409025A1/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
WO2017129239A1 (en) | 2017-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3409025A1 (en) | System and apparatus for tracking moving audio sources | |
US9788119B2 (en) | Spatial audio apparatus | |
EP3320692B1 (en) | Spatial audio processing apparatus | |
KR101724514B1 (en) | Sound signal processing method and apparatus | |
CN109804559B (en) | Gain control in spatial audio systems | |
Sun et al. | Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing | |
JP5814476B2 (en) | Microphone positioning apparatus and method based on spatial power density | |
EP2628316B1 (en) | Apparatus and method for deriving a directional information and computer program product | |
TWI530201B (en) | Sound acquisition via the extraction of geometrical information from direction of arrival estimates | |
CN105981404B (en) | Use the extraction of the reverberation sound of microphone array | |
RU2685053C2 (en) | Estimating room impulse response for acoustic echo cancelling | |
WO2017064368A1 (en) | Distributed audio capture and mixing | |
CN110379439B (en) | Audio processing method and related device | |
US20100278357A1 (en) | Signal processing apparatus, signal processing method, and program | |
US20140169576A1 (en) | Spatial interference suppression using dual-microphone arrays | |
EP2984852A1 (en) | Audio apparatus | |
CN106872945B (en) | Sound source positioning method and device and electronic equipment | |
Epain et al. | Independent component analysis using spherical microphone arrays | |
Huleihel et al. | Spherical array processing for acoustic analysis using room impulse responses and time-domain smoothing | |
CN111819862B (en) | Audio encoding apparatus and method | |
Blochberger et al. | Particle-filter tracking of sounds for frequency-independent 3D audio rendering from distributed B-format recordings | |
Alexandridis et al. | Towards wireless acoustic sensor networks for location estimation and counting of multiple speakers in real-life conditions | |
US20240236595A9 (en) | Generating restored spatial audio signals for occluded microphones | |
Huang et al. | Time delay estimation and acoustic source localization | |
Chern et al. | Voice Direction-Of-Arrival Conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20180817 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: NOKIA TECHNOLOGIES OY |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20200513 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20200924 |