GB2492162A

GB2492162A - Cleaning audio signals in video data captured from a mobile phone

Info

Publication number: GB2492162A
Application number: GB201110740A
Authority: GB
Inventors: Christopher James Mitchell
Original assignee: Audio Analytic Ltd
Current assignee: Audio Analytic Ltd
Priority date: 2011-06-24
Filing date: 2011-06-24
Publication date: 2012-12-26
Anticipated expiration: 2031-06-24
Also published as: GB2492162B; GB201110740D0

Abstract

This invention relates to systems, methods, and computer program code for processing audio signals, in particular for cleaning audio signals in video data capturedfrom a mobile device such as a mobile phone. We describe a method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.

Description

Audio Signal Processing Systems

FIELD OF THE INVENTION

This invention relates to systems, methods, and computer programme code for processing audio signals, in particular for cleaning audio signals in video data captured from a mobile device such as a mobile phone.

BACKGROUND TO THE INVENTION

It is common place to capture video data, including sound, from mobile devices such as mobile phones, for example for uploading to websites such as YouTube (trade mark).

However the quality of this material is often relatively low and it would be advantageous to be able to improve the quality of captured video either at a mobile device or, more particularly, on a computer such as a server.

With this aim in mind, the inventors have conducted research into the properties of captured video, and in particular the audio component of this video data, and have made an interesting observation: The audio data captured by a mobile phone is typically stereo audio data but, unlike for example digital cameras, the microphones in a mobile phone are often well separated, for example at opposite ends of the device.

This, in combination with the manner in which mobile phones are often held, has been observed to result in a particular surprising consequence, namely that wind noise tends to be observed on either one microphone or the other but not often both. It is speculated that this may be a result of the combination of microphone positioning, the manner in which the devices are held, and the way in which air eddies swirl around the device.

The inventors have been able to exploit these observations to provide improved techniques for processing video data captured from mobile phones.

SUMMARY OF THE INVENTION

According to the present invention there is therefore provided a method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels; and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.

In embodiments the stereo audio data is processed to label each channel with data representing a probability that the audio on the channel comprises greater than a threshold level of wind noise. This results in two time series of label data, one for each channel, which may in embodiments comprise a string of is and Os representing the presence/absence of wind respectively. These may then be combined to determine regions where wind noise is present on one or other of the channels, on both, or on neither. Where wind noise is present on one channel but not the other, the channel on which the noise is present may be partially or wholly replaced by audio from the other channel; where noise is present on both channels in embodiments no action is taken.

The replacing or partial replacing of audio in one channel with audio from the other preferably also comprises locally realigning sections of the audio to mitigate phasing effects when switching between the channels. This may be achieved, for example, by cross-correlation of the audio in the two channels and then time shifting the data to be in phase. One channel may then be replaced by another and/or the channels may be partially mixed.

In some embodiments rather than a simple wind presentlwind absent detection the audio is scored and labelled with probability data representing a probability that a segment (successive segments) of the audio comprise wind noise. The audio channels may then be mixed dependent upon this probability data. For example the proportion of a channel in the mix may be inversely proportional to the probability that the audio in that channel comprises wind noise.

Additionally or alternatively the detection of wind noise in a channel may comprise identifying a shared or coherent portion of audio in the audio for the left and right channels, to separate the wind noise from the remainder of the audio. Broadly speaking wind noise has less coherency as compared with the target, desired audio, and thus by selecting audio with greater coherency, and/or rejecting incoherent audio, the wind noise can be separated from the desired sound. The audio may then be processed, for example as described above, to increase a proportion of the shared or coherent portion of audio on a channel in which the wind noise is detected.

A further technique which may be applied, separately or in combination to the above described techniques, relies on determining the temporal envelope of the audio on a channel. Broadly speaking wind noise tends to have a time series envelope comprising an initial period of rapid attack (spike), followed by a longer decaying tail. Thus the envelope is asymmetric in time, over a period of typically of order 0.05 -5 seconds, for example of order 0.5 seconds. Thus embodiments of the method may further detect an asymmetrical temporal envelope of the audio on a channel to detect the wind noise, for example by calculating an envelope of the signal and then calculating a rate of change with time of this envelope data.

Further additionally or alternatively, detection of the wind noise may employ spectral filtering/matching. It is observed that wind noise tends to be brown noise with a low frequency rumble component. Thus wind noise detection may employ low pass filtering and thresholding to identify the presence or absence of this low frequency noise. A further observed feature of wind noise is that over short time periods, for example of order 10 ms (say 1 ms to 1 second) the energy in some frequency bands, for example less than 200 Hz, tends to be approximately invariant from one audio frame or segment (for example of order 10 ms in length) to the next. Thus detection of wind noise may further comprise detecting continuity of an energy level in at least two frequency bands of audio in the channel.

This skilled person will appreciate that, in embodiments, a number of the above described techniques may optionally be combined.

As previously mentioned, embodiments of the above described techniques are particularly advantageous when applied to a mobile phone or, more generally, a mobile device in which the left and right microphones are separated by more than 2 cm, 3 cm or 5 cm.

In embodiments the method includes decoding the stereo audio data from video data, then applying the above described technique to detect wind noise features in the left/right audio. This detection is then used to clean the stereo audio, then re-encoding the audio data into a combined audio/video data stream comprising wind-noise-mitigated audio.

One audio encoding/decoding technique which may be employed is AAC (Advanced Audio Coding) which, broadly speaking, involves time-to-frequency domain mapping.

With such an encoding technique one or both of the wind noise detection and the processing of the audio to mitigate the wind noise may be performed on the encoded data, more particularly without all decoding of the audio. Thus, for example, the frequency domain data may be employed to detect wind noise and/or wind noise may be mitigated by substituting encoded audio frames in the left/right channel data.

The invention further provides processor control code to implement the above-described systems and methods, for example on a general purpose computer system or on a digital signal processor (DSP). The code is provided on a physical data carrier such as a disk, CD-or DVD-ROM, programmed memory such as non-volatile memory (eg Flash) or read-only memory (Firmware). Code (and/or data) to implement embodiments of the invention may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code. As the skilled person will appreciate such code and/or data may be distributed between a plurality of coupled components in communication with one another.

In a related aspect the invention provides an audio signal processing system, the system comprising: an audio input to receive encoded stereo audio data; an audio decoder to decode said stereo audio data to provide left and right audio channel data for left and right audio channels of said stereo audio; an audio feature detector coupled to said audio decoder to process said stereo audio data to detect wind noise in each of said left and right channels, and to output left and right wind noise data for each of said channels, wherein said left and right wind noise data comprises time series data defining times when wind noise is predicted to be present in said left and right audio channels respectively; an audio signal cleaning module, coupled to said audio decoder and to said audio feature detector, to process said left and right audio channel data responsive to left and right wind noise data identifying wind noise in a respective channel, to mitigate wind noise in one said channel using audio channel data from the other of said channels, to provide cleaned left and right audio channel data; and an audio encoder, coupled to said audio signal cleaning module, to re-encode said cleaned left and right audio channel data to provide a cleaned encoded stereo audio data output.

Embodiments of the above described audio signal processing system may be implemented either on a mobile device such as a mobile phone or on a server.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will now be further described, by way of example only, with reference to the accompanying figures in which: Figure 1 shows a block diagram of a procedure for cleaning a stereo audio track of video captured from a mobile device, according to an embodiment of the invention; and Figure 2 shows a block diagram of an audio signal processing system according to an embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Referring to Figure 1, this shows a mobile device 100 comprising a camera 102 and a pair of stereo microphones 1 04a, b the mobile device captures video data and corresponding stereo audio using the camera and microphones, compresses this data, and makes the compressed video and audio available to one or more mobile devices and/or a server via an RF communications link 106. A procedure 150, which may be implemented using computer program code, operates to clean a stereo audio track of the captured video and may be implemented, for example, on a server or in alternative embodiments on mobile device 100.

The procedure 150, in outline, separates a video stream 152 from a stereo audio stream 154, performs signal processing 156 to de-wind' the captured audio and provides a processed audio output stream 158 which may optionally then be recombined with video stream 152. Generally but not essentially, the procedure 150 will involve de-compression and re-compression of at least the audio stream.

Figure 2 illustrates the audio signal processing in more detail. Thus a combined video and stereo audio compressed data input stream 200 is first decoded in module 202 to separate the video stream 204 from the stereo audio stream 206. In embodiments the video is encoded using H264 and the audio is encoded AAC, and decoder 202 is configured accordingly.

A wind feature extraction module 208 operates on the stereo audio stream 206 to produce an annotated data output 210 comprising time series data labelling each audio channel according to whether or not wind is present. Example annotated data streams 21 Oa, b illustrate example wind present/absent channel labelling data.

The wind feature extraction module 208 may perform wind detection based upon one or more of detected coherence/incoherence, signal envelope rate of change, signal spectrum, and spectral features in multiple frequency bands, as previously described.

In embodiments rather than the data streams 210a, b comprising binary wind present/absent data these data streams may comprise probability data indicating the probability that wind noise is present.

The annotated data 210 in combination with the audio data 206 is passed to an audio cleaning module 212 which operates with the time series wind labelling data on the stereo audio to mitigate the wind noise. In one embodiment cleaning module 212 replaces the audio data in one channel with that of the other channel where wind noise is detected on one channel but not the other this is schematically illustrated by time series 214 illustrating cleaned (stereo) audio data. Alternatively the stereo channels may be mixed in inverse proportion to the probability that each comprises wind noise.

Cleaning module 212 provides a cleaned stereo audio data output 216 to a re-encoder module 218 which combines this data with the original decoded video stream 204 to provide a cleaned output data stream 220 comprising video and stereo audio.

No doubt many other effective alternatives will occur to the skilled person. For example, although embodiments of the method are particularly suitable for application to mobile phones they may also be applied to other types of mobile computing devices with wireless connectivity such as wireless tablets.

It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims

CLAIMS: 1. A method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels; and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.
2. A method as claimed in claim 1 wherein said detecting of said wind noise comprises processing said stereo audio data to generate time series wind noise label data for each of said left and right channels, said time series wind noise label data labelling times at which each of said left and right channels is predicted to comprise greater than a threshold level of wind noise; and wherein said processing of said audio data comprises at least partially replacing audio one said channel predicted to comprise wind noise with audio from the other said channel, responsive to said time series wind noise label data.
3. A method as claimed in claim 2 wherein said replacing includes phase aligning said audio of said replacing channel with said audio of said at least partially replaced channel.
4. A method as claimed in claim 2 or 3 wherein said partial replacing of audio in said one channel predicted to comprise wind noise comprises mixing audio in said one channel with audio in said other channel in a proportion dependent on a probability of said one channel comprising noise and replacing said audio on said channel predicted to comprise wind noise with said mixed audio.
5. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises identifying a shared or coherent portion of audio in said audio for said left and right channels to separate said wind noise; and wherein said processing of said audio data comprises processing to increase a proportion of said shared or coherent portion of said audio on a said channel in which said wind noise is detected.
6. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises detecting an asymmetrical temporal envelope of audio on a said channel.
7. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises detecting continuity of an energy level in at least two frequency bands of audio in a said channel.
8. A method as claimed in any preceding claim further comprising decoding said audio data from said video data, and re-encoding said processed audio data into cleaned video data.
9. A method as claimed in claim 9 wherein said audio data is AAC coded, and wherein said detecting of said wind noise comprises operating on said AAC coded data without fully decoding said audio data.
10. A method as claimed in any preceding claim wherein said left and tight channels of said stereo audio track are captured from microphones separated by greater than 3cm.
11. A method as claimed in any preceding claim wherein said mobile device is a mobile phone.
12. A physical data carrier carrying processor code to, when running, implement the method of any preceding claim.
13. A mobile device or server comprising the data carrier of claim 12.
14. An audio signal processing system, the system comprising: an audio input to receive encoded stereo audio data; an audio decoder to decode said stereo audio data to provide left and right audio channel data for left and right audio channels of said stereo audio; an audio feature detector coupled to said audio decoder to process said stereo audio data to detect wind noise in each of said left and right channels, and to output left and right wind noise data for each of said channels, wherein said left and right wind noise data comprises time series data defining times when wind noise is predicted to be present in said left and right audio channels respectively; an audio signal cleaning module, coupled to said audio decoder and to said audio feature detector, to process said left and right audio channel data responsive to left and right wind noise data identifying wind noise in a respective channel, to mitigate wind noise in one said channel using audio channel data from the other of said channels, to provide cleaned left and right audio channel data; and an audio encoder, coupled to said audio signal cleaning module, to re-encode said cleaned left and right audio channel data to provide a cleaned encoded stereo audio data output.