GB2492162A - Cleaning audio signals in video data captured from a mobile phone - Google Patents

Cleaning audio signals in video data captured from a mobile phone Download PDF

Info

Publication number
GB2492162A
GB2492162A GB201110740A GB201110740A GB2492162A GB 2492162 A GB2492162 A GB 2492162A GB 201110740 A GB201110740 A GB 201110740A GB 201110740 A GB201110740 A GB 201110740A GB 2492162 A GB2492162 A GB 2492162A
Authority
GB
United Kingdom
Prior art keywords
audio
data
wind noise
channel
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB201110740A
Other versions
GB2492162B (en
GB201110740D0 (en
Inventor
Christopher James Mitchell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Audio Analytic Ltd
Original Assignee
Audio Analytic Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Audio Analytic Ltd filed Critical Audio Analytic Ltd
Priority to GB1110740.6A priority Critical patent/GB2492162B/en
Publication of GB201110740D0 publication Critical patent/GB201110740D0/en
Publication of GB2492162A publication Critical patent/GB2492162A/en
Application granted granted Critical
Publication of GB2492162B publication Critical patent/GB2492162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/07Mechanical or electrical reduction of wind noise generated by wind passing a microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Abstract

This invention relates to systems, methods, and computer program code for processing audio signals, in particular for cleaning audio signals in video data capturedfrom a mobile device such as a mobile phone. We describe a method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.

Description

Audio Signal Processing Systems
FIELD OF THE INVENTION
This invention relates to systems, methods, and computer programme code for processing audio signals, in particular for cleaning audio signals in video data captured from a mobile device such as a mobile phone.
BACKGROUND TO THE INVENTION
It is common place to capture video data, including sound, from mobile devices such as mobile phones, for example for uploading to websites such as YouTube (trade mark).
However the quality of this material is often relatively low and it would be advantageous to be able to improve the quality of captured video either at a mobile device or, more particularly, on a computer such as a server.
With this aim in mind, the inventors have conducted research into the properties of captured video, and in particular the audio component of this video data, and have made an interesting observation: The audio data captured by a mobile phone is typically stereo audio data but, unlike for example digital cameras, the microphones in a mobile phone are often well separated, for example at opposite ends of the device.
This, in combination with the manner in which mobile phones are often held, has been observed to result in a particular surprising consequence, namely that wind noise tends to be observed on either one microphone or the other but not often both. It is speculated that this may be a result of the combination of microphone positioning, the manner in which the devices are held, and the way in which air eddies swirl around the device.
The inventors have been able to exploit these observations to provide improved techniques for processing video data captured from mobile phones.
SUMMARY OF THE INVENTION
According to the present invention there is therefore provided a method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels; and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.
In embodiments the stereo audio data is processed to label each channel with data representing a probability that the audio on the channel comprises greater than a threshold level of wind noise. This results in two time series of label data, one for each channel, which may in embodiments comprise a string of is and Os representing the presence/absence of wind respectively. These may then be combined to determine regions where wind noise is present on one or other of the channels, on both, or on neither. Where wind noise is present on one channel but not the other, the channel on which the noise is present may be partially or wholly replaced by audio from the other channel; where noise is present on both channels in embodiments no action is taken.
The replacing or partial replacing of audio in one channel with audio from the other preferably also comprises locally realigning sections of the audio to mitigate phasing effects when switching between the channels. This may be achieved, for example, by cross-correlation of the audio in the two channels and then time shifting the data to be in phase. One channel may then be replaced by another and/or the channels may be partially mixed.
In some embodiments rather than a simple wind presentlwind absent detection the audio is scored and labelled with probability data representing a probability that a segment (successive segments) of the audio comprise wind noise. The audio channels may then be mixed dependent upon this probability data. For example the proportion of a channel in the mix may be inversely proportional to the probability that the audio in that channel comprises wind noise.
Additionally or alternatively the detection of wind noise in a channel may comprise identifying a shared or coherent portion of audio in the audio for the left and right channels, to separate the wind noise from the remainder of the audio. Broadly speaking wind noise has less coherency as compared with the target, desired audio, and thus by selecting audio with greater coherency, and/or rejecting incoherent audio, the wind noise can be separated from the desired sound. The audio may then be processed, for example as described above, to increase a proportion of the shared or coherent portion of audio on a channel in which the wind noise is detected.
A further technique which may be applied, separately or in combination to the above described techniques, relies on determining the temporal envelope of the audio on a channel. Broadly speaking wind noise tends to have a time series envelope comprising an initial period of rapid attack (spike), followed by a longer decaying tail. Thus the envelope is asymmetric in time, over a period of typically of order 0.05 -5 seconds, for example of order 0.5 seconds. Thus embodiments of the method may further detect an asymmetrical temporal envelope of the audio on a channel to detect the wind noise, for example by calculating an envelope of the signal and then calculating a rate of change with time of this envelope data.
Further additionally or alternatively, detection of the wind noise may employ spectral filtering/matching. It is observed that wind noise tends to be brown noise with a low frequency rumble component. Thus wind noise detection may employ low pass filtering and thresholding to identify the presence or absence of this low frequency noise. A further observed feature of wind noise is that over short time periods, for example of order 10 ms (say 1 ms to 1 second) the energy in some frequency bands, for example less than 200 Hz, tends to be approximately invariant from one audio frame or segment (for example of order 10 ms in length) to the next. Thus detection of wind noise may further comprise detecting continuity of an energy level in at least two frequency bands of audio in the channel.
This skilled person will appreciate that, in embodiments, a number of the above described techniques may optionally be combined.
As previously mentioned, embodiments of the above described techniques are particularly advantageous when applied to a mobile phone or, more generally, a mobile device in which the left and right microphones are separated by more than 2 cm, 3 cm or 5 cm.
In embodiments the method includes decoding the stereo audio data from video data, then applying the above described technique to detect wind noise features in the left/right audio. This detection is then used to clean the stereo audio, then re-encoding the audio data into a combined audio/video data stream comprising wind-noise-mitigated audio.
One audio encoding/decoding technique which may be employed is AAC (Advanced Audio Coding) which, broadly speaking, involves time-to-frequency domain mapping.
With such an encoding technique one or both of the wind noise detection and the processing of the audio to mitigate the wind noise may be performed on the encoded data, more particularly without all decoding of the audio. Thus, for example, the frequency domain data may be employed to detect wind noise and/or wind noise may be mitigated by substituting encoded audio frames in the left/right channel data.
The invention further provides processor control code to implement the above-described systems and methods, for example on a general purpose computer system or on a digital signal processor (DSP). The code is provided on a physical data carrier such as a disk, CD-or DVD-ROM, programmed memory such as non-volatile memory (eg Flash) or read-only memory (Firmware). Code (and/or data) to implement embodiments of the invention may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code. As the skilled person will appreciate such code and/or data may be distributed between a plurality of coupled components in communication with one another.
In a related aspect the invention provides an audio signal processing system, the system comprising: an audio input to receive encoded stereo audio data; an audio decoder to decode said stereo audio data to provide left and right audio channel data for left and right audio channels of said stereo audio; an audio feature detector coupled to said audio decoder to process said stereo audio data to detect wind noise in each of said left and right channels, and to output left and right wind noise data for each of said channels, wherein said left and right wind noise data comprises time series data defining times when wind noise is predicted to be present in said left and right audio channels respectively; an audio signal cleaning module, coupled to said audio decoder and to said audio feature detector, to process said left and right audio channel data responsive to left and right wind noise data identifying wind noise in a respective channel, to mitigate wind noise in one said channel using audio channel data from the other of said channels, to provide cleaned left and right audio channel data; and an audio encoder, coupled to said audio signal cleaning module, to re-encode said cleaned left and right audio channel data to provide a cleaned encoded stereo audio data output.
Embodiments of the above described audio signal processing system may be implemented either on a mobile device such as a mobile phone or on a server.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other aspects of the invention will now be further described, by way of example only, with reference to the accompanying figures in which: Figure 1 shows a block diagram of a procedure for cleaning a stereo audio track of video captured from a mobile device, according to an embodiment of the invention; and Figure 2 shows a block diagram of an audio signal processing system according to an embodiment of the invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Referring to Figure 1, this shows a mobile device 100 comprising a camera 102 and a pair of stereo microphones 1 04a, b the mobile device captures video data and corresponding stereo audio using the camera and microphones, compresses this data, and makes the compressed video and audio available to one or more mobile devices and/or a server via an RF communications link 106. A procedure 150, which may be implemented using computer program code, operates to clean a stereo audio track of the captured video and may be implemented, for example, on a server or in alternative embodiments on mobile device 100.
The procedure 150, in outline, separates a video stream 152 from a stereo audio stream 154, performs signal processing 156 to de-wind' the captured audio and provides a processed audio output stream 158 which may optionally then be recombined with video stream 152. Generally but not essentially, the procedure 150 will involve de-compression and re-compression of at least the audio stream.
Figure 2 illustrates the audio signal processing in more detail. Thus a combined video and stereo audio compressed data input stream 200 is first decoded in module 202 to separate the video stream 204 from the stereo audio stream 206. In embodiments the video is encoded using H264 and the audio is encoded AAC, and decoder 202 is configured accordingly.
A wind feature extraction module 208 operates on the stereo audio stream 206 to produce an annotated data output 210 comprising time series data labelling each audio channel according to whether or not wind is present. Example annotated data streams 21 Oa, b illustrate example wind present/absent channel labelling data.
The wind feature extraction module 208 may perform wind detection based upon one or more of detected coherence/incoherence, signal envelope rate of change, signal spectrum, and spectral features in multiple frequency bands, as previously described.
In embodiments rather than the data streams 210a, b comprising binary wind present/absent data these data streams may comprise probability data indicating the probability that wind noise is present.
The annotated data 210 in combination with the audio data 206 is passed to an audio cleaning module 212 which operates with the time series wind labelling data on the stereo audio to mitigate the wind noise. In one embodiment cleaning module 212 replaces the audio data in one channel with that of the other channel where wind noise is detected on one channel but not the other this is schematically illustrated by time series 214 illustrating cleaned (stereo) audio data. Alternatively the stereo channels may be mixed in inverse proportion to the probability that each comprises wind noise.
Cleaning module 212 provides a cleaned stereo audio data output 216 to a re-encoder module 218 which combines this data with the original decoded video stream 204 to provide a cleaned output data stream 220 comprising video and stereo audio.
No doubt many other effective alternatives will occur to the skilled person. For example, although embodiments of the method are particularly suitable for application to mobile phones they may also be applied to other types of mobile computing devices with wireless connectivity such as wireless tablets.
It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims (14)

  1. CLAIMS: 1. A method of cleaning a stereo audio track of video captured from a mobile device, the stereo audio track comprising left and right channels, the method comprising: inputting video data for said video; extracting left and right channel stereo audio data for said audio track from said video data; detecting wind noise in said audio data on one of said left and right channels; and processing said audio data for said one of said left and right channels using the other of said left and right channels to mitigate said wind noise.
  2. 2. A method as claimed in claim 1 wherein said detecting of said wind noise comprises processing said stereo audio data to generate time series wind noise label data for each of said left and right channels, said time series wind noise label data labelling times at which each of said left and right channels is predicted to comprise greater than a threshold level of wind noise; and wherein said processing of said audio data comprises at least partially replacing audio one said channel predicted to comprise wind noise with audio from the other said channel, responsive to said time series wind noise label data.
  3. 3. A method as claimed in claim 2 wherein said replacing includes phase aligning said audio of said replacing channel with said audio of said at least partially replaced channel.
  4. 4. A method as claimed in claim 2 or 3 wherein said partial replacing of audio in said one channel predicted to comprise wind noise comprises mixing audio in said one channel with audio in said other channel in a proportion dependent on a probability of said one channel comprising noise and replacing said audio on said channel predicted to comprise wind noise with said mixed audio.
  5. 5. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises identifying a shared or coherent portion of audio in said audio for said left and right channels to separate said wind noise; and wherein said processing of said audio data comprises processing to increase a proportion of said shared or coherent portion of said audio on a said channel in which said wind noise is detected.
  6. 6. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises detecting an asymmetrical temporal envelope of audio on a said channel.
  7. 7. A method as claimed in any preceding claim wherein said detecting of said wind noise comprises detecting continuity of an energy level in at least two frequency bands of audio in a said channel.
  8. 8. A method as claimed in any preceding claim further comprising decoding said audio data from said video data, and re-encoding said processed audio data into cleaned video data.
  9. 9. A method as claimed in claim 9 wherein said audio data is AAC coded, and wherein said detecting of said wind noise comprises operating on said AAC coded data without fully decoding said audio data.
  10. 10. A method as claimed in any preceding claim wherein said left and tight channels of said stereo audio track are captured from microphones separated by greater than 3cm.
  11. 11. A method as claimed in any preceding claim wherein said mobile device is a mobile phone.
  12. 12. A physical data carrier carrying processor code to, when running, implement the method of any preceding claim.
  13. 13. A mobile device or server comprising the data carrier of claim 12.
  14. 14. An audio signal processing system, the system comprising: an audio input to receive encoded stereo audio data; an audio decoder to decode said stereo audio data to provide left and right audio channel data for left and right audio channels of said stereo audio; an audio feature detector coupled to said audio decoder to process said stereo audio data to detect wind noise in each of said left and right channels, and to output left and right wind noise data for each of said channels, wherein said left and right wind noise data comprises time series data defining times when wind noise is predicted to be present in said left and right audio channels respectively; an audio signal cleaning module, coupled to said audio decoder and to said audio feature detector, to process said left and right audio channel data responsive to left and right wind noise data identifying wind noise in a respective channel, to mitigate wind noise in one said channel using audio channel data from the other of said channels, to provide cleaned left and right audio channel data; and an audio encoder, coupled to said audio signal cleaning module, to re-encode said cleaned left and right audio channel data to provide a cleaned encoded stereo audio data output.
GB1110740.6A 2011-06-24 2011-06-24 Audio signal processing systems Active GB2492162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1110740.6A GB2492162B (en) 2011-06-24 2011-06-24 Audio signal processing systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1110740.6A GB2492162B (en) 2011-06-24 2011-06-24 Audio signal processing systems

Publications (3)

Publication Number Publication Date
GB201110740D0 GB201110740D0 (en) 2011-08-10
GB2492162A true GB2492162A (en) 2012-12-26
GB2492162B GB2492162B (en) 2018-11-21

Family

ID=44485111

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1110740.6A Active GB2492162B (en) 2011-06-24 2011-06-24 Audio signal processing systems

Country Status (1)

Country Link
GB (1) GB2492162B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11735202B2 (en) 2019-01-23 2023-08-22 Sound Genetics, Inc. Systems and methods for pre-filtering audio content based on prominence of frequency content

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238183A1 (en) * 2002-08-20 2005-10-27 Kazuhiko Ozawa Automatic wind noise reduction circuit and automatic wind noise reduction method
US20060233391A1 (en) * 2005-04-19 2006-10-19 Park Jae-Ha Audio data processing apparatus and method to reduce wind noise
US20070058822A1 (en) * 2005-09-12 2007-03-15 Sony Corporation Noise reducing apparatus, method and program and sound pickup apparatus for electronic equipment
JP2008263483A (en) * 2007-04-13 2008-10-30 Sanyo Electric Co Ltd Wind noise reducing device, sound signal recorder, and imaging apparatus
US20090002498A1 (en) * 2007-04-13 2009-01-01 Sanyo Electric Co., Ltd. Wind Noise Reduction Apparatus, Audio Signal Recording Apparatus And Imaging Apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3153912B2 (en) * 1991-06-25 2001-04-09 ソニー株式会社 Microphone device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238183A1 (en) * 2002-08-20 2005-10-27 Kazuhiko Ozawa Automatic wind noise reduction circuit and automatic wind noise reduction method
US20060233391A1 (en) * 2005-04-19 2006-10-19 Park Jae-Ha Audio data processing apparatus and method to reduce wind noise
US20070058822A1 (en) * 2005-09-12 2007-03-15 Sony Corporation Noise reducing apparatus, method and program and sound pickup apparatus for electronic equipment
JP2008263483A (en) * 2007-04-13 2008-10-30 Sanyo Electric Co Ltd Wind noise reducing device, sound signal recorder, and imaging apparatus
US20090002498A1 (en) * 2007-04-13 2009-01-01 Sanyo Electric Co., Ltd. Wind Noise Reduction Apparatus, Audio Signal Recording Apparatus And Imaging Apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11735202B2 (en) 2019-01-23 2023-08-22 Sound Genetics, Inc. Systems and methods for pre-filtering audio content based on prominence of frequency content

Also Published As

Publication number Publication date
GB2492162B (en) 2018-11-21
GB201110740D0 (en) 2011-08-10

Similar Documents

Publication Publication Date Title
IL276179A (en) Adaptive processing with multiple media processing nodes
WO2008016935A3 (en) Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
MX2015016892A (en) Apparatus and method realizing a fading of an mdct spectrum to white noise prior to fdns application.
RU2015104987A (en) VIDEO CODING METHOD AND VIDEO CODING DEVICE AND VIDEO DECODING METHOD AND VIDEO DECODING DEVICE FOR SIGNALIZING SAO PARAMETERS
TW200746052A (en) Apparatus and method for encoding and decoding signal
US11715171B2 (en) Detecting watermark modifications
MX2016000854A (en) Apparatus and method for decoding and encoding an audio signal using adaptive spectral tile selection.
GB2553446A (en) Entity based temporal segmentation of video streams
RU2010127313A (en) SYSTEM AND METHOD FOR COMPRESSING INTERACTIVE STREAM VIDEO
RU2010154749A (en) AUDIO CODING / DECODING DIAGRAM WITH BYPASS SWITCHING
MY148913A (en) Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
TW200721849A (en) Image processing apparatus, image processing method, recording medium, and program
WO2009017351A3 (en) Digital broadcasting system and data processing method
MX2011009969A (en) System and method for compressing video based on latency measurements and other feedback.
WO2009051010A1 (en) Image coding device, image decoding device, image coding method, and image decoding method
PH12019000380A1 (en) An apparatus, a method and a computer program for video coding and decoding
WO2014000049A3 (en) Method, apparatus and system for encoding and decoding a sample adaptive offset data of encoded video data
RU2011140533A (en) AUDIO SIGNAL CODING METHOD, AUDIO SIGNAL DECODING METHOD, CODING DEVICE, DECODING PROCESSING SYSTEM, AUDIO SIGNAL, AUDIO SIGNAL PROGRAM AND PROGRAM PROGRAM
MX349600B (en) Effective pre-echo attenuation in a digital audio signal.
RU2010127312A (en) SYSTEM AND METHOD FOR COMPRESSING INTERACTIVE STREAM VIDEO
WO2009096715A3 (en) Method and apparatus for coding and decoding of audio signal
US8976254B2 (en) Temporal aliasing reduction and coding of upsampled video
RU2015135352A (en) METHOD AND DEVICE FOR ARITHMETIC ENCODING OR ARITHMETIC DECODING
TW200616460A (en) Image processing apparatus
DE60226777D1 (en) AUDIO CODING

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20230202 AND 20230208