AU2004205225A1 - Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones - Google Patents
Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones Download PDFInfo
- Publication number
- AU2004205225A1 AU2004205225A1 AU2004205225A AU2004205225A AU2004205225A1 AU 2004205225 A1 AU2004205225 A1 AU 2004205225A1 AU 2004205225 A AU2004205225 A AU 2004205225A AU 2004205225 A AU2004205225 A AU 2004205225A AU 2004205225 A1 AU2004205225 A1 AU 2004205225A1
- Authority
- AU
- Australia
- Prior art keywords
- signals
- band
- section
- time delay
- dominant frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Landscapes
- Circuit For Audible Band Transducer (AREA)
Description
S&F Ref: 690303
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3chome, Ohta-ku, Tokyo, 146, Japan Timothy John Wark Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones ASSOCIATED PROVISIONAL APPLICATION DETAILS [33] Country [31] Applic. No(s) AU 2003904879 [32] Application Date 05 Sep 2003 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5815c -1- FRONTAL AUDIO SOURCE LOCATION USING VERY CLOSELY SPACED STEREO MICROPHONES Field of the Invention The present invention relates generally to audio source location and, in particular, to audio source direction estimation from a stereo signal.
Background Numerous systems exist to estimate an angular direction of a dominant audio source, such as a person speaking, using two or more microphones. Most systems use microphones that are placed a fair distance apart so that the difference in the signals captured by the two or more microphones are significant. It is that significant difference in the signals that is used to estimate the angular direction of the audio source.
A majority of known systems use a time-delay between the signals captured by the two microphones by estimating the time time-delay f as follows: S= arg max X)(c)e-j-P X,(o)dco 1(1 where X, and X 2 are the frequency spectra for the two signals. Equation is known as the Generalized Cross-Correlation (GCC). The angular direction of the source is then estimated using the calculated time-delay f between the signals. This frequency based approach has been shown to be very robust when the microphones are widely spaced, which results in a significant time-delay z between the signals.
Another known system uses a time-based cross-correlation. In this system it is assumed that the microphones were placed 20cm apart when the audio signals were captured. Furthermore, a tone of single frequency is used as an audio source, rather than a far more complicated signal, such as a speech signal.
690303 -2- Other known audio source direction estimation systems have used microphone arrays, which involve multiple microphones placed in either a 1-dimensional or 2dimensional configurations. Techniques known as "beam-forming" are then used to pinpoint a source direction using the inputs from all microphones, taking into account their geometric configuration.
A problem experienced when applying known source location systems and methods to audio captured on a small, portable device, such as a typical mini-DV camera, is that the microphones are typically only around 20mm apart. Beam-forming is therefore not an option for estimating the source direction, as there is no provision for a microphone array in such a small, portable device. Also, referring to the known systems using as input signals from two microphones, for a signal that is sampled at 48 kHz, a maximum time delay of only around 2 samples between the two signals captured by closely spaced microphones can be expected. Previous approaches are ineffective for expected delays of this order.
Summary It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems by analysing a section of at least one channel of stereo audio signals to estimate a dominant frequency therein. The audio signals are then band-pass filtered to form two band-passed signals, with the dominant frequency being in the pass-band of said filter. A time delay between the two band-passed signals is then estimated, and the time delay is converted to an angular direction of the audio source of the audio signals.
6QOo -3- According to an aspect of the present invention, there is provided a method of estimating an angular direction of an audio source from stereo audio signals, said method comprising the steps of: analysing a section of at least one channel of said stereo audio signals to estimate a dominant frequency therein; applying a band-pass filter to each channel of said stereo audio signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; estimating a time delay between said two band-passed signals; and converting the time delay to said angular direction of said audio source.
According to another aspect of the present invention, there is provided a method of estimating a time delay between two one-dimensional signals, said method comprising the steps of: analysing a section of at least one of said signals to estimate a dominant frequency therein; applying a band-pass filter to each of said signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; and cross-correlating said two band-passed signals to estimate said time delay between said signals.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings One or more embodiments of the present invention will now be described with reference to the drawings in which: Fig. 1 shows a mini digital video camera for capturing audiovisual data, together with a definition of angles used when estimating an angular direction of a source; 690303 9 -4- Fig. 2 is a flow diagram of a method of estimating an angular direction of an audio source from the stereo audio data according to the present invention; Fig. 3 is a flow diagram of a step of estimating an angular direction for the most dominant audio source present in an audio segment, which performed in the method shown in Fig. 2, in more detail; Fig. 4A shows a typical 5 ms section of an audio signal before band-pass filtering; Fig. 4B shows the audio signal of Fig. 4A after band-pass filtering; Fig. 5 shows small parts of typical corresponding band-passed signals from two channels; and Fig. 6 is a schematic block diagram of a general purpose computer upon which the method of estimating an angular direction of an audio source can be practiced.
Detailed Description including Best Mode Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
Fig. 1 shows a mini digital video (DV) camera 214 for capturing audiovisual data. The mini-DV camera 214 has two microphones 250 and 260 for capturing the audio component of the audiovisual data, and in particular for capturing stereo audio data or digital signals. The microphones 250 and 260 of typical mini-DV cameras 214 are about mm apart. The captured stereo audio data is typically stored in the mini-DV camera 214 in a DV format.
Fig. 2 is a flow diagram of a method 100 of estimating an angular direction of an audio source from the stereo audio data. The method 100 may be performed by a 690101 processor (not illustrated) in the mini-DV camera 214. In another implementation a general purpose computer 200, a schematic block diagram of which is illustrated in Fig. 6, is connected to the mini-DV camera 214, allowing the audiovisual data to be transferred to the computer 200. The computer 200 then performs the method 100 of estimating the angular direction of the audio source in a real-time application or a post-processing application. In the real-time application, the stereo audio data captured using the microphones 250 and 260 (Fig. 1) is fed to the computer 200, via connection 230, as the audiovisual data is captured. The angular direction of the audio source is estimated from incoming audio data in real-time. In the post-processing application, the audiovisual data is first stored on the mini-DV camera 214, or transferred to and stored on the storage medium of the computer 200 described below in the .AVI or .DV file formats, and then analysed at a later time.
The steps of method 100 are implemented in the computer 200 as software, such as an application program executing within the computer system 200. In particular, the steps of method 100 are effected by instructions in the software that are carried out by the computer 200. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer from the computer readable medium, and then executed by the computer. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer 200 preferably effects an advantageous apparatus for estimating an angular direction of an audio source from the stereo audio signal.
Referring now to Fig. 6, the computer system 200 is formed by a computer module 201, input devices such as a keyboard 202 and mouse 203, and output devices including a display215. The computer module201 typically includes at least one 690303 -6processor unit 205, and a memory unit 206, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 201 also includes a number of input/output interfaces, including: a device interface 207 that couples via connection 230 to the mini-DV camera 214, shown in more detail in Fig. 1, an I/O interface 213 for the keyboard 202 and mouse 203, and an interface 208 for the display 215. 1/O interface 207 typically accepts the audiovisual data via connection 230 according to the IEEE 1394 protocol.
A storage device 209 is provided and typically includes a hard disk drive 210 and a floppy disk drive 211. A CD-ROM drive 212 is typically provided as a non-volatile source of data. The components 205 to 213 of the computer module 201, typically communicate via an interconnected bus 204 and in a manner which results in a conventional mode of operation of the computer system 200 known to those in the relevant art.
Typically, the application program is resident on the hard disk drive 210 and read and controlled in its execution by the processor 205. Intermediate storage of the program and any data accepted from the mini-DV camera 214 via the connection 230 may be accomplished using the semiconductor memory 206, possibly in concert with the hard disk drive 210.
Referring now to Fig. 2, the method 100 of estimating an angular direction of an audio source from the stereo audio data starts in step 120 where the processor 205 extracts the stereo audio data from the audiovisual data captured by mini-DV camera 214. One of a number of software libraries, known to those skilled in the art, is used in step 120 to extract the stereo audio data from the DV format input, in the case where the audio visual data is stored on the camera 214, or from the .AVI or .DV file, in the case where the audiovisual data is stored in the storage device 209.
690301 -7- The method 100 then proceeds to step 130 where the processor 205 buffers a segment of the audio data to memory 206. Preferably the buffer stores audio data corresponding to a duration of 0.4 seconds. Using the audio samples in the buffer, the processor 205 in step 140 estimates the angular direction for the most dominant audio source present in the 0.4 second segment being analysed, and provides an output of the estimation in step 150. If step 140 is unable to estimate the angular direction for the most dominant audio source, which may for example occur because the segment consists of white noise, or silence only, then step 150 returns a result to the effect that no estimation exists for current segment. The estimate may be displayed on the display 215.
After the estimation has been output in step 150, the method 100 returns to step 130 where a next segment is stored to the buffer in memory 206. Steps 140 and 150 are then repeated with the next segment to again provide an output of the estimation. In particular, steps 130 to 150 are repeated as long as there is still audio data to be processed.
Fig. 3 illustrates a flow diagram of step 140 in more detail. Step 140 starts in sub-step 305 where the processor 205 separates the audio data captured by the left and right microphones 250 and 260 contained within the buffered stereo audio data segment, forming left and right audio channel signals respectively.
The processor 205 then, in sub-step 310, applies a high-pass filter on each of the left and right audio channel signals. Preferably a Finite Impulse Response (FIR) filter of order 49 and having a cut-off frequency of 3500 Hz is used. The high-pass filter removes low-frequency components from the audio data, as low-frequency components are detrimental to the calculation of the zero-cross rate (ZCR) that follows.
Instead of using the whole 0.4 second segment to estimate the angular direction for the most dominant audio source, step 140 uses only small sub-segments of each 0.4 second segment for its estimation. Such a sub-segment, to be useful for the 690103 -8estimation, has to have sufficiently high energy to ensure that the sub-segment does not merely contain background noise. Also, to prevent ambiguity occurring in the crosscorrelation step that follows, the ZCR of the signal in the sub-segment has to be below a predetermined maximum.
Accordingly, in sub-step 320 the processor 205 analyses one of the left or right audio channel data to find within the 0.4 second segment a sub-segment that has sufficiently high energy and has a suitable ZCR. Preferably each sub-segment is of duration 0.1 seconds, with the beginning of successive sub-segments being 0.05 seconds apart. In the preferred implementation, one of the left or right audio channel signals is chosen by determining which one of the left or right audio channel signals has the greatest range. The range of the signal is calculated as: range max s(i) min s(i) (2) where s(i) is the i-th data value of the buffered audio signal after high-pass filtering.
Alternatively, either channel may be chosen at random. Also, in the preferred implementation the sub-segment has to have an energy of at least 0.6, and a ZCR below 17000 zero crossings per second.
The energy E for each sub-segment is calculated as: E (3) where N is the number of samples in the sub-segment being considered. The ZCR for each sub-segment is calculated as: ZCR lsign(s(i)- E{s})-sign(s(i (4) where E{s} is the Expected-Value of the signal s(i) in the current sub-segment.
69001 j -9- Step 140 continues to sub-step 330 where the processor 205 determines whether a sub-segment has been found within the segment being analysed that satisfies the energy and ZCR requirements. If sub-step 330 determines that no such sub-segment exists, then step 140 continues to sub-step 380 which terminates step 140 with the information that an angular direction for an audio source cannot be estimated from the current buffered segment.
Alternatively, if sub-step 330 determines that at least one such sub-segment exists, then step 140 processes that segment further to estimate the angular direction for the most dominant audio source. In particular, processing continues only on the subsegment of the segment under consideration having the highest energy E, provided that sub-section fulfils the ZCR requirement. Within the sub-segment with the highest energy E, the processor 205 in sub-step 340 finds a section of that sub-segment having the greatest combination of ZCR and energy E by calculating the product of the energy E and the ZCR of each section. In the preferred implementation each section is of duration ms, with the beginning of successive sections being 3 ms apart.
Step 140 continues by analysing only the section of the segment resulting from sub-step 340, and a corresponding section in time in the other audio channel. The processor 205 first estimates a fundamental frequency fo in the section in sub-step 350, and then in sub-step 360 band-pass filters the section and the corresponding section in the other audio channel around the fundamental frequency f 0 In the preferred implementation an FIR filter of order 99 is used having a lower cutoff frequency set at 0.9* fo and the upper cutoff frequency set at 1.1* fo.
The fundamental frequency estimated in sub-step 350 is the dominant frequency in the spectrum of the filtered audio data. Rather than taking the computationally 690303 expensive approach of finding the spectrum, a much simpler approach is taken where the fundamental frequency fo is estimated from the ZCR of the data in the section as follows: fo ZCR(s 5 2 2 where ZCR(s,,,s) is the ZCR for the signal in the 5 ms section of the chosen channel having the highest product of the energy E and the ZCR. In other words, the fundamental frequency is estimated to be half the ZCR of the signal in the section, which is a good estimation, especially seeing as the signal has already been high-pass filtered in sub-step 310.
A typical result achieved by the band-pass filtering of sub-step 360 is illustrated in Figs. 4A and 4B. Fig. 4A shows a typical 5 ms section of an audio signal before bandpass filtering, whereas Fig. 4B shows the same audio signal after band-pass filtering.
From this typical result it can be seen that the signal after band-pass filtering (Fig. 4B) is almost sinusoidal in appearance. Sinusoidal signals are far more robust for cross-channel comparisons.
Step 140 continues in sub-step 370 where the processor 205 cross-correlates the band-passed signals to estimate a time delay between the audio signals from the two channels. Fig. 5 shows small parts of typical corresponding band-passed signals from both channels within the section being analysed. A time delay between the two corresponding signals is clearly visible.
In the preferred implementation only the first 200 data samples from each of the corresponding signals is used to estimate the time delay between the audio data signals from the two channels. The cross-correlation between two digital signals and x 2 at a lag ofk samples is defined as: 690303 -11- 1 M-I r12(k) -Z x,(m)x 2 (m k) (6) M m=O where M is the number of samples in each signal and lag k is the number of sample intervals by which x 2 is shifted back in time. It is noted that the time based-approach of calculating cross-correlation r 1 2 is significantly faster than the frequency based approaches based on the Generalized Cross-Correlation defined in Equation This is because there is no need to calculate the Fast Fourier Transform (FFT) of the audio data of the two channels.
Typically the cross-correlation r 2 is evaluated across all possible values of lag k. However, due to restrictions in the number of discrete samples making up the expected time delay, a faster approach is preferably employed.
It can be shown that the angle 0 in radians with respect to the perpendicular line between two microphones to the most dominant audio source can be calculated as: 0= arcsin d (7) where v is the speed of sound in metres per second, d is the distance (in metres) between the microphones and r is the time-delay in seconds between the left and right audio signals. Using an audio sampling frequency of 48000 Hz, a distance between the microphones of 0.02m (20mm), and the speed of sound at standard temperature and pressure of 346.65 m/s, it can be shown that the maximum range of integer sample shifts in the 180-degree semi-circle in front of the microphones 250 and 260 is -2 to 2 samples.
Thus, for the given distance d between the microphones 250 and 260, and sampling frequency rate, the direction of the most dominant audio source can be estimated as one of 5 direction ranges in front of the camera 214. With integer sample shifts of -2 to 2 samples, the 5 direction ranges correspond to approximately 46 degrees left, 21 degrees 12left, center, 21 degrees right, and 46 degrees right respectively (given that the left channel is the reference signal). The 5 direction ranges are illustrated in Fig. 1 using positions A to E.
Given that there are only 5 integer sample shifts of interest, in sub-step 370 the cross-correlation r1 2 between the two band-passed signals from sub-step 360 is only calculated for lags k e An estimation of the most likely shift between the signals ,8 e is thus calculated as:
N
S= arg max x (i)x (i k) (8) k i=lkI where x, is the first N data samples of the left channel data in the 5 ms section being analysed, x 2 is the first N samples of the right channel data in the 5 ms section being analysed, and N 200. The reason for the index i to start from Iki in Equation is to ensure that the index of x 2 will never be negative.
Step 140 ends in sub-step 390 where the angular direction corresponding to the most likely shift is returned as the estimation of the angular direction to the most dominant audio source.
Hence, referring again to step 150 of the method 100 (Fig. if the signal is of sufficient energy and ZCR, and with the distance d between the microphones of 0.02 m and an audio sampling frequency of 48000 Hz, one of the 5 direction ranges (46 degrees left, 21 degrees left, center, 21 degrees right, and 46 degrees right) is returned as the estimation of the angular direction to the most dominant audio source. If the signal is not of sufficient energy and ZCR, then a "no available information" message is outputted.
When comparing the method 100 with prior art methods, it is noted that method 100 is performed in the time domain, rather than the frequency domain. As described 690303 -13above, performing cross-correlation in the time-domain avoids the need to calculate the Fast Fourier Transform (FFT) of the audio data of the two channels. It is also noted that a simple cross-correlation in the time-domain of the audio data captured by microphones spaced less than a wavelength of the dominant frequency apart will not work because of the order of shifts occurring when the microphones are so closely spaced. Such order of shifts is typically only a very small number. As set out above, using a microphone spacing of 0.02 m and an audio sampling frequency of 48000 Hz, the maximum shift is 2 samples in either direction. Prior art methods are unreliable with shifts of that order.
Apart from displaying the estimate on display 215, the estimate may be used in a portable video conferencing camera system. Whilst automatic video conferencing systems are known, such video conferencing systems rely on microphones placed around the room to track the person talking. This is because the methods used in such systems require the microphones to be spaced as far as possible apart. In a portable video conferencing camera system wherein method 100 is employed to estimate the angular direction of an audio source, the input from the microphones 250 and 260 of the camera 214 is used without the need for separate microphones spaced far apart. The estimate of the angular direction is converted by the processor 205 into instructions to control a servo motor of a camera head connected to the camera 214 to turn the specified number of degrees from the center of the field of view, thereby changing the field of view towards the direction of the audio source. A number of suitable camera heads are available on the market, such as the Panja AXB-PTIO or the Fujinon CPT-JA-IOD.
Alternatively a separate pan tilt zoom (PTZ) camera, such as the VCC4 manufactured by Canon, could be mounted near or on top of the mini-DV camera 214 to turn in the directions estimated by the angular direction estimation method 100. The only requirement would be that the PTZ camera must have the same (or similar) field of view 690303 -14as the mini-DV camera 214 that is used to capture the stereo audio data. As with the video conferencing application described above, the estimate of the angular direction is converted by the processor 205 into instructions to control the PTZ camera to turn the specified number of degrees.
Either of these video conferencing systems have the advantage of being fully portable and much cheaper than current automatic video-conferencing systems. There is no requirement to mount microphones around the room, which allows the system to be much faster and easier to set up and use.
In another application a system incorporating method 100 is used as a "smart" video camera system where visual and audio cues are used to automatically film a scene in a room. The estimate of the angular direction of the audio source is used to give an approximate direction towards an object of interest in the audio domain. Accordingly, image based algorithms, such as face and motion detection, could be greatly enhanced by using the approximate direction towards the object of interest in the audio domain to detect the presence of humans, for example.
Such a smart video camera system would use a higher-level algorithm which takes the angular direction estimate of an audio source and information from image based algorithms and makes subsequent filming decisions based thereon. The higher-level algorithm makes decisions about how long, for example, speech would need to come from the same direction before the camera turned and filmed the event of interest. The decisions are then converted into instructions to control a PTZ camera to turn the specified number of degrees as described in the application above.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
690303 In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.
690303
Claims (20)
1. A method of estimating an angular direction of an audio source from stereo audio signals, said method comprising the steps of: analysing a section of at least one channel of said stereo audio signals to estimate a dominant frequency therein; applying a band-pass filter to each channel of said stereo audio signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; estimating a time delay between said two band-passed signals; and converting the time delay to said angular direction of said audio source.
2. A method as claimed in claim 1, said method having the initial step of qualifying said section of at least one channel of said stereo audio signals by determining whether the signal contained in said section has energy above a first predefined value.
3. A method as claimed in claim 2 wherein said section is further qualified by determining whether said signal in said section has a zero crossing rate below a second predefined value.
4. A method as claimed in any one of claims 1 to 3 wherein said dominant frequency is estimated by calculating the zero crossing rate of said section.
A method as claimed in any one of claims 1 to 4 wherein said time delay is estimated by cross correlating said two band-passed signals in the time domain. 690303 -17-
6. A method as claimed in claim 5 wherein the cross-correlation is restricted to a predetermined domain of lag values, where said domain of lag values is a function of a sample frequency of said stereo audio signals and a distance between microphones used to capture said stereo audio signals.
7. A method of estimating a time delay between two one-dimensional signals, said method comprising the steps of: analysing a section of at least one of said signals to estimate a dominant frequency therein; applying a band-pass filter to each of said signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; and cross-correlating said two band-passed signals to estimate said time delay between said signals.
8. A method as claimed in claim 7 wherein said dominant frequency is estimated by calculating the zero crossing rate of said section.
9. Apparatus for estimating an angular direction of an audio source from stereo audio signals, said apparatus comprising: means for analysing a section of at least one channel of said stereo audio signals to estimate a dominant frequency therein; means for applying a band-pass filter to each channel of said stereo audio signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; 690303 -18- means for estimating a time delay between said two band-passed signals; and means for converting the time delay to said angular direction of said audio source.
10. Apparatus as claimed in claim 9, said apparatus further comprising means for qualifying said section of at least one channel of said stereo audio signals by determining whether the signal contained in said section has energy above a first predefined value.
11. Apparatus as claimed in claim 10 wherein said section is further qualified by determining whether said signal in said section has a zero crossing rate below a second predefined value.
12. Apparatus as claimed in any one of claims 9 to 11 wherein said dominant frequency is estimated by calculating the zero crossing rate of said section.
13. Apparatus as claimed in any one of claims 9 to 12 wherein said time delay is estimated by cross correlating said two band-passed signals in the time domain.
14. Apparatus as claimed in claim 13 wherein the cross-correlation is restricted to a predetermined domain of lag values, where said domain of lag values is a function of a sample frequency of said stereo audio signals and a distance between microphones used to capture said stereo audio signals.
Apparatus for estimating a time delay between two one-dimensional signals, said apparatus comprising: 690303 -19- means for analysing a section of at least one of said signals to estimate a dominant frequency therein; means for applying a band-pass filter to each of said signals to form two band- passed signals, with said dominant frequency being in the pass-band of said filter; and means for cross-correlating said two band-passed signals to estimate said time delay between said signals.
16. Apparatus as claimed in claim 15 wherein said dominant frequency is estimated by calculating the zero crossing rate of said section.
17. A program stored on a computer readable medium for estimating an angular direction of an audio source from stereo audio signals, said program comprising: code for analysing a section of at least one channel of said stereo audio signals to estimate a dominant frequency therein; code for applying a band-pass filter to each channel of said stereo audio signals to form two band-passed signals, with said dominant frequency being in the pass-band of said filter; code for estimating a time delay between said two band-passed signals; and code for converting the time delay to said angular direction of said audio source.
18. A program stored on a computer readable medium for estimating a time delay between two one-dimensional signals, said program comprising: code for analysing a section of at least one of said signals to estimate a dominant frequency therein; 690303 code for applying a band-pass filter to each of said signals to form two band- passed signals, with said dominant frequency being in the pass-band of said filter; and code for cross-correlating said two band-passed signals to estimate said time delay between said signals.
19. A method of estimating an angular direction of an audio source from stereo audio signals, said method being substantially as herein described with reference to the accompanying drawings.
20. Apparatus for estimating an angular direction of an audio source from stereo audio signals, said apparatus being substantially as herein described with reference to the accompanying drawings. DATED this 24 Day of August 2004 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON 6go~00
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2004205225A AU2004205225A1 (en) | 2003-09-05 | 2004-08-25 | Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003904879A AU2003904879A0 (en) | 2003-09-05 | Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones | |
AU2003904879 | 2003-09-05 | ||
AU2004205225A AU2004205225A1 (en) | 2003-09-05 | 2004-08-25 | Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2004205225A1 true AU2004205225A1 (en) | 2005-03-24 |
Family
ID=34423811
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2004205225A Abandoned AU2004205225A1 (en) | 2003-09-05 | 2004-08-25 | Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2004205225A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011001005A1 (en) | 2009-06-30 | 2011-01-06 | Nokia Corporation | Audio-controlled image capturing |
CN109858508A (en) * | 2018-10-23 | 2019-06-07 | 重庆邮电大学 | IP localization method based on Bayes and deep neural network |
-
2004
- 2004-08-25 AU AU2004205225A patent/AU2004205225A1/en not_active Abandoned
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011001005A1 (en) | 2009-06-30 | 2011-01-06 | Nokia Corporation | Audio-controlled image capturing |
US9007477B2 (en) | 2009-06-30 | 2015-04-14 | Nokia Corporation | Audio-controlled image capturing |
EP2449426A4 (en) * | 2009-06-30 | 2015-05-27 | Nokia Corp | Audio-controlled image capturing |
CN109858508A (en) * | 2018-10-23 | 2019-06-07 | 重庆邮电大学 | IP localization method based on Bayes and deep neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9899025B2 (en) | Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities | |
US8289363B2 (en) | Video conferencing | |
US6185152B1 (en) | Spatial sound steering system | |
US7305095B2 (en) | System and process for locating a speaker using 360 degree sound source localization | |
Halperin et al. | Dynamic temporal alignment of speech to lips | |
JP2000125274A (en) | Method and system to index contents of conference | |
CN107820037B (en) | Audio signal, image processing method, device and system | |
JP2013084334A (en) | Time alignment of recorded audio signals | |
JP2004515982A (en) | Method and apparatus for predicting events in video conferencing and other applications | |
EP4035352A1 (en) | System and method of dynamic, natural camera transitions in an electronic camera | |
KR20120089369A (en) | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal | |
CN105578097A (en) | Video recording method and terminal | |
KR101508092B1 (en) | Method and system for supporting video conference | |
US10250803B2 (en) | Video generating system and method thereof | |
JP4490076B2 (en) | Object tracking method, object tracking apparatus, program, and recording medium | |
US9756421B2 (en) | Audio refocusing methods and electronic devices utilizing the same | |
Arriandiaga et al. | Audio-visual target speaker enhancement on multi-talker environment using event-driven cameras | |
US12073844B2 (en) | Audio-visual hearing aid | |
US11842745B2 (en) | Method, system, and computer-readable medium for purifying voice using depth information | |
AU2004205225A1 (en) | Frontal Audio Source Location Using Very Closely Spaced Stereo Microphones | |
JP4198915B2 (en) | Spatial sonic steering system | |
KR20010079719A (en) | Real-time tracking of an object of interest using a hybrid optical and virtual zooming mechanism | |
Bregonzio et al. | Multi-modal particle filtering tracking using appearance, motion and audio likelihoods | |
TWI687917B (en) | Voice system and voice detection method | |
Korchagin | Audio spatio-temporal fingerprints for cloudless real-time hands-free diarization on mobile devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MK1 | Application lapsed section 142(2)(a) - no request for examination in relevant period |