EP0968624A2 - Fernsprechübertragung von dreidimensionalem schall - Google Patents
Fernsprechübertragung von dreidimensionalem schallInfo
- Publication number
- EP0968624A2 EP0968624A2 EP98909666A EP98909666A EP0968624A2 EP 0968624 A2 EP0968624 A2 EP 0968624A2 EP 98909666 A EP98909666 A EP 98909666A EP 98909666 A EP98909666 A EP 98909666A EP 0968624 A2 EP0968624 A2 EP 0968624A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- signals
- output signals
- channel
- khz
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 16
- 238000007906 compression Methods 0.000 claims description 20
- 230000006835 compression Effects 0.000 claims description 20
- 230000000007 visual effect Effects 0.000 claims description 6
- 230000003595 spectral effect Effects 0.000 claims description 5
- 238000013144 data compression Methods 0.000 claims description 2
- 238000000034 method Methods 0.000 claims 2
- 238000001914 filtration Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 13
- 210000003128 head Anatomy 0.000 description 32
- 210000005069 ears Anatomy 0.000 description 11
- 230000005236 sound signal Effects 0.000 description 7
- 210000000883 ear external Anatomy 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 4
- 239000002023 wood Substances 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 229920002379 silicone rubber Polymers 0.000 description 2
- 239000004945 silicone rubber Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
Definitions
- This invention relates to telephonic transmission of three dimensional (3D) sounds and more particularly to an apparatus for communicating three dimensional sounds between two or more remote locations by telephone transmission.
- the present invention is concerned with telephone conference systems irrespective of whether or not a visual image is transmitted at the same time as the audio transmission.
- Video telephone conference systems which employ large expensive equipment to enable a large group of people in one location to communicate with another group at another location is well known.
- Video telephones which incorporate a camera and video screen at each location are also well known.
- office technologies such as fax, telephone and video such systems are becoming more readily available.
- An object of the present invention is to overcome, or reduce, these undesirable effects by reproducing a three dimensional sound field of the transmitting station at the receiving location.
- Binaural technology is based on using a so-called "artificial head” microphone system to receive sound from a sound source and convert the acoustic energy into an electrical signal which is subsequently processed digitally.
- the use of an artificial head ensures that the natural three dimensional sound cues, which the brain of a listener uses to determine the position of sound sources in three dimensional space, are incorporated into the audio signal.
- the artificial head is preferably constructed to resemble as close as possible an actual human head and upper torso and has silicone rubber ears which precisely resemble human ears but in some applications good results (but less precise) can be achieved using two spaced microphones with a block or sheet of wood between the microphones.
- binaural signals is intended to mean two channel or stereophonic signals which include one or more components representing audio diffraction effects created by an artificial head means positioned between a pair of microphones.
- artificial head is intended to cover not only a precise model of a human head but other imprecise models (such as for example a block of wood between microphones) and electrical synthesis of the audio diffraction signals.
- a further problem with artificial - head microphone systems is that when listening to the reproduced sound through loudspeakers interaural cross talk occurs, when an audio signal intended for one ear of a listener is also received by the other ear. In order to compensate for this effect it is well known to employ cross-talk cancellation circuits. See for example International Patent Application WO-A-9515069.
- a further object of the present invention is to provide apparatus which enables binaural processing of the audio signals of a telephone conference system.
- apparatus for communicating three dimensional sounds via a telephone link comprising an input device consisting of two spaced microphones operable to produce left and right channel monophonic microphone output signals, signal processing means for each channel comprising filter means for receiving the microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions and equalise the spectral response of the microphone output signals, cross-talk cancellation means for cancelling out interaural cross-talk between the channels, and data compression means operable to receive an output signal from each channel, combine them to produce a binaural signal and compress said binaural signal to produce a compressed binaural signal for transmission over the telephone link, said compression means using a first compression algorithm to compress frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm to compress frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm to compress frequencies between 1 kHz and 2 kHz whilst preserving the IAD and ITD
- the apparatus further includes a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals, and spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
- a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals
- spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
- the sound reproduction means may comprise a pair of loudspeakers, or a pair of headphones.
- the apparatus may be provided with a video signal means comprising a camera operable to produce a video output signal, and the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
- a video signal means comprising a camera operable to produce a video output signal
- the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
- receiving means further includes means for receiving a video signal transmitted over a telephone link and converting the video signal into a video output signal, and display means operable to receive said video output signal and display a visual image.
- Figure 1 illustrates schematically apparatus incorporating the present invention for telephone conference connection between two conference centres.
- Figure 2 shows in block diagram form apparatus incorporating signal processors in accordance with the present invention.
- Figure 3 shows schematically human head, and
- Figure 4 shows a further embodiment of the present invention.
- each conference station 10, 11 is provided with a personal computer (PC) 12 which includes a monitor, two spaced microphones 13, 14 mounted in silicone rubber moulded ears 15 (which model precisely human outer ears) and two spaced loudspeakers 16.
- PC personal computer
- the microphones 13, 14 should ideally be placed about 15 cm apart (the approximate width of a human head) and although it is preferable that the microphones are mounted in moulded ears 15 on an artificial head, the microphones could be mounted in moulded ears mounted on structure 17 (such as a block or sheet of wood). Alternatively the microphones 13, 14 and moulded ears could simply be mounted on the sides of the computer case 12, but this would give less precise detail to the three-dimensional sound field.
- Each of the stations 10, 11 is connected to the other by means of the public telephone system 27 in the usual way.
- both microphones 13, 14 are positioned to receive sound generated at their respective station 10, 11, where they are located.
- Each microphone converts the pressure variations associated with the sound waves that it receives into an analogue electrical signal at inputs 18a, 18b of each channel (representing left and right ears 13,14) of a digital signal processor 19.
- the processor 18 comprises a HRTF filter 20 and an equalisation filter 21 for each channel.
- HRTF or "Head Related Transfer Function” is intended to mean a function representing the transfer function of a path between a source of sound and the ear of the listener, either the ear nearer the sound (near HRTF) or the ear further from the sound (far HRTF).
- HRTF's may be obtained by measurements on a real human head equipped with suitable microphones; alternatively, they may be obtained using an artificial head means, which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
- an artificial head means which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
- Filters 21 correct the spectral response to compensate for the mid-range gain associated with the concha-related resonance, as explained in International Patent Applications WO-A- 9422278 and WO-A-9515069.
- the outputs 21a, 21b of the filters 21 are fed to cross-talk cancellation circuits 22 which cancel out the interaural crosstalk as explained in International Patent Applications WO-A-9422278 and WO-A-9515069.
- the output signals at each channel output 23 comprises a monophonic digital audio signal.
- the normal signals transmitted over internationally acceptable telephone networks are typically a monophonic signal covering a range of frequencies from about 200 Hz to 3.4 kHz.
- the output signals 23 of each channel are combined and compressed by a signal compression means 25 to produce a stereophonic output signal 24.
- the compression algorithms used by the compression means 25 are designed to preserve the three dimensional cues in the audio output signals 23 from each channel.
- a second key aspect is to preserve the time relationship between the signals in the two channels.
- the manner in which the head and outer ears of a listener modify soundwaves before they are registered by the inner ears is complex, with several contributing factors playing a part.
- each pinna outer ear flap
- each pinna together with its auditory canal
- the sound source is moved to one side of the head of the listener, then the more distant ear lies in the shadow of the head, and the ear closer to the sound source is aligned more on-axis with the source.
- the soundwaves.diffract around the listener's head When sound waves encounter the listeners head, the soundwaves.diffract around the listener's head. In general, the average width of a human head is 15 cm with an interaural path length of about 20 cm when the circumference effect is taken into account. Sound waves of greater wavelength than 15 cm (corresponding to frequencies below about 1.7 kHz) can diffract efficiently around a human head whereas at higher frequencies the sound wave cannot diffract efficiently around the head. This effect, known as "head-shadowing", creates differences in amplitudes of the sound signals arriving at each ear of the listener. This interaural amplitude difference (IAD) is one of the primary 3D cues which need to be preserved.
- IAD interaural amplitude difference
- the effects of diffraction on the intensity of the sound are noticeable in the range of between 700 Hz and 8 kHz and are more noticeable at higher frequencies (say above 2 kHz), where the head-shadowing creates noticeable differences in the intensities of the sound waves reaching the ears.
- the listener's brain uses these differences in intensity as cues to locate the direction of the source of high frequency sounds. Therefore it is important to retain the relationship between the intensities (or amplitudes) of the high frequency sounds.
- phase difference is approximately proportional to frequency.
- the listener's brain therefore uses the phase differences of the low frequencies as an important cue to determine the direction of the source of low frequency sounds. It is therefore important to retain the phase relationships between the output signals of the left and right channels for the low frequency sounds.
- time-of-arrival differences between the left and right ears of the listener, unless the sound source is exactly in front, behind, above or below the head of the listener.
- ITD interaural time delay
- Figure 3 shows a plan view of a conceptual head with a left ear (LE) and a right ear (RE) receiving a sound signal from a distant source at azimuth angle ⁇ (about +45° as shown in the drawing).
- the wave front (W - W 1 ) arrives at the right ear (RE)
- the path distance a represents a proportion of the circumference subtended by ⁇ .
- the path length (a+b) is given by.
- ITDs are measured to be slightly greater than this, possibly because of the non-spherical nature of human heads, the complex diffractive situation and surface effects. Hence ITDs lying in the range of 0 to 0.8 ms are also important primary 3D cues.
- the mid-range gain due to the concha related resonance and the resonance in the auditory canal of the outer ear occurs at about 3 kHz or slightly higher and this is at the extreme end of the normal bandwidth of conventional telephone transmission lines.
- the Fossa a cavity at the uppermost region of the Pinna of the outer ear
- the brain of the listener makes use of the higher frequency sounds at 13 kHz or above to assist in determining whether the source of sound is in front of or behind the listener. It is therefore important to retain the detail of high frequency sounds above 13 kHz, if front and back cues are necessary.
- the compression means 25 uses a first algorithm which allows compression of frequencies below 1 kHz, whilst preserving phase differences between the channel output signal, and uses a second algorithm to compress frequencies above 2 kHz, whilst preserving relative differences in the amplitudes of the channel output signals 23.
- the compression means 25 also employs algorithms that allow the compression of the mid range frequencies, whilst preserving the IAD and ITD information over the whole frequency band.
- the compression means 25 thus preserves the phase and amplitude relationships up to 8 kHz for reproducing three dimensional. sound fields without front and back cues, or up to 13 kHz, or above, when front and back cues are wanted.
- the output signal 24 of the compression means 25 is a compressed binaural signal which is transmitted over a conventional public telephone link 27 to another receiving station 10, 11.
- Each station 10, 11 further includes a receiving means 28 for receiving an incoming compressed combined binaural signal transmitted via the telephone link 27.
- the receiving means 28, (see Figure 2), comprises a signal processor which operates to re-expand the incoming compressed signal 26 and produce two channel input signals 30.
- Each channel input signal 30 is supplied to a sound reproduction device 16 which may be the pair of loudspeakers 16 or a pair of headphones 32.
- the apparatus of Figures 1, 2, and 3 further includes means for transmitting and receiving video signals over a telephone link 27 as shown in Figure 4 .
- Figure 4 the same reference numbers are given to the same components that are common to the Figure 2 embodiment.
- each station 10, 11 is provided with a video camera 32 and video processor 33 which is operable to produce a video output signal 34.
- the video output signal 34 from the camera 32 is supplied to the compression means 25 of the signal processor 19 (see Figure 4).
- the compression means 25 includes circuits for combining the binaural output signal 24 with the video output signal 34 to produce a combined video and binaural output signal 36 for transmission over the a telephone link.
- the apparatus is also provided with a receiving means 37 for receiving an incoming combined video and binaural signal 38 transmitted over the telephone link 27 from another remote conference centre 10 or 11.
- the receiving means 37 includes a decompression means 39 for expanding the received video and binaural signal 38, and operates to produce a video signal 40 to a video processor 41 and two audio output signals 30.to the speakers 16 or headphones 32.
- the output of the video processor 40 drives the monitor 12 to produce a visual image.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB9705565 | 1997-03-18 | ||
| GBGB9705565.1A GB9705565D0 (en) | 1997-03-18 | 1997-03-18 | Telephone transmission of 3d sound |
| GBGB9707962.8A GB9707962D0 (en) | 1997-04-19 | 1997-04-19 | Telephonic transmission of 3D sound |
| GB9707962 | 1997-04-19 | ||
| PCT/GB1998/000813 WO1998042161A2 (en) | 1997-03-18 | 1998-03-18 | Telephonic transmission of three-dimensional sound |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP0968624A2 true EP0968624A2 (de) | 2000-01-05 |
Family
ID=26311214
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP98909666A Withdrawn EP0968624A2 (de) | 1997-03-18 | 1998-03-18 | Fernsprechübertragung von dreidimensionalem schall |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP0968624A2 (de) |
| WO (1) | WO1998042161A2 (de) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE102006048295B4 (de) * | 2006-10-12 | 2008-06-12 | Andreas Max Pavel | Verfahren und Vorrichtung zur Aufnahme, Übertragung und Wiedergabe von Schallereignissen für Kommunikationsanwendungen |
| CN102809742B (zh) | 2011-06-01 | 2015-03-18 | 杜比实验室特许公司 | 声源定位设备和方法 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP0689756B1 (de) * | 1993-03-18 | 1999-10-27 | Central Research Laboratories Limited | Tonverarbeitung für mehrere kanäle |
| US5434913A (en) * | 1993-11-24 | 1995-07-18 | Intel Corporation | Audio subsystem for computer-based conferencing system |
| GB9324240D0 (en) * | 1993-11-25 | 1994-01-12 | Central Research Lab Ltd | Method and apparatus for processing a bonaural pair of signals |
| US5596644A (en) * | 1994-10-27 | 1997-01-21 | Aureal Semiconductor Inc. | Method and apparatus for efficient presentation of high-quality three-dimensional audio |
-
1998
- 1998-03-18 EP EP98909666A patent/EP0968624A2/de not_active Withdrawn
- 1998-03-18 WO PCT/GB1998/000813 patent/WO1998042161A2/en not_active Ceased
Non-Patent Citations (1)
| Title |
|---|
| See references of WO9842161A3 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO1998042161A2 (en) | 1998-09-24 |
| WO1998042161A3 (en) | 1998-12-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2008362920B2 (en) | Method of rendering binaural stereo in a hearing aid system and a hearing aid system | |
| US7012630B2 (en) | Spatial sound conference system and apparatus | |
| US4118599A (en) | Stereophonic sound reproduction system | |
| JP4166435B2 (ja) | 通信会議システム | |
| JP3435156B2 (ja) | 音像定位装置 | |
| CN109640235B (zh) | 利用声源的定位的双耳听力系统 | |
| US20160140947A1 (en) | Apparatus, Method, and Computer Program for Adjustable Noise Cancellation | |
| US8340315B2 (en) | Assembly, system and method for acoustic transducers | |
| US8488820B2 (en) | Spatial audio processing method, program product, electronic device and system | |
| CN101208989A (zh) | 用于声信号的装置、系统以及方法 | |
| JP7070910B2 (ja) | テレビ会議システム | |
| KR20090077934A (ko) | 통신 애플리케이션의 사운드 이벤트의 획득, 송신, 및 재생을 위한 프로세스 및 장치 | |
| EP0968624A2 (de) | Fernsprechübertragung von dreidimensionalem schall | |
| West et al. | Teleconferencing system using head-related signals | |
| JP6972858B2 (ja) | 音響処理装置、プログラム及び方法 | |
| KR102613033B1 (ko) | 머리전달함수 기반의 이어폰, 이를 포함하는 전화디바이스 및 이를 이용하는 통화방법 | |
| JP2662825B2 (ja) | 会議通話端末装置 | |
| JP2662824B2 (ja) | 会議通話端末装置 | |
| Horiuchi et al. | Adaptive estimation of transfer functions for sound localization using stereo earphone-microphone combination | |
| JPH07107599A (ja) | ヘッドホン受聴装置 | |
| WO2005069680A1 (en) | Sound receiving arrangement comprising sound receiving means and sound receiving method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| 17P | Request for examination filed |
Effective date: 19991007 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): CH DE FR GB LI NL SE |
|
| RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: CREATIVE TECHNOLOGY LTD. |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
| 18D | Application deemed to be withdrawn |
Effective date: 20041001 |