KR20040027015A

KR20040027015A - New Down-Mixing Technique to Reduce Audio Bandwidth using Immersive Audio for Streaming

Info

Publication number: KR20040027015A
Application number: KR1020020058711A
Authority: KR
Inventors: 최두현; 이규은
Original assignee: (주)엑스파미디어; 이규은
Priority date: 2002-09-27
Filing date: 2002-09-27
Publication date: 2004-04-01

Abstract

PURPOSE: A new down-mixing technique using the immersion typed audio to reduce an audio bandwidth for streaming is provided to simultaneously offer high quality video and surround sound by reducing a sound bandwidth on entire bandwidth and allotting more bandwidths to the video when the video is streamed in a restricted bandwidth. CONSTITUTION: After applying a direction to each channel through the respective binaural synthesis of 5 channels by using the HRFT(Head Related Transfer Function), 5 channels are down-mixed to 2 channels. Thus, a user feels stereo when the user listens to the sound through a headphone. The user feels the stereo through two speakers by making a down-mixing result pass a cross-talk filter.

Description

New Down-Mixing Technique to Reduce Audio Bandwidth using Immersive Audio for Streaming}

스트리밍시 5개의 채널을 모두 전송하거나 혹은 2개의 채널로 단순 다운 믹싱하여 스트리밍한다. 5개의 채널을 전송하는 경우에 많은 양의 데이터를 전송해야 하기 때문에 실시간 전송이 불가능할 수 있고, 2개의 채널로 단순 다운믹싱하는 경우에는 사운드의 입체감을 그대로 유지하는 것이 힘들어 진다.When streaming, all five channels are transmitted or simply down mixed to two channels for streaming. In the case of transmitting five channels, a large amount of data needs to be transmitted, so real-time transmission may not be possible, and in the case of simple downmixing with two channels, it is difficult to maintain the three-dimensionality of sound.

고화질을 가지는 DVD등의 영상물은 사운드의 입체감을 살리기 위해 돌비 디지탈, DTS 등에서 제공하는 5.1채널을 이용한다. 일반적으로 5.1채널은 Dolby Digital의 경우에는 448 Kbps의 대역폭을 가지고, DTS의 경우 1411 Kbps의 대역폭을 가진다. 고화질의 DVD급 영상물을 스트리밍하려면 Dolby Digital 혹은 DTS의 대역폭은 ADSL과 같은 초고속 인터넷에서 조차도 제한된 대역폭 내에서 상당히 많은 대역폭을 차지하므로, 상대적으로 비디오의 대역폭을 잠식한다. 본 발명에서는 Binaural Synthesis를 이용하여 각 채널의 방향성을 유지한 후, 5채널을 2채널로 다운믹싱함으로써, 5채널 본래의 입체감은 유지되면서 채널의 감소로 인한 대역폭의 절감 효과도 기대할 수 있다. 결과적으로, 만약 영상물을 제한된 대역폭에서 스트리밍한다면, 전체 대역폭에서 사운드의 대역폭은 감소하고 상대적으로 비디오에 보다 많은 대역폭을 할당할 수 있기 때문에, 고화질의 비디오와 입체감을 가지는 사운드를 함께 제공할 수 있다.High-definition video such as DVD uses 5.1 channels provided by Dolby Digital, DTS, etc. to enhance the stereoscopic sound. In general, 5.1 channels have a bandwidth of 448 Kbps for Dolby Digital and 1411 Kbps for DTS. To stream high-quality DVD-quality video, the bandwidth of Dolby Digital or DTS occupies a considerable amount of bandwidth within a limited bandwidth, even on high-speed Internet such as ADSL, and relatively erodes video bandwidth. In the present invention, by maintaining the direction of each channel using the Binaural Synthesis, by downmixing the five channels into two channels, while maintaining the original three-dimensional sense of the five channels can be expected to reduce the bandwidth due to the reduction of the channel. As a result, if video is streamed in a limited bandwidth, the bandwidth of the sound at the full bandwidth is reduced and relatively more bandwidth can be allocated to the video, thereby providing high quality video and sound with stereoscopic effect.

스트리밍시 제한된 대역폭에서 5.1채널과 같은 다중채널을 방향성을 유지하여 입체감을 유지하여 2채널로 다운믹싱함으로써 비디오에 보다 많은 대역폭을 할당하여 오디와 함께 비디오도 양질의 스트리밍이 될 수 있도록 한다.When streaming, multi-channel such as 5.1 channel is maintained in a limited bandwidth, down-mixing to 2 channel by maintaining the three-dimensional effect to allocate more bandwidth to the video so that the video can also be streamed with the audio.

(도 1) 5개의 채널을 2개의 채널로 다운믹싱하는 전체 과정(FIG. 1) The whole process of downmixing five channels into two channels

(도 2) Binaural Synthesis와 2채널로 다운믹싱하는 세부 과정(Figure 2) Detailed process of downmixing with Binaural Synthesis and 2-channel

일반적으로 5.1채널(좌, 우, 중앙, 서라운드 좌, 서라운드우 그리고 저주파 효과 채널)을 이용하여 입체감 있는 사운드를 듣기 위해서는 방향성을 가지지 않는 저주파 효과 채널을 제외한 5채널에 해당하는 스피커의 위치가 중요하다. 즉 5개 스피커의 위치에 의해서 입체감이 형성된다. 본 발명에서는 5개 채널의 각각을HRTF를 이용하여 binaural synthesis함으로써 각 채널에 방향성을 준 후, 5채널을 2 채널로 다운믹싱하여 헤드폰으로 청취시 입체감을 느낄 수 있게 한다. 또한 2채널로 다운믹싱된 결과를 cross-talk 제거 필터를 통과시킴으로써 2개의 스피커를 통해서도 청취자가 입체감을 느낄 수 있도록 한다.In general, in order to listen to stereoscopic sound using 5.1 channels (left, right, center, surround left, surround right and low frequency effect channels), the position of speakers corresponding to 5 channels except the non-directional low frequency effect channel is important. . That is, the stereoscopic feeling is formed by the positions of the five speakers. In the present invention, each of the five channels is binaural synthesis using HRTF to give directionality to each channel, and then downmix the five channels to two channels so that a stereoscopic feeling can be felt when listening with headphones. The two-channel downmixed results are then passed through a cross-talk cancellation filter, allowing listeners to feel three-dimensional through two speakers.

첫번째 단계는 (도 1)과 같이 압축된 좌, 우, 중앙, 서라운드 좌, 그리고 서라운드 우 등의 5개 채널에 binaural synthesis를 하기 위해서는 압축된 5개 채널을 PCM(Pulse Coded Modulation)등의 형태로 압축을 풀어준다. 두번째 단계는 5개 채널에 각각 Binaural Synthesis를 한다. Binaural synthesis란 방향성이 없는 모노 채널을 특정한 각도의 방향성을 가지는 HRTF 필터에 통과시킴으로써 방향성을 가지는 신호로 변환하는 것이다. 두번째 단계를 (도 2)에 상세하게 나타내었다. (도 2)와 같이, 좌 채널의 경우 방위각 좌30도 위도 0도 HRTF를 이용하여 Binaural Synthesis한다. 우 채널의 경우 방위각 우30도 위도 0도 HRTF를 이용하여 Binaural Synthesis한다. 중앙 채널은 방위각 0도 위도 0도 HRTF를 이용하여 Binaural Synthesis한다. 서라운드 좌는 방위각 좌120도 위도 0도 HRTF를 이용하여 Binaural Synthesis한다. 서라운드 우는 방위각 우120도 위도 0도 HRTF를 이용하여 Binaural Synthesis한다. 세번째 단계에서 Binaural Synthesis를 통과한 5개의 채널을 (도2)와 같이 2개의 채널로 다운믹싱 한다. 헤드폰으로 청취하는 경우는 2채널로 다운믹싱된 PCM 신호를 MP3등으로 압축하면 헤드폰을 위한 다운믹싱은 끝난다. 만약 압축을 위해서MP3를 이용하여 128 Kbps로 압축한다면 Dolby digital의 경우 448 Kbps가 128Kbps로 감소하게 된다.In the first step, in order to perform binaural synthesis on five channels such as left, right, center, surround left, and surround right, the compressed five channels are in the form of pulse coded modulation (PCM). Unzip it. The second stage performs Binaural Synthesis on each of the five channels. Binaural synthesis is the conversion of a non-directional mono channel into a directional signal by passing it through a HRTF filter with a specific angle. The second step is shown in detail in FIG. 2. As shown in FIG. 2, in the case of the left channel, Binaural Synthesis is performed using an azimuth left 30 degrees latitude 0 degrees HRTF. In the case of the right channel, the azimuth right 30 degrees latitude 0 degrees HRTF is used to perform the Binaural Synthesis. The central channel is Binaural Synthesis using azimuth 0 degree latitude 0 degree HRTF. The surround left is Binaural Synthesis using azimuth left 120 degrees latitude 0 degrees HRTF. The surround rain is Binaural Synthesis using azimuth right 120 degrees latitude and 0 degrees HRTF. In the third step, five channels passed through Binaural Synthesis are downmixed into two channels as shown in FIG. 2. When listening to headphones, downmixing for the headphones is done by compressing the PCM signal downmixed into two channels into MP3 or the like. If you compress it to 128 Kbps using MP3 for compression, 448 Kbps will be reduced to 128 Kbps for Dolby digital.

청취자가2개의 스피커를 통해서 청취하고자 할 때는, 입체감을 느끼기 위해서는 Binaural Synthesis 한 후 2채널로 다운믹싱된 채널들 중에서 좌 채널은 좌측귀에 들어가고 우측 채널은 우측귀에 들어가야만 한다. 그러나 스피커의 경우에는 어쩔 수 없이 좌 채널의 일부가 우측귀에 들어가고, 우 채널의 일부가 좌측귀에 들어간다. 이를 Cross-talk이라 하는데 Cross-talk이 존재하면, Binaural Synthesis된 신호에서 청취자가 입체감을 느낄 수 없다. 헤드폰을 통해서 들을 때는 좌측 채널은 좌측귀에, 우측 채널은 우측귀에 들어가기 때문에 채널 상호간의 Cross-talk은 발생되지 않는다. 그러나, 스피커의 경우는 다르다. 2개의 스피커를 통해서 청취하고자 할 때는Binaural Synthesis후 2채널로 다운믹싱된 채널들을 (도 1)의 점선으로 표시된 부분인 Cross-talk 제거 필터를 통과시켜야 한다. 즉, 2개의 스피커를 통해 청취하는 경우에는Cross-talk 제거 필터를 통과한 2채널을 MP3로 압축한다. 이렇게 하면, 2개의 스피커를 사용해서도 입체감 있는 사운드를 즐길 수 있다.When the listener wants to listen through two speakers, the left channel should be in the left ear and the right channel in the right ear of the two channels downmixed after Binaural Synthesis in order to feel 3D. In the case of speakers, however, a part of the left channel enters the right ear and a part of the right channel enters the left ear. This is called cross-talk. If cross-talk is present, the listener cannot feel three-dimensional effects in the Binaural Synthesis signal. When listening through headphones, the left channel is in the left ear and the right channel is in the right ear, so no crosstalk between channels occurs. However, the case of the speaker is different. In order to listen through two speakers, the channels downmixed into two channels after Binaural Synthesis must pass through a cross-talk cancellation filter, which is indicated by the dotted line in FIG. 1. In other words, when listening through two speakers, two channels through the cross-talk cancellation filter are compressed to MP3. In this way, you can enjoy a three-dimensional sound even with two speakers.

본 발명은 오디오 스트리밍에 필요한 대역폭을 줄이면서 헤드폰이나 2개의 스피커를 사용하여 사운드의 입체감을 유지하는 방법에 대한 것이다. 적은 대역폭으로 입체감 있는 사운드를 전송하는 것이 가능해 지므로, 휴대폰과 같은 데이터 전송환경에서도 헤드폰을 이용하여 오케스트라 연주 같은 사운드를 즐길 수 있다. 아울러 DVD와 같은 영상물의 경우에 제안한 기법을 적용한다면 오디오의 입체감은 유지되고 대역폭은 감소하게 된다. 이는, 제한된 전송대역폭이라면 더 좋은 비디오를 기대할 수 있고, 같은 비디오 화질이라면 더 작은 대역폭이 필요함 (더 많은 가입자 서비스가 가능함)을 의미 한다.The present invention relates to a method of maintaining the stereoscopic sound of sound using headphones or two speakers while reducing the bandwidth required for audio streaming. Since it is possible to transmit a three-dimensional sound with a small bandwidth, even in a data transmission environment such as a mobile phone, you can enjoy the sound such as playing the orchestra using headphones. In addition, if the proposed technique is applied to a video such as DVD, the stereoscopic effect of the audio is maintained and the bandwidth is reduced. This means that better video can be expected with limited transmission bandwidth, and smaller bandwidth is required (more subscriber service is possible) with the same video quality.

Claims

Multi-channel provided by Dolby Digital or DTS, etc.Binaural Synthesis based on HRTF and then downmixed into 2 channels for streaming